openssl verify with 1B certificates

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

openssl verify with 1B certificates

ebe ebe
Hello,

I am a CS graduate student and doing a measurement study regarding the SSL ecosystem. I have approximately 1 billion SSL certificates and I would like to run openssl verify on each certificate to sift out invalid certificates. My major concern, as you might guess, is whether doing this verification is feasible given the size of my dataset. An alternative idea I have is to replicate the verification steps of openssl. More specifically, I am working with a Hadoop infrastructure and I can perform some of the verification steps without running into scalability issues (e.g is certificate between notBefore-notAfter timestamps, subject key&authority key identifier checks). However, with this approach I feel like verifying the signature would be a big challenge. Any ideas on how I can tackle these problems?

Regards,
Ceyhun
--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: openssl verify with 1B certificates

Richard Moore
Depends what information you need - if you just need a binary valid/not valid then prune it first then verify. If you want a more fine grained data set then don't. Write some code  - forking and running openssl verify each time will be insanely slow - don't do that. I doubt you really have a billion unique certificates - avoid testing duplicates. Also don't forget that you really need certificate chains, so I hope you captured the intermediate certificates too!

Cheers

Rich.

On 30 March 2017 at 18:44, ebe ebe <[hidden email]> wrote:
Hello,

I am a CS graduate student and doing a measurement study regarding the SSL ecosystem. I have approximately 1 billion SSL certificates and I would like to run openssl verify on each certificate to sift out invalid certificates. My major concern, as you might guess, is whether doing this verification is feasible given the size of my dataset. An alternative idea I have is to replicate the verification steps of openssl. More specifically, I am working with a Hadoop infrastructure and I can perform some of the verification steps without running into scalability issues (e.g is certificate between notBefore-notAfter timestamps, subject key&authority key identifier checks). However, with this approach I feel like verifying the signature would be a big challenge. Any ideas on how I can tackle these problems?

Regards,
Ceyhun
--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users


--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: openssl verify with 1B certificates

Jakob Bohm-7
Also consider using the functions that the "openssl verify"
command uses (source file: apps/verify.c), perhaps from a
bulk process that can be run on each CPU node on your
compute cluster.  With a little thought, these can be done
efficiently, with lots of reused (i.e. not repeated) actions,
such as setting up parameters, loading known CA and intermediary
certs, opening files that contain multiple certs, etc.

On 30/03/2017 22:10, Richard Moore wrote:

> Depends what information you need - if you just need a binary
> valid/not valid then prune it first then verify. If you want a more
> fine grained data set then don't. Write some code  - forking and
> running openssl verify each time will be insanely slow - don't do
> that. I doubt you really have a billion unique certificates - avoid
> testing duplicates. Also don't forget that you really need certificate
> chains, so I hope you captured the intermediate certificates too!
>
> Cheers
>
> Rich.
>
> On 30 March 2017 at 18:44, ebe ebe <[hidden email]
> <mailto:[hidden email]>>wrote:
>
>     Hello,
>
>     I am a CS graduate student and doing a measurement study regarding
>     the SSL ecosystem. I have approximately 1 billion SSL certificates
>     and I would like to run openssl verify on each certificate to sift
>     out invalid certificates. My major concern, as you might guess, is
>     whether doing this verification is feasible given the size of my
>     dataset. An alternative idea I have is to replicate the
>     verification steps of openssl. More specifically, I am working
>     with a Hadoop infrastructure and I can perform some of the
>     verification steps without running into scalability issues (e.g is
>     certificate between notBefore-notAfter timestamps, subject
>     key&authority key identifier checks). However, with this approach
>     I feel like verifying the signature would be a big challenge. Any
>     ideas on how I can tackle these problems?
>
>


--
Jakob Bohm, CIO, partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Soborg, Denmark. direct: +45 31 13 16 10
<tel:+4531131610>
This message is only for its intended recipient, delete if misaddressed.
WiseMo - Remote Service Management for PCs, Phones and Embedded


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded
--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: openssl verify with 1B certificates

Michael Wojcik
A lot depends on what you mean by "verify", too.  TLS endpoints should perform a large number of checks on certificates; some of them aren't relevant for your purposes, and others might not be.

For example, a TLS client such as a browser will check whether the received entity certificate identifies the peer it wants to connect to - generally checking the subjAltName extensions, and possibly falling back on e.g. the CN of the subject DN if the certificate isn't X.509v3. That's not relevant in your case.

And then there are things like CRLs and OCSP checks. If you don't care about those, obviously that's work you don't have to do. What about, oh, certificate purpose, for example? Do you care about the chain length?

So what are you checking? My guess is the list is something like this:
1. Object is a valid X.509 certificate (ASN.1 parsing doesn't show any errors, structure is appropriate, contains required fields...).
2. Within the validity period, as you noted in your original message.
3. Valid signature. This means you'll need the public key of the signing certificate, of course. Are you going to chase it all the way to the root? Do you care about whether the root's in some collection of trust anchors?

That's a lot simpler than verifying a peer certificate for TLS - my checklist for that is 11 steps, and recurses as it walks the chain. But it's still a fair bit of work.

Personally, for a project like this, as I harvested public keys I'd put them in a NoSQL key-value store, with the certificate subject DN as the key. Then I wouldn't have to find and parse the signing certificate for step 3, if I'd already stored the corresponding key.

Michael Wojcik
Distinguished Engineer, Micro Focus



--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Loading...