Multithreading: Global locks causing bottleneck in parallel SSL_write calls

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Multithreading: Global locks causing bottleneck in parallel SSL_write calls

Dipak Gaigole
Hi,
 
I have a windows multi-threaded SSL server application which handles each client request in a new thread. The Server handles different types of requests. One of the request type is like “send file” where server thread has to read a file from local filesystem and send the content to the client.
Server configurations:
                FIPS: Enabled
                SSL Protocol: TLSv1.2
                Cipher: AES256-SHA
 
It was observed that as the number of thread parallelism increases, the throughput decreases.
To profile the server, I had recompiled the OpenSSL and FIPS source with debug symbol information. When run under a statistical profiler “verysleepy“ (http://www.codersnotes.com/sleepy) points out below stack (hotspot) which was consuming most of the time.
###################################
WaitForSingleObjectEx                  KERNELBASE      [unknown]         0              0x7fefd2610dc
CRYPTO_lock                               LIBEAY64              c:\openssl_src\openssl-1.0.2f\crypto\cryptlib.c 597         0xfb0bb26
FIPS_lock                                     LIBEAY64              c:\fips_src\openssl-fips-2.0.10\fips\utl\fips_lck.c             69                0xfceb291
fips_drbg_bytes                             LIBEAY64              c:\fips_src\openssl-fips-2.0.10\fips\rand\fips_drbg_rand.c                86           0xfcfe868
RAND_bytes                                 LIBEAY64              c:\openssl_src\openssl-1.0.2f\crypto\rand\rand_lib.c    159                0xfc0dbe5
tls1_enc                                       SSLEAY64             c:\openssl_src\openssl-1.0.2f\ssl\t1_enc.c          786         0x3b6675c
do_ssl3_write                                SSLEAY64             c:\openssl_src\openssl-1.0.2f\ssl\s3_pkt.c          1042       0x3b4c336
ssl3_write_bytes                           SSLEAY64             c:\openssl_src\openssl-1.0.2f\ssl\s3_pkt.c          830         0x3b4badd
ssl3_write                                     SSLEAY64             c:\openssl_src\openssl-1.0.2f\ssl\s3_lib.c            4404       0x3b4796c
SSL_write                                     SSLEAY64             c:\openssl_src\openssl-1.0.2f\ssl\ssl_lib.c           1047       0x3b7a3e4
###################################
 
To check if this behavior can be seen outside of our code, I wrote a standalone multi threaded SSL server which performs same task as “send file”. And profiling of the standalone server also point out at the similar stack. So I was able to reproduced this behavior in standalone program.
File size used: 340 MB
 
To find out how the bottleneck varies with increasing the parallel thread count in standalone SSL server program, I analyzed one thread behavior with different parallelism and here are the results:
######################
“Parallel thread count”  ->  “% of time spend in waiting for global lock”
1 -> 1 %
2 -> 2 %
5 -> 5 %
10 -> 40 %
15 -> 46 %
20 -> 65 %
25 -> 68 %
30 -> 70 %
######################
 
After digging into the FIPS code found that there is a global lock around the random number generation code which is causing the bottleneck when multiple threads want to perform SSL_write operation in parallel.
Code snippet from fips/rand/fips_drbg_rand.c:
######################
/* Since we only have one global PRNG used at any time in OpenSSL use a global
* variable to store context.
*/
static DRBG_CTX ossl_dctx;
….
….
static int fips_drbg_bytes(unsigned char *out, int count)
                {
                DRBG_CTX *dctx = &ossl_dctx;
                int rv = 0;
                unsigned char *adin = NULL;
                size_t adinlen = 0;
                CRYPTO_w_lock(CRYPTO_LOCK_RAND);
                 ….
                 ….
                 CRYPTO_w_unlock(CRYPTO_LOCK_RAND);
######################
 
As comment from fips_drbg_rand.c says, do we really need to have one global PRNG at any time in OpenSSL? Does anyone has any suggestion about how starvation (due to the global locks) of parallel SSL_write can be reduced? Any suggestions are welcome :)
 
Thanks,
Dipak

--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Multithreading: Global locks causing bottleneck in parallel SSL_write calls

OpenSSL - User mailing list
On 04/12/2017 05:54 AM, dipakgaigole wrote:
Hi,
 
I have a windows multi-threaded SSL server application which handles each client request in a new thread. The Server handles different types of requests. One of the request type is like “send file” where server thread has to read a file from local filesystem and send the content to the client.
Server configurations:
                FIPS: Enabled
                SSL Protocol: TLSv1.2
                Cipher: AES256-SHA

The OpenSSL PRNG story is currently not so great, yes.
But maybe you should try without FIPS, and also with a different cipher?  AES256-SHA is both CBC and SHA1, neither of which is really a current best practice.

-Ben

--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Multithreading: Global locks causing bottleneck in parallel SSL_write calls

Dipak Gaigole
  The OpenSSL PRNG story is currently not so great, yes.
  But maybe you should try without FIPS, and also with a different cipher?  AES256-SHA is both CBC and SHA1, neither of which is really a current best practice.

  -Ben

Thanks Ben. I will try with disabling FIPS. Where can i find current best practice cipher list? Or Can you suggest few?

-Dipak
 
To start a new topic under OpenSSL - User, email [hidden email]
To unsubscribe from OpenSSL, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Multithreading: Global locks causing bottleneck in parallel SSL_write calls

Michael Wojcik
> From: openssl-users [mailto:[hidden email]] On Behalf Of Dipak Gaigole
> Sent: Thursday, April 13, 2017 15:12

>  I will try with disabling FIPS.

Opinions differ, but many people - including myself - recommend not enabling FIPS mode unless it is explicitly required (generally because you work for the US Federal Government or a relatively small number of other organizations that let bureaucracy stand in the way of security).

> Where can i find current best practice cipher list? Or Can you suggest few?

The free ebook /OpenSSL Cookbook/ from Feisty Duck is a good place to start. It was updated only a year ago, so it's reasonably current.

https://www.feistyduck.com/books/openssl-cookbook/

Beyond that, you really need to be following current research, or at least the people who write knowledgeably about current research.

Ben wrote "AES256-SHA is both CBC and SHA1, neither of which is really a current best practice". Certainly the bloom is off the rose of SHA1, particularly since the "SHAttered" demonstration of a successful collision. The reality is that SHA1 is still fine for many purposes in practice; but if you're in a position to pick a better digest, it makes sense to do so. That means SHA-256 or SHA-384; or perhaps SHA3 with appropriate parameters, but SHA3 hasn't seen widespread adoption yet. (That's more or less by design - NIST wanted to standardize SHA3 before it was needed.)

Regarding CBC, he presumably was referring to the various issues with CBC mode and the general move to various AE and AEAD combining modes, particularly GCM. AES-GCM suites are most people's default recommendation these days, when there aren't any compelling reasons to use something else. With GCM you have to be careful that you have a solid implementation and you never reuse an IV, so it's a bit easier for a non-expert to screw up. But considering the aforementioned issues with CBC, it's easy to see why people recommend it.

There's a ton of information - more than a non-expert can be expected to absorb - on these topics available online, even if we ignore the actual primary research and just look at treatments for lay readers. Adam Langley talks about the problems with AES-CBC in particular in this post, for example:

https://security.googleblog.com/2013/11/a-roster-of-tls-cipher-suites-weaknesses.html

In TLS, AES-CBC is vulnerable to the BEAST (TLS 1.0 only) and Lucky13 attacks, given certain other conditions. Lucky13 (aka "Lucky Thirteen") actually applies to all block ciphers in CBC mode, if the implementation exposes certain timing side channels. These days decent implementations (including OpenSSL) try to remove or whiten side channels, but that's actually quite difficult to do comprehensively (see various pieces of research published over the past several years). Again, for many applications, these attacks simply aren't feasible. But many applications are developed without the benefit of a cryptographer who can look at them and decide whether you need to worry about them.


Michael Wojcik
Distinguished Engineer, Micro Focus



--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Multithreading: Global locks causing bottleneck in parallel SSL_write calls

Jakob Bohm-7
On 13/04/2017 22:01, Michael Wojcik wrote:

>> From: openssl-users [mailto:[hidden email]] On Behalf Of Dipak Gaigole
>> Sent: Thursday, April 13, 2017 15:12
>>   I will try with disabling FIPS.
> Opinions differ, but many people - including myself - recommend not enabling FIPS mode unless it is explicitly required (generally because you work for the US Federal Government or a relatively small number of other organizations that let bureaucracy stand in the way of security).
>
>> Where can i find current best practice cipher list? Or Can you suggest few?
> The free ebook /OpenSSL Cookbook/ from Feisty Duck is a good place to start. It was updated only a year ago, so it's reasonably current.
>
> https://www.feistyduck.com/books/openssl-cookbook/
>
> Beyond that, you really need to be following current research, or at least the people who write knowledgeably about current research.
>
> Ben wrote "AES256-SHA is both CBC and SHA1, neither of which is really a current best practice". Certainly the bloom is off the rose of SHA1, particularly since the "SHAttered" demonstration of a successful collision. The reality is that SHA1 is still fine for many purposes in practice; but if you're in a position to pick a better digest, it makes sense to do so. That means SHA-256 or SHA-384; or perhaps SHA3 with appropriate parameters, but SHA3 hasn't seen widespread adoption yet. (That's more or less by design - NIST wanted to standardize SHA3 before it was needed.)
>
> Regarding CBC, he presumably was referring to the various issues with CBC mode and the general move to various AE and AEAD combining modes, particularly GCM. AES-GCM suites are most people's default recommendation these days, when there aren't any compelling reasons to use something else. With GCM you have to be careful that you have a solid implementation and you never reuse an IV, so it's a bit easier for a non-expert to screw up. But considering the aforementioned issues with CBC, it's easy to see why people recommend it.
>
> There's a ton of information - more than a non-expert can be expected to absorb - on these topics available online, even if we ignore the actual primary research and just look at treatments for lay readers. Adam Langley talks about the problems with AES-CBC in particular in this post, for example:
>
> https://security.googleblog.com/2013/11/a-roster-of-tls-cipher-suites-weaknesses.html
>
> In TLS, AES-CBC is vulnerable to the BEAST (TLS 1.0 only) and Lucky13 attacks, given certain other conditions. Lucky13 (aka "Lucky Thirteen") actually applies to all block ciphers in CBC mode, if the implementation exposes certain timing side channels. These days decent implementations (including OpenSSL) try to remove or whiten side channels, but that's actually quite difficult to do comprehensively (see various pieces of research published over the past several years). Again, for many applications, these attacks simply aren't feasible. But many applications are developed without the benefit of a cryptographer who can look at them and decide whether you need to worry about them.

Please note that all of these "CBC vulnerabilities" you specifically
mention are SSL/TLS vulnerabilities in the particular ways that SSL3
and current TLS versions handle padding and IV management, not
issues with CBC itself.

Also note that GCM is very much a "marginal" design, operating at the
very edge of what is safe to do and furthermore putting all the
cryptographic "eggs" in one basket (AES and GF-2^n arithmetic).

Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Multithreading: Global locks causing bottleneck in parallel SSL_write calls

Michael Wojcik
> From: openssl-users [mailto:[hidden email]] On Behalf
> Of Jakob Bohm
> Sent: Tuesday, April 18, 2017 06:22
>
> Please note that all of these "CBC vulnerabilities" you specifically
> mention are SSL/TLS vulnerabilities in the particular ways that SSL3
> and current TLS versions handle padding and IV management, not
> issues with CBC itself.
>
> Also note that GCM is very much a "marginal" design, operating at the
> very edge of what is safe to do and furthermore putting all the
> cryptographic "eggs" in one basket (AES and GF-2^n arithmetic).

Agreed on both points.

Of course with any block cipher operating mode that requires padding you have the possibility of protocol and implementation errors that create padding oracles. But with GCM you  have the possibility of, say, implementation errors  that lead to nonce reuse. All of the modes have risks.

(Also AE / AEAD modes seem like they're on the edge of violating Moxie Marlinspike's Cryptographic Doom Principle: If message integrity isn't the very first thing you check, sooner or later you'll regret it. The CDP isn't scientific, but then neither is cryptographic protocol design. The rush to AEAD modes seems to be largely driven by performance concerns, and that does not make for good crypto. Take TLSv1.3's 0-RTT session resumption, for example.)

And for most applications, attacking the crypto isn't a particularly likely mode of attack anyway. There are lower-hanging fruit, and even flawed crypto will direct the attacker's attention elsewhere. Or the nature of the application doesn't provide enough volume or flexibility to exploit a theoretical vulnerability.

Michael Wojcik
Distinguished Engineer, Micro Focus

--
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users