Decryption slower in 1.1.1 branch?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Decryption slower in 1.1.1 branch?

Dan Heinz

I upgraded a library that used OpenSSL 1.0.2 to the OpenSSL 1.1.1d.  On Windows, I have found that the time to decrypt had doubled.  

After a bit of timestamp logging, I found the RSA_private_decrypt function is taking twice as long with 1.1.1d as it did with 1.0.2t.  This is being called from a Windows 64-bit DLL. 

 

For example, decrypting 8680 bytes of data averages about .3 seconds with the OpenSSL 1.0.2t library (statically linked).  Decrypting the same data with the OpenSSL 1.1.1d library averages about .6 seconds.

I’m wondering if perhaps my build configuration is incorrect or missing something for the 1.1.1d build.  Here are the configuration parameters for the 64-bit build:

Configure VC-WIN64A --prefix=%RootPath_ThirdParty%\%OPENSSL_VERSION% -DPURIFY -DOPENSSL_NO_COMP -D_USING_V110_SDK71_ no-shared no-asm no-idea no-mdc2 no-rc5 no-ssl2 no-ssl3 no-zlib no-comp no-pinshared

 

I logged things granular enough to see the speed difference was in RSA_private_decrypt, but I’m not sure why it is so much slower with 1.1.1d.  Any help or ideas would be appreciated!

 

Reply | Threaded
Open this post in threaded view
|

Re: Decryption slower in 1.1.1 branch?

Viktor Dukhovni
On Mon, Jan 27, 2020 at 06:20:27PM +0000, Dan Heinz wrote:

> I upgraded a library that used OpenSSL 1.0.2 to the OpenSSL 1.1.1d.
> On Windows, I have found that the time to decrypt had doubled.  After
> a bit of timestamp logging, I found the RSA_private_decrypt function
> is taking twice as long with 1.1.1d as it did with 1.0.2t.  This is
> being called from a Windows 64-bit DLL.

RSA is not intended for bulk data decryption, its intended uses are key
transport and signing.  Bulk data decryption is done via AES or similar.

> For example, decrypting 8680 bytes of data averages about .3 seconds
> with the OpenSSL 1.0.2t library (statically linked).  Decrypting the
> same data with the OpenSSL 1.1.1d library averages about .6 seconds.

Are you sure that's seconds and not milliseconds?  These are absurdly
long times, almost certainly dominated by factors other than the
encryption algorithms.  On my 2015 laptop (MacOS) I get:

  OpenSSL 1.0.2h:

    options:bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: cc -I. -I.. -I../include  -fPIC -fno-common -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -arch x86_64 -O3 -DL_ENDIAN -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.000883s 0.000027s   1132.3  37397.7

    OpenSSL 1.1.1a:
    options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
    compiler: cc -fPIC -arch x86_64 -O3 -Wall -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPADLOCK_ASM -DPOLY1305_ASM -D_REENTRANT -DNDEBUG -DOPENSSL_API_COMPAT=0x10100000L
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.000883s 0.000026s   1132.8  38377.8

Which shows an RSA 2048-bit signature (private key decrypt operation)
taking less than 1ms, and not materially different between 1.0.2 and
1.1.1.

> I'm wondering if perhaps my build configuration is incorrect or
> missing something for the 1.1.1d build.  Here are the configuration
> parameters for the 64-bit build:

There's probably a deeper issue with what you're doing, you need to be
much more specific about what you're measuring.  Is this SMIME?  CMS?
What is the RSA key size?  What is the bulk encryption cipher?

> Configure VC-WIN64A --prefix=%RootPath_ThirdParty%\%OPENSSL_VERSION%
> -DPURIFY -DOPENSSL_NO_COMP -D_USING_V110_SDK71_ no-shared no-asm
> no-idea no-mdc2 no-rc5 no-ssl2 no-ssl3 no-zlib no-comp no-pinshared

PURIFY must not be enabled in production builds, it is only for memory
allocation/safety debugging.  You've also disabled assembly
optimizations, which reduces side-channel resistance and hurts
performance.

> I logged things granular enough to see the speed difference was in
> RSA_private_decrypt, but I'm not sure why it is so much slower with
> 1.1.1d.  Any help or ideas would be appreciated!

At 600ms for 8KB, it is not plausible that the time is spend doing
cryptography.  That's barely fast enough to feed a 1980's modem.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

RE: Decryption slower in 1.1.1 branch?

Dan Heinz
Thank you for the information, Victor.

>> I upgraded a library that used OpenSSL 1.0.2 to the OpenSSL 1.1.1d.
>> On Windows, I have found that the time to decrypt had doubled.  After
>> a bit of timestamp logging, I found the RSA_private_decrypt function
>> is taking twice as long with 1.1.1d as it did with 1.0.2t.  This is
>> being called from a Windows 64-bit DLL.

>RSA is not intended for bulk data decryption, its intended uses are key transport and signing.  Bulk data decryption is done via AES or similar.

>> For example, decrypting 8680 bytes of data averages about .3 seconds
>> with the OpenSSL 1.0.2t library (statically linked).  Decrypting the
>> same data with the OpenSSL 1.1.1d library averages about .6 seconds.

>Are you sure that's seconds and not milliseconds?  These are absurdly long times, almost certainly dominated by factors other than the encryption algorithms.  On my 2015 laptop (MacOS) I get:

Yes, it is seconds.  
Our library source is cross-platform and I tested on Linux with execution times around 20 milliseconds.  This was with a static build rather than shared on Linux.  I'm running the Linux tests on a VM on the same machine I am testing the Windows builds.  Yet, the Windows build is much slower.  Same source code.  That's why I initially thought it was something in my OpenSSL configure parameters.

While I'm ok with the execution speed with OpenSSL 1.0.2, I'd like to figure out why the times doubled with OpenSSL 1.1.1.  

I'm logging times before and after the calls to RSA_private_decrypt.  With OpenSSL 1.0.2 it takes on average about 4-8 milliseconds for each RSA_private_decrypt call.  With OpenSSL 1.1.1d, it takes 10-15 milliseconds for each RSA_private_decrypt call.  No code changes other than what was needed such as changing the direct calls to the RSA structure fields.

>> I'm wondering if perhaps my build configuration is incorrect or
>> missing something for the 1.1.1d build.  Here are the configuration
>> parameters for the 64-bit build:

>There's probably a deeper issue with what you're doing, you need to be much more specific about what you're measuring.  Is this SMIME?  CMS?
>What is the RSA key size?  What is the bulk encryption cipher?

The data being decrypted is local on the client machine and is just an XML file.
RSA key is 1024 bits.  
I'm using OAEP padding.

> Configure VC-WIN64A --prefix=%RootPath_ThirdParty%\%OPENSSL_VERSION%
> -DPURIFY -DOPENSSL_NO_COMP -D_USING_V110_SDK71_ no-shared no-asm
> no-idea no-mdc2 no-rc5 no-ssl2 no-ssl3 no-zlib no-comp no-pinshared

>PURIFY must not be enabled in production builds, it is only for memory allocation/safety debugging.  You've also disabled assembly optimizations, which reduces side-channel resistance and hurts performance.

Thank you for the information.  I removed it from the configuration parameters.  I didn't really notice a difference in execution time though.  I also removed the no-asm parameter, setup nasm, and rebuilt with no noticeable changes.  

> I logged things granular enough to see the speed difference was in
> RSA_private_decrypt, but I'm not sure why it is so much slower with
> 1.1.1d.  Any help or ideas would be appreciated!

>At 600ms for 8KB, it is not plausible that the time is spend doing cryptography.  That's barely fast enough to feed a 1980's modem.

I would expect the execution times to be more in line with what I saw with Linux for both 1.0.2 and 1.1.1.  But even so, I do not understand why just upgrading to 1.1.1 causes the RSA_private_decrypt calls to double in execution time from what they were with 1.0.2?

Reply | Threaded
Open this post in threaded view
|

Re: Decryption slower in 1.1.1 branch?

Viktor Dukhovni
On Tue, Jan 28, 2020 at 06:24:06PM +0000, Dan Heinz wrote:

> >RSA is not intended for bulk data decryption, its intended uses are
> >key transport and signing.  Bulk data decryption is done via AES or
> >similar.

It sounds like you're directly encrypting data with RSA.  That's a
mistake.  RSA is for decrypting a symmetric algorithm key, that then
decrypts the data.

> >Are you sure that's seconds and not milliseconds?  These are absurdly
> >long times, almost certainly dominated by factors other than the
> >encryption algorithms.  On my 2015 laptop (MacOS) I get:
>
> Yes, it is seconds.  

Sorry, 0.6 seconds for a single 1024-bit RSA_private_decrypt() (128
bytes of data) is not plausible, but you say you have just over 8KB of
data, which would take ~65 calls to RSA_private_decrypt() to decrypt
piecewise.  It sure looks like you're measuring something other than
what you claim to be measuring, or not describing it accurately.

    OpenSSL 1.1.1c-dev  xx XXX xxxx
    options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
    compiler: cc -fPIC -arch x86_64 -g -O0 -Wall -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -D_REENTRANT
                      sign    verify    sign/s verify/s
    rsa 1024 bits 0.000135s 0.000013s   7414.8  78566.9

On my laptop RSA_private_decrypt (aka sign) takes 135 microseconds.  You
claim 600 milliseconds for perhaps ~60 calls, which might be 10ms each,
but that still is about two orders of magnitude too slow.

So, sorry whatever you're measuring, it is not the performance of
RSA_private_decrypt().

> While I'm ok with the execution speed with OpenSSL 1.0.2, I'd like to
> figure out why the times doubled with OpenSSL 1.1.1.  

Neither is a reasonable performance level, but also it is not reasonable
to use RSA for bulk data encryption.

> I'm logging times before and after the calls to RSA_private_decrypt.

How many calls?  What else is happening to feed the data into the
decryption algorithm, and reassemble the output?

> With OpenSSL 1.0.2 it takes on average about 4-8 milliseconds for each
> RSA_private_decrypt call.  With OpenSSL 1.1.1d, it takes 10-15
> milliseconds for each RSA_private_decrypt call.

Now we see that you're in fact chunking data for multiple calls to
"decrypt" via RSA.  That's a fatal design flaw. This is not a valid
operating mode for RSA.  You MUST NOT do this.

> >> I'm wondering if perhaps my build configuration is incorrect or
> >> missing something for the 1.1.1d build.  Here are the configuration
> >> parameters for the 64-bit build:

You have a deeper problem, your use of RSA is broken.

> The data being decrypted is local on the client machine and is just an XML file.
> RSA key is 1024 bits.  
> I'm using OAEP padding.

This is a mistake, for asymmetric encryption you should be using CMS.

> Thank you for the information.  I removed it from the configuration
> parameters.  I didn't really notice a difference in execution time
> though.  I also removed the no-asm parameter, setup nasm, and rebuilt
> with no noticeable changes.  

Likely the time is dominated by something other than the RSA operations,
but since those are mistake anyway, it hardly matters.

> > I logged things granular enough to see the speed difference was in
> > RSA_private_decrypt, but I'm not sure why it is so much slower with
> > 1.1.1d.  Any help or ideas would be appreciated!

STOP.  Fix your design to use CMS.  Report any performance differences
in CMS between 1.0.2 and 1.1.1 when built correctly with asm support.

> >At 600ms for 8KB, it is not plausible that the time is spend doing
> >cryptography.  That's barely fast enough to feed a 1980's modem.
>
> I would expect the execution times to be more in line with what I saw
> with Linux for both 1.0.2 and 1.1.1.  But even so, I do not understand
> why just upgrading to 1.1.1 causes the RSA_private_decrypt calls to
> double in execution time from what they were with 1.0.2?

I would expect execution times that are 2 to 3 orders of magnitude
faster, especially if you were using sound cryptographic primitives.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Decryption slower in 1.1.1 branch?

hamedsalini

در تاریخ چهارشنبه ۲۹ ژانویهٔ ۲۰۲۰،‏ ۰:۰۵ Viktor Dukhovni <[hidden email]> نوشت:
On Tue, Jan 28, 2020 at 06:24:06PM +0000, Dan Heinz wrote:

> >RSA is not intended for bulk data decryption, its intended uses are
> >key transport and signing.  Bulk data decryption is done via AES or
> >similar.

It sounds like you're directly encrypting data with RSA.  That's a
mistake.  RSA is for decrypting a symmetric algorithm key, that then
decrypts the data.

> >Are you sure that's seconds and not milliseconds?  These are absurdly
> >long times, almost certainly dominated by factors other than the
> >encryption algorithms.  On my 2015 laptop (MacOS) I get:
>
> Yes, it is seconds. 

Sorry, 0.6 seconds for a single 1024-bit RSA_private_decrypt() (128
bytes of data) is not plausible, but you say you have just over 8KB of
data, which would take ~65 calls to RSA_private_decrypt() to decrypt
piecewise.  It sure looks like you're measuring something other than
what you claim to be measuring, or not describing it accurately.

    OpenSSL 1.1.1c-dev  xx XXX xxxx
    options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr)
    compiler: cc -fPIC -arch x86_64 -g -O0 -Wall -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -D_REENTRANT
                      sign    verify    sign/s verify/s
    rsa 1024 bits 0.000135s 0.000013s   7414.8  78566.9

On my laptop RSA_private_decrypt (aka sign) takes 135 microseconds.  You
claim 600 milliseconds for perhaps ~60 calls, which might be 10ms each,
but that still is about two orders of magnitude too slow.

So, sorry whatever you're measuring, it is not the performance of
RSA_private_decrypt().

> While I'm ok with the execution speed with OpenSSL 1.0.2, I'd like to
> figure out why the times doubled with OpenSSL 1.1.1. 

Neither is a reasonable performance level, but also it is not reasonable
to use RSA for bulk data encryption.

> I'm logging times before and after the calls to RSA_private_decrypt.

How many calls?  What else is happening to feed the data into the
decryption algorithm, and reassemble the output?

> With OpenSSL 1.0.2 it takes on average about 4-8 milliseconds for each
> RSA_private_decrypt call.  With OpenSSL 1.1.1d, it takes 10-15
> milliseconds for each RSA_private_decrypt call.

Now we see that you're in fact chunking data for multiple calls to
"decrypt" via RSA.  That's a fatal design flaw. This is not a valid
operating mode for RSA.  You MUST NOT do this.

> >> I'm wondering if perhaps my build configuration is incorrect or
> >> missing something for the 1.1.1d build.  Here are the configuration
> >> parameters for the 64-bit build:

You have a deeper problem, your use of RSA is broken.

> The data being decrypted is local on the client machine and is just an XML file.
> RSA key is 1024 bits. 
> I'm using OAEP padding.

This is a mistake, for asymmetric encryption you should be using CMS.

> Thank you for the information.  I removed it from the configuration
> parameters.  I didn't really notice a difference in execution time
> though.  I also removed the no-asm parameter, setup nasm, and rebuilt
> with no noticeable changes. 

Likely the time is dominated by something other than the RSA operations,
but since those are mistake anyway, it hardly matters.

> > I logged things granular enough to see the speed difference was in
> > RSA_private_decrypt, but I'm not sure why it is so much slower with
> > 1.1.1d.  Any help or ideas would be appreciated!

STOP.  Fix your design to use CMS.  Report any performance differences
in CMS between 1.0.2 and 1.1.1 when built correctly with asm support.

> >At 600ms for 8KB, it is not plausible that the time is spend doing
> >cryptography.  That's barely fast enough to feed a 1980's modem.
>
> I would expect the execution times to be more in line with what I saw
> with Linux for both 1.0.2 and 1.1.1.  But even so, I do not understand
> why just upgrading to 1.1.1 causes the RSA_private_decrypt calls to
> double in execution time from what they were with 1.0.2?

I would expect execution times that are 2 to 3 orders of magnitude
faster, especially if you were using sound cryptographic primitives.

--
    Viktor.