OpenSSL 1.1.1g Windows build slow rsa tests

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

OpenSSL 1.1.1g Windows build slow rsa tests

Dan Heinz

Hello,

 

I’m building openssl 1.1.1g  on multiple platforms and I found that the rsa speed tests are significantly slower in my build than on the other OS platforms (Linux and macOS). 

 

I downloaded a Windows 64-bit binary distribution of openssl from https://kb.firedaemon.com/support/solutions/articles/4000121705 as they include the configure parameters used for their build.

I ran the speed rsa tests on their openssl Windows 64-bit binary and they were much faster than the tests on my build.

 

Here’s some output.
My openssl binary executed with openssl speed rsa:

Doing 2048 bits private rsa's for 10s: 409 2048 bits private RSA's in 10.00s

Doing 2048 bits public rsa's for 10s: 15663 2048 bits public RSA's in 10.02s

Doing 4096 bits private rsa's for 10s: 60 4096 bits private RSA's in 10.00s

Doing 4096 bits public rsa's for 10s: 4316 4096 bits public RSA's in 10.02s

OpenSSL 1.1.1g  21 Apr 2020

built on: Wed Jan 20 18:38:14 2021 UTC

options:bn(64,64) rc4(int) des(long) aes(partial) blowfish(ptr)

compiler: cl /Fdossl_static.pdb  /Gs0 /GF /Gy /MT /Zi /W3 /wd4090 /nologo /O2 -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_NO_DEPRECATED

                  sign    verify    sign/s verify/s

rsa 2048 bits 0.024450s 0.000639s     40.9   1563.9

rsa 4096 bits 0.166667s 0.002321s      6.0    430.9

 

Here is the downloaded binary from https://kb.firedaemon.com/support/solutions/articles/4000121705:
Doing 2048 bits private rsa's for 10s: 1622 2048 bits private RSA's in 10.02s

Doing 2048 bits public rsa's for 10s: 72622 2048 bits public RSA's in 10.00s

Doing 4096 bits private rsa's for 10s: 255 4096 bits private RSA's in 10.03s

Doing 4096 bits public rsa's for 10s: 18976 4096 bits public RSA's in 10.00s

OpenSSL 1.1.1j-dev  xx XXX xxxx

built on: Wed Jan  6 11:11:12 2021 UTC

options:bn(64,64) rc4(int) des(long) aes(partial) idea(int) blowfish(ptr)

compiler: cl /Zi /Fdossl_static.pdb /Gs0 /GF /Gy /MD /W3 /wd4090 /nologo /O2 -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_NO_DEPRECATED

                  sign    verify    sign/s verify/s

rsa 2048 bits 0.006175s 0.000138s    161.9   7262.2

rsa 4096 bits 0.039338s 0.000527s     25.4   1897.6

 

That is a little over 4 times faster.

 

Here are my configure parameters:
Configure VC-WIN64A no-shared  no-asm no-idea no-mdc2 no-rc5 no-ssl2 no-ssl3 no-zlib no-comp no-pinshared no-ui-console  -DOPENSSL_NO_DEPRECATED --api=1.1.0

 

And their configure parameters:
Configure VC-WIN64A no-asm no-ssl3 no-zlib no-comp no-ui-console --api=1.1.0 --prefix="%openssl-dst%" --openssldir=ssl -DOPENSSL_NO_DEPRECATED

 

Both my build and theirs are built with Visual Studio 2015.

Any ideas why my build is so much slower?  Is there something in my configuration that might cause this? 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: OpenSSL 1.1.1g Windows build slow rsa tests

Dr Paul Dale-2
Try building without the no-asm configuration option.

Pauli

On 21/1/21 6:18 am, Dan Heinz wrote:

> Hello,
>
> I’m building openssl 1.1.1g  on multiple platforms and I found that the
> rsa speed tests are significantly slower in my build than on the other
> OS platforms (Linux and macOS).
>
> I downloaded a Windows 64-bit binary distribution of openssl from
> https://kb.firedaemon.com/support/solutions/articles/4000121705 
> <https://kb.firedaemon.com/support/solutions/articles/4000121705> as
> they include the configure parameters used for their build.
>
> I ran the speed rsa tests on their openssl Windows 64-bit binary and
> they were much faster than the tests on my build.
>
> Here’s some output.
> My openssl binary executed with openssl speed rsa:
>
> Doing 2048 bits private rsa's for 10s: 409 2048 bits private RSA's in 10.00s
>
> Doing 2048 bits public rsa's for 10s: 15663 2048 bits public RSA's in 10.02s
>
> Doing 4096 bits private rsa's for 10s: 60 4096 bits private RSA's in 10.00s
>
> Doing 4096 bits public rsa's for 10s: 4316 4096 bits public RSA's in 10.02s
>
> OpenSSL 1.1.1g  21 Apr 2020
>
> built on: Wed Jan 20 18:38:14 2021 UTC
>
> options:bn(64,64) rc4(int) des(long) aes(partial) blowfish(ptr)
>
> compiler: cl /Fdossl_static.pdb  /Gs0 /GF /Gy /MT /Zi /W3 /wd4090
> /nologo /O2 -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_NO_DEPRECATED
>
>                    sign    verify    sign/s verify/s
>
> rsa 2048 bits 0.024450s 0.000639s     40.9   1563.9
>
> rsa 4096 bits 0.166667s 0.002321s      6.0    430.9
>
> Here is the downloaded binary from
> https://kb.firedaemon.com/support/solutions/articles/4000121705 
> <https://kb.firedaemon.com/support/solutions/articles/4000121705>:
> Doing 2048 bits private rsa's for 10s: 1622 2048 bits private RSA's in
> 10.02s
>
> Doing 2048 bits public rsa's for 10s: 72622 2048 bits public RSA's in 10.00s
>
> Doing 4096 bits private rsa's for 10s: 255 4096 bits private RSA's in 10.03s
>
> Doing 4096 bits public rsa's for 10s: 18976 4096 bits public RSA's in 10.00s
>
> OpenSSL 1.1.1j-dev  xx XXX xxxx
>
> built on: Wed Jan  6 11:11:12 2021 UTC
>
> options:bn(64,64) rc4(int) des(long) aes(partial) idea(int) blowfish(ptr)
>
> compiler: cl /Zi /Fdossl_static.pdb /Gs0 /GF /Gy /MD /W3 /wd4090 /nologo
> /O2 -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_NO_DEPRECATED
>
>                    sign    verify    sign/s verify/s
>
> rsa 2048 bits 0.006175s 0.000138s    161.9   7262.2
>
> rsa 4096 bits 0.039338s 0.000527s     25.4   1897.6
>
> That is a little over 4 times faster.
>
> Here are my configure parameters:
> Configure VC-WIN64A no-shared  no-asm no-idea no-mdc2 no-rc5 no-ssl2
> no-ssl3 no-zlib no-comp no-pinshared no-ui-console
>   -DOPENSSL_NO_DEPRECATED --api=1.1.0
>
> And their configure parameters:
> Configure VC-WIN64Ano-asm no-ssl3 no-zlib no-comp no-ui-console
> --api=1.1.0 --prefix="%openssl-dst%" --openssldir=ssl
> -DOPENSSL_NO_DEPRECATED
>
> Both my build and theirs are built with Visual Studio 2015.
>
> Any ideas why my build is so much slower?  Is there something in my
> configuration that might cause this?
>
Reply | Threaded
Open this post in threaded view
|

RE: OpenSSL 1.1.1g Windows build slow rsa tests

Michael Wojcik
> From: openssl-users <[hidden email]> On Behalf Of Dr Paul
> Dale
> Sent: Wednesday, 20 January, 2021 16:19
>
> Try building without the no-asm configuration option.

That was my first thought, but according to Dan's message, the firedaemon version is also built with no-asm.

The only relevant differences I see between the two builds are static (Dan's) versus dynamic (firedaemon's) linkage:

> On 21/1/21 6:18 am, Dan Heinz wrote:

> > compiler: cl /Fdossl_static.pdb  /Gs0 /GF /Gy /MT /Zi /W3 /wd4090
> > /nologo /O2 -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_NO_DEPRECATED

/MT uses the static-linked MSVC runtime.

> > Here is the downloaded binary from
> > https://kb.firedaemon.com/support/solutions/articles/4000121705
> > <https://kb.firedaemon.com/support/solutions/articles/4000121705>:
> > compiler: cl /Zi /Fdossl_static.pdb /Gs0 /GF /Gy /MD /W3 /wd4090 /nologo
> > /O2 -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_NO_DEPRECATED

/MD uses the dynamic-linked MSVC runtime.

> > Here are my configure parameters:
> > Configure VC-WIN64A no-shared  no-asm no-idea no-mdc2 no-rc5 no-ssl2
> > no-ssl3 no-zlib no-comp no-pinshared no-ui-console
> >   -DOPENSSL_NO_DEPRECATED --api=1.1.0
> >
> > And their configure parameters:
> > Configure VC-WIN64Ano-asm no-ssl3 no-zlib no-comp no-ui-console
> > --api=1.1.0 --prefix="%openssl-dst%" --openssldir=ssl
> > -DOPENSSL_NO_DEPRECATED

Assuming the lack of a space between "VC_WIN64A" and "no-asm" is a typo, they're also building with no-asm, and the only significant difference for this case that I can see is no-shared. (no-pinshared looks even less likely to affect this test, and does it even have any effect when building no-shared?)

Linking with /MT will affect code size and layout, which could adversely affect code caching. It's not impossible that would have a factor-of-four penalty on compute-bound code. I'm reluctant to conclude that's the problem, though, without more evidence.

Unfortunately tracking this down would likely require profiling.

That's assuming Dan is correct about the firedaemon build being configured with no-asm.

--
Michael Wojcik
Reply | Threaded
Open this post in threaded view
|

Re: OpenSSL 1.1.1g Windows build slow rsa tests

Dr Paul Dale-2
I'd suggest giving a build without the no-asm option a try.  The
performance difference is usually quite significant.

Statis vs dynamic builds wouldn't normally be associated with such a
large difference.  If the difference were routinely this large, nobody
would use dynamic linking.


Pauli

On 21/1/21 10:37 am, Michael Wojcik wrote:

>> From: openssl-users <[hidden email]> On Behalf Of Dr Paul
>> Dale
>> Sent: Wednesday, 20 January, 2021 16:19
>>
>> Try building without the no-asm configuration option.
>
> That was my first thought, but according to Dan's message, the firedaemon version is also built with no-asm.
>
> The only relevant differences I see between the two builds are static (Dan's) versus dynamic (firedaemon's) linkage:
>
>> On 21/1/21 6:18 am, Dan Heinz wrote:
>
>>> compiler: cl /Fdossl_static.pdb  /Gs0 /GF /Gy /MT /Zi /W3 /wd4090
>>> /nologo /O2 -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_NO_DEPRECATED
>
> /MT uses the static-linked MSVC runtime.
>
>>> Here is the downloaded binary from
>>> https://kb.firedaemon.com/support/solutions/articles/4000121705
>>> <https://kb.firedaemon.com/support/solutions/articles/4000121705>:
>>> compiler: cl /Zi /Fdossl_static.pdb /Gs0 /GF /Gy /MD /W3 /wd4090 /nologo
>>> /O2 -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_NO_DEPRECATED
>
> /MD uses the dynamic-linked MSVC runtime.
>
>>> Here are my configure parameters:
>>> Configure VC-WIN64A no-shared  no-asm no-idea no-mdc2 no-rc5 no-ssl2
>>> no-ssl3 no-zlib no-comp no-pinshared no-ui-console
>>>    -DOPENSSL_NO_DEPRECATED --api=1.1.0
>>>
>>> And their configure parameters:
>>> Configure VC-WIN64Ano-asm no-ssl3 no-zlib no-comp no-ui-console
>>> --api=1.1.0 --prefix="%openssl-dst%" --openssldir=ssl
>>> -DOPENSSL_NO_DEPRECATED
>
> Assuming the lack of a space between "VC_WIN64A" and "no-asm" is a typo, they're also building with no-asm, and the only significant difference for this case that I can see is no-shared. (no-pinshared looks even less likely to affect this test, and does it even have any effect when building no-shared?)
>
> Linking with /MT will affect code size and layout, which could adversely affect code caching. It's not impossible that would have a factor-of-four penalty on compute-bound code. I'm reluctant to conclude that's the problem, though, without more evidence.
>
> Unfortunately tracking this down would likely require profiling.
>
> That's assuming Dan is correct about the firedaemon build being configured with no-asm.
>
> --
> Michael Wojcik
>
Reply | Threaded
Open this post in threaded view
|

RE: OpenSSL 1.1.1g Windows build slow rsa tests

Michael Wojcik
> From: openssl-users <[hidden email]> On Behalf Of Dr Paul
> Dale
> Sent: Wednesday, 20 January, 2021 19:28
>
> I'd suggest giving a build without the no-asm option a try.  The
> performance difference is usually quite significant.

I agree. It just doesn't explain what Dan's email claims.

> Statis vs dynamic builds wouldn't normally be associated with such a
> large difference.  If the difference were routinely this large, nobody
> would use dynamic linking.

In this case it's the static-linked version which is slower. But I'd be surprised if that's actually the cause.

--
Michael Wojcik
Reply | Threaded
Open this post in threaded view
|

RE: OpenSSL 1.1.1g Windows build slow rsa tests

Dan Heinz
-----Original Message-----
From: openssl-users <[hidden email]> On Behalf Of Michael Wojcik
Sent: Thursday, January 21, 2021 9:28 AM
To: [hidden email]
Subject: RE: OpenSSL 1.1.1g Windows build slow rsa tests

> >From: openssl-users <[hidden email]> On Behalf Of
> >Dr Paul Dale
> >Sent: Wednesday, 20 January, 2021 19:28
>>
>> I'd suggest giving a build without the no-asm option a try.  The
>> performance difference is usually quite significant.

>I agree. It just doesn't explain what Dan's email claims.

>> Statis vs dynamic builds wouldn't normally be associated with such a
>> large difference.  If the difference were routinely this large, nobody
>> would use dynamic linking.

>In this case it's the static-linked version which is slower. But I'd be surprised if that's actually the cause.

Thank you all for the helpful suggestions.  When I removed no-asm and built using nmake in the Developer Command  Prompt for Visual Studio 2015, I ended up getting an error "VC-WIN64A X86 conflicts with target x64".  From the command prompt I ran cl and saw this "Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x86".  So I was building for x86?  I'm not sure why it built with no-asm, but it did.

Once I ran the correct command prompt (I used Visual Studio x64 Native Tools Command Prompt), I saw a huge speed increase.  For example, 2048 bits:
Doing 2048 bits private rsa's for 10s: 8384 2048 bits private RSA's in 10.02s
Doing 2048 bits public rsa's for 10s: 236090 2048 bits public RSA's in 9.98s

Previously, I saw:
Doing 2048 bits private rsa's for 10s: 409 2048 bits private RSA's in 10.00s
Doing 2048 bits public rsa's for 10s: 15663 2048 bits public RSA's in 10.02s

For further testing, I added back no-asm and my speed tests were in line with the downloaded openssl binary I was testing with.  
Doing 2048 bits private rsa's for 10s: 1868 2048 bits private RSA's in 10.00s
Doing 2048 bits public rsa's for 10s: 71338 2048 bits public RSA's in 10.02s

You can see removing no-asm does make a pretty large speed increase too.

In summary, using the correct build tools helps (although I am surprised it built with no-asm).  And removing no-asm sped things up.
Reply | Threaded
Open this post in threaded view
|

Re: OpenSSL 1.1.1g Windows build slow rsa tests

Jan Just Keijser-2
Hi Dan,

On 21/01/21 19:22, Dan Heinz wrote:
> [...]

> Thank you all for the helpful suggestions. When I removed no-asm and
> built using nmake in the Developer Command Prompt for Visual Studio
> 2015, I ended up getting an error "VC-WIN64A X86 conflicts with target
> x64". From the command prompt I ran cl and saw this "Microsoft (R)
> C/C++ Optimizing Compiler Version 19.00.24215.1 for x86". So I was
> building for x86? I'm not sure why it built with no-asm, but it did.
> Once I ran the correct command prompt (I used Visual Studio x64 Native Tools Command Prompt), I saw a huge speed increase.  For example, 2048 bits:
> Doing 2048 bits private rsa's for 10s: 8384 2048 bits private RSA's in 10.02s
> Doing 2048 bits public rsa's for 10s: 236090 2048 bits public RSA's in 9.98s
>
> Previously, I saw:
> Doing 2048 bits private rsa's for 10s: 409 2048 bits private RSA's in 10.00s
> Doing 2048 bits public rsa's for 10s: 15663 2048 bits public RSA's in 10.02s
>
> For further testing, I added back no-asm and my speed tests were in line with the downloaded openssl binary I was testing with.
> Doing 2048 bits private rsa's for 10s: 1868 2048 bits private RSA's in 10.00s
> Doing 2048 bits public rsa's for 10s: 71338 2048 bits public RSA's in 10.02s
>
> You can see removing no-asm does make a pretty large speed increase too.
>
> In summary, using the correct build tools helps (although I am surprised it built with no-asm).  And removing no-asm sped things up.

Not sure why you'd want to do a 'no-asm' build to begin with, but
another thing worth testing with your "asm" build is to run the speed
test like this:
  set OPENSSL_ia32cap=0
  openssl speed rsa
(Linux/UNIX:  OPENSSL_ia32cap=0 openssl speed rsa)

On my (10th gen Intel ) laptop this gives me a ~35% performance hit.
Explanation:
- no-asm build -> compiler generates all code, no hand-tuned assembly
used at all; should be slowest

- asm build + OPENSSL_ia32cap=0  -> no newer CPU features used, but
hand-tuned assembly is used. Especially AES encryption takes a hit if
you disable these newer features

- asm build -> hand-tuned assemby, including the use of all new CPU
features such as AES, SHA etc.

I've found that this sometimes helps manage expectations when the "build
environment" CPU and the "runtime environment" CPU are very different.
I've seen a developer claim his/her code runs blazingly fast on his/her
Core i7 bla bla but when deploying it on a cheaper runtime device
performance is terrible.

Note that no-asm + OPENSSL_ia32cap=0 should not have any effect compared
to "no-asm".

JM2CW,

JJK / Jan Just Keijser