Slow crypto initialization.

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow crypto initialization.

Brian Makin-2
I am seeing a very slow initialization on a single Windows 2003 box with
openssl-0.9.8l.

During initialization the function RAND_screen gets called.  This
effectively hashes the frame buffer to generate entropy.  In our case we
are running as an IIS user and I'm not even sure what screen it's
getting.

This function takes on the order of 3 seconds.

We have other identical boxes which are behaving correctly and a single
box which is very slow.

Two questions.
1. It appears that this is deprecated so would it be reasonable to
simply remove it?

2. Does anyone have any idea why this function is misbehaving?

--
BRIAN MAKIN
Senior Software Engineer
[hidden email]

Vivisimo [Search Done Right™]
1710 Murray Avenue
Pittsburgh, PA 15217 USA
tel: +1.412.422.2499
vivisimo.com

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Slow crypto initialization.

Ger Hobbelt
:-( I hope I recall correctly that what I mention next is indeed stuff happening in RAND_screen()... IIRC RAND_screen() isn't 'only' reading the screen but also doing a system-level heap traversal and a few other things and it was exactly that system-level heap traversal that slowed a few spurious Win boxes down to a snail's pace, so my take is it doing that to yours too.
Has featured a few times here; there's a a known but quite unpredictable issue in the Windows specific heap walking code in there (I'd have to check the code to see again which calls it were, exactly); there's been a patch to at least limit the scan to an upper bound in time and as with anything entropy related, there's always the question 'is it really random?' or better: 'is it good enough random?'

The technical (software) part of the issue is that the problem occurs only on some machines and is quite unpredictable in where it might pop up and when; since OpenSSL accesses some Win-internal structures there, which have been documented to some degree, the problem is 'known' in that we know it can happen, and a few fixes have been added to the code to at least limit the bother  to an upper bound, but the issue of 'slow' isn't exactly /fixable/: turns out the machine is spending all that time inside the Windows OS itself and OpenSSL has no control over that, once it's called that one Win API. Some boxes just take ages in there for unknown reasons.


Depending on what your servers do, the 'making certain' move re entropy is to connect a hardware entropy source to the box, but that's probably off-topic here (unless it's an IIS webserver working in a military/banking setting). Anyway, removing [semi-]entropy sources /is/ an option, but it's dangerous as removing them one by one in the end delivers zero entropy and we've to thank a Debian fellow for [accidentally, but /quite/ noticably] showing everyone what happens when you like cleaning up so much you lose a suddenly-after-the-fact significant chunk of entropy gathering while your streetcred is in the can, permanently.

Personally, I wouldn't even bother about those 3 secs and let it do what it wants doing, as they [the 3 secs] only happen at library /init/ time, i.e. server [re]start. Of course, there'll be plenty who say 'just remove the code', but it gets quite inconclusive if you only count the 'votes' from security/cryptography knowledgable folks. And, no, that heap walk in RAND_screen() isn't a big source of entropy, probably small, rather, but you grab what you can, when you don't go the hardware entropy source route: you have to realize that you're 'faking' the whole entropy thing right there, all the way, so the game isn't about entropy-as-is but about making it bloody dang hard for any hacker to predict what your 'random' pool looks like at time t. There are no hard and fast answers to the question 'when have I done enough gathering?'   And RAND_Screen() does add several chunks of unpredictability to that game. Now how many bits of /entropy/ it's delivering, I won't (and can't) say (OpenSSL takes a guess, but that's all); it's checking several sources and eliminating sources [one at a time] because they bother you is a plenty dangerous game if you don't /exactly/ know what you're doing. Hence my basic answer: 'let it be'; maybe not what you'd like to hear, but it saves losts of $$$ in discussion / security review / calamities down the line.

[For the monetarily inclined, this subject has been discussed a lot in the ML before and when you count those emails @ some hourly rate and see what the result (or rather: the amount of change) is, then calc that cost sum and compare with X times a slower restart of N servers and the $-quantified cost, material and immaterial, of that... yeah. Let it be.]


And when you go about 'removing it' anyway, be very very careful WHAT you remove, because I don't think it's the screen sampling that'll turn out to eat the cycles on that one box of yours but the heap traversal sys calls which are part of RAND_poll()/RAND_screen() and they are only a part of the whole RAND_whatever entropy collecting thing.


Bottom line: commenting out the call(s) to RAND_screen() would quite definitely turn you out as 'the IIS guy who's related to that Debian guy y'all heard about before' several months down the line. A slightly 'smarter' removal would take out that heap walk loop if it /really/ hurts, but remember... Cave canem! (And this one has a /serious/ bite to it!)



On Wed, Jun 30, 2010 at 4:11 PM, Brian Makin <[hidden email]> wrote:
I am seeing a very slow initialization on a single Windows 2003 box with
openssl-0.9.8l.

During initialization the function RAND_screen gets called.  This
effectively hashes the frame buffer to generate entropy.  In our case we
are running as an IIS user and I'm not even sure what screen it's
getting.

This function takes on the order of 3 seconds.

We have other identical boxes which are behaving correctly and a single
box which is very slow.

Two questions.
1. It appears that this is deprecated so would it be reasonable to
simply remove it?

2. Does anyone have any idea why this function is misbehaving?

--
BRIAN MAKIN
Senior Software Engineer
[hidden email]

Vivisimo [Search Done Right™]
1710 Murray Avenue
Pittsburgh, PA 15217 USA
tel: +1.412.422.2499
vivisimo.com

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]



--
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
       http://www.hebbut.net/
mail:   [hidden email]
mobile: +31-6-11 120 978
--------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Slow crypto initialization.

Dr. Stephen Henson
On Wed, Jun 30, 2010, Ger Hobbelt wrote:

> :-( I hope I recall correctly that what I mention next is indeed stuff
> happening in RAND_screen()... IIRC RAND_screen() isn't 'only' reading the
> screen but also doing a system-level heap traversal and a few other things
> and it was exactly that system-level heap traversal that slowed a few
> spurious Win boxes down to a snail's pace, so my take is it doing that to
> yours too.

Just a correction. RAND_screen() doesn't perform those other activities such
as heap walking, those happen during the automatic entropy gather on Windows.
There are known issues with Windows 7 and some 64 bit versions of Windows two.

I asked about this and the actual bug is the previous "quick" behaviour. The
functions apparently need to do a lot of time consuming things which previous
versions of windows didn't handle correctly.

I'd first suggest trying a newer version of OpenSSL such as 0.9.8o which does
include some timing code that halts heap walking if it is taking too long. It
isn't the only source of entropy, there are several others including the
CryptoAPI PRNG.

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Slow crypto initialization.

Dr. Stephen Henson
On Wed, Jun 30, 2010, Dr. Stephen Henson wrote:

> On Wed, Jun 30, 2010, Ger Hobbelt wrote:
>
> > :-( I hope I recall correctly that what I mention next is indeed stuff
> > happening in RAND_screen()... IIRC RAND_screen() isn't 'only' reading the
> > screen but also doing a system-level heap traversal and a few other things
> > and it was exactly that system-level heap traversal that slowed a few
> > spurious Win boxes down to a snail's pace, so my take is it doing that to
> > yours too.
>
> Just a correction. RAND_screen() doesn't perform those other activities such
> as heap walking, those happen during the automatic entropy gather on Windows.
> There are known issues with Windows 7 and some 64 bit versions of Windows two.
>

Ooops, I should check the code before making statements like that... bugger.

RAND_screen() used to just add the screen. It now also does the auto PRNG
seeding via RAND_poll() as well, though that also happens automatically too.

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Slow crypto initialization.

Brian Makin-2
In reply to this post by Ger Hobbelt

Thank you... this is mostly what I expected.
In our case we having a problem with a CGI program so the response time
is important and initialization happens many times.

We may just have to hope no other boxes display this behavior :)

On Wed, 2010-06-30 at 19:19 +0200, Ger Hobbelt wrote:

> :-( I hope I recall correctly that what I mention next is indeed stuff
> happening in RAND_screen()... IIRC RAND_screen() isn't 'only' reading
> the screen but also doing a system-level heap traversal and a few
> other things and it was exactly that system-level heap traversal that
> slowed a few spurious Win boxes down to a snail's pace, so my take is
> it doing that to yours too.
> Has featured a few times here; there's a a known but quite
> unpredictable issue in the Windows specific heap walking code in there
> (I'd have to check the code to see again which calls it were,
> exactly); there's been a patch to at least limit the scan to an upper
> bound in time and as with anything entropy related, there's always the
> question 'is it really random?' or better: 'is it good enough random?'
>
> The technical (software) part of the issue is that the problem occurs
> only on some machines and is quite unpredictable in where it might pop
> up and when; since OpenSSL accesses some Win-internal structures
> there, which have been documented to some degree, the problem is
> 'known' in that we know it can happen, and a few fixes have been added
> to the code to at least limit the bother  to an upper bound, but the
> issue of 'slow' isn't exactly /fixable/: turns out the machine is
> spending all that time inside the Windows OS itself and OpenSSL has no
> control over that, once it's called that one Win API. Some boxes just
> take ages in there for unknown reasons.
>
>
> Depending on what your servers do, the 'making certain' move re
> entropy is to connect a hardware entropy source to the box, but that's
> probably off-topic here (unless it's an IIS webserver working in a
> military/banking setting). Anyway, removing [semi-]entropy
> sources /is/ an option, but it's dangerous as removing them one by one
> in the end delivers zero entropy and we've to thank a Debian fellow
> for [accidentally, but /quite/ noticably] showing everyone what
> happens when you like cleaning up so much you lose a
> suddenly-after-the-fact significant chunk of entropy gathering while
> your streetcred is in the can, permanently.
>
> Personally, I wouldn't even bother about those 3 secs and let it do
> what it wants doing, as they [the 3 secs] only happen at
> library /init/ time, i.e. server [re]start. Of course, there'll be
> plenty who say 'just remove the code', but it gets quite inconclusive
> if you only count the 'votes' from security/cryptography knowledgable
> folks. And, no, that heap walk in RAND_screen() isn't a big source of
> entropy, probably small, rather, but you grab what you can, when you
> don't go the hardware entropy source route: you have to realize that
> you're 'faking' the whole entropy thing right there, all the way, so
> the game isn't about entropy-as-is but about making it bloody dang
> hard for any hacker to predict what your 'random' pool looks like at
> time t. There are no hard and fast answers to the question 'when have
> I done enough gathering?'   And RAND_Screen() does add several chunks
> of unpredictability to that game. Now how many bits of /entropy/ it's
> delivering, I won't (and can't) say (OpenSSL takes a guess, but that's
> all); it's checking several sources and eliminating sources [one at a
> time] because they bother you is a plenty dangerous game if you
> don't /exactly/ know what you're doing. Hence my basic answer: 'let it
> be'; maybe not what you'd like to hear, but it saves losts of $$$ in
> discussion / security review / calamities down the line.
>
> [For the monetarily inclined, this subject has been discussed a lot in
> the ML before and when you count those emails @ some hourly rate and
> see what the result (or rather: the amount of change) is, then calc
> that cost sum and compare with X times a slower restart of N servers
> and the $-quantified cost, material and immaterial, of that... yeah.
> Let it be.]
>
>
> And when you go about 'removing it' anyway, be very very careful WHAT
> you remove, because I don't think it's the screen sampling that'll
> turn out to eat the cycles on that one box of yours but the heap
> traversal sys calls which are part of RAND_poll()/RAND_screen() and
> they are only a part of the whole RAND_whatever entropy collecting
> thing.
>
>
> Bottom line: commenting out the call(s) to RAND_screen() would quite
> definitely turn you out as 'the IIS guy who's related to that Debian
> guy y'all heard about before' several months down the line. A slightly
> 'smarter' removal would take out that heap walk loop if it /really/
> hurts, but remember... Cave canem! (And this one has a /serious/ bite
> to it!)
>
>
>
> On Wed, Jun 30, 2010 at 4:11 PM, Brian Makin <[hidden email]>
> wrote:
>         I am seeing a very slow initialization on a single Windows
>         2003 box with
>         openssl-0.9.8l.
>        
>         During initialization the function RAND_screen gets called.
>          This
>         effectively hashes the frame buffer to generate entropy.  In
>         our case we
>         are running as an IIS user and I'm not even sure what screen
>         it's
>         getting.
>        
>         This function takes on the order of 3 seconds.
>        
>         We have other identical boxes which are behaving correctly and
>         a single
>         box which is very slow.
>        
>         Two questions.
>         1. It appears that this is deprecated so would it be
>         reasonable to
>         simply remove it?
>        
>         2. Does anyone have any idea why this function is misbehaving?
>        
>         --
>         BRIAN MAKIN
>         Senior Software Engineer
>         [hidden email]
>        
>         Vivisimo [Search Done Right™]
>         1710 Murray Avenue
>         Pittsburgh, PA 15217 USA
>         tel: +1.412.422.2499
>         vivisimo.com
>        
>         ______________________________________________________________________
>         OpenSSL Project
>         http://www.openssl.org
>         User Support Mailing List
>          [hidden email]
>         Automated List Manager
>         [hidden email]
>
>
>
> --
> Met vriendelijke groeten / Best regards,
>
> Ger Hobbelt
>
> --------------------------------------------------
> web:    http://www.hobbelt.com/
>        http://www.hebbut.net/
> mail:   [hidden email]
> mobile: +31-6-11 120 978
> --------------------------------------------------
>


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Slow crypto initialization.

Brian Makin-2
In reply to this post by Dr. Stephen Henson

This is Windows 2003, 64 bit, and it's definitely in RAND_screen.
I'm trying to move things to 1.0.0a now.

On Wed, 2010-06-30 at 20:47 +0200, Dr. Stephen Henson wrote:

> On Wed, Jun 30, 2010, Ger Hobbelt wrote:
>
> > :-( I hope I recall correctly that what I mention next is indeed stuff
> > happening in RAND_screen()... IIRC RAND_screen() isn't 'only' reading the
> > screen but also doing a system-level heap traversal and a few other things
> > and it was exactly that system-level heap traversal that slowed a few
> > spurious Win boxes down to a snail's pace, so my take is it doing that to
> > yours too.
>
> Just a correction. RAND_screen() doesn't perform those other activities such
> as heap walking, those happen during the automatic entropy gather on Windows.
> There are known issues with Windows 7 and some 64 bit versions of Windows two.
>
> I asked about this and the actual bug is the previous "quick" behaviour. The
> functions apparently need to do a lot of time consuming things which previous
> versions of windows didn't handle correctly.
>
> I'd first suggest trying a newer version of OpenSSL such as 0.9.8o which does
> include some timing code that halts heap walking if it is taking too long. It
> isn't the only source of entropy, there are several others including the
> CryptoAPI PRNG.
>
> Steve.
> --
> Dr Stephen N. Henson. OpenSSL project core developer.
> Commercial tech support now available see: http://www.openssl.org
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> User Support Mailing List                    [hidden email]
> Automated List Manager                           [hidden email]


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Slow crypto initialization.

Ger Hobbelt
In reply to this post by Brian Makin-2

On Wed, Jun 30, 2010 at 9:12 PM, Brian Makin <[hidden email]> wrote:

Thank you... this is mostly what I expected.
In our case we having a problem with a CGI program so the response time
is important and initialization happens many times.

We may just have to hope no other boxes display this behavior :)


I hadn't thought about the Win7 screen issue mentioned by Dr. Henson as you said you had several machines (of, and I assume this, similar make, OS wise) and only one playing up, which is a symptom I have learned to associate with the Rand_poll() heap walking issue. Which doesn't say I'm right, so checking makes sure here.

Since, in your case, the timeout cost is significant (as it's happening in a CGI app), it's worth checking out which is it on your box. I'd  be interested to hear whether it's the heapwalk loop or the (IIRC) read_screen() itself causing the delay. (I bet on the former as Win7 ~~ Server2008 in this and you mentioned you're running on 2003)



[OT here] just a thought but since you're init-ing and exit-ing OPenSSL repeatedly due to starting and exiting the CGI exe, you may consider 'improving your random pool over time' by adding code in the CGI to dump the random pool to file (open for write and exclusive access, just shrug and don't write at all when the OS says another instance is already /writing/, this isn't about keeping everything, best effort is what we're after) at the end of the CGI run, while RAND_add()ing the file content at CGI start. Note that I say RAND_add() so this loading some 'unknown' amount of previously gathered entropy from a file doesn't replace the 'regular' entropy gathering that happens which each start!

The thought here is that you can alleviate any suspected reduction in entropy gathering 'quality' - as sources are removed from the gathering - by accumulating the gathered entropy over an extended time, surpassing the CGI run-time lifetime boundaries by 'persisting' entropy gathered so far.
The whole file I/O thing is best effort based so for multiple CGI instances running in parallel only one gets to 'win' and the collection in the other instances are 'lost', but that's okay, in a way, as we're considering longer term here, where a CGI instance I(t+n) now gets to have a /chance/ at obtaining more entropy than it would on its own, thanks to the successful persisting action by previous instance I(t) (instance 'I' at time 't'). [An alternative to classic fopen/fwrite/fclose might be memory mapped I/O which shared write access; no interprocess locking needed as we don't care who gets to write his stuff in there, just as long as it's happening fast and we don't run the risk of a completely zeroed file content or some such nastiness.]

It's not ideal, but it at least helps cover your tracks when you cut out the offending source in OpenSSL itself.


--
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
       http://www.hebbut.net/
mail:   [hidden email]
mobile: +31-6-11 120 978
--------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Slow crypto initialization.

Jean-Marc Desperrier-2
Ger Hobbelt wrote:
> a symptom I have learned to associate with the Rand_poll() heap walking
> issue.

AFAIR some time ago there was a problem that *just the first call* to
the heap walking function would, under 64 bits Windows, take second in
some circumstances. That's clearly a bug, and only Microsoft can do
something about it.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Slow crypto initialization.

goulding (Bugzilla)
In reply to this post by Brian Makin-2
To start off with, we're sorry for any confusion we caused, but this
issue does not have any interaction with the heap-walking code. The
problem definitely has to do with the readscreen function. Making
readscreen() a no-op decreases initialization time by 3 seconds.

Delving more into the issue, we added debug prints to measure the
timing [1] of various sections of readscreen(). For our particular
case, the inner loop is called 47 times. Inside the loop, each
invocation of GetBitmapBits takes about 78ms. The total runtime of
readscreen alone was 3640ms.

The MSDN page indicates that GetBitmapBits is a 16-bit Windows
compatibility function [2]. Some random web searches seem to indicate
that it has to go through multiple layers of compatibility code,
involving acquiring a global lock of some kind.

The same MSDN page points to the suggested replacement function,
GetDIBits [3].

Please find attached a patch to modify readscreen to use the newer
GetDIBits function. On our problem machine, this decreases the time
taken in readscreen() to 94ms. The majority of the code in the patch
comes from a MSDN example on device contexts [4].

To verify that the new code functions identically, we modified the
original code and printed out the hex of the hashed data. For each
chunk of framebuffer, the hash is
1adc95bebe9eea8c112d40cd4ab7a8d75c4f961, in both the sets of code.

Of special note is that this code runs in the context of the anonymous
IIS user, which is unlikely to have a useful screen. One might wonder
how often this happens. Presumably, user-mode software can actually
access the screen, but server-type software will not have a screen.

Please let me know if we can provide further data or help. Any
comments, insight, or suggestions on the patch are appreciated.

Thanks!

[1] The timing information was provided by the Windows function
    GetSystemTimeAsFileTime.
[2] http://msdn.microsoft.com/en-us/library/dd144850(VS.85).aspx
[3] http://msdn.microsoft.com/en-us/library/dd144879(v=VS.85).aspx
[4] http://msdn.microsoft.com/en-us/library/dd183402(v=VS.85).aspx

Jake Goulding | Software Engineer
[hidden email] | Connect: www.vivisimo.com
Vivisimo - Information Optimized

----- "Brian Makin" <[hidden email]> wrote:

> This is Windows 2003, 64 bit, and it's definitely in RAND_screen.
> I'm trying to move things to 1.0.0a now.

faster-screen-bits.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Slow crypto initialization.

Ger Hobbelt
Don't be sorry, this is great work!! I'm glad the culprit has been found (and fixed)!

BTW: To help the OpenSSL core team help track and fix this, it would be good to submit your message + patch to [hidden email] so it ends up as an issue ticket in the tracker and this material does not disappear off the horizon of an ever progressing discussion list. A reference to this email thread in the RT would be handy, e.g.: http://www.bluequartz.us/phpBB2/viewtopic.php?t=131309 (the entire thread is easily viewable as a set of forum messages there, so one page carries all)


On Fri, Jul 2, 2010 at 9:13 PM, Jake Goulding <[hidden email]> wrote:
>


--
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
       http://www.hebbut.net/
mail:   [hidden email]
mobile: +31-6-11 120 978
--------------------------------------------------