1.1.1b crash (RUN_ONCE problem?)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

1.1.1b crash (RUN_ONCE problem?)

Norm Green
I'm debugging a failure in a debug build on Solaris SPARC in the below
code in rand_lib.c.  On line 744, rand_meth_lock is NULL, which suggests
the RUN_ONCE code is not working.  Wondering if anyone else has seen
this problem?
We did not see this issue in 1.1.1a.  Perhaps changes in the RUN_ONCE
code in this commit are responsible?
https://github.com/openssl/openssl/commit/f725fe5b4b6504df08e30f5194d321c3025e2336



737 const RAND_METHOD *RAND_get_rand_method(void)
738 {
739     const RAND_METHOD *tmp_meth = NULL;
740
741     if (!RUN_ONCE(&rand_init, do_rand_init))
742         return NULL;
743
744     CRYPTO_THREAD_write_lock(rand_meth_lock);
745     if (default_RAND_meth == NULL) {
746 #ifndef OPENSSL_NO_ENGINE
747         ENGINE *e;
748
749         /* If we have an engine that can do RAND, use it. */
750         if ((e = ENGINE_get_default_RAND()) != NULL
751                 && (tmp_meth = ENGINE_get_RAND(e)) != NULL) {
752             funct_ref = e;
753             default_RAND_meth = tmp_meth;
754         } else {
755             ENGINE_finish(e);
756             default_RAND_meth = &rand_meth;
757         }
758 #else
759         default_RAND_meth = &rand_meth;
760 #endif
761     }
762     tmp_meth = default_RAND_meth;
763     CRYPTO_THREAD_unlock(rand_meth_lock);
764     return tmp_meth;
765 }

Reply | Threaded
Open this post in threaded view
|

Re: 1.1.1b crash (RUN_ONCE problem?)

Viktor Dukhovni
On Fri, Mar 01, 2019 at 11:16:52AM -0800, Norm Green wrote:

[ Please avoid non-breaking spaces in your posts. ]

> I'm debugging a failure in a debug build on Solaris SPARC in the below
> code in rand_lib.c. On line 744, rand_meth_lock is NULL, which suggests
> the RUN_ONCE code is not working.  Wondering if anyone else has seen
> this problem?
> We did not see this issue in 1.1.1a.  Perhaps changes in the RUN_ONCE
> code in this commit are responsible?
> https://github.com/openssl/openssl/commit/f725fe5b4b6504df08e30f5194d321c3025e2336

That PR looks correct, and has no effect on the behaviour of RUN_ONCE
below.  It only introduces RUN_ONCE_ALT() and uses it in a few
special cases.

> 741     if (!RUN_ONCE(&rand_init, do_rand_init))
> 742         return NULL;
> 743
> 744     CRYPTO_THREAD_write_lock(rand_meth_lock);

Are you sure you're compiling linking and running with the desired
set of headers and libraries?

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: 1.1.1b crash (RUN_ONCE problem?)

Norm Green
Yes I'm sure the build process is correct.
Turns out this problem was cause by one thread calling exit() while
another thread was doing SSL_write().  The SSL exit handler triggered by
exit() was causing the lock in question to be freed AFAIKT.
So it would seem that threads either need to exit with pthread_exit() or
return to a known state such that no SSL calls are in progress in any
thread before the process can safely exit().


On 3/1/2019 11:54 AM, Viktor Dukhovni wrote:

> On Fri, Mar 01, 2019 at 11:16:52AM -0800, Norm Green wrote:
>
> [ Please avoid non-breaking spaces in your posts. ]
>
>> I'm debugging a failure in a debug build on Solaris SPARC in the below
>> code in rand_lib.c. On line 744, rand_meth_lock is NULL, which suggests
>> the RUN_ONCE code is not working.  Wondering if anyone else has seen
>> this problem?
>> We did not see this issue in 1.1.1a.  Perhaps changes in the RUN_ONCE
>> code in this commit are responsible?
>> https://github.com/openssl/openssl/commit/f725fe5b4b6504df08e30f5194d321c3025e2336
> That PR looks correct, and has no effect on the behaviour of RUN_ONCE
> below.  It only introduces RUN_ONCE_ALT() and uses it in a few
> special cases.
>
>> 741     if (!RUN_ONCE(&rand_init, do_rand_init))
>> 742         return NULL;
>> 743
>> 744     CRYPTO_THREAD_write_lock(rand_meth_lock);
> Are you sure you're compiling linking and running with the desired
> set of headers and libraries?
>

Reply | Threaded
Open this post in threaded view
|

Re: 1.1.1b crash (RUN_ONCE problem?)

Viktor Dukhovni
> On Mar 4, 2019, at 11:29 PM, Norm Green <[hidden email]> wrote:
>
> Yes I'm sure the build process is correct.
> Turns out this problem was cause by one thread calling exit() while another thread was doing SSL_write().  The SSL exit handler triggered by exit() was causing the lock in question to be freed AFAIKT.
> So it would seem that threads either need to exit with pthread_exit() or return to a known state such that no SSL calls are in progress in any thread before the process can safely exit().

Yes, main() must not exit while threads are running code from
libraries with exit handlers.  This is issue is not unique to
OpenSSL.  I've observed the same problem with threading running
code in MIT's GSSAPI library.

Some users want libraries to be unloadable without memory leaks,
others want safe exit() while threads are running...  Serving
both is hard.

A multi-threaded program that wants to exit before all the
threads are done, should probably call _exit(), and skip
the exit handlers (when none of those are doing anything
critical, like flushing stdio buffers).

Alternatively arrange for a way to ask threads to terminate
promptly, and wait for them to do that.

--
--
        Viktor.