Re: [openssl-team] Discussion: design issue: async and -lpthread

classic Classic list List threaded Threaded
60 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Nico Williams
[Viktor asked me for my advice on this issue and bounced me the post
 that I'm following up to.  -Nico]

The summary of what I've to say is that making libcrypto and libssl need
-lpthread is something that does require discussion, as it will have
detrimental effects on some users.  Personally, I think that those
detrimental effects are a good thing (see below), but nonetheless I
encourage you to discuss whether this is actually what OpenSSL should
do.  In particular, it may be possible to avoid -lpthread on some
systems and still get a subset of lipthread functionality from libc or
the compiler (e.g., thread-locals), and that may be worth doing.

On Mon, Nov 23, 2015 at 01:53:47AM +0000, Viktor Dukhovni wrote:

> As a side-effect of the MacOS/X MR I've become aware that the async
> code in its current state links master with "-lphtread", and defines
> macros that enable multi-theaded (as opposed to merely thread-safe)
> compilation into OpenSSL.
>
>     commit 757d14905e3877abaa9f258f3dd0694ae3c7c270
>     Author: Matt Caswell <[hidden email]>
>     Date:   Thu Nov 19 14:55:09 2015 +0000
>
> Add pthread support
>
> The forthcoming async code needs to use pthread thread local variables. This
> updates the various Configurations to add the necessary flags. In many cases
> this is an educated guess as I don't have access to most of these
> environments! There is likely to be some tweaking needed.
>
> Reviewed-by: Kurt Roeckx <[hidden email]>
>
> This is quite possibly not be the right thing to do, and deserves
> some attention from the team.  We might even seek outside som
> outside advice from folks well versed in platform-specific library
> engineering (Christos Zoulas from NetBSD, Nico Williams formerly
> from Sun, ...).
>
> My concern is that introducing -lpthread automatically converts
> single-threaded applications that link with OpenSSL into threaded
> applications (with a single thread).  This may well have undesirable
> consequences.

Some background may be needed.  When threading was introduced in the
90s, and in some cases still to this day, generally the end result was
that the system had to support a number "process models", with potential
transitions from one to another at run-time:

 - single-threaded, dynamically linked
 - single-threaded, statically  linked
 - multi-threaded,  dynamically linked
 - multi-threaded,  statically  linked
 - multi-threaded,  mixed linkage

The statically-linked models can become mixed-linkage models via
dlopen().  The single-threaded model can become a threaded model via
dlopen() of an object linked with -lpthread, or by dlopen()ing
libpthread itself.

Solaris 9 and under used to have a veritable rats nest of code to deal
with the process model transitions from single-threaded to
multi-threaded.  Solaris 10 unified these and moved libpthread and libdl
into libc [with filters left behind for backwards compatibility].  Thus,
on Solaris 10 and up, and Illumos, OpenSSL using -lpthread or not makes
no difference.

I'm quite fond of the approach taken by Solaris 10 and up (and thus also
Illumos): there is but one process model, and it is threaded, with
pthreads in libc.  But that need not be the way it works everywhere.
Some systems may still support multiple process models.

For a library like OpenSSL making use of -lpthread does mean dictating
to users that they may only use a threaded process model.  OTOH, not
using -lpthread allows the user to choose a process model unconstrained
by such a library.

Until now OpenSSL has avoided forcing the user to choose any particular
process model.  Now with this commit OpenSSL is now taking the reverse
stance.  This seems like a very significant change that should at least
be noted prominently in the release notes, but it should also be
disucssed, indeed.

Personally, I believe this change is a good thing, as OpenSSL really
ought to either automatically initialize its "lock callbacks" or do away
with them completely (leaving backwards compatibility stubs) and use the
OS' locking facility by default / only.  Automatic lock callback
initialization without forcing the use of -lpthread and still allowing
static linking would be tricky indeed (for example: OpenSSL couldn't use
weak symbols to detect when -lpthread gets brought in).

Still, if -lpthread avoidance were still desired, you'd have to find an
alternative to pthread_key_create(), pthread_getspecific(), and friends.

For thread-specifics the obvious answer is to use C compiler
thread-local variable support.  This might or might not be available;
this would have to be determined at build configuration time.  Still, if
where the compiler supports thread-locals, OpenSSL could avoid
-lpthread.

For pthread mutex functions (for lock callbacks) and, perhaps,
pthread_once() (for automatic initialization of lock callbacks,
perhaps), it's very difficult to create alternatives to -lpthread, but
it can conceivably be done within a library like OpenSSL.  I don't think
it's worth doing though!  Instead I think it's best to just use
-lpthread and let the OS and its libc and libpthread take care of any
desirable optimizations in single-threaded cases, as well as any
automatic transitions to non-optimized implementations upon creation of
the first additional thread.

The single-threaded and statically linked process model has a number of
detrimental effects, and ISTM that by bending over backwards to avoid
forcing the choice of process model, OpenSSL and similar allow users to
believe in fairy tales.  One detrimental effect has been OpenSSL's
bloody lock callbacks, which in turn mean that it is impossible to use
OpenSSL perfectky thread-safely from another library!  Must every
program use -lcrypto and -lpthread, and setup lock callbacks just in
case some other library the program really does use calls dlopen()?

Other detrimental effects of static linking include:

 - the flattened dependency topology forced by static linking (with
   attendant confusion in *-cofig(1) programs), with attendant
   accidental symbol interposition;

 - partial loss of ASLR protection (one random load address for the
   entire executable text and all its libraries, versus one for each
   object)

 - the need for complex single- to multi-threaded transition support
   code in various system libraries (mostly in libc).

 - slower load times.  Yes, slower.  A single statically-linked
   executable will load faster than the same program dynamically-linked.
   But a larger system than just one executable (e.g., the host OS,
   related applications, helper programs, ...) will load slower because
   multiple copies of common texts will not be shareable.

If a major library like OpenSSL were to force operating systems to deal
with the shortcommings of mixed process models and static linking, that
would be a good thing.  That's what OpenSSL would be doing by using
-lpthread.

> The C-library on many platforms is carefully designed to operate
> efficiently and locklessly in single-threaded applications, and to
> only switch to thread-aware implementations of functions when the
> pthread library is loaded.

I don't think the loss of the lockless optimizations should be of
particular concern to OpenSSL: if that's a problem for a given OS, let
its implementors deal with it.  (They can do it, by, e.g., hot-swapping
lock functions at pthread_create() call time rather than at the time
that libpthread is loaded.)

> In NetBSD, for example, the <pthread.h> header file carefully
> defines non-threaded (or rather lazily threaded only when pthread
> is also loaded) versions of various pthread functions, see the
> extract of pthread.h below my signature.

It shouldn't be the loading of libpthread that causes transition to a
threaded model (and, therefore, loss of single-threaded optimizations),
but calling pthread_create().  If NetBSD really does the former, well,
too bad: it really ought to be changed to do the latter.

> Therefore, on any platform where the C-library is properly engineered
> in this way (at least NetBSD and MacOS/X), we should not be
> introducing -lpthread or definining "-D_REENTRANT", ...

Nonetheless, this is good advice.  If the necessary functionality from
libpthread can be found in libc (or, in the case of thread-locals, in
the compiler), then OpenSSL should avoid bringing in -lpthread.

As long as they support seamless upgrade to threaded model via dlopen(),
this is good advice.

Of course, this does mean that OpenSSL needs more complexity in its
configurator: to decide when to use -lpthread and when not to.

> We are a thread-safe library, *not* a threaded library, and must
> not introduce threading into threaded applications.

That's a fairy tale.

OpenSSL is only thread-safe when either a) used only in single-threaded
processes that never transition to multi-threaded process models, or b)
when the program initializes the OpenSSL lock callbacks before any
libraries that need OpenSSL.  Those two constraints are so difficult to
meet that I don't think it's fair to call OpenSSL a thread-safe library
at all.

It's time to fix this.  If OpenSSL will now require -lpthread, then it
gets easier, certainly, so I'm not exactly against OpenSSL requiring
-lpthread!  ;)

> We should think carefully about which platforms get async support
> (perhaps one at time, and before it has been tested), and whether
> "-pthread" is an acceptable penalty for any potential future
> advantage of async support (none at present).
>
> Thoughts?  Comments?

See above.  I'm not sure you'll be please with mine :)

> --
> Viktor.
>
> [NetBSD header commentary extracts elided]

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Nico Williams
[Resend, with slight edits.]

[Viktor asked me for my advice on this issue and bounced me the post
 that I'm following up to.  -Nico]

The summary of what I've to say is that making libcrypto and libssl need
-lpthread is something that does require discussion, as it will have
detrimental effects on some users.  Personally, I think that those
detrimental effects are a good thing (see below), but nonetheless I
encourage you to discuss whether this is actually what OpenSSL should
do.  In particular, it may be possible to avoid -lpthread on some
systems and still get a subset of lipthread functionality from libc or
the compiler (e.g., thread-locals), and that may be worth doing.

On a slightly related note, I asked and Viktor tells me that fiber
stacks are allocated with malloc().  I would prefer that they were
allocated with mmap(), because then you get a guard page.  A guard page
would allow one to safely tune down fiber stack size to the whatever
OpenSSL actually needs for a given use.

Comments below.

On Mon, Nov 23, 2015 at 01:53:47AM +0000, Viktor Dukhovni wrote:

> As a side-effect of the MacOS/X MR I've become aware that the async
> code in its current state links master with "-lphtread", and defines
> macros that enable multi-theaded (as opposed to merely thread-safe)
> compilation into OpenSSL.
>
>     commit 757d14905e3877abaa9f258f3dd0694ae3c7c270
>     Author: Matt Caswell <[hidden email]>
>     Date:   Thu Nov 19 14:55:09 2015 +0000
>
> Add pthread support
>
> The forthcoming async code needs to use pthread thread local variables. This
> updates the various Configurations to add the necessary flags. In many cases
> this is an educated guess as I don't have access to most of these
> environments! There is likely to be some tweaking needed.
>
> Reviewed-by: Kurt Roeckx <[hidden email]>
>
> This is quite possibly not be the right thing to do, and deserves
> some attention from the team.  We might even seek outside some
> outside advice from folks well versed in platform-specific library
> engineering (Christos Zoulas from NetBSD, Nico Williams formerly
> from Sun, ...).
>
> My concern is that introducing -lpthread automatically converts
> single-threaded applications that link with OpenSSL into threaded
> applications (with a single thread).  This may well have undesirable
> consequences.

Some background may be needed.  When threading was introduced in the
90s, and in some cases still to this day, generally the end result was
that the system had to support a number "process models", with potential
transitions from one to another at run-time:

 - single-threaded, dynamically linked
 - single-threaded, statically  linked
 - multi-threaded,  dynamically linked
 - multi-threaded,  statically  linked
 - multi-threaded,  mixed linkage

The statically-linked models can become mixed-linkage models via
dlopen().  The single-threaded model can become a threaded model via
dlopen() of an object linked with -lpthread, or by dlopen()ing
libpthread itself.

Solaris 9 and under used to have a veritable rats nest of code to deal
with the process model transitions from single-threaded to
multi-threaded.  Solaris 10 unified these and moved libpthread and libdl
into libc [with filters left behind for backwards compatibility].  Thus,
on Solaris 10 and up, and Illumos, OpenSSL using -lpthread or not makes
no difference.

I'm quite fond of the approach taken by Solaris 10 and up (and thus also
Illumos): there is but one process model, and it is threaded, with
pthreads in libc.  But that need not be the way it works everywhere.
Some systems may still support multiple process models.

For a library like OpenSSL making use of -lpthread does mean dictating
to users that they may only use a threaded process model.  OTOH, not
using -lpthread allows the user to choose a process model unconstrained
by such a library.

Until now OpenSSL has avoided forcing the user to choose any particular
process model.  Now with this commit OpenSSL is now taking the reverse
stance.  This seems like a very significant change that should at least
be noted prominently in the release notes, but it should also be
discussed, indeed.

Personally, I believe this change is a good thing, as OpenSSL really
ought to either automatically initialize its "lock callbacks" or do away
with them completely (leaving backwards compatibility stubs) and use the
OS' locking facility by default / only.  Automatic lock callback
initialization without forcing the use of -lpthread and still allowing
static linking would be tricky indeed (for example: OpenSSL couldn't use
weak symbols to detect when -lpthread gets brought in).

Still, if -lpthread avoidance were still desired, you'd have to find an
alternative to pthread_key_create(), pthread_getspecific(), and friends.

For thread-specifics the obvious answer is to use C compiler
thread-local variable support.  This might or might not be available;
this would have to be determined at build configuration time.  Still, if
where the compiler supports thread-locals, OpenSSL could avoid
-lpthread.

For pthread mutex functions (for lock callbacks) and, perhaps,
pthread_once() (for automatic initialization of lock callbacks,
perhaps),...  On some systems these functions will be in libc.  For the
rest it's very difficult to create alternatives to requiring -lpthread,
though it can conceivably be done within a library like OpenSSL.  I just
don't think it's worth doing though!  Instead I think it's best to just
use -lpthread and let the OS and its libc and libpthread take care of
any desirable optimizations in single-threaded cases, as well as any
automatic transitions to non-optimized implementations upon creation of
the first additional thread.

The single-threaded and statically linked process model has a number of
detrimental effects, and ISTM that by bending over backwards to avoid
forcing the choice of process model, OpenSSL and similar allow users to
believe in fairy tales.  One detrimental effect has been OpenSSL's
bloody lock callbacks, which in turn mean that it is impossible to use
OpenSSL thread-safely from another library!  Must every program use
-lcrypto and -lpthread, and setup lock callbacks just in case some other
library the program really does use calls dlopen()?

Other detrimental effects of static linking include:

 - the flattened dependency topology forced by static linking (with
   attendant confusion in *-config(1) programs), with attendant
   accidental symbol interposition;

 - partial loss of ASLR protection (one random load address for the
   entire executable text and all its libraries, versus one for each
   object)

 - the need for complex single- to multi-threaded transition support
   code in various system libraries (mostly in libc).

 - slower load times.  Yes, slower.  A single statically-linked
   executable will load faster than the same program dynamically-linked.
   But a larger system than just one executable (e.g., the host OS,
   related applications, helper programs, ...) will load slower because
   multiple copies of common texts will not be sharable.

If a major library like OpenSSL were to force operating systems to deal
with the shortcomings of mixed process models and static linking, that
would be a good thing.  That's what OpenSSL would be doing by using
-lpthread.

> The C-library on many platforms is carefully designed to operate
> efficiently and lock-less-ly in single-threaded applications, and to
> only switch to thread-aware implementations of functions when the
> pthread library is loaded.

I don't think the loss of the lock-less optimizations should be of
particular concern to OpenSSL: if that's a problem for a given OS, let
its implementors deal with it.  (They can do it, by, e.g., hot-swapping
lock functions at pthread_create() call time rather than at the time
that libpthread is loaded.)

> In NetBSD, for example, the <pthread.h> header file carefully
> defines non-threaded (or rather lazily threaded only when pthread
> is also loaded) versions of various pthread functions, see the
> extract of pthread.h below my signature.

It shouldn't be the loading of libpthread that causes transition to a
threaded model (and, therefore, loss of single-threaded optimizations),
but calling pthread_create().  If NetBSD really does the former, well,
too bad: it really ought to be changed to do the latter.

> Therefore, on any platform where the C-library is properly engineered
> in this way (at least NetBSD and MacOS/X), we should not be
> introducing -lpthread or definining "-D_REENTRANT", ...

Nonetheless, this is good advice.  If the necessary functionality from
libpthread can be found in libc (or, in the case of thread-locals, in
the compiler), then OpenSSL should avoid bringing in -lpthread.

As long as they support seamless upgrade to threaded model via dlopen(),
this is good advice.

Of course, this does mean that OpenSSL needs more complexity in its
configurator: to decide when to use -lpthread and when not to.

> We are a thread-safe library, *not* a threaded library, and must
> not introduce threading into threaded applications.

That's a bit of a fairy tale.

OpenSSL is not thread-safe when called from other libraries, as a) it's
not their place to initialize OpenSSL's lock callbacks, b) the program
might not do it either, c) OpenSSL defaults do no locking, d) the lock
callback initialization interfaces assume no races to initialize lock
callbacks, which means that other libraries can't safely initialize them
(in practice one library is likely to always win the race, still).

It's time to fix this.  If OpenSSL will now require -lpthread, then it
gets easier, certainly, so I'm not exactly against OpenSSL requiring
-lpthread!  ;)

> We should think carefully about which platforms get async support
> (perhaps one at time, and before it has been tested), and whether
> "-pthread" is an acceptable penalty for any potential future
> advantage of async support (none at present).
>
> Thoughts?  Comments?

See above.  I'm not sure you'll be please with mine :)

> --
> Viktor.
>
> [NetBSD header commentary extracts elided]

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Nico Williams
In reply to this post by Nico Williams
On Mon, Nov 23, 2015 at 01:53:47AM +0000, Viktor Dukhovni wrote:

[NetBSD header commentary extracts:]

> /*
>  * Use macros to rename many pthread functions to the corresponding
>  * libc symbols which are either trivial/no-op stubs or the real

No renaming is necessary if one's link-editor and RTLD support
filters...

  An ELF filter is a forwarding saying that the implementation of a
  symbol in the filter object is to be found elsewhere, e.g., in some
  other object.

  In Solaris/Illumos this is used to maintain backwards compatibility
  when symbols get moved from one library to another.

  E.g., libpthread and libdl moved into libc, but they remain as filters
  so that objects linked with those old libraries will a) still find
  them, b) still find the symbols the expect in them, c) get the correct
  implementations of those symbols from the object now providing them
  (here: libc).

  Filters are awesome.  Lack of universal support for them is very
  frustrating.  On Linux, for example, it's possible to create filters
  with strong link-editor-fu, but the RTLD does not support them.

>  * thing, depending on whether libpthread is linked in to the
>  * program. This permits code, particularly libraries that do not
>  * directly use threads but want to be thread-safe in the presence of
>  * threaded callers, to use pthread mutexes and the like without
>  * unnecessairly including libpthread in their linkage.

Just move these into libc, lock stock and barrel, and if you want to
have fast versions for the single-threaded case, just arrange for slower
versions to get hot-patched-in when pthread_create() is first called.

Or even just use a branch/computed jump (whichever is faster) to avoid
having to hot-patch.

It's important that pthread_mutex_init/lock/trylock/unlock/destroy work
correctly even in the optimized single-threaded case.  The main thread
might init and acquire some locks then create a second thread that will
block acquiring those locks.

>  * Left out of this list are functions that can't sensibly be trivial
>  * or no-op stubs in a single-threaded process (pthread_create,
>  * pthread_kill, pthread_detach), functions that normally block and
>  * wait for another thread to do something (pthread_join), and

Just move them into libc anyways.

>  * functions that don't make sense without the previous functions
>  * (pthread_attr_*). The pthread_cond_wait and pthread_cond_timedwait
>  * functions are useful in implementing certain protection mechanisms,
>  * though a non-buggy app shouldn't end up calling them in
>  * single-threaded mode.
>  *
>  * The rename is done as:
>  * #define pthread_foo __libc_foo
>  * instead of
>  * #define pthread_foo(x) __libc_foo((x))
>  * in order that taking the address of the function ("func =
>  * &pthread_foo;") continue to work.
>  *
>  * POSIX/SUSv3 requires that its functions exist as functions (even if
>  * macro versions exist) and specifically that "#undef pthread_foo" is
>  * legal and should not break anything. Code that does such will not
>  * successfully get the stub behavior implemented here and will
>  * require libpthread to be linked in.
>  */

All the more reason to not rename these symbols!  All you need is ELF
filter support.

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Matt Caswell-2
In reply to this post by Nico Williams


On 23/11/15 17:49, Nico Williams wrote:

> [Resend, with slight edits.]
>
> [Viktor asked me for my advice on this issue and bounced me the post
>  that I'm following up to.  -Nico]
>
> The summary of what I've to say is that making libcrypto and libssl need
> -lpthread is something that does require discussion, as it will have
> detrimental effects on some users.  Personally, I think that those
> detrimental effects are a good thing (see below), but nonetheless I
> encourage you to discuss whether this is actually what OpenSSL should
> do.  In particular, it may be possible to avoid -lpthread on some
> systems and still get a subset of lipthread functionality from libc or
> the compiler (e.g., thread-locals), and that may be worth doing.
>
> On a slightly related note, I asked and Viktor tells me that fiber
> stacks are allocated with malloc().  I would prefer that they were
> allocated with mmap(), because then you get a guard page.  A guard page
> would allow one to safely tune down fiber stack size to the whatever
> OpenSSL actually needs for a given use.

Interesting. I'll take a look at that.

> Still, if -lpthread avoidance were still desired, you'd have to find an
> alternative to pthread_key_create(), pthread_getspecific(), and friends.

Just a point to note about this. The async code that introduced this has
3 different implementations:

- posix
- windows
- null

The detection code will check if you have a suitable posix or windows
implementation and use that. Otherwise the fallback position is to use
the null implementation. With "null" everything will compile and run but
you won't be able to use any of the new async functionality.

Only the posix implementation uses the pthread* functions (and only for
thread local storage). Part of the requirement of the posix detection
code is that you have "Configured" with "threads" enabled. This is the
default. However it is possible to explicitly configure with
"no-threads". This suppresses stuff like the "-DRENENTERANT" flag. It
now will also force the use of the null implementation for async and
hence will not use any of the pthread functions.

One other option we could pursue is to use the "__thread" syntax for
thread local variables and avoid the need for libpthread altogether. An
earlier version of the code did this. I have not found a way to reliably
detect at compile time the capability to do this and my understanding is
that this is a lot less portable.

Matt

_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Nico Williams
On Mon, Nov 23, 2015 at 08:34:29PM +0000, Matt Caswell wrote:
> On 23/11/15 17:49, Nico Williams wrote:
> > On a slightly related note, I asked and Viktor tells me that fiber
> > stacks are allocated with malloc().  I would prefer that they were
> > allocated with mmap(), because then you get a guard page.  A guard page
> > would allow one to safely tune down fiber stack size to the whatever
> > OpenSSL actually needs for a given use.
>
> Interesting. I'll take a look at that.

Please do.  It will make this much safer.  Also, you might want to run
some experiments to find the best stack size on each platform.  The
smaller the stack you can get away with, the better.

> > Still, if -lpthread avoidance were still desired, you'd have to find an
> > alternative to pthread_key_create(), pthread_getspecific(), and friends.
>
> Just a point to note about this. The async code that introduced this has
> 3 different implementations:
>
> - posix
> - windows
> - null
>
> The detection code will check if you have a suitable posix or windows
> implementation and use that. Otherwise the fallback position is to use
> the null implementation. With "null" everything will compile and run but
> you won't be able to use any of the new async functionality.
>
> Only the posix implementation uses the pthread* functions (and only for
> thread local storage). Part of the requirement of the posix detection
> code is that you have "Configured" with "threads" enabled. This is the
> default. However it is possible to explicitly configure with
> "no-threads". This suppresses stuff like the "-DRENENTERANT" flag. It
> now will also force the use of the null implementation for async and
> hence will not use any of the pthread functions.

Ah, I see.  I think that's fine.  Maybe Viktor misunderstood this?

> One other option we could pursue is to use the "__thread" syntax for
> thread local variables and avoid the need for libpthread altogether. An
> earlier version of the code did this. I have not found a way to reliably
> detect at compile time the capability to do this and my understanding is
> that this is a lot less portable.

I use this in an autoconf project (I know, OpenSSL doesn't use autoconf):

  dnl Thread local storage
  have___thread=no
  AC_MSG_CHECKING(for thread-local storage)
  AC_LINK_IFELSE([AC_LANG_SOURCE([
  static __thread int x ;
  int main () { x = 123; return x; }
  ])], have___thread=yes)
  if test $have___thread = yes; then
     AC_DEFINE([HAVE___THREAD],1,[Define to 1 if the system supports __thread])
  fi
  AC_MSG_RESULT($have___thread)

Is there something wrong with that that I should know?  I suppose the
test could use threads to make real sure that it's getting thread-
locals, in case the compiler is simply ignoring __thread.  Are there
compilers that ignore __thread??

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Peter Waltenberg

"
Please do.  It will make this much safer.  Also, you might want to run
some experiments to find the best stack size on each platform.  The
smaller the stack you can get away with, the better.
"

It does, but it also requires code changes in a few places. probable_prime() in bn_prime.c being far and away the worst offender. We instrumented our test code so we could find out what the stack usage was, for libcrypto you can get it under 4k for 32 bit and under 8k for 64 bit code on x86 Linux.

FYI, nothing elegant there, just have your code allocate and fill a large stack array then add check points further down to see how far you've eaten into it.

"
> > A guard page
> > would allow one to safely tune down fiber stack size to the whatever
> > OpenSSL actually needs for a given use.

"

Unless someone allocates a stack array larger than the size of the guard page and scribbles over another threads stack. This is another reason to never use large arrays on the stack.

"
Is there something wrong with that that I should know?  I suppose the
test could use threads to make real sure that it's getting thread-
locals, in case the compiler is simply ignoring __thread.  Are there
compilers that ignore __thread??
"


Only that it's a compile time choice and OpenSSL is currently 'thread neutral' at runtime, not at compile time ?.
Compile time is easy, making this work at runtime is hard and occasionally is really valuable - i.e. way back in the dim distant path when Linux had multiple thread packages available.

Peter




Inactive hide details for Nico Williams ---24/11/2015 06:49:51---On Mon, Nov 23, 2015 at 08:34:29PM +0000, Matt Caswell wrote: Nico Williams ---24/11/2015 06:49:51---On Mon, Nov 23, 2015 at 08:34:29PM +0000, Matt Caswell wrote: > On 23/11/15 17:49, Nico Williams wro

From: Nico Williams <[hidden email]>
To: [hidden email]
Date: 24/11/2015 06:49
Subject: Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
Sent by: "openssl-dev" <[hidden email]>





On Mon, Nov 23, 2015 at 08:34:29PM +0000, Matt Caswell wrote:
> On 23/11/15 17:49, Nico Williams wrote:
> > On a slightly related note, I asked and Viktor tells me that fiber
> > stacks are allocated with malloc().  I would prefer that they were
> > allocated with mmap(), because then you get a guard page.  A guard page
> > would allow one to safely tune down fiber stack size to the whatever
> > OpenSSL actually needs for a given use.
>
> Interesting. I'll take a look at that.

Please do.  It will make this much safer.  Also, you might want to run
some experiments to find the best stack size on each platform.  The
smaller the stack you can get away with, the better.

> > Still, if -lpthread avoidance were still desired, you'd have to find an
> > alternative to pthread_key_create(), pthread_getspecific(), and friends.
>
> Just a point to note about this. The async code that introduced this has
> 3 different implementations:
>
> - posix
> - windows
> - null
>
> The detection code will check if you have a suitable posix or windows
> implementation and use that. Otherwise the fallback position is to use
> the null implementation. With "null" everything will compile and run but
> you won't be able to use any of the new async functionality.
>
> Only the posix implementation uses the pthread* functions (and only for
> thread local storage). Part of the requirement of the posix detection
> code is that you have "Configured" with "threads" enabled. This is the
> default. However it is possible to explicitly configure with
> "no-threads". This suppresses stuff like the "-DRENENTERANT" flag. It
> now will also force the use of the null implementation for async and
> hence will not use any of the pthread functions.

Ah, I see.  I think that's fine.  Maybe Viktor misunderstood this?

> One other option we could pursue is to use the "__thread" syntax for
> thread local variables and avoid the need for libpthread altogether. An
> earlier version of the code did this. I have not found a way to reliably
> detect at compile time the capability to do this and my understanding is
> that this is a lot less portable.

I use this in an autoconf project (I know, OpenSSL doesn't use autoconf):

 dnl Thread local storage
 have___thread=no
 AC_MSG_CHECKING(for thread-local storage)
 AC_LINK_IFELSE([AC_LANG_SOURCE([
 static __thread int x ;
 int main () { x = 123; return x; }
 ])], have___thread=yes)
 if test $have___thread = yes; then
    AC_DEFINE([HAVE___THREAD],1,[Define to 1 if the system supports __thread])
 fi
 AC_MSG_RESULT($have___thread)

Is there something wrong with that that I should know?  I suppose the
test could use threads to make real sure that it's getting thread-
locals, in case the compiler is simply ignoring __thread.  Are there
compilers that ignore __thread??

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe:
https://mta.openssl.org/mailman/listinfo/openssl-dev





_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Dr Paul Dale
In reply to this post by Nico Williams

Somewhat tangentially related to this is the how thread locking in OpenSSL is slowing things up.

 

We've been doing some connection establishment performance analysis recently and have discovered a lot of waiting on locks is occurring. By far the worst culprit is CRYPTO_LOCK_EVP_PKEY in CRYPTO_add_lock calls. Changing these to gcc's atomic add operations (__sync_add_and_fetch) improved things significantly:

 

base OpenSSL 11935 connections/s 85% CPU utilisation

with Atomic Change 16465 connections/s 22% CPU utilisation

 

So fifty percent more connections for a quarter the CPU. At this point a number of other locks are causing the slow down.

 

Now, I'm not sure if such a change would be interesting to the community or not, but there definitely is room for significant gains in the multi threaded locking. Ignoring the atomic operations and moving to a separate lock per reference count would likely save a an amount of blocking -- is this a suitable use for dynamic locks?

 

 

I also submitted a bug report and fix recently [openssl.org #4135] to do with threading, which will hopefully get included eventually.

 

 

Regards,

 

Pauli

 

--

Oracle

Dr Paul Dale | Cryptographer | Network Security & Encryption

Phone +61 7 3031 7217

Oracle Australia

 


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Kurt Roeckx
In reply to this post by Nico Williams
On Mon, Nov 23, 2015 at 02:48:25PM -0600, Nico Williams wrote:

>
> I use this in an autoconf project (I know, OpenSSL doesn't use autoconf):
>
>   dnl Thread local storage
>   have___thread=no
>   AC_MSG_CHECKING(for thread-local storage)
>   AC_LINK_IFELSE([AC_LANG_SOURCE([
>   static __thread int x ;
>   int main () { x = 123; return x; }
>   ])], have___thread=yes)
>   if test $have___thread = yes; then
>      AC_DEFINE([HAVE___THREAD],1,[Define to 1 if the system supports __thread])
>   fi
>   AC_MSG_RESULT($have___thread)
>
> Is there something wrong with that that I should know?  I suppose the
> test could use threads to make real sure that it's getting thread-
> locals, in case the compiler is simply ignoring __thread.  Are there
> compilers that ignore __thread??

I think that we currently don't do any compile / link test to
detect features but that we instead explicitly say so for each
platform.

Anyway, the gcc the documentation is here:
https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

TLS support clearly isn't supported everywhere.


Kurt

_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Matt Caswell-2
In reply to this post by Dr Paul Dale


On 23/11/15 21:56, Paul Dale wrote:
> Somewhat tangentially related to this is the how thread locking in
> OpenSSL is slowing things up.

Alessandro has submitted an interesting patch to provide a much better
threading API. See:

https://github.com/openssl/openssl/pull/451

I'm not sure what the current status of this is though.

Matt
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Salz, Rich
> https://github.com/openssl/openssl/pull/451
> I'm not sure what the current status of this is though.

I've made several comments I think need to be addressed before we should merge it.
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Dr Paul Dale
In reply to this post by Matt Caswell-2

Thanks for the quick reply. That patch looks much improved on this front.

 

We'll wait for the changes and then retest performance.

 

 

Thanks again,

 

Pauli

 

On Mon, 23 Nov 2015 10:18:27 PM Matt Caswell wrote:

>

> On 23/11/15 21:56, Paul Dale wrote:

> > Somewhat tangentially related to this is the how thread locking in

> > OpenSSL is slowing things up.

>

> Alessandro has submitted an interesting patch to provide a much better

> threading API. See:

>

> https://github.com/openssl/openssl/pull/451

>

> I'm not sure what the current status of this is though.

>

> Matt

> _______________________________________________

> openssl-dev mailing list

> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev

 

--

Oracle

Dr Paul Dale | Cryptographer | Network Security & Encryption

Phone +61 7 3031 7217

Oracle Australia

 


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Alessandro Ghedini
In reply to this post by Dr Paul Dale
On Tue, Nov 24, 2015 at 07:56:15am +1000, Paul Dale wrote:

> Somewhat tangentially related to this is the how thread locking in OpenSSL is
> slowing things up.
>
> We've been doing some connection establishment performance analysis recently
> and have discovered a lot of waiting on locks is occurring.  By far the worst
> culprit is CRYPTO_LOCK_EVP_PKEY in CRYPTO_add_lock calls.  Changing these to
> gcc's atomic add operations (__sync_add_and_fetch) improved things
> significantly:
>
> base OpenSSL        11935 connections/s    85% CPU utilisation
> with Atomic Change  16465 connections/s    22% CPU utilisation
>
> So fifty percent more connections for a quarter the CPU.

Is this TLS connections?

> At this point a number of other locks are causing the slow down.

I'd like to know more...

> Now, I'm not sure if such a change would be interesting to the community or
> not, but there definitely  is room for significant gains in the multi threaded
> locking.  Ignoring the atomic operations and moving to a separate lock per
> reference count would likely save a an amount of blocking -- is this a
> suitable use for dynamic locks?

As Matt mentioned, I'm curently working exactly on this. Would it be possible
for you to share your testing method and code? I'd like to verify that my
patches actually go in the right direction before having them merged (or maybe
you could do your tests on top of my patches, they should mostly work fine even
if the work is not complete yet, I think).

Thanks
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Nico Williams
In reply to this post by Matt Caswell-2
On Mon, Nov 23, 2015 at 10:18:27PM +0000, Matt Caswell wrote:

> On 23/11/15 21:56, Paul Dale wrote:
> > Somewhat tangentially related to this is the how thread locking in
> > OpenSSL is slowing things up.
>
> Alessandro has submitted an interesting patch to provide a much better
> threading API. See:
>
> https://github.com/openssl/openssl/pull/451
>
> I'm not sure what the current status of this is though.

I'll have to look myself.  A while back I posted a link to a
not-yet-complete attempt to make all the lock (and threadid...)
callbacks automatically initialized correctly/safely on Windows and
Unix.

It may be a good idea to rethink locking completely.

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Viktor Dukhovni
On Mon, Nov 23, 2015 at 05:28:18PM -0600, Nico Williams wrote:

> It may be a good idea to rethink locking completely.

There is some glimmer of hope in that as various libcrypto structures
become opaque, the locking moves from application code into the
library.  For example, we now have (yet to be documented):

        X509_up_ref()

function, because applications can no longer directly manipulate
the reference count.  For compatibility with older releases Postfix
now defines:

    src/tls/tls.h:
        #if OPENSSL_VERSION_NUMBER < 0x10100000L
        #define X509_up_ref(x) CRYPTO_add(&((x)->references), 1, CRYPTO_LOCK_X509)
        #endif

Mind you, both the old and new interfaces assume that the X509
object (presumably freshly returned by some function) that the
application wants to hang on somewhat longer than default, has not
been deallocated by any other thread in the mean time!  So in
particular we are assuming that the caller's thread still "owns"
structures that hold and are responsible for releasing at least
one of the references.

Anyway, with the locking moving into X509_up_ref(), the implementation
moves into the library, where it becomes easier to switch to
fine-grained locking.

We may in more cases need "get1" analogues of existing "get0"
interfaces, so that the user can obtain an already up-refed object
from the library, and just have to call the appropriate "free"
function at the end.

Doing this requires a global review of the API, and filling in
missing functions and documentation. :-(
 
--
        Viktor.
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Nico Williams
In reply to this post by Peter Waltenberg
On Mon, Nov 23, 2015 at 09:53:15PM +1000, Peter Waltenberg wrote:

>
> "
> Please do.  It will make this much safer.  Also, you might want to run
> some experiments to find the best stack size on each platform.  The
> smaller the stack you can get away with, the better.
> "
>
> It does, but it also requires code changes in a few places. probable_prime
> () in bn_prime.c being far and away the worst offender. We instrumented our
> test code so we could find out what the stack usage was, for libcrypto you
> can get it under 4k for 32 bit and under 8k for 64 bit code on x86 Linux.

Are you saying that using mmap() would be onerous?  Something else?

> FYI, nothing elegant there, just have your code allocate and fill a large
> stack array then add check points further down to see how far you've eaten
> into it.

Sure.

> "
> > > A guard page
> > > would allow one to safely tune down fiber stack size to the whatever
> > > OpenSSL actually needs for a given use.
> "
>
> Unless someone allocates a stack array larger than the size of the guard
> page and scribbles over another threads stack. This is another reason to
> never use large arrays on the stack.

alloca() and VLAs aren't safe for allocating more bytes than fit in a
guard page.  One should not use alloca()/VLAs for anything larger than
that.

This is no reason not to have a guard page!

This is a reason to have coding standards that address alloca()/VLAs.

> "
> Is there something wrong with that that I should know?  I suppose the
> test could use threads to make real sure that it's getting thread-
> locals, in case the compiler is simply ignoring __thread.  Are there
> compilers that ignore __thread??
> "
>
> Only that it's a compile time choice and OpenSSL is currently 'thread
> neutral' at runtime, not at compile time ?.

OpenSSL is "thread-neutral" at run-time as to locks and thread IDs
because of the lock/threadid callbacks.  But here we're talking about a
new feature (fibers) that uses thread-locals, and here using pthread
thread locals (pthread_getspecific()) clearly means no longer being
"thread-neutral" -- if I understand your definition of that term
anyways.

It's perfectly fine to use __thread in compiled code regardless of what
threading library is used, provided -of course- that __thread was
supported to begin with and that the compiler isn't lying.

> Compile time is easy, making this work at runtime is hard and occasionally
> is really valuable - i.e. way back in the dim distant path when Linux had
> multiple thread packages available.

If the compiler accepts __thread but allows it to break at run-time
depending on the available threading libraries, then the compiler is
broken and should now have allowed __thread to begin with.  I can't find
anything describing such brokenness.  If you assert such brokenness
exists then please post links or instructions for how to reproduce it.

BTW, https://en.wikipedia.org/wiki/Thread-local_storage#C_and_C.2B.2B
says that even Visual Studio supports thread-locals.  Though there's a
caveat that requires some care at configuration time:

  On Windows versions before Vista and Server 2008, __declspec(thread)
  works in DLLs only when those DLLs are bound to the executable, and
  will not work for those loaded with LoadLibrary() (a protection fault
  or data corruption may occur).[9]

There must, of course, be compilers that don't support thread locals
(pcc?).  Wouldn't it be fair to say that OpenSSL simply doesn't support
fibers on those compilers?  I think so.

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Dr Paul Dale
In reply to this post by Alessandro Ghedini

On Mon, 23 Nov 2015 11:11:37 PM Alessandro Ghedini wrote:

> Is this TLS connections?

 

Yes, this is just measuring the TLS handshake. Renegotiations predominately.

We deliberately didn't test the bulk symmetric crypto phase of the connection.

 

 

> I'd like to know more...

 

The data are a bit rough and ready but I've included what I can. I wasn't directly involved in taking these measurements, so Chinese whispers are entirely possible. I've been tasked with trying to find some performance enhancements.

 

The TLS stack results are:

 

stack CPU % connections/s

OpenSSL 85 11,935

atomic patch 22 16,465 proof of concept only, the stack is broken elsewhere

NSS 47 46,507 !!!!!

 

 

 

 

----------------------------------------------------------------------

The top ten bottlenecks identified by Sun Studio profiler.

There are before my atomic change.

 

Lock % 16 CPU % 41

+- EVP_MD_CTX_cleanup

+- EVP_PKEY_CTX_free

+- EVP_PKEY_free

+- CRYPTO_add_lock

+- locking_function

+- pthread_mutex_lock

 

 

Lock % 21 CPU % 50

+- EVP_MD_CTX_copy_ex

+- EVP_PKEY_CTX_dup

+- CRYPTO_add_lock

+- locking_function

+- pthread_mutex_lock

 

 

Lock % 5 CPU % -

+- EVP_PKEY_CTX_dup

+- CRYPTO_add_lock

+- locking_function

+- pthread_mutex_lock

 

 

Lock % 3 CPU % 3

+- EVP_PKEY_CTX_new

+- CRYPTO_add_lock

+- locking_function

+- pthread_mutex_lock

 

 

Lock % - CPU % 3

EVP_PKEY_free

 

 

Lock % 2 CPU % 2

pkey_hmac_copy

 

 

Lock % - CPU % 1

+- Connection::destroy()

+- Connection::close()

+- NZIO_Close

+- nzos_Destroy_Ctx

+- SSL_free

+- ssl_cert_free

+- ssl_cert_clear_certs

 

 

Lock % - CPU % 2

+- ERR_clear_error

+- ERR_get_state

+- int_thread_get_item

+- lh_retrieve

 

 

Locl % - CPU % 1

X509_check_private_key

 

 

Lock % - CPU % 1

sha1_block_data_order_ssse3

 

----------------------------------------------------------------------

 

After making all CRYPTO_add calls atomic, which breaks things elsewhere in

the TLS stack, we obtained the above performance improvement and these new

bottlenecks -- take these with more of a grain of salt:

 

Lock % 6 CPU % 13

SSL_new

+- 46.000 (7%) nzos_Create_Ctx

+- 38.530 (6%) SSL_new

+- 35.370 (6%) CRYPTO_new_ex_data

+- 35.310 (6%) int_new_ex_data

+- 34.360 (5%) def_get_class

+- 34.100 (5%) CRYPTO_lock

+- 34.020 (5%) locking_function

+- 32.620 (5%) pthread_mutex_lock

 

 

Lock % 6 CPU % 12

SSL_SESSION_new

+- 40.090 (6%) d2i_SSL_SESSION

+- 34.420 (6%) SSL_SESSION_new

+- 33.930 (5%) CRYPTO_new_ex_data

+- 33.880 (5%) int_new_ex_data

+- 32.630 (5%) def_get_class

+- 32.390 (5%) CRYPTO_lock

+- 32.370 (5%) locking_function

+- 30.860 (5%) pthread_mutex_lock

 

 

Lock % 9 CPU %22

BIO_new

+ BIO_new 1%

+ BIO_set 4%

+ CRYPTO_new_ex_data 5%

+ int_new_ex_data

+ def_get_class

+ CRYPTO_lock

+ locking_function

+ pthread_mutex_lock

 

 

Lock % 11 CPU % 26

BIO_free

+- BIO_free

+- CRYPTO_free_ex_data

+- int_free_ex_data

+- def_get_class

+- CRYPTO_lock

+- locking_function

+- pthread_mutex_lock

 

 

Lock % 17

tls1_PRF

+- tls1_PRF

+- tls1_P_hash

 

called from

 

+- 62.400 (10%) ssl3_get_finished

+- 62.120 (10%) ssl3_get_message

+- 44.940 (7%) ssl3_read_bytes

+- 28.530 (5%) ssl3_do_change_cipher_spec

+- 23.410 (4%) tls1_final_finish_mac

 

or

 

+- 16.300 (3%) ssl3_take_mac

+- 16.260 (3%) tls1_final_finish_mac

 

or

 

+- 17.430 (3%) ssl3_send_finished

+- 15.240 (2%) tls1_final_finish_mac

 

or

 

+- 65.480 (10%) tls1_setup_key_block

+- 62.890 (10%) tls1_generate_key_block

 

 

Lock % 6

ERR_clear_error

+- ERR_clear_error

+- ERR_get_state

+- int_thread_get_item

+- lh_retrieve

+- getrn

 

 

CPU % 18.93

SSL_free

 

 

CPU % 5.5

SSL_SESSION_free

 

 

 

> As Matt mentioned, I'm curently working exactly on this. Would it be possible

> for you to share your testing method and code? I'd like to verify that my

> patches actually go in the right direction before having them merged (or maybe

> you could do your tests on top of my patches, they should mostly work fine even

> if the work is not complete yet, I think).

 

I'm not sure I can share code and associated infrastructure at this point, we'd like to but we need approvals through the various internal channels.

It might be possible for us to run these tests against your patches and mail you the results (less than ideal but probably workable), I'd have to ask the engineer who did this to see if they can justify the time involved.

 

 

The bottom line is OpenSSL wants for finer grain locking, which the atomic operations provided. Having locks in the various contexts would achieve the same result and be less platform/compiler specific.

 

 

For reference, my proof of concept atomic patch was:

diff --git a/include/openssl/crypto.h b/include/openssl/crypto.h

index 56afc51..803d7b7 100644

--- a/include/openssl/crypto.h

+++ b/include/openssl/crypto.h

@@ -220,8 +220,13 @@ extern "C" {

CRYPTO_lock(CRYPTO_LOCK|CRYPTO_READ,type,__FILE__,__LINE__)

# define CRYPTO_r_unlock(type) \

CRYPTO_lock(CRYPTO_UNLOCK|CRYPTO_READ,type,__FILE__,__LINE__)

-# define CRYPTO_add(addr,amount,type) \

- CRYPTO_add_lock(addr,amount,type,__FILE__,__LINE__)

+# if defined(__GNUC__) || defined(__INTEL_COMPILER)

+# define CRYPTO_add(addr,amount,type) \

+ __sync_add_and_fetch(addr, amount)

+# else

+# define CRYPTO_add(addr,amount,type) \

+ CRYPTO_add_lock(addr,amount,type,__FILE__,__LINE__)

+# endif

# endif

# else

# define CRYPTO_w_lock(a)

This should never be applied, it breaks things and is quick and ugly.

 

 

Regards,

 

Pauli

 

--

Oracle

Dr Paul Dale | Cryptographer | Network Security & Encryption

Phone +61 7 3031 7217

Oracle Australia

 


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Peter Waltenberg
In reply to this post by Nico Williams

I wasn't saying there was anything wrong with mmap(), just that guard pages only work if you can guarantee your overrun hits the guard page (and doesn't just step over it). Large stack allocations increase the odds of 'stepping over' the guard pages. It's still better than not having guard pages, but they aren't a hard guarantee that you won't have mysterious bugs still.

You obviously realize that, but bn_prime() is the classic example of allocating very large chunks of memory on the stack.


As for fibre's, I doubt it'll work in general, the issue there is simply the range of OS's OpenSSL supports. If you wire it in you still have to run with man+dog+world in the process, that's a hard ask. One of the good points about OpenSSL up until now, it tends to not break those big messy apps where a whole lot of independly developed code ends up in the same process.


Peter


Inactive hide details for Nico Williams ---24/11/2015 10:42:57---On Mon, Nov 23, 2015 at 09:53:15PM +1000, Peter Waltenberg wroNico Williams ---24/11/2015 10:42:57---On Mon, Nov 23, 2015 at 09:53:15PM +1000, Peter Waltenberg wrote: >

From: Nico Williams <[hidden email]>
To: [hidden email]
Date: 24/11/2015 10:42
Subject: Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
Sent by: "openssl-dev" <[hidden email]>





On Mon, Nov 23, 2015 at 09:53:15PM +1000, Peter Waltenberg wrote:

>
> "
> Please do.  It will make this much safer.  Also, you might want to run
> some experiments to find the best stack size on each platform.  The
> smaller the stack you can get away with, the better.
> "
>
> It does, but it also requires code changes in a few places. probable_prime
> () in bn_prime.c being far and away the worst offender. We instrumented our
> test code so we could find out what the stack usage was, for libcrypto you
> can get it under 4k for 32 bit and under 8k for 64 bit code on x86 Linux.

Are you saying that using mmap() would be onerous?  Something else?

> FYI, nothing elegant there, just have your code allocate and fill a large
> stack array then add check points further down to see how far you've eaten
> into it.

Sure.

> "
> > > A guard page
> > > would allow one to safely tune down fiber stack size to the whatever
> > > OpenSSL actually needs for a given use.
> "
>
> Unless someone allocates a stack array larger than the size of the guard
> page and scribbles over another threads stack. This is another reason to
> never use large arrays on the stack.

alloca() and VLAs aren't safe for allocating more bytes than fit in a
guard page.  One should not use alloca()/VLAs for anything larger than
that.

This is no reason not to have a guard page!

This is a reason to have coding standards that address alloca()/VLAs.

> "
> Is there something wrong with that that I should know?  I suppose the
> test could use threads to make real sure that it's getting thread-
> locals, in case the compiler is simply ignoring __thread.  Are there
> compilers that ignore __thread??
> "
>
> Only that it's a compile time choice and OpenSSL is currently 'thread
> neutral' at runtime, not at compile time ?.

OpenSSL is "thread-neutral" at run-time as to locks and thread IDs
because of the lock/threadid callbacks.  But here we're talking about a
new feature (fibers) that uses thread-locals, and here using pthread
thread locals (pthread_getspecific()) clearly means no longer being
"thread-neutral" -- if I understand your definition of that term
anyways.

It's perfectly fine to use __thread in compiled code regardless of what
threading library is used, provided -of course- that __thread was
supported to begin with and that the compiler isn't lying.

> Compile time is easy, making this work at runtime is hard and occasionally
> is really valuable - i.e. way back in the dim distant path when Linux had
> multiple thread packages available.

If the compiler accepts __thread but allows it to break at run-time
depending on the available threading libraries, then the compiler is
broken and should now have allowed __thread to begin with.  I can't find
anything describing such brokenness.  If you assert such brokenness
exists then please post links or instructions for how to reproduce it.

BTW,
https://en.wikipedia.org/wiki/Thread-local_storage#C_and_C.2B.2B
says that even Visual Studio supports thread-locals.  Though there's a
caveat that requires some care at configuration time:

 On Windows versions before Vista and Server 2008, __declspec(thread)
 works in DLLs only when those DLLs are bound to the executable, and
 will not work for those loaded with LoadLibrary() (a protection fault
 or data corruption may occur).[9]

There must, of course, be compilers that don't support thread locals
(pcc?).  Wouldn't it be fair to say that OpenSSL simply doesn't support
fibers on those compilers?  I think so.

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe:
https://mta.openssl.org/mailman/listinfo/openssl-dev





_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Salz, Rich
In reply to this post by Peter Waltenberg

Ø  It does, but it also requires code changes in a few places. probable_prime() in bn_prime.c being far and away the worst offender.

This is fixed in master which uses malloc and free.  Actually, I think all egregious stack consumption has been fixed in master.


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Viktor Dukhovni
In reply to this post by Peter Waltenberg
On Tue, Nov 24, 2015 at 11:32:32AM +1000, Peter Waltenberg wrote:

> As for fibre's, I doubt it'll work in general, the issue there is simply
> the range of OS's OpenSSL supports. If you wire it in you still have to run
> with man+dog+world in the process, that's a hard ask. One of the good
> points about OpenSSL up until now, it tends to not break those big messy
> apps where a whole lot of independly developed code ends up in the same
> process.

It'll work because it has to be explicitly enabled by the application,
the OpenSSL library will provide the support on sufficiently capable
platforms, but they won't be used unless explicitly requested, and
they're only useful if there's some sort of asynchronous crypto
engine available.  They're light-weight glue to facilitate new
kinds of low-overhead asynchronous operations.  The use of fibres
will be rather rare until such engines become commonplace.

--
        Viktor.
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-team] Discussion: design issue: async and -lpthread

Nico Williams
In reply to this post by Peter Waltenberg
On Tue, Nov 24, 2015 at 11:32:32AM +1000, Peter Waltenberg wrote:
> I wasn't saying there was anything wrong with mmap(), just that guard pages
> only work if you can guarantee your overrun hits the guard page (and
> doesn't just step over it). Large stack allocations increase the odds of
> 'stepping over' the guard pages. It's still better than not having guard
> pages, but they aren't a hard guarantee that you won't have mysterious bugs
> still.
>
> You obviously realize that, but bn_prime() is the classic example of
> allocating very large chunks of memory on the stack.

Sure.  Rich Salz claims this is all fixed in master, so guard pages
strike me as a plus, not a wash.

> As for fibre's, I doubt it'll work in general, the issue there is simply
> the range of OS's OpenSSL supports. If you wire it in you still have to run
> with man+dog+world in the process, that's a hard ask. One of the good
> points about OpenSSL up until now, it tends to not break those big messy
> apps where a whole lot of independly developed code ends up in the same
> process.

That's... a joke, right?

OpenSSL very much does "break those big messy apps ...".  I don't see
the fibers project making that worse, nor better -- it's neutral.

Let me give you some examples.

Using Heimdal's libgssapi from Java JGSS (with the JNI GSS wrapper)
blows up due to no one initializing OpenSSL's lock callbacks.  Heimdal,
of course, uses OpenSSL.  Who should be initializing the lock callbacks?
Not the JVM -- why should it know/assume that some library used via
dlopen() will want OpenSSL?

But it's not a library's place to initialize OpenSSL's lock callbacks
either!  Suppose a Java program were loading via dlopen() *two*
libraries that use OpenSSL, and suppose different threads are racing to
do this.

This happens, and it happens because OpenSSL is used in so many
*libraries*, not just *programs*.  OpenSSL is a prime case of how a
library meant for use by programs comes to be used by libraries.
Another example is PKCS#11.  That's no excuse.  Use by libraries must be
supported.

And historically OpenSSL has been very bad at keeping its ABI
backwards-compatible, so DLL Hell cases often involve OpenSSL.

The more layers: the higher the likelihood of breakage involving
OpenSSL.  All threaded pluggable-software-using software (name services
switch, PAM, JNI, ...) is vulnerable to these problems.

If the OpenSSL team finally decides to do something about sane locking
by default, then it will be a huge improvement.  If this thread provides
the impetus, so much the better.

Nico
--
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
123