Cert hot-reloading

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Cert hot-reloading

David Arnold
Hi,

If you prefer this mailing list over github issues, I still want to ask for comments on:

Certificate hot-reloading #12753

Specifically, my impression is that this topic has died down a bit and from the linked mailing list threads, in my eye, no concrete conclusion was drawn.

I'm not sure how to rank this motion in the context of OpenSSL development, but I guess OpenSSL is used to producing ripple effects, so the man-hour 
argument might be a genuinely valid one.

Please inform my research about this issue with your comments!

BR, David A
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

Viktor Dukhovni
On Sun, Aug 30, 2020 at 05:45:41PM -0500, David Arnold wrote:

> If you prefer this mailing list over github issues, I still want to ask
> for comments on:
>
> Certificate hot-reloading #12753
> <https://github.com/openssl/openssl/issues/12753>
>
> Specifically, my impression is that this topic has died down a bit and
> from the linked mailing list threads, in my eye, no concrete conclusion
> was drawn.
>
> I'm not sure how to rank this motion in the context of OpenSSL
> development, but I guess OpenSSL is used to producing ripple effects,
> so the man-hour argument might be a genuinely valid one.
>
> Please inform my research about this issue with your comments!

This is a worthwhile topic.  It has a few interesting aspects:

    1.  Automatic key+cert reloads upon updates of key+cert chain PEM
        files.  This can be tricky when processes start privileged,
        load the certs and then drop privs, and are no longer able
        to reopen the key + cert chain file.

        - Here, for POSIX systems I'd go with an approach where
          it is the containing directory that is restricted to
          root or similar, and the actual cert files are group
          and or world readable.  The process can then keep
          the directory file descriptor open, and then openat(2)
          to periodically check the cert file, reloading when
          the metadata changes.

        - With non-POSIX systems, or applications that don't
          drop privs, the openat(2) is not needed, and one
          just checks the cert chain periodically.

        - Another option is to use passphrase-protected keys,
          and load the secret passphrase at process start from
          a separate read-protected file, while the actual
          private key + cert chain file is world readable,
          with the access control via protecting the passphrase
          file.

        - In all cases, it is important to keep both the private
          key and the cert in the same file, and open it just
          once to read both, avoiding races in which the key
          and cert are read in a way that results in one or
          the other being stale.

    2.  Having somehow obtained a new key + cert chain, one
        now wants to non-disruptively apply them to running
        servers.  Here there are two potential approaches:

        - Hot plug a new pointer into an existing SSL_CTX structure.
          While the update itself could be made atomic, the readers
          of such pointers might read them more than once to separately
          extract the key and the cert chain, without checking that
          they're using the same pointer for both operations.

          This is bound to be fragile, though not necessarily
          impossible.

        - Build a new SSL_CTX, and use it to accept *new* connections,
          while existing connections use whatever SSL_CTX they started
          with.  I believe this can work well, because "SSL" handles
          increment the reference count of the associated SSL_CTX
          when they're created, and decrement it when destroyed.

          So when you create a replacement SSL_CTX, you can just
          SSL_CTX_free() the old, and it will only actually
          be deleted when the last SSL connection tied to that
          SSL_CTX is destroyed.

          It is true that typical SSL_CTX construction is modestly
          expensive (loading CA stores and the like) but some of
          that could be handled by sharing and reference-counting
          the stores.

So my preferred approach would be to create a new SSL_CTX, and get new
connections using that.  Now in a multi-threaded server, it could be a
bit tricky to ensure that the SSL_CTX_free() does not happen before all
threads reading the pointer to the latest SSL_CTX see the new pointer
installed.  Something equivalent to RCU may be needed to ensure that the
free only happens after the new pointer is visible in all threads.

Designs addressing various parts of this would be cool, provided they're
well thought out, and not just single-use-case quick hacks.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

Karl Denninger


On 8/30/2020 19:28, Viktor Dukhovni wrote:
On Sun, Aug 30, 2020 at 05:45:41PM -0500, David Arnold wrote:

If you prefer this mailing list over github issues, I still want to ask 
for comments on:

Certificate hot-reloading #12753
<https://github.com/openssl/openssl/issues/12753>

Specifically, my impression is that this topic has died down a bit and 
from the linked mailing list threads, in my eye, no concrete conclusion 
was drawn.

I'm not sure how to rank this motion in the context of OpenSSL
development, but I guess OpenSSL is used to producing ripple effects,
so the man-hour argument might be a genuinely valid one.

Please inform my research about this issue with your comments!
This is a worthwhile topic.  It has a few interesting aspects:

    1.  Automatic key+cert reloads upon updates of key+cert chain PEM
        files.  This can be tricky when processes start privileged,
        load the certs and then drop privs, and are no longer able
        to reopen the key + cert chain file.

        - Here, for POSIX systems I'd go with an approach where
          it is the containing directory that is restricted to
          root or similar, and the actual cert files are group
          and or world readable.  The process can then keep
          the directory file descriptor open, and then openat(2)
          to periodically check the cert file, reloading when
          the metadata changes.

        - With non-POSIX systems, or applications that don't
          drop privs, the openat(2) is not needed, and one
          just checks the cert chain periodically.

        - Another option is to use passphrase-protected keys,
          and load the secret passphrase at process start from
          a separate read-protected file, while the actual
          private key + cert chain file is world readable,
          with the access control via protecting the passphrase
          file.

        - In all cases, it is important to keep both the private
          key and the cert in the same file, and open it just
          once to read both, avoiding races in which the key
          and cert are read in a way that results in one or
          the other being stale.

    2.  Having somehow obtained a new key + cert chain, one
        now wants to non-disruptively apply them to running
        servers.  Here there are two potential approaches:

        - Hot plug a new pointer into an existing SSL_CTX structure.
          While the update itself could be made atomic, the readers
          of such pointers might read them more than once to separately
          extract the key and the cert chain, without checking that
          they're using the same pointer for both operations.

          This is bound to be fragile, though not necessarily
          impossible.

        - Build a new SSL_CTX, and use it to accept *new* connections,
          while existing connections use whatever SSL_CTX they started
          with.  I believe this can work well, because "SSL" handles
          increment the reference count of the associated SSL_CTX
          when they're created, and decrement it when destroyed.

          So when you create a replacement SSL_CTX, you can just
          SSL_CTX_free() the old, and it will only actually
          be deleted when the last SSL connection tied to that
          SSL_CTX is destroyed.

          It is true that typical SSL_CTX construction is modestly
          expensive (loading CA stores and the like) but some of
          that could be handled by sharing and reference-counting
          the stores.

So my preferred approach would be to create a new SSL_CTX, and get new
connections using that.  Now in a multi-threaded server, it could be a
bit tricky to ensure that the SSL_CTX_free() does not happen before all
threads reading the pointer to the latest SSL_CTX see the new pointer
installed.  Something equivalent to RCU may be needed to ensure that the
free only happens after the new pointer is visible in all threads.

Designs addressing various parts of this would be cool, provided they're
well thought out, and not just single-use-case quick hacks.

This works now; I use it with an application that checks in with a license server and can grab a new cert.  OpenSSL appears to have no problem with setting up a new SSL_CTX and using it for new connections; the old ones continue onward until they terminate, and new ones are fine as well.

This appears to be be ok with the current code; I've yet to have it blow up in my face although at present the certs in question are reasonably long-lived.  Whether it's robust enough to handle very short-term certificates I do not know.

--
Karl Denninger
[hidden email]
The Market Ticker
[S/MIME encrypted email preferred]

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

JordanBrown
In reply to this post by David Arnold
Well, I can restate the problem that I encountered.

We deliver an integrated storage system.  Under the covers it is a modified Solaris running a usual collection of proprietary and open-source components.  We supply an administrative user interface that, among many other things, lets you manage a list of "trusted" certificates - typically CA certificates that a program would use to authenticate its peers.  That is, it's the equivalent of Firefox's Tools / Options / Privacy & Security / Certificates / View Certificates, and the "Servers" and "Authorities" tabs there, with the additional tidbit that for each certificate you can control which services (e.g. LDAP, et cetera) that certificate is trusted for.

When an administrator makes a change to the trusted-certificates list, we want that change to take effect, system-wide.

The problem is that that means that some number of processes with active OpenSSL contexts need to drop those contexts and recreate them, and we don't know which processes those are.  Client operations are typically driven through a library, not a separate daemon, and so there's no centralized way to know which processes might be TLS clients.  In addition, there's the question of how to *tell* the process to recreate the context.  Simply restarting them may involve disruption of various sorts.

What we'd like would be for OpenSSL to, on every authentication, stat the file or directory involved, and if it's changed then wipe the in-memory cache.

Yes, aspects of this are system-specific, but that's true of many things.  There could easily be an internal API that captures a current-stage object, and another that answers "is this still the same".  The default implementation could always say "yes".
-- 
Jordan Brown, Oracle ZFS Storage Appliance, Oracle Solaris
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

Kyle Hamilton
In reply to this post by Viktor Dukhovni
I'm not sure I can follow the "in all cases it's important to keep the key and cert in the same file" argument, particularly in line with openat() usage on the cert file after privilege to open the key file has been dropped.  I agree that key/cert staleness is important to address in some manner, but I don't think it's necessarily appropriate here.

I also don't think it's necessarily okay to add a new requirement that e.g. letsencrypt clients reconcatentate their keys and certs, and that all of the Apache-style configuration guides be rewritten to consolidate the key and cert files. On a simple certificate renewal without a rekey, the best current practice is sufficient.  (As well, a letsencrypt client would possibly need to run privileged in that scenario to reread the private key file in order to reconcatenate it, which is not currently actually necessary.  Increasing the privileges required for any non-OS service for any purpose that isn't related to OS kernel privilege requirements feels a bit disingenuous.)

Of course, if you want to alter the conditions which led to the best current practice (and impose retraining on everyone), that's a different matter.  But I still think increasing privilege requirements would be a bad thing, under the least-privilege principle.

-Kyle H

On Sun, Aug 30, 2020, 18:36 Viktor Dukhovni <[hidden email]> wrote:
On Sun, Aug 30, 2020 at 05:45:41PM -0500, David Arnold wrote:

> If you prefer this mailing list over github issues, I still want to ask
> for comments on:
>
> Certificate hot-reloading #12753
> <https://github.com/openssl/openssl/issues/12753>
>
> Specifically, my impression is that this topic has died down a bit and
> from the linked mailing list threads, in my eye, no concrete conclusion
> was drawn.
>
> I'm not sure how to rank this motion in the context of OpenSSL
> development, but I guess OpenSSL is used to producing ripple effects,
> so the man-hour argument might be a genuinely valid one.
>
> Please inform my research about this issue with your comments!

This is a worthwhile topic.  It has a few interesting aspects:

    1.  Automatic key+cert reloads upon updates of key+cert chain PEM
        files.  This can be tricky when processes start privileged,
        load the certs and then drop privs, and are no longer able
        to reopen the key + cert chain file.

        - Here, for POSIX systems I'd go with an approach where
          it is the containing directory that is restricted to
          root or similar, and the actual cert files are group
          and or world readable.  The process can then keep
          the directory file descriptor open, and then openat(2)
          to periodically check the cert file, reloading when
          the metadata changes.

        - With non-POSIX systems, or applications that don't
          drop privs, the openat(2) is not needed, and one
          just checks the cert chain periodically.

        - Another option is to use passphrase-protected keys,
          and load the secret passphrase at process start from
          a separate read-protected file, while the actual
          private key + cert chain file is world readable,
          with the access control via protecting the passphrase
          file.

        - In all cases, it is important to keep both the private
          key and the cert in the same file, and open it just
          once to read both, avoiding races in which the key
          and cert are read in a way that results in one or
          the other being stale.

    2.  Having somehow obtained a new key + cert chain, one
        now wants to non-disruptively apply them to running
        servers.  Here there are two potential approaches:

        - Hot plug a new pointer into an existing SSL_CTX structure.
          While the update itself could be made atomic, the readers
          of such pointers might read them more than once to separately
          extract the key and the cert chain, without checking that
          they're using the same pointer for both operations.

          This is bound to be fragile, though not necessarily
          impossible.

        - Build a new SSL_CTX, and use it to accept *new* connections,
          while existing connections use whatever SSL_CTX they started
          with.  I believe this can work well, because "SSL" handles
          increment the reference count of the associated SSL_CTX
          when they're created, and decrement it when destroyed.

          So when you create a replacement SSL_CTX, you can just
          SSL_CTX_free() the old, and it will only actually
          be deleted when the last SSL connection tied to that
          SSL_CTX is destroyed.

          It is true that typical SSL_CTX construction is modestly
          expensive (loading CA stores and the like) but some of
          that could be handled by sharing and reference-counting
          the stores.

So my preferred approach would be to create a new SSL_CTX, and get new
connections using that.  Now in a multi-threaded server, it could be a
bit tricky to ensure that the SSL_CTX_free() does not happen before all
threads reading the pointer to the latest SSL_CTX see the new pointer
installed.  Something equivalent to RCU may be needed to ensure that the
free only happens after the new pointer is visible in all threads.

Designs addressing various parts of this would be cool, provided they're
well thought out, and not just single-use-case quick hacks.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

David Arnold
Should aspects of an implementation be configurable behavior with a sane default? I'd guess so...

Hot-plugging the pointer seems to force atomicity considerations down-stream, which might be
educationally a good thing for openssl to press for. It also addresses Jordan's use case, for however
application specific it might be. For compat reasons, a "legacy" mode which creates a new context
for *new* connections might be the necessary "bridge" into that transformation.

For change detection: I think "on next authentication" has enough (or even better) guarantees over a periodic loop.

For file read atomicity: What are the options to keep letsencrypt & co at comfort? Although the hereditary 
"right (expectation) for comfort" is somewhat offset by a huge gain in functionality. It still feels like a convincing deal.

- add a staleness check on every change detection? (maybe costly?)
- consume a tar if clients want those guarantees? (opt-out or opt-out?)




On Sun, Aug 30, 2020 at 19:54, Kyle Hamilton <[hidden email]> wrote:
I'm not sure I can follow the "in all cases it's important to keep the key and cert in the same file" argument, particularly in line with openat() usage on the cert file after privilege to open the key file has been dropped.  I agree that key/cert staleness is important to address in some manner, but I don't think it's necessarily appropriate here.

I also don't think it's necessarily okay to add a new requirement that e.g. letsencrypt clients reconcatentate their keys and certs, and that all of the Apache-style configuration guides be rewritten to consolidate the key and cert files. On a simple certificate renewal without a rekey, the best current practice is sufficient.  (As well, a letsencrypt client would possibly need to run privileged in that scenario to reread the private key file in order to reconcatenate it, which is not currently actually necessary.  Increasing the privileges required for any non-OS service for any purpose that isn't related to OS kernel privilege requirements feels a bit disingenuous.)

Of course, if you want to alter the conditions which led to the best current practice (and impose retraining on everyone), that's a different matter.  But I still think increasing privilege requirements would be a bad thing, under the least-privilege principle.

-Kyle H

On Sun, Aug 30, 2020, 18:36 Viktor Dukhovni <[hidden email]> wrote:
On Sun, Aug 30, 2020 at 05:45:41PM -0500, David Arnold wrote:

> If you prefer this mailing list over github issues, I still want to ask
> for comments on:
>
> Certificate hot-reloading #12753
> <https://github.com/openssl/openssl/issues/12753>
>
> Specifically, my impression is that this topic has died down a bit and
> from the linked mailing list threads, in my eye, no concrete conclusion
> was drawn.
>
> I'm not sure how to rank this motion in the context of OpenSSL
> development, but I guess OpenSSL is used to producing ripple effects,
> so the man-hour argument might be a genuinely valid one.
>
> Please inform my research about this issue with your comments!

This is a worthwhile topic.  It has a few interesting aspects:

    1.  Automatic key+cert reloads upon updates of key+cert chain PEM
        files.  This can be tricky when processes start privileged,
        load the certs and then drop privs, and are no longer able
        to reopen the key + cert chain file.

        - Here, for POSIX systems I'd go with an approach where
          it is the containing directory that is restricted to
          root or similar, and the actual cert files are group
          and or world readable.  The process can then keep
          the directory file descriptor open, and then openat(2)
          to periodically check the cert file, reloading when
          the metadata changes.

        - With non-POSIX systems, or applications that don't
          drop privs, the openat(2) is not needed, and one
          just checks the cert chain periodically.

        - Another option is to use passphrase-protected keys,
          and load the secret passphrase at process start from
          a separate read-protected file, while the actual
          private key + cert chain file is world readable,
          with the access control via protecting the passphrase
          file.

        - In all cases, it is important to keep both the private
          key and the cert in the same file, and open it just
          once to read both, avoiding races in which the key
          and cert are read in a way that results in one or
          the other being stale.

    2.  Having somehow obtained a new key + cert chain, one
        now wants to non-disruptively apply them to running
        servers.  Here there are two potential approaches:

        - Hot plug a new pointer into an existing SSL_CTX structure.
          While the update itself could be made atomic, the readers
          of such pointers might read them more than once to separately
          extract the key and the cert chain, without checking that
          they're using the same pointer for both operations.

          This is bound to be fragile, though not necessarily
          impossible.

        - Build a new SSL_CTX, and use it to accept *new* connections,
          while existing connections use whatever SSL_CTX they started
          with.  I believe this can work well, because "SSL" handles
          increment the reference count of the associated SSL_CTX
          when they're created, and decrement it when destroyed.

          So when you create a replacement SSL_CTX, you can just
          SSL_CTX_free() the old, and it will only actually
          be deleted when the last SSL connection tied to that
          SSL_CTX is destroyed.

          It is true that typical SSL_CTX construction is modestly
          expensive (loading CA stores and the like) but some of
          that could be handled by sharing and reference-counting
          the stores.

So my preferred approach would be to create a new SSL_CTX, and get new
connections using that.  Now in a multi-threaded server, it could be a
bit tricky to ensure that the SSL_CTX_free() does not happen before all
threads reading the pointer to the latest SSL_CTX see the new pointer
installed.  Something equivalent to RCU may be needed to ensure that the
free only happens after the new pointer is visible in all threads.

Designs addressing various parts of this would be cool, provided they're
well thought out, and not just single-use-case quick hacks.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

Kyle Hamilton
In reply to this post by JordanBrown
Could this be dealt with by the simple removal of any caching layer
between an SSL_CTX and a directory processed by openssl c_rehash?
Would reading the filesystem on every certificate verification be too
heavy for your use case?

On Sun, Aug 30, 2020 at 7:20 PM Jordan Brown
<[hidden email]> wrote:

>
> Well, I can restate the problem that I encountered.
>
> We deliver an integrated storage system.  Under the covers it is a modified Solaris running a usual collection of proprietary and open-source components.  We supply an administrative user interface that, among many other things, lets you manage a list of "trusted" certificates - typically CA certificates that a program would use to authenticate its peers.  That is, it's the equivalent of Firefox's Tools / Options / Privacy & Security / Certificates / View Certificates, and the "Servers" and "Authorities" tabs there, with the additional tidbit that for each certificate you can control which services (e.g. LDAP, et cetera) that certificate is trusted for.
>
> When an administrator makes a change to the trusted-certificates list, we want that change to take effect, system-wide.
>
> The problem is that that means that some number of processes with active OpenSSL contexts need to drop those contexts and recreate them, and we don't know which processes those are.  Client operations are typically driven through a library, not a separate daemon, and so there's no centralized way to know which processes might be TLS clients.  In addition, there's the question of how to *tell* the process to recreate the context.  Simply restarting them may involve disruption of various sorts.
>
> What we'd like would be for OpenSSL to, on every authentication, stat the file or directory involved, and if it's changed then wipe the in-memory cache.
>
> Yes, aspects of this are system-specific, but that's true of many things.  There could easily be an internal API that captures a current-stage object, and another that answers "is this still the same".  The default implementation could always say "yes".
>
> --
> Jordan Brown, Oracle ZFS Storage Appliance, Oracle Solaris
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

Karl Denninger
In reply to this post by JordanBrown
On 8/30/2020 20:19, Jordan Brown wrote:
Well, I can restate the problem that I encountered.

We deliver an integrated storage system.  Under the covers it is a modified Solaris running a usual collection of proprietary and open-source components.  We supply an administrative user interface that, among many other things, lets you manage a list of "trusted" certificates - typically CA certificates that a program would use to authenticate its peers.  That is, it's the equivalent of Firefox's Tools / Options / Privacy & Security / Certificates / View Certificates, and the "Servers" and "Authorities" tabs there, with the additional tidbit that for each certificate you can control which services (e.g. LDAP, et cetera) that certificate is trusted for.

When an administrator makes a change to the trusted-certificates list, we want that change to take effect, system-wide.

The problem is that that means that some number of processes with active OpenSSL contexts need to drop those contexts and recreate them, and we don't know which processes those are.  Client operations are typically driven through a library, not a separate daemon, and so there's no centralized way to know which processes might be TLS clients.  In addition, there's the question of how to *tell* the process to recreate the context.  Simply restarting them may involve disruption of various sorts.

What we'd like would be for OpenSSL to, on every authentication, stat the file or directory involved, and if it's changed then wipe the in-memory cache.

Yes, aspects of this are system-specific, but that's true of many things.  There could easily be an internal API that captures a current-stage object, and another that answers "is this still the same".  The default implementation could always say "yes".

I'm trying to figure out why you want to replace the context in an *existing* connection that is currently passing data rather than for new ones.

For new ones, as I've noted, it already works as you'd likely expect it to work, at least in my use case, including in multiple threads where the context is picked up and used for connections in more than one place.  I've had no trouble with this and a perusal of the documentation (but not the code in depth) suggested it would be safe due to how OpenSSL does reference counts.

While some of the client connections to the back end in my use case are "controlled" (an app on a phone, for example, so I could have control over what happens on the other end and could, for example, send down a sequence demanding the client close and reconnect) there is also a general web-style interface so the connecting party could be on any of the commodity web browsers, over which I have no code control.

Example meta-code:

get_lock(mutex)

if (web_context) { /* If there is an existing context then free it up */
    SSL_CTX_free(web_context);
    www_context = NULL;    /* It is not ok to attempt to use SSL */
}

www_context = SSL_CTX_new(server_method);    /* Now get a new context */
.... (set options, callbacks, verification requirement on certs presented, DH and ECDH preferences, cert and key, etc)
if NOT (happy with the previous sequence of options, setting key and cert, etc) {
    SSL_CTX_free(web_context);
    web_context = NULL;
}

unlock(mutex)

Then in the code that actually accepts a new connection:

get_lock(mutex)

if (web_context) {
    ssl_socket = starttls(inbound_socket, www_context, &error);
    .... check non-null to know it's ok, if it is, store and use it
}

unlock(mutex)

("starttls" does an SSL_new on the context passed, does the SSL_set_fd and SSL_accept, etc, handles any errors generated from that and if everything is ok returns the SSL structure)

I've had no trouble with this for a good long time; if there are existing connections they continue to run on the previous www_context until they close.  New connections come off the new one.  You just have to run a mutex to make sure that you don't try to create a new connection while the "re-keying" is "in process".

--
Karl Denninger
[hidden email]
The Market Ticker
[S/MIME encrypted email preferred]

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

JordanBrown
In reply to this post by David Arnold
On 8/30/2020 7:24 PM, David Arnold wrote:
Hot-plugging the pointer seems to force atomicity considerations down-stream, which might be
educationally a good thing for openssl to press for. It also addresses Jordan's use case, for however
application specific it might be. For compat reasons, a "legacy" mode which creates a new context
for *new* connections might be the necessary "bridge" into that transformation.

I haven't particularly thought about the implementation; that seemed like Just Work.  There might need to be reference counts on the structures involved so that they can be safely "freed" while they are in active use by another thread.  Simply swapping out a pointer isn't going to be enough because you can't know whether another thread already picked up a copy of that pointer and so you can't know when you can free the old structure.  As I think about it more, there might be a challenge fitting such a mechanism into the existing functions.

-- 
Jordan Brown, Oracle ZFS Storage Appliance, Oracle Solaris
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

JordanBrown
In reply to this post by Kyle Hamilton
On 8/30/2020 10:26 PM, Kyle Hamilton wrote:
Could this be dealt with by the simple removal of any caching layer
between an SSL_CTX and a directory processed by openssl c_rehash?
Would reading the filesystem on every certificate verification be too
heavy for your use case?

That might well be sufficient.  Rereading the file would probably be low-cost compared to the network connection.

--
Jordan Brown, Oracle ZFS Storage Appliance, Oracle Solaris
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

JordanBrown
In reply to this post by Karl Denninger
On 8/31/2020 6:29 AM, Karl Denninger wrote:

I'm trying to figure out why you want to replace the context in an *existing* connection that is currently passing data rather than for new ones.


No, not for existing connections, just for new ones using the same context.

Note that I'm interested in the client case, not the server case - in the list of trusted certificates set up with SSL_CTX_load_verify_locations().  (Though the same issues, and maybe more, would apply to a server that is verifying client certificates.)

The hypothetical application does something like:

ctx = set_up_ctx();
forever {
    ...
    connection = new_connection(ctx);
    ...
    close_connection(connection)
    ...
}

The application could certainly create the context before making each connection, but probably doesn't - after all, the whole idea of contexts is to make one and then use it over and over again.

It's been a very long time since I last really looked at this[*], but I believe that I experimentally verified that simply deleting a certificate from the file system was not enough to make future connections refuse that certificate.  *Adding* a certificate to the directory works, because there's no negative caching, but *removing* one doesn't work.
[*] Which tells you that although my purist sense says that it would be nice to have and would improve correctness, customers aren't lined up waiting for it.
-- 
Jordan Brown, Oracle ZFS Storage Appliance, Oracle Solaris
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

Viktor Dukhovni
In reply to this post by Kyle Hamilton
On Sun, Aug 30, 2020 at 07:54:34PM -0500, Kyle Hamilton wrote:

> I'm not sure I can follow the "in all cases it's important to keep the key
> and cert in the same file" argument, particularly in line with openat()
> usage on the cert file after privilege to open the key file has been
> dropped.  I agree that key/cert staleness is important to address in some
> manner, but I don't think it's necessarily appropriate here.

Well, the OP had in mind very frequent certificate chain rollover, where
presumably, in at least some deployments also the key would roll over
frequently along with the cert.

If the form of the key/cert rollover is to place new keys and certs into
files, then *atomicity* of these updates becomes important, so that
applications loading a new key+chain pair see a matching key and
certificate and not some cert unrelated to the key.

This, e.g., Postfix now supports loading both the key and the cert
directly from the same open file, reading both sequentially, without
racing atomic file replacements when reopening the file separately
to reach keys and certs.

If we're going to automate things more, and exercise them with much
higher frequency.  The automation needs to be robust!

Note that nothing prevents applications that have separate configuration
for the key and cert locations from opening the same file twice.  If
they're using the normal OpenSSL PEM read key/cert routines, the key
is ignored when reading certs and the certs are ignored when reading
the key.

Therefore, the single-file model is unconditionally superior in this
context.  Yes, some tools (e.g. certbot), don't yet do the right
thing and atomically update a single file with both the key and the
obtained certs.  This problem can be solved.  We're talking about
new capabilities here, and don't need to adhere to outdated process
models.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

OpenSSL - User mailing list
On 2020-09-01 01:52, Viktor Dukhovni wrote:

> On Sun, Aug 30, 2020 at 07:54:34PM -0500, Kyle Hamilton wrote:
>
>> I'm not sure I can follow the "in all cases it's important to keep
>> the key
>> and cert in the same file" argument, particularly in line with openat()
>> usage on the cert file after privilege to open the key file has been
>> dropped. I agree that key/cert staleness is important to address in some
>> manner, but I don't think it's necessarily appropriate here.
> Well, the OP had in mind very frequent certificate chain rollover, where
> presumably, in at least some deployments also the key would roll over
> frequently along with the cert.
>
> If the form of the key/cert rollover is to place new keys and certs into
> files, then *atomicity* of these updates becomes important, so that
> applications loading a new key+chain pair see a matching key and
> certificate and not some cert unrelated to the key.
>
> This, e.g., Postfix now supports loading both the key and the cert
> directly from the same open file, reading both sequentially, without
> racing atomic file replacements when reopening the file separately
> to reach keys and certs.
>
> If we're going to automate things more, and exercise them with much
> higher frequency. The automation needs to be robust!
Another synchronization method would be for the application to decree a
specific order of changing the two files, such that triggering reload on
the second file would correctly load the matching contents of the other.

If a future OpenSSL version includes an option to detect such change,
documentation as to which file it watches for changes would guide
applications in choosing which order to specify for changing the files.


> Note that nothing prevents applications that have separate configuration
> for the key and cert locations from opening the same file twice. If
> they're using the normal OpenSSL PEM read key/cert routines, the key
> is ignored when reading certs and the certs are ignored when reading
> the key.
>
> Therefore, the single-file model is unconditionally superior in this
> context. Yes, some tools (e.g. certbot), don't yet do the right
> thing and atomically update a single file with both the key and the
> obtained certs. This problem can be solved. We're talking about
> new capabilities here, and don't need to adhere to outdated process
> models.
>
Given the practical imposibility of managing atomic changes to a single
POSIX file of variable-length data, it will often be more practical to
create a complete replacement file, then replace the filename with the
"mv -f" command or rename(3) function.  This would obviously only work
if the directory remains accessible to the application, after it drops
privileges and/or enters a chroot jail, as will already be the case
for hashed certificate/crl directories.



Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

Viktor Dukhovni
> On Aug 31, 2020, at 10:57 PM, Jakob Bohm via openssl-users <[hidden email]> wrote:
>
> Given the practical imposibility of managing atomic changes to a single
> POSIX file of variable-length data, it will often be more practical to
> create a complete replacement file, then replace the filename with the
> "mv -f" command or rename(3) function.  This would obviously only work
> if the directory remains accessible to the application, after it drops
> privileges and/or enters a chroot jail, as will already be the case
> for hashed certificate/crl directories.

There is no such "impossibility", indeed that's what the rename(2) system
call is for.  It atomically replaces files.  Note that mv(1) can hide
non-atomic copies across file-system boundaries and should be used with
care.

And this is why I mentioned retaining an open directory handle, openat(2),
...

There's room here to design a robust process, if one is willing to impose
reasonable constraints on the external agents that orchestrate new cert
chains.

As for updating two files in a particular order, and reacting only to
changes in the one that's updated second, this behaves poorly when
updates are racing an application cold start.  The single file approach,
by being more restrictive, is in fact more robust in ways that are not
easy to emulate with multiple files.

If someone implements a robust design with multiple files, great.  I for
one don't know of an in principle decent way to do that without various
races, other than somewhat kludgey retry loops in the application (or
library) when it finds a mismatch between the cert and the key.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

David Arnold
1. Construe symlinks to current certs in a folder (old or new / file by file)
2. Symlink that folder
3. Rename the current symlink to that new symlink atomically.

On OpenSSL side statd would have to follow through on symlinks - if it shouldnt do so.

This is +- how kubernetes atomically provisions config maps and secrets to pods.

So there is a precedence for applications to follow this pattern.

I totally agree, that those constraints shall be put on applications in order to have the freedom to focuse on a sound design.

If openssl really wanted to make it easy it would provide an independent helper that would do exactly this operation on behalf of non-complying applications.

Does it look like we are actually getting somewhere here?

I'd still better understand why atomic pointer swaps can be difficult and how this can be mitigated. I'm sensing a bold move for a sounder certificate consumption is possible there too (with potential upsides further down). Do I sense right?


El lunes, 31 de agosto de 2020, Viktor Dukhovni <[hidden email]> escribió:
> On Aug 31, 2020, at 10:57 PM, Jakob Bohm via openssl-users <[hidden email]> wrote:
>
> Given the practical imposibility of managing atomic changes to a single
> POSIX file of variable-length data, it will often be more practical to
> create a complete replacement file, then replace the filename with the
> "mv -f" command or rename(3) function.  This would obviously only work
> if the directory remains accessible to the application, after it drops
> privileges and/or enters a chroot jail, as will already be the case
> for hashed certificate/crl directories.

There is no such "impossibility", indeed that's what the rename(2) system
call is for.  It atomically replaces files.  Note that mv(1) can hide
non-atomic copies across file-system boundaries and should be used with
care.

And this is why I mentioned retaining an open directory handle, openat(2),
...

There's room here to design a robust process, if one is willing to impose
reasonable constraints on the external agents that orchestrate new cert
chains.

As for updating two files in a particular order, and reacting only to
changes in the one that's updated second, this behaves poorly when
updates are racing an application cold start.  The single file approach,
by being more restrictive, is in fact more robust in ways that are not
easy to emulate with multiple files.

If someone implements a robust design with multiple files, great.  I for
one don't know of an in principle decent way to do that without various
races, other than somewhat kludgey retry loops in the application (or
library) when it finds a mismatch between the cert and the key.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

Viktor Dukhovni
On Mon, Aug 31, 2020 at 11:00:31PM -0500, David Arnold wrote:

> 1. Construe symlinks to current certs in a folder (old or new / file by file)
> 2. Symlink that folder
> 3. Rename the current symlink to that new symlink atomically.

This is fine, but does not provide atomicity of access across files in
that directory.  It just lets you prepare the new directory with
non-atomic operations on the list of published files or file content.

But if clients need to see consistent content across files, this does
not solve the problem, a client might read one file before the symlink
is updated and another file after.  To get actual atomicity, the client
would need to be sure to open a directory file descriptor, and then
openat(2) to read each file relative to the directory in question.

Most application code is not written that way, but conceivably OpenSSL
could have an interface for loading a key and certchain from two (or
perhaps even more for the cert chain) files relative to a given
directory.  I know how to do this on modern Unix systems, no idea
whether something similar is possible on Windows.

The above is *complicated*.  Requiring a single file for both key and
cert is far simpler.  Either PEM with key + cert or perhaps (under
duress) even PKCS#12.


> Does it look like we are actually getting somewhere here?

So far, not much, just some rough notes on the obvious obstacles.
There's a lot more to do to design a usable framework for always fresh
keys.  Keeping it portable between Windows and Unix (assuming MacOS will
be sufficiently Unix-like) and gracefully handling processes that drop
privs will be challenging.

Not all applications will want the same approach, so there'd need to be
various knobs to set to choose one of the supported modes.  Perhaps
the sanest approach (but one that does nothing for legacy applications)
is to provide an API that returns the *latest* SSL_CTX via some new
handle that under the covers constructs a new SSL_CTX as needed.

    SSL_CTX *SSL_Factory_get1_CTX(SSL_CTX_FACTORY *);

This would yield a reference-counted SSL_CTX that each caller must
ultimately release via SSL_CTX_free() to avoid a leak.

    ... factory construction API calls ...
    ctx = SSL_Factory_get1_CTX(factory);    -- ctx ref count >= 1
    SSL *ssl = SSL_CTX_new(ctx);            -- ctx ref count >= 2
    ...
    SSL_free(ssl);                          -- ctx ref count >= 1
    SSL_CTX_free(ctx);                      -- ctx may be freed here

To address the needs of legacy clients is harder, because they
expect an SSL_CTX "in hand" to be valid indefinitely, but now
we want to be able age out and free old contexts, so we want
some mechanism by which it becomes safe to free old contexts
that we're sure no thread is still using.  This is difficult
to do right, because some thread may be blocked for a long
time, before becoming active again and using an already known
SSL_CTX pointer.

It is not exactly clear how multi-threaded unmodified legacy software
can be ensured crash free without memory leaks while behind the scenes
we're constantly mutating the SSL_CTX.  Once a pointer to an SSL_CTX
has been read, it might be squirreled away in all kinds of places, and
here's just no way to know that it won't be used indefinitely.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

David Arnold
A SSL_CTX api seem like a good idea to provide additional guarantees to applications.

Maybe Openssl - used as a library - can return to the other legacy applications that the certificate is "deemed not valid any more" whenever they try to use an outdated pointer?

This ought to be a transparent scenario for a legacy application which *at the same time* also do frequent cert rolling.

Would it be appropriate to record some excerpts of this discussion in github gist? I can be the secretary, if that would be uncontroversial.

El lunes, 31 de agosto de 2020, Viktor Dukhovni <[hidden email]> escribió:
On Mon, Aug 31, 2020 at 11:00:31PM -0500, David Arnold wrote:

> 1. Construe symlinks to current certs in a folder (old or new / file by file)
> 2. Symlink that folder
> 3. Rename the current symlink to that new symlink atomically.

This is fine, but does not provide atomicity of access across files in
that directory.  It just lets you prepare the new directory with
non-atomic operations on the list of published files or file content.

But if clients need to see consistent content across files, this does
not solve the problem, a client might read one file before the symlink
is updated and another file after.  To get actual atomicity, the client
would need to be sure to open a directory file descriptor, and then
openat(2) to read each file relative to the directory in question.

Most application code is not written that way, but conceivably OpenSSL
could have an interface for loading a key and certchain from two (or
perhaps even more for the cert chain) files relative to a given
directory.  I know how to do this on modern Unix systems, no idea
whether something similar is possible on Windows.

The above is *complicated*.  Requiring a single file for both key and
cert is far simpler.  Either PEM with key + cert or perhaps (under
duress) even PKCS#12.


> Does it look like we are actually getting somewhere here?

So far, not much, just some rough notes on the obvious obstacles.
There's a lot more to do to design a usable framework for always fresh
keys.  Keeping it portable between Windows and Unix (assuming MacOS will
be sufficiently Unix-like) and gracefully handling processes that drop
privs will be challenging.

Not all applications will want the same approach, so there'd need to be
various knobs to set to choose one of the supported modes.  Perhaps
the sanest approach (but one that does nothing for legacy applications)
is to provide an API that returns the *latest* SSL_CTX via some new
handle that under the covers constructs a new SSL_CTX as needed.

    SSL_CTX *SSL_Factory_get1_CTX(SSL_CTX_FACTORY *);

This would yield a reference-counted SSL_CTX that each caller must
ultimately release via SSL_CTX_free() to avoid a leak.

    ... factory construction API calls ...
    ctx = SSL_Factory_get1_CTX(factory);    -- ctx ref count >= 1
    SSL *ssl = SSL_CTX_new(ctx);            -- ctx ref count >= 2
    ...
    SSL_free(ssl);                          -- ctx ref count >= 1
    SSL_CTX_free(ctx);                      -- ctx may be freed here

To address the needs of legacy clients is harder, because they
expect an SSL_CTX "in hand" to be valid indefinitely, but now
we want to be able age out and free old contexts, so we want
some mechanism by which it becomes safe to free old contexts
that we're sure no thread is still using.  This is difficult
to do right, because some thread may be blocked for a long
time, before becoming active again and using an already known
SSL_CTX pointer.

It is not exactly clear how multi-threaded unmodified legacy software
can be ensured crash free without memory leaks while behind the scenes
we're constantly mutating the SSL_CTX.  Once a pointer to an SSL_CTX
has been read, it might be squirreled away in all kinds of places, and
here's just no way to know that it won't be used indefinitely.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

Viktor Dukhovni
On Tue, Sep 01, 2020 at 12:22:30AM -0500, David Arnold wrote:

> A SSL_CTX api seem like a good idea to provide additional guarantees to
> applications.
>
> Maybe Openssl - used as a library - can return to the other legacy
> applications that the certificate is "deemed not valid any more" whenever
> they try to use an outdated pointer?
>
> This ought to be a transparent scenario for a legacy application which *at
> the same time* also do frequent cert rolling.
>
> Would it be appropriate to record some excerpts of this discussion in
> github gist? I can be the secretary, if that would be uncontroversial.
>

By all means, some (who don't follow the list, and in any case prefer
a long-term record of this sort of issue) would rather appreciate
you doing that.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

OpenSSL - User mailing list
In reply to this post by Viktor Dukhovni
On 2020-09-01 04:26, Viktor Dukhovni wrote:

>> On Aug 31, 2020, at 10:57 PM, Jakob Bohm via openssl-users <[hidden email]> wrote:
>>
>> Given the practical imposibility of managing atomic changes to a single
>> POSIX file of variable-length data, it will often be more practical to
>> create a complete replacement file, then replace the filename with the
>> "mv -f" command or rename(3) function.  This would obviously only work
>> if the directory remains accessible to the application, after it drops
>> privileges and/or enters a chroot jail, as will already be the case
>> for hashed certificate/crl directories.
> There is no such "impossibility", indeed that's what the rename(2) system
> call is for.  It atomically replaces files.  Note that mv(1) can hide
> non-atomic copies across file-system boundaries and should be used with
> care.
Note that rename(3) and link(2) do replace the file name, by making the
replaced name point to a new inode, thus it would not work with calls
thatmonitor an inode for content or statis change.

There is no basic series of I/O calls that completely replace file contents
inone step, in particular write(2) doesn't shorten the file if the new
contentsis smaller than the old contents.

> And this is why I mentioned retaining an open directory handle, openat(2),
> ...
>
> There's room here to design a robust process, if one is willing to impose
> reasonable constraints on the external agents that orchestrate new cert
> chains.
>
> As for updating two files in a particular order, and reacting only to
> changes in the one that's updated second, this behaves poorly when
> updates are racing an application cold start.  The single file approach,
> by being more restrictive, is in fact more robust in ways that are not
> easy to emulate with multiple files.
What exactly is that "cold start" race you are talking about?

Obviously, coding the logic to react badly to only one of two
files being present would not work with rules that one of those
two needs to arrive/change after the other.

>
> If someone implements a robust design with multiple files, great.  I for
> one don't know of an in principle decent way to do that without various
> races, other than somewhat kludgey retry loops in the application (or
> library) when it finds a mismatch between the cert and the key.
>


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Reply | Threaded
Open this post in threaded view
|

Re: Cert hot-reloading

OpenSSL - User mailing list
In reply to this post by Viktor Dukhovni
On 2020-09-01 06:57, Viktor Dukhovni wrote:

> On Mon, Aug 31, 2020 at 11:00:31PM -0500, David Arnold wrote:
>
>> 1. Construe symlinks to current certs in a folder (old or new / file by file)
>> 2. Symlink that folder
>> 3. Rename the current symlink to that new symlink atomically.
> This is fine, but does not provide atomicity of access across files in
> that directory.  It just lets you prepare the new directory with
> non-atomic operations on the list of published files or file content.
>
> But if clients need to see consistent content across files, this does
> not solve the problem, a client might read one file before the symlink
> is updated and another file after.  To get actual atomicity, the client
> would need to be sure to open a directory file descriptor, and then
> openat(2) to read each file relative to the directory in question.
>
> Most application code is not written that way, but conceivably OpenSSL
> could have an interface for loading a key and certchain from two (or
> perhaps even more for the cert chain) files relative to a given
> directory.  I know how to do this on modern Unix systems, no idea
> whether something similar is possible on Windows.
On NT-based window, the undocumented Zw family of file I/O syscalls
would do what you call "openat()", "current dir" is in fact a directory
handle plus string equivalent stored in a user mode variable in one
of the core shared objects, which is why rmdir fails if it is the current
directory of any process.

> The above is *complicated*.  Requiring a single file for both key and
> cert is far simpler.  Either PEM with key + cert or perhaps (under
> duress) even PKCS#12.
>
>
>> Does it look like we are actually getting somewhere here?
> So far, not much, just some rough notes on the obvious obstacles.
> There's a lot more to do to design a usable framework for always fresh
> keys.  Keeping it portable between Windows and Unix (assuming MacOS will
> be sufficiently Unix-like) and gracefully handling processes that drop
> privs will be challenging.
>
> Not all applications will want the same approach, so there'd need to be
> various knobs to set to choose one of the supported modes.  Perhaps
> the sanest approach (but one that does nothing for legacy applications)
> is to provide an API that returns the *latest* SSL_CTX via some new
> handle that under the covers constructs a new SSL_CTX as needed.
>
>      SSL_CTX *SSL_Factory_get1_CTX(SSL_CTX_FACTORY *);
>
> This would yield a reference-counted SSL_CTX that each caller must
> ultimately release via SSL_CTX_free() to avoid a leak.
>
>      ... factory construction API calls ...
>      ctx = SSL_Factory_get1_CTX(factory);    -- ctx ref count >= 1
>      SSL *ssl = SSL_CTX_new(ctx);            -- ctx ref count >= 2
>      ...
>      SSL_free(ssl);                          -- ctx ref count >= 1
>      SSL_CTX_free(ctx);                      -- ctx may be freed here
>
> To address the needs of legacy clients is harder, because they
> expect an SSL_CTX "in hand" to be valid indefinitely, but now
> we want to be able age out and free old contexts, so we want
> some mechanism by which it becomes safe to free old contexts
> that we're sure no thread is still using.  This is difficult
> to do right, because some thread may be blocked for a long
> time, before becoming active again and using an already known
> SSL_CTX pointer.
>
> It is not exactly clear how multi-threaded unmodified legacy software
> can be ensured crash free without memory leaks while behind the scenes
> we're constantly mutating the SSL_CTX.  Once a pointer to an SSL_CTX
> has been read, it might be squirreled away in all kinds of places, and
> here's just no way to know that it won't be used indefinitely.
>


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

12