resumption problem

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

resumption problem

Jeremy Harris
OpenSSL 1.1.1  on Centos 8
Ticket-based resumption


I'm getting a repeatable error from a client call to SSL_connect()
of "14228044:SSL routines:construct_ca_names:internal error".

Packet capture shows an Alert being sent by the client before
anything is received from the server.

The error only occurs when the client is trying to resume
a previous session, and (here's the odd part) only when
the client is set up to offer a client certificate.

[I can change the client config to stop it offering this
client-cert, and the resumption works just fine]


I *think* possibly also the precise nature of that client cert
matters; a testcase I set up away from my production
system failed to induce the error.  The client cert
is loaded using SSL_CTX_use_certificate_chain_file();
the file contains a private-key and a 3-element chain
with a Lets Encrypt cert (leaf, signer, CA-root).
The CA is sha1/rsa, the other two are sha256/rsa.


The preceding TLS session is logged as using
"TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256"



Any ideas?
--
Thanks,
  Jeremy
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

OpenSSL - User mailing list
On Mon, Mar 23, 2020 at 11:46:43PM +0000, Jeremy Harris wrote:

> OpenSSL 1.1.1  on Centos 8
> Ticket-based resumption
>
>
> I'm getting a repeatable error from a client call to SSL_connect()
> of "14228044:SSL routines:construct_ca_names:internal error".
>
> Packet capture shows an Alert being sent by the client before
> anything is received from the server.
>
> The error only occurs when the client is trying to resume
> a previous session, and (here's the odd part) only when
> the client is set up to offer a client certificate.
>
> [I can change the client config to stop it offering this
> client-cert, and the resumption works just fine]
>
>
> I *think* possibly also the precise nature of that client cert
> matters; a testcase I set up away from my production
> system failed to induce the error.  The client cert
> is loaded using SSL_CTX_use_certificate_chain_file();
> the file contains a private-key and a 3-element chain
> with a Lets Encrypt cert (leaf, signer, CA-root).
> The CA is sha1/rsa, the other two are sha256/rsa.

Try omitting the (sha1) CA from the file?

-Ben
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Viktor Dukhovni
In reply to this post by Jeremy Harris
On Mon, Mar 23, 2020 at 11:46:43PM +0000, Jeremy Harris wrote:

> OpenSSL 1.1.1  on Centos 8
> Ticket-based resumption

I'm testing posttls-finger with OpenSSL 1.1.1 on FreeBSD.

>
> I'm getting a repeatable error from a client call to SSL_connect()
> of "14228044:SSL routines:construct_ca_names:internal error".

Either issues allocating space, or an explicit NULL element
on the client CA list, or a DN that for some reason can't
be serialized.  Is there an issuer with an subject DN?

> The error only occurs when the client is trying to resume
> a previous session, and (here's the odd part) only when
> the client is set up to offer a client certificate.

My test with "posttls-finger" does not exhibit this issue:

    Verified TLS connection established to [...] TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
    Reconnecting after 2 seconds
    looking for session [...] in memory cache
    reloaded session [...] from memory cache
    [...] Reusing old session
    Verified TLS connection established to [...] TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)

> Any ideas?

The entire function body is below.  There are three places where that
error could be raised:

    int construct_ca_names(SSL *s, const STACK_OF(X509_NAME) *ca_sk, WPACKET *pkt)
    {
        /* Start sub-packet for client CA list */
        if (!WPACKET_start_sub_packet_u16(pkt)) {
-->         SSLfatal(s, SSL_AD_INTERNAL_ERROR, SSL_F_CONSTRUCT_CA_NAMES,
                     ERR_R_INTERNAL_ERROR);
            return 0;
        }

        if (ca_sk != NULL) {
            int i;

            for (i = 0; i < sk_X509_NAME_num(ca_sk); i++) {
                unsigned char *namebytes;
                X509_NAME *name = sk_X509_NAME_value(ca_sk, i);
                int namelen;

                if (name == NULL
                        || (namelen = i2d_X509_NAME(name, NULL)) < 0
                        || !WPACKET_sub_allocate_bytes_u16(pkt, namelen,
                                                           &namebytes)
                        || i2d_X509_NAME(name, &namebytes) != namelen) {
-->                 SSLfatal(s, SSL_AD_INTERNAL_ERROR, SSL_F_CONSTRUCT_CA_NAMES,
                             ERR_R_INTERNAL_ERROR);
                    return 0;
                }
            }
        }

        if (!WPACKET_close(pkt)) {
-->         SSLfatal(s, SSL_AD_INTERNAL_ERROR, SSL_F_CONSTRUCT_CA_NAMES,
                     ERR_R_INTERNAL_ERROR);
            return 0;
        }

        return 1;
    }

I'm guessing it is not the first.  The second would an issue with a
particular issuer on the CA list (does Exim configure a list of CAs to
send to the server?), or the list of CAs is too long to fit in 2^16
bytes, which might fail on the packet close if not while extending
the subpacket.

It'd be useful to single-step through that function with gdb and
what happened.  How long is the CA list?  Which condition caused
the error?

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Viktor Dukhovni
In reply to this post by OpenSSL - User mailing list
On Mon, Mar 23, 2020 at 05:27:55PM -0700, Benjamin Kaduk via openssl-users wrote:

> > I *think* possibly also the precise nature of that client cert
> > matters; a testcase I set up away from my production
> > system failed to induce the error.  The client cert
> > is loaded using SSL_CTX_use_certificate_chain_file();
> > the file contains a private-key and a 3-element chain
> > with a Lets Encrypt cert (leaf, signer, CA-root).
> > The CA is sha1/rsa, the other two are sha256/rsa.
>
> Try omitting the (sha1) CA from the file?

That's not plausibly related to a failure to construct
the list of CA distinguished names.  The signatures
are not looked at by the function reporting the error.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

hamedsalini

در تاریخ سه‌شنبه ۲۴ مارس ۲۰۲۰،‏ ۵:۲۰ Viktor Dukhovni <[hidden email]> نوشت:
On Mon, Mar 23, 2020 at 05:27:55PM -0700, Benjamin Kaduk via openssl-users wrote:

> > I *think* possibly also the precise nature of that client cert
> > matters; a testcase I set up away from my production
> > system failed to induce the error.  The client cert
> > is loaded using SSL_CTX_use_certificate_chain_file();
> > the file contains a private-key and a 3-element chain
> > with a Lets Encrypt cert (leaf, signer, CA-root).
> > The CA is sha1/rsa, the other two are sha256/rsa.
>
> Try omitting the (sha1) CA from the file?

That's not plausibly related to a failure to construct
the list of CA distinguished names.  The signatures
are not looked at by the function reporting the error.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Jeremy Harris
In reply to this post by Viktor Dukhovni
On 24/03/2020 20:25, Viktor Dukhovni wrote:
>>> I'm guessing it is not the first.  The second would an issue with a
>>> particular issuer on the CA list (does Exim configure a list of CAs to
>>> send to the server?),
>>
>> I don't think so

Looks like I'm wrong, from the behaviour.

It's the second of the possible places, and "i" is 129.
It appears to be failing the   WPACKET_sub_allocate_bytes_u16()
call.  %rsi before the call, which I think should be
the "namelen" arg, has value 172.
--
Cheers,
  Jeremy
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Viktor Dukhovni
On Thu, Mar 26, 2020 at 12:40:08AM +0000, Jeremy Harris wrote:

> Looks like I'm wrong, from the behaviour.
>
> It's the second of the possible places, and "i" is 129.
> It appears to be failing the   WPACKET_sub_allocate_bytes_u16()
> call.  %rsi before the call, which I think should be
> the "namelen" arg, has value 172.

Right, you're running out of space by trying to send too many
CA names.  It is better to have this fail, so you can figure
what is trying to dump your entire trusted CA list (of names)
to the server, than to actually have that silently "work".

Now you need to find out why that's happening.  Perhaps your
"openssl.cnf" (Linux distro mistake?) causes the damage
for all applications even without explicit code to that
end in Exim?  Or you're calling something to make it happen.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Jeremy Harris
On 26/03/2020 00:58, Viktor Dukhovni wrote:

> On Thu, Mar 26, 2020 at 12:40:08AM +0000, Jeremy Harris wrote:
>
>> Looks like I'm wrong, from the behaviour.
>>
>> It's the second of the possible places, and "i" is 129.
>> It appears to be failing the   WPACKET_sub_allocate_bytes_u16()
>> call.  %rsi before the call, which I think should be
>> the "namelen" arg, has value 172.
>
> Right, you're running out of space by trying to send too many
> CA names.  It is better to have this fail, so you can figure
> what is trying to dump your entire trusted CA list (of names)
> to the server, than to actually have that silently "work".

It's /etc/pki/tls/certs/ca-bundle.crt (on Centos 8) and
it's being loaded using SSL_CTX_set_client_CA_list()
into the client, supposedly to be the acceptable CAs
for the cert sent by the server.

The load call doesn't have a return value to check. The
list presented had 133 entries.

>  Perhaps your
> "openssl.cnf" (Linux distro mistake?) causes the damage
> for all applications even without explicit code to that
> end in Exim?  Or you're calling something to make it happen.

The Exim config is saying to use that file, so I do have a
workaround point there.  However, the question is: what is [1] the
proper fix?  We want to accept "the usual, pretty big, set of
well-known CAs".  Is that an incorrect way of telling OpenSSL
that list?  And how come it only fails under resumption and
when a client-cert is also used?

1]  We also need to code to be compatible across a wide range
of OpenSSL versions.  Minimising #ifdeffery is a factor.
--
Cheers,
  Jeremy
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Viktor Dukhovni
On Fri, Mar 27, 2020 at 08:20:55PM +0000, Jeremy Harris wrote:

> > Right, you're running out of space by trying to send too many
> > CA names.  It is better to have this fail, so you can figure
> > what is trying to dump your entire trusted CA list (of names)
> > to the server, than to actually have that silently "work".
>
> It's /etc/pki/tls/certs/ca-bundle.crt (on Centos 8) and
> it's being loaded using SSL_CTX_set_client_CA_list()
> into the client, supposedly to be the acceptable CAs
> for the cert sent by the server.

That function should only affect the server -> client direction.
Briefly, in OpenSSL 1.1.1 it affected both the client and server
directions, but this was fixed in OpenSSL 1.1.1a.

If the distro started with 1.1.1 and only backported security fixes, you
could be running an OpenSSL version with the unintentional bidirectional
setting.

Another possibility is that your system-wide openssl.cnf file has a
"RequestCAFile" or "ClientCAFile" setting.  Both unfortunately set not
only the server->client direction, but also the client->server
direction.  This may be a bug (oversight), the second of these
should not touch the bidirectional list.

  "ssl/ssl_conf.c" line 500:

    static int cmd_RequestCAFile(SSL_CONF_CTX *cctx, const char *value)
    {
        if (cctx->canames == NULL)
            cctx->canames = sk_X509_NAME_new_null();
        if (cctx->canames == NULL)
            return 0;
        return SSL_add_file_cert_subjects_to_stack(cctx->canames, value);
    }

    static int cmd_ClientCAFile(SSL_CONF_CTX *cctx, const char *value)
    {
        return cmd_RequestCAFile(cctx, value);
    }

  "ssl/ssl_conf.c" line 883:

    int SSL_CONF_CTX_finish(SSL_CONF_CTX *cctx)
    {
        ...

        if (cctx->canames) {
            if (cctx->ssl)
                SSL_set0_CA_list(cctx->ssl, cctx->canames);
            else if (cctx->ctx)
                SSL_CTX_set0_CA_list(cctx->ctx, cctx->canames);
            else
                sk_X509_NAME_pop_free(cctx->canames, X509_NAME_free);
            cctx->canames = NULL;
        }
        return 1;
    }

Can you confirm whether any such directive is present in your
system-wide openssl.cnf (possibly via .include)?

> >  Perhaps your "openssl.cnf" (Linux distro mistake?) causes the
> >  damage for all applications even without explicit code to that end
> >  in Exim?  Or you're calling something to make it happen.
>
> The Exim config is saying to use that file, so I do have a workaround
> point there.

It is actually not easy to avoid using the system-wide config.
It is loaded by default during library initialization, and
you need custom initialization calls to avoid loading it.

> However, the question is: what is [1] the proper fix?  We want to
> accept "the usual, pretty big, set of well-known CAs".  Is that an
> incorrect way of telling OpenSSL that list?

It makes some sense to send a CA list to clients, but ideally one that
is not too large.  In Postfix I recommend a small or no CAfile (from
which the CAlist is initialized), and the bulk of the trust store
configured via a CApath (which is not used to send CA names to clients).

Most clients (real MTAs, not random Java apps) handle an empty CA list
just fine, and use the (at most) one client cert they have.

> And how come it only fails under resumption and when a client-cert is
> also used?

Here, I am not sure, but perhaps this is because the extension is only
valid with TLS 1.3, and on resumption we already know the server
protocol version.  On initial handshakes the protocol might be
TLS 1.2, and perhaps as a result the list is not sent???

> 1]  We also need to code to be compatible across a wide range
> of OpenSSL versions.  Minimising #ifdeffery is a factor.

Understood, I'm in the same boat with Postfix.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Jeremy Harris
On 27/03/2020 21:07, Viktor Dukhovni wrote:
> That function should only affect the server -> client direction.
> Briefly, in OpenSSL 1.1.1 it affected both the client and server
> directions, but this was fixed in OpenSSL 1.1.1a.

If Centos is following the same pattern in 8 as they did in 7,
they do list the letter when there is one; I have a 7 system
claiming "1.0.2k-fips".  So:

> If the distro started with 1.1.1 and only backported security fixes, you
> could be running an OpenSSL version with the unintentional bidirectional
> setting.

.. either this, or even an unpatched basic 1.1.1 .

A simple code addition to avoid that call in the client case sounds
in order.  Would the above likely explain the error I'm getting?


> Another possibility is that your system-wide openssl.cnf file has a
> "RequestCAFile" or "ClientCAFile" setting.

Neither appears to be present in /etc/pki/tls/openssl.cnf
--
Cheers,
  Jeremy
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Viktor Dukhovni
On Fri, Mar 27, 2020 at 09:25:28PM +0000, Jeremy Harris wrote:

> > If the distro started with 1.1.1 and only backported security fixes, you
> > could be running an OpenSSL version with the unintentional bidirectional
> > setting.
>
> .. either this, or even an unpatched basic 1.1.1 .
>
> A simple code addition to avoid that call in the client case sounds
> in order.  Would the above likely explain the error I'm getting?

You could explicitly set the client CA list to an empty stack,
as a final step in initializing the SSL_CTX:

    SSL_CTX_set0_CA_list(ctx, NULL);

> > Another possibility is that your system-wide openssl.cnf file has a
> > "RequestCAFile" or "ClientCAFile" setting.
>
> Neither appears to be present in /etc/pki/tls/openssl.cnf

And neither has any ".include" directives?

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Jeremy Harris
On 27/03/2020 21:52, Viktor Dukhovni wrote:

> On Fri, Mar 27, 2020 at 09:25:28PM +0000, Jeremy Harris wrote:
>
>>> If the distro started with 1.1.1 and only backported security fixes, you
>>> could be running an OpenSSL version with the unintentional bidirectional
>>> setting.
>>
>> .. either this, or even an unpatched basic 1.1.1 .
>>
>> A simple code addition to avoid that call in the client case sounds
>> in order.

Testing, it appears to work - I get resumption and not that error.
And the Exim testsuite shows no regressions, at least on my laptop
(which is Fedora 31, with 1.1.1d).

>>> Another possibility is that your system-wide openssl.cnf file has a
>>> "RequestCAFile" or "ClientCAFile" setting.
>>
>> Neither appears to be present in /etc/pki/tls/openssl.cnf
>
> And neither has any ".include" directives?

One, but that file doesn't have either string, either.
--
Cheers,
  Jeremy
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Viktor Dukhovni
On Fri, Mar 27, 2020 at 10:10:16PM +0000, Jeremy Harris wrote:

> >> A simple code addition to avoid that call in the client case sounds
> >> in order.
>
> Testing, it appears to work - I get resumption and not that error.
> And the Exim testsuite shows no regressions, at least on my laptop
> (which is Fedora 31, with 1.1.1d).

On a Fedora 31 system I also don't see those directives in the system
openssl.cnf or includes.  Mind you, closer inspection of the code
suggests that in the config file also "RequestCAPath" and "ClientCAPath"
would result in setting the bidirectional CA list.  But I don't find
those either.


> >>> Another possibility is that your system-wide openssl.cnf file has a
> >>> "RequestCAFile" or "ClientCAFile" setting.
> >>
> >> Neither appears to be present in /etc/pki/tls/openssl.cnf
> >
> > And neither has any ".include" directives?

So my best guess is that you were testing with approximately a stock
1.1.1 that predates 1.1.1a, modulo security fixes.  Otherwise, it
is unclear how the client CA list (server -> client) ended up being
sent from client -> server.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Dan Fulger
In reply to this post by Jeremy Harris
Indeed, CentOS 8.0 has OpenSSL 1.1.1 with very few updates.
 
But CentOS 8.1 was released in January, with OpenSSL 1.1.1c.
 
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Jeremy Harris
On 30/03/2020 08:41, Dan Fulger wrote:
> Indeed, CentOS 8.0 has OpenSSL 1.1.1 with very few updates.
> But CentOS 8.1 was released in January, with OpenSSL 1.1.1c.

Fortunately, with Viktor's help, the application fix is a
one-liner and is compatible across versions.
--
Cheers,
  Jeremy
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Viktor Dukhovni
On Mon, Mar 30, 2020 at 09:37:51AM +0100, Jeremy Harris wrote:

> On 30/03/2020 08:41, Dan Fulger wrote:
> > Indeed, CentOS 8.0 has OpenSSL 1.1.1 with very few updates.
> > But CentOS 8.1 was released in January, with OpenSSL 1.1.1c.
>
> Fortunately, with Viktor's help, the application fix is a
> one-liner and is compatible across versions.

I am glad you have a work-around, but remain puzzled.  On a FreeBSD 12.1
system with OpenSSL 1.1.1d, I just built a version of "posttls-finger"
patched (hastily, with inadequate error checks) to also load a client CA
list into the client->server SSL context:

--- a/src/tls/tls_client.c
+++ b/src/tls/tls_client.c
@@ -432,6 +432,18 @@ TLS_APPL_STATE *tls_client_init(const TLS_CLIENT_INIT_PROPS *props)
  SSL_CTX_free(client_ctx); /* 200411 */
  return (0);
     }
+    if (props->CAfile) {
+        STACK_OF(X509_NAME) *calist = SSL_load_client_CA_file(props->CAfile);
+
+        SSL_CTX_set_client_CA_list(client_ctx, calist);
+        msg_info("loaded %d CA names", sk_X509_NAME_num(calist));
+    }
 
     /*
      * We do not need a client certificate, so the certificates are only

When I run this, and resume a TLS 1.3 session, it logs that 150 CA names
have been loaded, but none are sent in the resumption client hello,
which remains modestly sized:

    posttls-finger: loaded 150 CA names
    posttls-finger: SSL_connect:before SSL initialization
    posttls-finger: write to 80127A100 [80136B000] (517 bytes => 517 (0x205))
    posttls-finger: SSL_connect:SSLv3/TLS write client hello
    posttls-finger: SSL_connect:SSLv3/TLS write client hello
    posttls-finger: SSL_connect:SSLv3/TLS read server hello
    posttls-finger: SSL_connect:TLSv1.3 read encrypted extensions
    posttls-finger: SSL_connect:SSLv3/TLS read server certificate
    posttls-finger: SSL_connect:TLSv1.3 read server certificate verify
    posttls-finger: SSL_connect:SSLv3/TLS read finished
    posttls-finger: SSL_connect:SSLv3/TLS write change cipher spec
    posttls-finger: write to 80127A100 [80136B000] (80 bytes => 80 (0x50))
    posttls-finger: SSL_connect:SSLv3/TLS write finished
    posttls-finger: SSL_connect:SSL negotiation finished successfully
    posttls-finger: SSL_connect:SSL negotiation finished successfully
    posttls-finger: save session ... to memory cache
    posttls-finger: SSL_connect:SSLv3/TLS read server session ticket

    posttls-finger: reloaded session ... from memory cache
    posttls-finger: SSL_connect:before SSL initialization
    posttls-finger: write to 80127A100 [80136B000] (638 bytes => 638 (0x27E))
    posttls-finger: SSL_connect:SSLv3/TLS write client hello
    posttls-finger: SSL_connect:SSLv3/TLS write client hello
    posttls-finger: SSL_connect:SSLv3/TLS read server hello
    posttls-finger: SSL_connect:TLSv1.3 read encrypted extensions
    posttls-finger: SSL_connect:SSLv3/TLS read finished
    posttls-finger: SSL_connect:SSLv3/TLS write change cipher spec
    posttls-finger: write to 80127A100 [80136B000] (80 bytes => 80 (0x50))
    posttls-finger: SSL_connect:SSLv3/TLS write finished
    posttls-finger: Untrusted TLS connection established to ... TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)

As expected, CA names loaded via SSL_CTX_set_client_CA_list() are not
sent in the client->server direction, either initially, or on
resumption.

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: resumption problem

Viktor Dukhovni


> On Mar 30, 2020, at 6:12 AM, Jeremy Harris <[hidden email]> wrote:
>
> On 30/03/2020 10:12, Viktor Dukhovni wrote:
>> On Mon, Mar 30, 2020 at 09:37:51AM +0100, Jeremy Harris wrote:
>>
>>> On 30/03/2020 08:41, Dan Fulger wrote:
>>>> Indeed, CentOS 8.0 has OpenSSL 1.1.1 with very few updates.
>>>> But CentOS 8.1 was released in January, with OpenSSL 1.1.1c.
>>>
>>> Fortunately, with Viktor's help, the application fix is a
>>> one-liner and is compatible across versions.
>>
>> I am glad you have a work-around, but remain puzzled.  On a FreeBSD 12.1
>> system with OpenSSL 1.1.1d
> 1.1.1d != 1.1.1

Yes, if it was truly 1.1.1 without the 1.1.1a patches, then your symptoms
are expected.  The message from Dan Fulger suggested that perhaps it was
1.1.1c, but maybe that's a different system than the one on which you
observed the problem.

Likely you're all set.  Good luck.

--
        Viktor.