SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

John Unsworth-3

We are using OpenSSL 1.1.0h on Linux to send operations to LDAP servers. We use SSL_read() to receive the replies on a non-blocking socket. The vast majority of times SSL_read() returns >0, SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE as per the spec. However we are very occasionally seeing SSL_ERROR_SYSCALL with errno 11 (EAGAIN) which would seem to be the result of a platform socket read(() or write() when blocking would occur. Is this expected behaviour? We have changed our code to treat this as SSL_ERROR_WANT_READ or SSL_WANT_WRITE depending on the result of SSL_want_write(). Are we correct?

 

Regards,

John.

Reply | Threaded
Open this post in threaded view
|

Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

Michael Wojcik
> From: openssl-users <[hidden email]> on behalf of John Unsworth <[hidden email]>
> Sent: Monday, April 29, 2019 10:54

> We are using OpenSSL 1.1.0h on Linux to send operations to LDAP servers. We use SSL_read()
> to receive the replies on a non-blocking socket. The vast majority of times SSL_read()  returns >0,
> SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE as per the spec. However we are very
> occasionally seeing SSL_ERROR_SYSCALL with errno 11 (EAGAIN) which would seem to be the
> result of a platform socket read(() or write() when blocking would occur. Is this expected
> behaviour? We have changed our code to treat this as SSL_ERROR_WANT_READ or
> SSL_WANT_WRITE depending on the result of SSL_want_write(). Are we correct?
 
I haven't seen a reply to this, so I'll take a stab...

I haven't looked at the code, but my impression is that WANT_READ and WANT_WRITE are returned in two cases: when OpenSSL has received or sent a partial record and needs to complete it; or when the TLS state is such that OpenSSL needs to perform the associated operation and it hasn't been requested by the application - for example, if the application is trying to receive data but OpenSSL needs to send renegotiation information.

If you do a non-blocking receive at a record boundary (so you don't have an incomplete record) and OpenSSL doesn't currently need to send for TLS reasons, OpenSSL will see the EAGAIN (or EWOULDBLOCK, depending on platform). I think in this case it does just return SSL_ERROR_SYSCALL, because OpenSSL itself doesn't "want" to receive. If OpenSSL had already received a partial record, then you'd get WANT_READ.

I suspect you could always treat this as WANT_READ, which typically means using a mechanism such as select or poll to determine when the socket is readable, then trying the OpenSSL receive again. But looking at the return value of SSL_want_write() seems safe enough.

That's my understanding. Someone else may know better.

--
Michael Wojcik
Reply | Threaded
Open this post in threaded view
|

Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

Viktor Dukhovni
> On Apr 30, 2019, at 12:31 PM, Michael Wojcik <[hidden email]> wrote:
>
> I haven't seen a reply to this, so I'll take a stab...
>
> I haven't looked at the code, but my impression is that WANT_READ and WANT_WRITE are returned in two cases: when OpenSSL has received or sent a partial record and needs to complete it; or when the TLS state is such that OpenSSL needs to perform the associated operation and it hasn't been requested by the application - for example, if the application is trying to receive data but OpenSSL needs to send renegotiation information.
>
> If you do a non-blocking receive at a record boundary (so you don't have an incomplete record) and OpenSSL doesn't currently need to send for TLS reasons, OpenSSL will see the EAGAIN (or EWOULDBLOCK, depending on platform). I think in this case it does just return SSL_ERROR_SYSCALL, because OpenSSL itself doesn't "want" to receive. If OpenSSL had already received a partial record, then you'd get WANT_READ.

I think the above guess is not correct.  A cursory look at the
code suggests that even user-initiated reads normally return
SSL_ERROR_WANT_READ when the network bio signals a retriable
failure.

The OP has not provided much detail about the connections in
question are created.  Is the connection made by the
application, and SSL negotiated over an existing socket, or
is the connection established by OpenSSL over a "connect bio"?

Is the handshake explicit, or does the application just call
SSL_read(), with OpenSSL performing the handshake as needed?

In any case, I would not expect SSL_ERROR_SYSCALL under normal
conditions.  The documentation says:

       SSL_ERROR_SYSCALL
           Some non-recoverable, fatal I/O error occurred. The OpenSSL error
           queue may contain more information on the error. For socket I/O on
           Unix systems, consult errno for details. If this error occurs then
           no further I/O operations should be performed on the connection and
           SSL_shutdown() must not be called.

           This value can also be returned for other errors, check the error
           queue for details.

--
        Viktor.

Reply | Threaded
Open this post in threaded view
|

Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

Erik Forsberg-11
I can add some of my own observations to this below ...

>> I haven't looked at the code, but my impression is that WANT_READ and WANT_WRITE are returned in two cases: when OpenSSL has received or sent a partial record and needs to complete it; or when the TLS state is such that OpenSSL needs to perform the associated operation and it hasn't been requested by the application - for example, if the application is trying to receive data but OpenSSL needs to send renegotiation information.
>>
>> If you do a non-blocking receive at a record boundary (so you don't have an incomplete record) and OpenSSL doesn't currently need to send for TLS reasons, OpenSSL will see the EAGAIN (or EWOULDBLOCK, depending on platform). I think in this case it does just return SSL_ERROR_SYSCALL, because OpenSSL itself doesn't "want" to receive. If OpenSSL had already received a partial record, then you'd get WANT_READ.
>
>I think the above guess is not correct.  A cursory look at the
>code suggests that even user-initiated reads normally return
>SSL_ERROR_WANT_READ when the network bio signals a retriable
>failure.
>
>The OP has not provided much detail about the connections in
>question are created.  Is the connection made by the
>application, and SSL negotiated over an existing socket, or
>is the connection established by OpenSSL over a "connect bio"?
>
>Is the handshake explicit, or does the application just call
>SSL_read(), with OpenSSL performing the handshake as needed?
>

I occasionally (somewhat rarely) see the issue mentioned by the OP.
Ignoring the error, or mapping it and do what WANT_READ/WANT_WRITE
does effectively hides the issue and connection works fine. I predominantly
run on Solaris 11. In my case, I open the socket myself, set non-blocking
mode and associates with an SSL object using SS_set_fd().
The initial handshake is done explicitly.


Reply | Threaded
Open this post in threaded view
|

Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

Viktor Dukhovni
On Tue, Apr 30, 2019 at 03:23:23PM -0700, Erik Forsberg wrote:

> >Is the handshake explicit, or does the application just call
> >SSL_read(), with OpenSSL performing the handshake as needed?
>
> I occasionally (somewhat rarely) see the issue mentioned by the OP.
> Ignoring the error, or mapping it and do what WANT_READ/WANT_WRITE
> does effectively hides the issue and connection works fine. I predominantly
> run on Solaris 11. In my case, I open the socket myself, set non-blocking
> mode and associates with an SSL object using SS_set_fd().
> The initial handshake is done explicitly.

Recoverable errors should not result in SSL_ERROR_SYSCALL.  This
feels like a bug.  I'd like to hear from Matt Caswell on this one.
Perhaps someone should open an issue on Github...

--
        Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

Erik Forsberg-11

>-- Original Message --
>
>On Tue, Apr 30, 2019 at 03:23:23PM -0700, Erik Forsberg wrote:
>
>> >Is the handshake explicit, or does the application just call
>> >SSL_read(), with OpenSSL performing the handshake as needed?
>>
>> I occasionally (somewhat rarely) see the issue mentioned by the OP.
>> Ignoring the error, or mapping it and do what WANT_READ/WANT_WRITE
>> does effectively hides the issue and connection works fine. I predominantly
>> run on Solaris 11. In my case, I open the socket myself, set non-blocking
>> mode and associates with an SSL object using SS_set_fd().
>> The initial handshake is done explicitly.
>
>Recoverable errors should not result in SSL_ERROR_SYSCALL.  This
>feels like a bug.  I'd like to hear from Matt Caswell on this one.
>Perhaps someone should open an issue on Github...
>
I will scan my logs later this evening and see if this is still an issue.
Last time I remember seeing it was quote some long time ago (couple of years)


Reply | Threaded
Open this post in threaded view
|

Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

Erik Forsberg-11

>-- Original Message --
>
>
>>-- Original Message --
>>
>>On Tue, Apr 30, 2019 at 03:23:23PM -0700, Erik Forsberg wrote:
>>
>>> >Is the handshake explicit, or does the application just call
>>> >SSL_read(), with OpenSSL performing the handshake as needed?
>>>
>>> I occasionally (somewhat rarely) see the issue mentioned by the OP.
>>> Ignoring the error, or mapping it and do what WANT_READ/WANT_WRITE
>>> does effectively hides the issue and connection works fine. I predominantly
>>> run on Solaris 11. In my case, I open the socket myself, set non-blocking
>>> mode and associates with an SSL object using SS_set_fd().
>>> The initial handshake is done explicitly.
>>
>>Recoverable errors should not result in SSL_ERROR_SYSCALL.  This
>>feels like a bug.  I'd like to hear from Matt Caswell on this one.
>>Perhaps someone should open an issue on Github...
>>
>I will scan my logs later this evening and see if this is still an issue.
>Last time I remember seeing it was quote some long time ago (couple of years)
>
>

ok, I checked my logs (3+ years worth of them) and I have not seen this error in that timeframe.
so it must have been a much older OpenSSL version I used way back when I remember doing this workaround.
Doesnt seem to be needed for me anymore.


Reply | Threaded
Open this post in threaded view
|

Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

Matt Caswell-2
In reply to this post by Viktor Dukhovni


On 30/04/2019 23:37, Viktor Dukhovni wrote:

> On Tue, Apr 30, 2019 at 03:23:23PM -0700, Erik Forsberg wrote:
>
>>> Is the handshake explicit, or does the application just call
>>> SSL_read(), with OpenSSL performing the handshake as needed?
>>
>> I occasionally (somewhat rarely) see the issue mentioned by the OP.
>> Ignoring the error, or mapping it and do what WANT_READ/WANT_WRITE
>> does effectively hides the issue and connection works fine. I predominantly
>> run on Solaris 11. In my case, I open the socket myself, set non-blocking
>> mode and associates with an SSL object using SS_set_fd().
>> The initial handshake is done explicitly.
>
> Recoverable errors should not result in SSL_ERROR_SYSCALL.  This
> feels like a bug.  I'd like to hear from Matt Caswell on this one.
> Perhaps someone should open an issue on Github...
>

SSL_ERROR_SYSCALL should not be raised as result of a recoverable error. This
should always be considered fatal. If you are getting this but errno says EAGAIN
then a number of possibilities spring to mind:

1) If a fatal error has occurred SSL_get_error() checks to see if there is an
error on the OpenSSL error queue. If there is it returns SSL_ERROR_SSL (unless
the error type is ERR_LIB_SYS). If there is no error at all, but libssl doesn't
think the error is recoverable then it will return SSL_ERROR_SYSCALL by default.
It is possible that libssl has encountered some non-syscall related error but
neglected to push an error onto the error queue. Thus the return value
incorrectly indicates SSL_ERROR_SYSCALL when it should have been SSL_ERROR_SSL.
This would be an OpenSSL bug - but quite tricky to find since we'd have to
locate the spot where no error is being pushed...but because there is no error
we don't have a lot to go on!

2) A second possibility is that it really was a syscall that failed but
something (either in libssl or possibly in application code) made some
subsequent syscall that changed errno in the meantime. If that "something" was
in libssl then that's probably also a libssl bug. (Also quite tricky to track down)

3) A third possibility is that it really is a retryable error but libssl failed
to properly set its state to note that. I think this is quite a lot less likely
than (1) or (2) but would also be a libssl bug.


So my guess is, except in the case where the application itself has accidentally
changed errno, this most likely indicates an openssl bug. The safest thing to do
in such circumstances is to treat this as a fatal error. It is very unwise to
retry a connection where the library has indicated a fatal error (e.g. see
CVE-2019-1559)

What OpenSSL version is this?

Matt
Reply | Threaded
Open this post in threaded view
|

RE: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

John Unsworth-3
Openssl 1.1.0h
We have implemented the workaround - if SSL_ERROR_SYSCALL and errno=EAGAIN then treat as WANT_READ/WANT_WRITE. This (seems to) work fine. No subsequent problems, everything continues correctly.

Regards,
John

-----Original Message-----
From: openssl-users <[hidden email]> On Behalf Of Matt Caswell
Sent: 01 May 2019 08:42
To: [hidden email]
Subject: Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

CAUTION: This email originated from outside of Synchronoss.


On 30/04/2019 23:37, Viktor Dukhovni wrote:

> On Tue, Apr 30, 2019 at 03:23:23PM -0700, Erik Forsberg wrote:
>
>>> Is the handshake explicit, or does the application just call
>>> SSL_read(), with OpenSSL performing the handshake as needed?
>>
>> I occasionally (somewhat rarely) see the issue mentioned by the OP.
>> Ignoring the error, or mapping it and do what WANT_READ/WANT_WRITE
>> does effectively hides the issue and connection works fine. I
>> predominantly run on Solaris 11. In my case, I open the socket
>> myself, set non-blocking mode and associates with an SSL object using SS_set_fd().
>> The initial handshake is done explicitly.
>
> Recoverable errors should not result in SSL_ERROR_SYSCALL.  This feels
> like a bug.  I'd like to hear from Matt Caswell on this one.
> Perhaps someone should open an issue on Github...
>

SSL_ERROR_SYSCALL should not be raised as result of a recoverable error. This should always be considered fatal. If you are getting this but errno says EAGAIN then a number of possibilities spring to mind:

1) If a fatal error has occurred SSL_get_error() checks to see if there is an error on the OpenSSL error queue. If there is it returns SSL_ERROR_SSL (unless the error type is ERR_LIB_SYS). If there is no error at all, but libssl doesn't think the error is recoverable then it will return SSL_ERROR_SYSCALL by default.
It is possible that libssl has encountered some non-syscall related error but neglected to push an error onto the error queue. Thus the return value incorrectly indicates SSL_ERROR_SYSCALL when it should have been SSL_ERROR_SSL.
This would be an OpenSSL bug - but quite tricky to find since we'd have to locate the spot where no error is being pushed...but because there is no error we don't have a lot to go on!

2) A second possibility is that it really was a syscall that failed but something (either in libssl or possibly in application code) made some subsequent syscall that changed errno in the meantime. If that "something" was in libssl then that's probably also a libssl bug. (Also quite tricky to track down)

3) A third possibility is that it really is a retryable error but libssl failed to properly set its state to note that. I think this is quite a lot less likely than (1) or (2) but would also be a libssl bug.


So my guess is, except in the case where the application itself has accidentally changed errno, this most likely indicates an openssl bug. The safest thing to do in such circumstances is to treat this as a fatal error. It is very unwise to retry a connection where the library has indicated a fatal error (e.g. see
CVE-2019-1559)

What OpenSSL version is this?

Matt