Problems with SSL_shutdown() and non blocking socket

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Problems with SSL_shutdown() and non blocking socket

Victor STINNER
Hi,

I'm trying to fix a bug in Python which is specific to OpenSSL 0.9.8m. The
problem is in a FTP test using a blocking socket (client) and a non blocking
socket (server). There are different tests, some tests use a timeout of 2
seconds on the client socket.

Pseudo-code of Python shutdown low-level function:

        err = SSL_shutdown(self->ssl);
        if (err == 0)
                err = SSL_shutdown(self->ssl);
        if (err < 0)
           <raise an exception>
        else
           <ok>

Using OpenSSL 0.9.8m, SSL_shutdown() returns sometimes -1 and SSL_get_error()
gives SSL_ERROR_WANT_READ. If I understood correctly, I have to read some
bytes from the sockets using SSL_read() to make OpenSSL happy. But how many
bytes? And can I read directly bytes or should I ensure that bytes are
available using select() (or anything else)?

I wrote a patch using a loop:

   while 1:
       try:
           self._sslobj.shutdown()
           break
       except SSLError as err:
           if err.args[0] == SSL_ERROR_WANT_READ:
               try:
                   self.read()
               except SSLError as read_err:
                   if read_err.args[0] == SSL_ERROR_ZERO_RETURN:
                       # connection closed: done
                       break
                   else:
                       # non blocking socket
                       raise err
               else:
                   continue
           else:
               raise
       except socket_error as err:
           if err.errno == EPIPE:
               # connection closed: done
               break
           else:
               raise

The code is written in Python, don't hesitate to ask me if you don't
understand something.

I don't understand why I'm getting SSL_ERROR_ZERO_RETURN or EPIPE errors.

---

I tried to call SSL_shutdown() in a loop, but if the first or the second call
returns the SSL_ERROR_WANT_READ error: the next call will always return the
same error (I tried to wait some seconds, but it doesn't change). Does it mean
that SSL_Shutdown() is not compatible between 0.9.8l and 0.9.8m for non
blocking sockets?

--
Victor Stinner
http://www.haypocalc.com/
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Problems with SSL_shutdown() and non blocking socket

Darryl Miles
Victor Stinner wrote:

> I'm trying to fix a bug in Python which is specific to OpenSSL 0.9.8m. The
> problem is in a FTP test using a blocking socket (client) and a non blocking
> socket (server). There are different tests, some tests use a timeout of 2
> seconds on the client socket.
>
> Pseudo-code of Python shutdown low-level function:
>
>         err = SSL_shutdown(self->ssl);
>         if (err == 0)
>                 err = SSL_shutdown(self->ssl);
>         if (err < 0)
>            <raise an exception>
>         else
>            <ok>
>
> Using OpenSSL 0.9.8m, SSL_shutdown() returns sometimes -1 and SSL_get_error()
> gives SSL_ERROR_WANT_READ. If I understood correctly, I have to read some
> bytes from the sockets using SSL_read() to make OpenSSL happy. But how many
> bytes? And can I read directly bytes or should I ensure that bytes are
> available using select() (or anything else)?

The change in behavior was introduced by a patch I submitted to fix a
long standing bug with SSL_shutdown() and the handling of non-blocking
sockets.  This did change the behavior since it never used to return
-1/WANT_WRITE or -1/WANT_READ at all but internally mitigate them back
to a zero value.

Please take a look at the following threads for background info on the bug:

http://marc.info/?t=119109061500001&r=1&w=2
http://marc.info/?t=119246586800001&r=1&w=2




Short answer
============

For all intents an purposes to convert older code not expecting to see
these two error returns you simply check for them explicitly and follow
the same execution path in your code as you do for a 0 return value from
SSL_shutdown().

For any other kind of -1 error return you follow the execution path as
you did before, for example your <raise an exception>.

However by not understanding what is going on fully you could get a hung
connection in situations where you shouldn't (unless you implement an
external timeout) if you don't SSL_read() to sink the application data
that may exist in the stream ahead of the inbound end-of-stream notify.

To port older code try the following C code snippet (this code is also
compatible with older versions of the OpenSSL library, so it doesn't
matter if you add this snippet to your application but use an older
version of OpenSSL at runtime) :

int rc = SSL_shutdown(ssl);
/***** BEGIN - INSERT THIS CODE AFTER EVERY SSL_shutdown() INVOCATION IN
YOUR CODE *****/
if(rc == -1) {
        int ssl_errno;
        SSL_get_error(ssl, ssl_errno);
        if(ssl_errno == SSL_ERROR_WANT_READ || ssl_errno == SSL_ERROR_WANT_WRITE)
                rc = 0;
}
/***** END - INSERT THIS CODE AFTER EVERY SSL_shutdown() INVOCATION IN
YOUR CODE *****/


With this the observable behavior that you got before should be consistent.

This doesn't necessarily mean your code is correctly going a graceful
SSL stream shutdown.  For that you need to understand your application
the context you use SSL etc...  hence the long answer.



Long answer
===========

* SSL_read() is responsible for reading application data (i.e. the data
that is encrypted for transport and then decrypted)

* Application payload data can only be received while the receiving half
of the SSL stream is still open.

* The other end voluntarily controls weather the receiving half of the
SSL stream is still open.  Or to put another way the sending side
controls when the end-of-stream notify is sent to securely close that
half of the stream.

As opposed to dropping the TCP network connection and not being
cryptographically secure stream shutdown; how do you know there wasn't
some other piece of data the sending side sent but some attacker doesn't
want you to receive it ?  You securely need to know when the
end-of-stream has been reached in order to be sure the stream has not
been tampered with in anyway.


So the first time you call SSL_shutdown() what you are in effect saying
is, "I have no more data to send to the other side, so I'm going to
write the end-of-stream notify packet into the SSL stream so the other
end knows this."

The next action is then for your side to finish processing all inbound
application data.  Now just because you decided you were not going to
send any more data to the other end, this doesn't mean the far end has
finished sending data to you.

So between zero and an infinite amount of application data may still
need to be received and removed from the stream via SSL_read().  It is
possible for whatever reason the far end isn't sending any data now and
is keeping the stream open but has nothing to send right at this moment.
  These situations are valid SSL protocol scenarios.

Eventually the far end should finish sending and will then send its own
end-of-stream notify packet for you to receive.

Only once your end receives this notify packet does the OpenSSL API
function SSL_shutdown() return a value of 1.


You must also consider that due to buffering an inbound end-of-stream
packet may not be processed while there is application data still in
transit and not yet removed from buffers with SSL_read().

Only once this zero to infinite amount of data has been removed will the
inbound end-of-stream marker be visible to the inbound SSL protocol
stack to process.  Until that has happens and as far as the local end is
concerned the receiving half of the stream is still open for business.



So you ask how much data the local side needs to read.  Well this data
might still be valid and relevant to the application so if the library
supports half-open streams it should continue to pass it up the line to
the application.

If however you are implementing a "close" kind of function to attempt to
close the stream completely in a neighborly fashion then you should sink
this data (i.e. use SSL_read to read whatever you see but immediately
discard it, so lets say use a stack based buffer to read it into and
read in somewhere between 512 bytes and 16 Kb at a time).

Everytime you see -1/WANT_READ you put your application to sleep and
wakeup when there is something to read (this is normal non-blocking
sleep/wakeup mechanics for OpensSL).  At each wakeup you can loop
SSL_read() multiple times if you wish (probably sane to apply a fixed
upper limit on the number of iterations per wakeup, lets say in the
order of 1 to 100) until no more data is found then you must call
SSL_shutdown() again to check for 1.

If you see the return value 1 then thats it, you can free the OpenSSL
objects and close the underlying BSD socket.



>
> I wrote a patch using a loop:
>
>    while 1:
>        try:
>            self._sslobj.shutdown()
>            break
>        except SSLError as err:
>            if err.args[0] == SSL_ERROR_WANT_READ:
>                try:
>                    self.read()
>                except SSLError as read_err:
>                    if read_err.args[0] == SSL_ERROR_ZERO_RETURN:
>                        # connection closed: done
>                        break
>                    else:
>                        # non blocking socket
>                        raise err
>                else:
>                    continue
>            else:
>                raise
>        except socket_error as err:
>            if err.errno == EPIPE:
>                # connection closed: done
>                break
>            else:
>                raise
>
> The code is written in Python, don't hesitate to ask me if you don't
> understand something.
>
> I don't understand why I'm getting SSL_ERROR_ZERO_RETURN or EPIPE errors.

ZERO_RETURN is the end-of-stream return.  It is not really an error it
is more an expected natural condition when there is never going to be
any more data to receive.  It means you have received the end-of-stream
notify packet from the far end.  It is possible for BOTH ends of an SSL
stream to decide to send their end-of-stream notify packets at the same
time.  There is no master/slave, as I put before the sending side is in
control of its half of the connection.

EPIPE is because the other end of the SSL connection hung up on you
abruptly (at a BSD socket level).  For example some application received
a "QUIT" command from you and the other end just closed the BSD socket.
(without doing any of the graceful SSL stream shutdown).  This is not a
"secure stream shutdown" but never the less achieves the same thing.
This is not a neighbourly thing to do but never the less some server
implementations do it.  So EPIPE is because OpenSSL called write()
system call to send more data to the BSD socket but the the other end
hung up at a TCP level.  The data that it maybe trying to send is your
end-of-stream notify packet.


> I tried to call SSL_shutdown() in a loop, but if the first or the second call
> returns the SSL_ERROR_WANT_READ error: the next call will always return the
> same error (I tried to wait some seconds, but it doesn't change). Does it mean
> that SSL_Shutdown() is not compatible between 0.9.8l and 0.9.8m for non
> blocking sockets?

I think my C code snippet should address this.


Regards,

Darryl
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Problems with SSL_shutdown() and non blocking socket

Claus Assmann
On Fri, Mar 12, 2010, Darryl Miles wrote:

> int rc = SSL_shutdown(ssl);
> /***** BEGIN - INSERT THIS CODE AFTER EVERY SSL_shutdown()
> INVOCATION IN YOUR CODE *****/
> if(rc == -1) {
> int ssl_errno;
> SSL_get_error(ssl, ssl_errno);
> if(ssl_errno == SSL_ERROR_WANT_READ || ssl_errno == SSL_ERROR_WANT_WRITE)
> rc = 0;
> }
> /***** END - INSERT THIS CODE AFTER EVERY SSL_shutdown() INVOCATION
> IN YOUR CODE *****/

> With this the observable behavior that you got before should be consistent.

It should probably be

        ssl_errno = SSL_get_error(ssl, rc);

but even then I get SSL_ERROR_SYSCALL and errno=EBADF using sendmail
8, while previously it didn't complain about errors.

So where is the error? In the application (if so: what is the correct
handling of the new code?) or in OpenSSL 0.9.8m?
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Problems with SSL_shutdown() and non blocking socket

Darryl Miles
Claus Assmann wrote:
> It should probably be
>
> ssl_errno = SSL_get_error(ssl, rc);

Ah yes you could be correct on that, please consult the SSL_get_error()
documentation for correct usage.


> but even then I get SSL_ERROR_SYSCALL and errno=EBADF using sendmail
> 8, while previously it didn't complain about errors.
>
> So where is the error? In the application (if so: what is the correct
> handling of the new code?) or in OpenSSL 0.9.8m?

That is a different issue, out of scope of my information.

EBADF means you have attempted to access a file descriptor after the
close() system call has been made upon it and currently that file
descriptor is invalidated in the kernel, so any kernel system call made
using that FD will return EBADF system call error.


Maybe you need to audit the ownership and responsibility for closing the
socket within the application.  It is likely your application called
close() on the file descriptor too early as OpenSSL had not finished
using it, or rather you had not finished using OpenSSL's "SSL *" handle
which has that file descriptor associated with it.

Maybe a simple series of printf() inserted in and around the use of the
following API calls would highlight the out-of-order problem, I'd advise
you also print out the %p or %d value of their first and/or return
values arguments as well:

  %p = SSL_new(...)
  SSL_set_fd(%p, %d)
  SSL_set_rfd(%p, %d)
  SSL_set_wfd(%p, %d)
  SSL_shutdown(%p, ...)
  SSL_free(%p, ...)
  close(%d)


Note there are other ways to skin a rabbit with the association of an FD
to an 'SSL *' for example see man page BIO_s_socket.


In short you should never see EBADF caused by OpenSSL itself.  If you
can write a simple test case where by a legitimate sequence of OpenSSL
API calls are made and cause the EBADF system call error to be returned;
and there was no outside interference with the socket/fd.  Then that
would be an OpenSSL bug.

However I do suspect out-of-order sequence of API calls is causing
outside interference on the state of the fd.


Darryl

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Problems with SSL_shutdown() and non blocking socket

Victor Duchovni
In reply to this post by Claus Assmann
On Mon, Mar 22, 2010 at 04:23:53PM -0700, Claus Assmann wrote:

> It should probably be
>
> ssl_errno = SSL_get_error(ssl, rc);
>
> but even then I get SSL_ERROR_SYSCALL and errno=EBADF using sendmail
> 8, while previously it didn't complain about errors.

For what it's worth, Postfix calls SSL_shutdown via a biopair state
machine that handles SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE.

    /*      The TLS layer to network interface is realized with a BIO pair:
    /*
    /*      Postfix SMTP layer   |   TLS layer
    /*                           |
    /*      smtp/smtpd           |
    /*       /\    ||            |
    /*       ||    \/            |
    /*      vstream read/write <===> TLS read/write/etc
    /*                           |     /\    ||
    /*                           |     ||    \/
    /*                           |   BIO pair (internal_bio)
    /*                           |   BIO pair (network_bio)
    /*      Postfix socket layer |     /\    ||
    /*                           |     ||    \/
    /*      socket read/write  <===> BIO read/write
    /*       /\    ||            |
    /*       ||    \/            |
    /*       network             |

This state machine is used for handshake, read and write I/O, so if/when
SSL_shutdown returns SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE, the
appropriate I/O ops are issued and the call is retried.

--
        Viktor.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Problems with SSL_shutdown() and non blocking socket

Dr. Stephen Henson
In reply to this post by Darryl Miles
On Tue, Mar 23, 2010, Darryl Miles wrote:

> Claus Assmann wrote:
>> It should probably be
>> ssl_errno = SSL_get_error(ssl, rc);
>
> Ah yes you could be correct on that, please consult the SSL_get_error()
> documentation for correct usage.
>
>
>> but even then I get SSL_ERROR_SYSCALL and errno=EBADF using sendmail
>> 8, while previously it didn't complain about errors.
>> So where is the error? In the application (if so: what is the correct
>> handling of the new code?) or in OpenSSL 0.9.8m?
>
> That is a different issue, out of scope of my information.
>
> EBADF means you have attempted to access a file descriptor after the
> close() system call has been made upon it and currently that file
> descriptor is invalidated in the kernel, so any kernel system call made
> using that FD will return EBADF system call error.
>
>
> Maybe you need to audit the ownership and responsibility for closing the
> socket within the application.  It is likely your application called
> close() on the file descriptor too early as OpenSSL had not finished using
> it, or rather you had not finished using OpenSSL's "SSL *" handle which has
> that file descriptor associated with it.
>
> Maybe a simple series of printf() inserted in and around the use of the
> following API calls would highlight the out-of-order problem, I'd advise
> you also print out the %p or %d value of their first and/or return values
> arguments as well:
>
>  %p = SSL_new(...)
>  SSL_set_fd(%p, %d)
>  SSL_set_rfd(%p, %d)
>  SSL_set_wfd(%p, %d)
>  SSL_shutdown(%p, ...)
>  SSL_free(%p, ...)
>  close(%d)
>
>
> Note there are other ways to skin a rabbit with the association of an FD to
> an 'SSL *' for example see man page BIO_s_socket.
>
>
> In short you should never see EBADF caused by OpenSSL itself.  If you can
> write a simple test case where by a legitimate sequence of OpenSSL API
> calls are made and cause the EBADF system call error to be returned; and
> there was no outside interference with the socket/fd.  Then that would be
> an OpenSSL bug.
>
> However I do suspect out-of-order sequence of API calls is causing outside
> interference on the state of the fd.
>
>

Another possible cause is multiple closes on the same file descriptor in a
multi threaded application. I saw this once myself where the SSL_free() closed
the file descriptor and the application itself closed it as well.

Often the second close will just use an invalid file descriptor and it
is harmless. In some circumstances the just closed descriptor gets
reallocated to another thread and you end up closing that prematurely.

This BTW to *ages* to track down when I first saw it.

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Problems with SSL_shutdown() and non blocking socket

Claus Assmann
On Tue, Mar 23, 2010, Dr. Stephen Henson wrote:

> Another possible cause is multiple closes on the same file descriptor in a
> multi threaded application. I saw this once myself where the SSL_free() closed
> the file descriptor and the application itself closed it as well.

The application is sendmail, it's not multi-threaded.

Seems like the new OpenSSL version uncovered an error in the shutdown
section of the MTA, I'll have to figure out how to fix that (without
rewriting the I/O section...)

Thanks for the feedback.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]