Server application hangs on SS_read, even when client disconnects

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Server application hangs on SS_read, even when client disconnects

Brice André-2
Hello,

I have developed a client-server application with openssl and I have a recurrent bug where, sometimes, server instance seems to be definitively stuck in SSL_read call.

I have put more details of the problem here below, but it seems that in some rare execution cases, the server performs a SSL_read, the client disconnects in the meantime, and the server never detects the disconnection and remains stuck in the SSL_read operation.

My server runs on a Debian 6.3, and my version of openssl is 1.1.0l.

Here is an extract of the code that manages the SSL connexion at server side :

   ctx = SSL_CTX_new(SSLv23_server_method());

   BIO* bio = BIO_new_file("dhkey.pem", "r");
   if (bio == NULL) ...
   DH* ret = PEM_read_bio_DHparams(bio, NULL, NULL, NULL);
   BIO_free(bio);
   if (SSL_CTX_set_tmp_dh(ctx, ret) < 0) ...

   SSL_CTX_set_default_passwd_cb_userdata(ctx, (void*)key);
   if (SSL_CTX_use_PrivateKey_file(ctx, "server.key", SSL_FILETYPE_PEM) <= 0) ...
   if (SSL_CTX_use_certificate_file(ctx, "server.crt", SSL_FILETYPE_PEM) <= 0) ...
   if (SSL_CTX_check_private_key(ctx) == 0) ...
   SSL_CTX_set_cipher_list(ctx, "ALL");

   ssl_in = SSL_new(ctx);
   BIO* sslclient_in = BIO_new_socket(in_sock, BIO_NOCLOSE);
   SSL_set_bio(ssl_in, sslclient_in, sslclient_in);
   int r_in = SSL_accept(ssl_in);
   if (r_in != 1) ...
   
   ...
   
   /* Place where program hangs : */
   int read = SSL_read(ssl_in, &(((char*)ptr)[nb_read]), size-nb_read);
   
Here is the full stack-trace where the program hangs :

#0  0x00007f836575d210 in __read_nocancel () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f8365c8ccec in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#2  0x00007f8365c8772b in BIO_read () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#3  0x00007f83659879a2 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#4  0x00007f836598b70d in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#5  0x00007f8365989113 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#6  0x00007f836598eff6 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#7  0x00007f8365998dc9 in SSL_read () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#8  0x000055b7b3e98289 in Socket::SslRead (this=0x7ffdc6131900, size=4, ptr=0x7ffdc613066c)
    at ../../Utilities/Database/Sync/server/Communication/Socket.cpp:80

Here is the result of "netstat -natp | grep <pid of hanging process>" :

tcp       32      0 5.196.111.132:5412      109.133.193.70:51822    CLOSE_WAIT  19218/./MabeeServer
tcp       32      0 5.196.111.132:5412      109.133.193.70:51696    CLOSE_WAIT  19218/./MabeeServer
tcp       32      0 5.196.111.132:5412      109.133.193.70:51658    CLOSE_WAIT  19218/./MabeeServer
tcp        0      0 5.196.111.132:5413      85.27.92.8:25856        ESTABLISHED 19218/./MabeeServer
tcp       32      0 5.196.111.132:5412      109.133.193.70:51818    CLOSE_WAIT  19218/./MabeeServer
tcp       32      0 5.196.111.132:5412      109.133.193.70:51740    CLOSE_WAIT  19218/./MabeeServer
tcp        0      0 5.196.111.132:5412      85.27.92.8:26305        ESTABLISHED 19218/./MabeeServer
tcp6       0      0 ::1:36448               ::1:5432                ESTABLISHED 19218/./MabeeServer

From this log, I can see that I have two established connections with remote client machine on IP 109.133.193.70. Note that it's normal to have two connexions because my client-server protocol relies on two distinct TCP connexions.

From this, I logged the result of a "tcpdump -i any -nn host 85.27.92.8" during two days (and during those two days, my server instance remained stuck in SSL_read...). On this log, I see no packet exchange on ports 85.27.92.8:25856 or 85.27.92.8:26305. I see some burst of packets exchanged on other client TCP ports, but probably due to the client that performs other requests to the server (and thus, the server that is forking new instances with connections on other client ports).

This let me think that the connexion on which the SSL_read is listening is definitively dead (no more TCP keepalive), and that, for a reason I do not understand, the SSL_read keeps blocked into it.

Note that the normal behavior of my application is : client connects, server daemon forks a new instance, communication remains a few seconds with forked server instance, client disconnects and the forked process finished.

Note also that normally, client performs a proper disconnection (SSL_shutdown, etc.). But I cannot guarantee it never interrupts on a more abrupt way (connection lost, client crash, etc.).

Any advice on what is going wrong ?

Many thanks,

Brice
Reply | Threaded
Open this post in threaded view
|

RE: Server application hangs on SS_read, even when client disconnects

Michael Wojcik
> From: openssl-users <[hidden email]> On Behalf Of Brice André
> Sent: Friday, 13 November, 2020 05:06

> ... it seems that in some rare execution cases, the server performs a SSL_read,
> the client disconnects in the meantime, and the server never detects the
> disconnection and remains stuck in the SSL_read operation.

...

> #0  0x00007f836575d210 in __read_nocancel () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007f8365c8ccec in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> #2  0x00007f8365c8772b in BIO_read () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1

So OpenSSL is in a blocking read of the socket descriptor.

> tcp        0      0 http://5.196.111.132:5413      http://85.27.92.8:25856        ESTABLISHED 19218/./MabeeServer
> tcp        0      0 http://5.196.111.132:5412      http://85.27.92.8:26305        ESTABLISHED 19218/./MabeeServer

> From this log, I can see that I have two established connections with remote
> client machine on IP 109.133.193.70. Note that it's normal to have two connexions
> because my client-server protocol relies on two distinct TCP connexions.

So the client has not, in fact, disconnected.

When a system closes one end of a TCP connection, the stack will send a TCP packet
with either the FIN or the RST flag set. (Which one you get depends on whether the
stack on the closing side was holding data for the conversation which the application
hadn't read.)

The sockets are still in ESTABLISHED state; therefore, no FIN or RST has been
received by the local stack.

There are various possibilities:

- The client system has not in fact closed its end of the conversation. Sometimes
this happens for reasons that aren't immediately apparent; for example, if the
client forked and allowed the descriptor for the conversation socket to be inherited
by the child, and the child still has it open.

- The client system shut down suddenly (crashed) and so couldn't send the FIN/RST.

- There was a failure in network connectivity between the two systems, and consequently
the FIN/RST couldn't be received by the local system.

- The connection is in a state where the peer can't send the FIN/RST, for example
because the local side's receive window is zero. That shouldn't be the case, since
OpenSSL is (apparently) blocked in a receive on the connection. but as I don't have
the complete picture I can't rule it out.

> This let me think that the connexion on which the SSL_read is listening is
> definitively dead (no more TCP keepalive)

"definitely dead" doesn't have any meaning in TCP. That's not one of the TCP states,
or part of the other TCP or IP metadata associated with the local port (which is
what matters).

Do you have keepalives enabled?

> and that, for a reason I do not understand, the SSL_read keeps blocked into it.

The reason is simple: The connection is still established, but there's no data to
receive. The question isn't why SSL_read is blocking; it's why you think the
connection is gone, but the stack thinks otherwise.

> Note that the normal behavior of my application is : client connects, server
> daemon forks a new instance,

Does the server parent process close its copy of the conversation socket?

--
Michael Wojcik
Reply | Threaded
Open this post in threaded view
|

Re: Server application hangs on SS_read, even when client disconnects

Brice André-2
Hello,

And many thanks for the answer.

"Does the server parent process close its copy of the conversation socket?" : I checked in my code, but it seems that no. Is it needed  ? May it explain my problem ?

" Do you have keepalives enabled?" To be honest, I did not know it was possible to not enable them. I checked with command "netstat -tnope" and it tells me that it is not enabled.

I suppose that, if for some reason, the communication with the client is lost (crash of client, loss of network, etc.) and keepalive is not enabled, this may fully explain my problem ?

If yes, do you have an idea of why keepalive is not enabled ? I thought that by default on linux it was ?

Many thanks,
Brice


Le ven. 13 nov. 2020 à 15:43, Michael Wojcik <[hidden email]> a écrit :
> From: openssl-users <[hidden email]> On Behalf Of Brice André
> Sent: Friday, 13 November, 2020 05:06

> ... it seems that in some rare execution cases, the server performs a SSL_read,
> the client disconnects in the meantime, and the server never detects the
> disconnection and remains stuck in the SSL_read operation.

...

> #0  0x00007f836575d210 in __read_nocancel () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007f8365c8ccec in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> #2  0x00007f8365c8772b in BIO_read () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1

So OpenSSL is in a blocking read of the socket descriptor.

> tcp        0      0 http://5.196.111.132:5413      http://85.27.92.8:25856        ESTABLISHED 19218/./MabeeServer
> tcp        0      0 http://5.196.111.132:5412      http://85.27.92.8:26305        ESTABLISHED 19218/./MabeeServer

> From this log, I can see that I have two established connections with remote
> client machine on IP 109.133.193.70. Note that it's normal to have two connexions
> because my client-server protocol relies on two distinct TCP connexions.

So the client has not, in fact, disconnected.

When a system closes one end of a TCP connection, the stack will send a TCP packet
with either the FIN or the RST flag set. (Which one you get depends on whether the
stack on the closing side was holding data for the conversation which the application
hadn't read.)

The sockets are still in ESTABLISHED state; therefore, no FIN or RST has been
received by the local stack.

There are various possibilities:

- The client system has not in fact closed its end of the conversation. Sometimes
this happens for reasons that aren't immediately apparent; for example, if the
client forked and allowed the descriptor for the conversation socket to be inherited
by the child, and the child still has it open.

- The client system shut down suddenly (crashed) and so couldn't send the FIN/RST.

- There was a failure in network connectivity between the two systems, and consequently
the FIN/RST couldn't be received by the local system.

- The connection is in a state where the peer can't send the FIN/RST, for example
because the local side's receive window is zero. That shouldn't be the case, since
OpenSSL is (apparently) blocked in a receive on the connection. but as I don't have
the complete picture I can't rule it out.

> This let me think that the connexion on which the SSL_read is listening is
> definitively dead (no more TCP keepalive)

"definitely dead" doesn't have any meaning in TCP. That's not one of the TCP states,
or part of the other TCP or IP metadata associated with the local port (which is
what matters).

Do you have keepalives enabled?

> and that, for a reason I do not understand, the SSL_read keeps blocked into it.

The reason is simple: The connection is still established, but there's no data to
receive. The question isn't why SSL_read is blocking; it's why you think the
connection is gone, but the stack thinks otherwise.

> Note that the normal behavior of my application is : client connects, server
> daemon forks a new instance,

Does the server parent process close its copy of the conversation socket?

--
Michael Wojcik
Reply | Threaded
Open this post in threaded view
|

RE: Server application hangs on SS_read, even when client disconnects

Michael Wojcik
> From: Brice André <[hidden email]>
> Sent: Friday, 13 November, 2020 09:13

> "Does the server parent process close its copy of the conversation socket?"
> I checked in my code, but it seems that no. Is it needed?

You'll want to do it, for a few reasons:

- You'll be leaking descriptors in the server, and eventually it will hit its limit.
- If the child process dies without cleanly closing its end of the conversation,
the parent will still have an open descriptor for the socket, so the network stack
won't terminate the TCP connection.
- A related problem: If the child just closes its socket without calling shutdown,
no FIN will be sent to the client system (because the parent still has its copy of
the socket open). The client system will have the connection in one of the termination
states (FIN_WAIT, maybe? I don't have my references handy) until it times out.
- A bug in the parent process might cause it to operate on the connected socket,
causing unexpected traffic on the connection.
- All such sockets will be inherited by future child processes, and one of them might
erroneously perform some operation on one of them. Obviously there could also be a
security issue with this, depending on what your application does.

Basically, when a descriptor is "handed off" to a child process by forking, you
generally want to close it in the parent, unless it's used for parent-child
communication. (There are some cases where the parent wants to keep it open for
some reason, but they're rare.)

On a similar note, if you exec a different program in the child process (I wasn't
sure from your description), it's a good idea for the parent to set the FD_CLOEXEC
option (with fcntl) on its listening socket and any other descriptors that shouldn't
be passed along to child processes. You could close these manually in the child
process between the fork and exec, but FD_CLOEXEC is often easier to maintain.

For some applications, you might just dup2 the socket over descriptor 0 or
descriptor 3, depending on whether the child needs access to stdio, and then close
everything higher.

Closing descriptors not needed by the child process is a good idea even if you
don't exec, since it can prevent various problems and vulnerabilities that result
from certain classes of bugs. It's a defensive measure.

The best source for this sort of recommendation, in my opinion, remains W. Richard
Stevens' /Advanced Programming in the UNIX Environment/. The book is old, and Linux
isn't UNIX, but I don't know of any better explanation of how and why to do things
in a UNIX-like OS.

And my favorite source of TCP/IP information is Stevens' /TCP/IP Illustrated/.

> May it explain my problem?

In this case, I don't offhand see how it does, but I may be overlooking something.

> I suppose that, if for some reason, the communication with the client is lost
> (crash of client, loss of network, etc.) and keepalive is not enabled, this may
> fully explain my problem ?

It would give you those symptoms, yes.

> If yes, do you have an idea of why keepalive is not enabled?

The Host Requirements RFC mandates that it be disabled by default. I think the
primary reasoning for that was to avoid re-establishing virtual circuits (e.g.
dial-up connections) for long-running connections that had long idle periods.

Linux may well have a kernel tunable or similar to enable TCP keepalive by
default, but it seems to be switched off on your system. You'd have to consult
the documentation for your distribution, I think.

By default (again per the Host Requirements RFC), it takes quite a long time for
TCP keepalive to detect a broken connection. It doesn't start probing until the
connection has been idle for 2 hours, and then you have to wait for the TCP
retransmit timer times the retransmit count to be exhausted - typically over 10
minutes. Again, some OSes let you change these defaults, and some let you change
them on an individual connection.

--
Michael Wojcik

Reply | Threaded
Open this post in threaded view
|

Re: Server application hangs on SS_read, even when client disconnects

Brice André-2
Hello Michael,

Thanks for all those information.

I corrected your suggested point (close parent process sockets). I also activated keepalive, with values adapted to my application.

I hope this will solve my issue, but as the problem may take several weeks to occur, I will not know immediately if this was the origin :-)

Many thanks for your help.

Regards,
Brice


Le ven. 13 nov. 2020 à 18:52, Michael Wojcik <[hidden email]> a écrit :
> From: Brice André <[hidden email]>
> Sent: Friday, 13 November, 2020 09:13

> "Does the server parent process close its copy of the conversation socket?"
> I checked in my code, but it seems that no. Is it needed?

You'll want to do it, for a few reasons:

- You'll be leaking descriptors in the server, and eventually it will hit its limit.
- If the child process dies without cleanly closing its end of the conversation,
the parent will still have an open descriptor for the socket, so the network stack
won't terminate the TCP connection.
- A related problem: If the child just closes its socket without calling shutdown,
no FIN will be sent to the client system (because the parent still has its copy of
the socket open). The client system will have the connection in one of the termination
states (FIN_WAIT, maybe? I don't have my references handy) until it times out.
- A bug in the parent process might cause it to operate on the connected socket,
causing unexpected traffic on the connection.
- All such sockets will be inherited by future child processes, and one of them might
erroneously perform some operation on one of them. Obviously there could also be a
security issue with this, depending on what your application does.

Basically, when a descriptor is "handed off" to a child process by forking, you
generally want to close it in the parent, unless it's used for parent-child
communication. (There are some cases where the parent wants to keep it open for
some reason, but they're rare.)

On a similar note, if you exec a different program in the child process (I wasn't
sure from your description), it's a good idea for the parent to set the FD_CLOEXEC
option (with fcntl) on its listening socket and any other descriptors that shouldn't
be passed along to child processes. You could close these manually in the child
process between the fork and exec, but FD_CLOEXEC is often easier to maintain.

For some applications, you might just dup2 the socket over descriptor 0 or
descriptor 3, depending on whether the child needs access to stdio, and then close
everything higher.

Closing descriptors not needed by the child process is a good idea even if you
don't exec, since it can prevent various problems and vulnerabilities that result
from certain classes of bugs. It's a defensive measure.

The best source for this sort of recommendation, in my opinion, remains W. Richard
Stevens' /Advanced Programming in the UNIX Environment/. The book is old, and Linux
isn't UNIX, but I don't know of any better explanation of how and why to do things
in a UNIX-like OS.

And my favorite source of TCP/IP information is Stevens' /TCP/IP Illustrated/.

> May it explain my problem?

In this case, I don't offhand see how it does, but I may be overlooking something.

> I suppose that, if for some reason, the communication with the client is lost
> (crash of client, loss of network, etc.) and keepalive is not enabled, this may
> fully explain my problem ?

It would give you those symptoms, yes.

> If yes, do you have an idea of why keepalive is not enabled?

The Host Requirements RFC mandates that it be disabled by default. I think the
primary reasoning for that was to avoid re-establishing virtual circuits (e.g.
dial-up connections) for long-running connections that had long idle periods.

Linux may well have a kernel tunable or similar to enable TCP keepalive by
default, but it seems to be switched off on your system. You'd have to consult
the documentation for your distribution, I think.

By default (again per the Host Requirements RFC), it takes quite a long time for
TCP keepalive to detect a broken connection. It doesn't start probing until the
connection has been idle for 2 hours, and then you have to wait for the TCP
retransmit timer times the retransmit count to be exhausted - typically over 10
minutes. Again, some OSes let you change these defaults, and some let you change
them on an individual connection.

--
Michael Wojcik

Reply | Threaded
Open this post in threaded view
|

Re: Server application hangs on SS_read, even when client disconnects

OpenSSL - User mailing list
In reply to this post by Brice André-2
(Top posting to match what Mr. André does):

TCP without keepalive will time out the connection a few minutes after
sending any data that doesn't get a response.

TCP without keepalive with no outstanding send (so only a blocking
recv) and nothing outstanding at the other end will probably hang
almost forever as there is nothing indicating that there is actual
data lost in transit.

On 2020-11-13 17:13, Brice André wrote:

> Hello,
>
> And many thanks for the answer.
>
> "Does the server parent process close its copy of the conversation
> socket?" : I checked in my code, but it seems that no. Is it needed  ?
> May it explain my problem ?
>
> " Do you have keepalives enabled?" To be honest, I did not know it was
> possible to not enable them. I checked with command "netstat -tnope"
> and it tells me that it is not enabled.
>
> I suppose that, if for some reason, the communication with the client
> is lost (crash of client, loss of network, etc.) and keepalive is not
> enabled, this may fully explain my problem ?
>
> If yes, do you have an idea of why keepalive is not enabled ? I
> thought that by default on linux it was ?
>
> Many thanks,
> Brice
>
>
> Le ven. 13 nov. 2020 à 15:43, Michael Wojcik
> <[hidden email] <mailto:[hidden email]>>
> a écrit :
>
> > From: openssl-users <[hidden email]
> <mailto:[hidden email]>> On Behalf Of Brice André
> > Sent: Friday, 13 November, 2020 05:06
>
> > ... it seems that in some rare execution cases, the server
> performs a SSL_read,
> > the client disconnects in the meantime, and the server never
> detects the
> > disconnection and remains stuck in the SSL_read operation.
>
> ...
>
> > #0  0x00007f836575d210 in __read_nocancel () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> > #1  0x00007f8365c8ccec in ?? () from
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > #2  0x00007f8365c8772b in BIO_read () from
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>
> So OpenSSL is in a blocking read of the socket descriptor.
>
> > tcp        0      0 http://5.196.111.132:5413
> <http://5.196.111.132:5413> http://85.27.92.8:25856
> <http://85.27.92.8:25856>       ESTABLISHED 19218/./MabeeServer
> > tcp        0      0 http://5.196.111.132:5412
> <http://5.196.111.132:5412> http://85.27.92.8:26305
> <http://85.27.92.8:26305>       ESTABLISHED 19218/./MabeeServer
>
> > From this log, I can see that I have two established connections
> with remote
> > client machine on IP 109.133.193.70. Note that it's normal to
> have two connexions
> > because my client-server protocol relies on two distinct TCP
> connexions.
>
> So the client has not, in fact, disconnected.
>
> When a system closes one end of a TCP connection, the stack will
> send a TCP packet
> with either the FIN or the RST flag set. (Which one you get
> depends on whether the
> stack on the closing side was holding data for the conversation
> which the application
> hadn't read.)
>
> The sockets are still in ESTABLISHED state; therefore, no FIN or
> RST has been
> received by the local stack.
>
> There are various possibilities:
>
> - The client system has not in fact closed its end of the
> conversation. Sometimes
> this happens for reasons that aren't immediately apparent; for
> example, if the
> client forked and allowed the descriptor for the conversation
> socket to be inherited
> by the child, and the child still has it open.
>
> - The client system shut down suddenly (crashed) and so couldn't
> send the FIN/RST.
>
> - There was a failure in network connectivity between the two
> systems, and consequently
> the FIN/RST couldn't be received by the local system.
>
> - The connection is in a state where the peer can't send the
> FIN/RST, for example
> because the local side's receive window is zero. That shouldn't be
> the case, since
> OpenSSL is (apparently) blocked in a receive on the connection.
> but as I don't have
> the complete picture I can't rule it out.
>
> > This let me think that the connexion on which the SSL_read is
> listening is
> > definitively dead (no more TCP keepalive)
>
> "definitely dead" doesn't have any meaning in TCP. That's not one
> of the TCP states,
> or part of the other TCP or IP metadata associated with the local
> port (which is
> what matters).
>
> Do you have keepalives enabled?
>
> > and that, for a reason I do not understand, the SSL_read keeps
> blocked into it.
>
> The reason is simple: The connection is still established, but
> there's no data to
> receive. The question isn't why SSL_read is blocking; it's why you
> think the
> connection is gone, but the stack thinks otherwise.
>
> > Note that the normal behavior of my application is : client
> connects, server
> > daemon forks a new instance,
>
> Does the server parent process close its copy of the conversation
> socket?
>
>


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Reply | Threaded
Open this post in threaded view
|

Re: Server application hangs on SS_read, even when client disconnects

Kyle Hamilton
In reply to this post by Michael Wojcik
There's another reason why you'll want to close your socket with
SSL_close(): SSL (and TLS) view a prematurely-closed stream as an
exceptional condition to be reported to the application. This is to
prevent truncation attacks against the data communication layer.
While your application may not need that level of protection, it helps
to keep the state of your application in lockstep with the state of
the TLS protocol.  If your application doesn't expect to send any more
data, SSL_close() sends another record across the TCP connection to
tell the remote side that it should not keep the descriptor open.

-Kyle H

On Fri, Nov 13, 2020 at 11:51 AM Michael Wojcik
<[hidden email]> wrote:

>
> > From: Brice André <[hidden email]>
> > Sent: Friday, 13 November, 2020 09:13
>
> > "Does the server parent process close its copy of the conversation socket?"
> > I checked in my code, but it seems that no. Is it needed?
>
> You'll want to do it, for a few reasons:
>
> - You'll be leaking descriptors in the server, and eventually it will hit its limit.
> - If the child process dies without cleanly closing its end of the conversation,
> the parent will still have an open descriptor for the socket, so the network stack
> won't terminate the TCP connection.
> - A related problem: If the child just closes its socket without calling shutdown,
> no FIN will be sent to the client system (because the parent still has its copy of
> the socket open). The client system will have the connection in one of the termination
> states (FIN_WAIT, maybe? I don't have my references handy) until it times out.
> - A bug in the parent process might cause it to operate on the connected socket,
> causing unexpected traffic on the connection.
> - All such sockets will be inherited by future child processes, and one of them might
> erroneously perform some operation on one of them. Obviously there could also be a
> security issue with this, depending on what your application does.
>
> Basically, when a descriptor is "handed off" to a child process by forking, you
> generally want to close it in the parent, unless it's used for parent-child
> communication. (There are some cases where the parent wants to keep it open for
> some reason, but they're rare.)
>
> On a similar note, if you exec a different program in the child process (I wasn't
> sure from your description), it's a good idea for the parent to set the FD_CLOEXEC
> option (with fcntl) on its listening socket and any other descriptors that shouldn't
> be passed along to child processes. You could close these manually in the child
> process between the fork and exec, but FD_CLOEXEC is often easier to maintain.
>
> For some applications, you might just dup2 the socket over descriptor 0 or
> descriptor 3, depending on whether the child needs access to stdio, and then close
> everything higher.
>
> Closing descriptors not needed by the child process is a good idea even if you
> don't exec, since it can prevent various problems and vulnerabilities that result
> from certain classes of bugs. It's a defensive measure.
>
> The best source for this sort of recommendation, in my opinion, remains W. Richard
> Stevens' /Advanced Programming in the UNIX Environment/. The book is old, and Linux
> isn't UNIX, but I don't know of any better explanation of how and why to do things
> in a UNIX-like OS.
>
> And my favorite source of TCP/IP information is Stevens' /TCP/IP Illustrated/.
>
> > May it explain my problem?
>
> In this case, I don't offhand see how it does, but I may be overlooking something.
>
> > I suppose that, if for some reason, the communication with the client is lost
> > (crash of client, loss of network, etc.) and keepalive is not enabled, this may
> > fully explain my problem ?
>
> It would give you those symptoms, yes.
>
> > If yes, do you have an idea of why keepalive is not enabled?
>
> The Host Requirements RFC mandates that it be disabled by default. I think the
> primary reasoning for that was to avoid re-establishing virtual circuits (e.g.
> dial-up connections) for long-running connections that had long idle periods.
>
> Linux may well have a kernel tunable or similar to enable TCP keepalive by
> default, but it seems to be switched off on your system. You'd have to consult
> the documentation for your distribution, I think.
>
> By default (again per the Host Requirements RFC), it takes quite a long time for
> TCP keepalive to detect a broken connection. It doesn't start probing until the
> connection has been idle for 2 hours, and then you have to wait for the TCP
> retransmit timer times the retransmit count to be exhausted - typically over 10
> minutes. Again, some OSes let you change these defaults, and some let you change
> them on an individual connection.
>
> --
> Michael Wojcik
>
Reply | Threaded
Open this post in threaded view
|

RE: Server application hangs on SS_read, even when client disconnects

Michael Wojcik
> From: Kyle Hamilton <[hidden email]>
> Sent: Tuesday, 17 November, 2020 02:37
> On Fri, Nov 13, 2020 at 11:51 AM Michael Wojcik
> <[hidden email]> wrote:
> >
> > > From: Brice André <[hidden email]>
> > > Sent: Friday, 13 November, 2020 09:13
> >
> > > "Does the server parent process close its copy of the conversation socket?"
> > > I checked in my code, but it seems that no. Is it needed?
> >
> > You'll want to do it, for a few reasons: ...
>
> There's another reason why you'll want to close your socket with
> SSL_close(): SSL (and TLS) view a prematurely-closed stream as an
> exceptional condition to be reported to the application. This is to
> prevent truncation attacks against the data communication layer.
> While your application may not need that level of protection, it helps
> to keep the state of your application in lockstep with the state of
> the TLS protocol.  If your application doesn't expect to send any more
> data, SSL_close() sends another record across the TCP connection to
> tell the remote side that it should not keep the descriptor open.

This is true, but not what we're talking about here. When the
application is done with the conversation, it should use SSL_close
to terminate the conversation.

Here, though, we're talking about the server parent process closing
its descriptor for the socket after forking the child process. At that
point the application is not done with the conversation, and calling
SSL_close in the server would be a mistake.

Now, if the server is unable to start a child process (e.g. fork fails
because the user's process limit has been reached), or if for whatever
other reason it decides to terminate the conversation without further
processing, SSL_close would be appropriate.

--
Michael Wojcik
Reply | Threaded
Open this post in threaded view
|

Re: Server application hangs on SS_read, even when client disconnects

Matt Caswell-2


On 17/11/2020 13:56, Michael Wojcik wrote:

>> From: Kyle Hamilton <[hidden email]>
>> Sent: Tuesday, 17 November, 2020 02:37
>> On Fri, Nov 13, 2020 at 11:51 AM Michael Wojcik
>> <[hidden email]> wrote:
>>>
>>>> From: Brice André <[hidden email]>
>>>> Sent: Friday, 13 November, 2020 09:13
>>>
>>>> "Does the server parent process close its copy of the conversation socket?"
>>>> I checked in my code, but it seems that no. Is it needed?
>>>
>>> You'll want to do it, for a few reasons: ...
>>
>> There's another reason why you'll want to close your socket with
>> SSL_close(): SSL (and TLS) view a prematurely-closed stream as an
>> exceptional condition to be reported to the application. This is to
>> prevent truncation attacks against the data communication layer.
>> While your application may not need that level of protection, it helps
>> to keep the state of your application in lockstep with the state of
>> the TLS protocol.  If your application doesn't expect to send any more
>> data, SSL_close() sends another record across the TCP connection to
>> tell the remote side that it should not keep the descriptor open.
>
> This is true, but not what we're talking about here. When the
> application is done with the conversation, it should use SSL_close
> to terminate the conversation.
>
> Here, though, we're talking about the server parent process closing
> its descriptor for the socket after forking the child process. At that
> point the application is not done with the conversation, and calling
> SSL_close in the server would be a mistake.
>
> Now, if the server is unable to start a child process (e.g. fork fails
> because the user's process limit has been reached), or if for whatever
> other reason it decides to terminate the conversation without further
> processing, SSL_close would be appropriate.

Just for clarity, there is no such function as SSL_close. I assume
SSL_shutdown is what people mean.

Matt