Find size of available data prior to ssl_read

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Find size of available data prior to ssl_read

counterpoint
Is there a way to obtain the amount of data available to be read?

I'm working with a system that operates in non-blocking mode using
epoll. When an EPOLLIN event is received the aim is to read the data.
For the non-SSL case, the amount of data can be obtained using ioctl
FIONREAD.  This is used to malloc a suitable sized buffer, followed by
read the data into the buffer.

How should the SSL version of our code work?  At present it is using the
sum of the number obtained from ioctl FIONREAD (which seems suspect when
SSL is in use and appears to be always too large) and the number from
ssl_pending (which seems to be zero).  The buffer then has to be truncated.

Can this approach work?  Could it be improved?  Or is there some
fundamental problem with operating in this way?


_______________________________________________
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Find size of available data prior to ssl_read

Michael Wojcik
> From: openssl-users [mailto:[hidden email]] On Behalf
> Of Martin Brampton
> Sent: Wednesday, December 16, 2015 13:23
>
> Is there a way to obtain the amount of data available to be read?
>
> I'm working with a system that operates in non-blocking mode using
> epoll. When an EPOLLIN event is received the aim is to read the data.
> For the non-SSL case, the amount of data can be obtained using ioctl
> FIONREAD.  This is used to malloc a suitable sized buffer, followed by
> read the data into the buffer.
>
> How should the SSL version of our code work?  At present it is using the
> sum of the number obtained from ioctl FIONREAD (which seems suspect
> when
> SSL is in use and appears to be always too large) and the number from
> ssl_pending (which seems to be zero).  The buffer then has to be truncated.

TCP is a stream service. It may deliver (to the application, which in this case means to OpenSSL) part of an SSL/TLS record, a single complete record, multiple records...

In some situations, you may reliably receive one TLS record at a time. You can't assume that will be the general case, particularly for application protocols that aren't simple alternating request-response pairs, or over long network paths, or with large blocks of application data, or if the recipient's stack is squeezed for resources.

FIONREAD will show the amount of data available from the stack. SSL_pending will show the amount of application data from complete records OpenSSL has already received and processed that the application has not read from OpenSSL yet. Per above, the former can represent less than one record to multiple records and possibly a partial one at the end. The latter may well not be zero, for example if the peer does multiple sends, or sends a block of data large enough that it gets chunked into multiple TLS records; then OpenSSL may read data from the stack and get multiple complete records, in which case SSL_pending will be > 0.

Note that nothing in the OpenSSL API gives you the number of bytes of a partial record that OpenSSL has received from the stack.

Even in the ideal case where exactly a single TLS record is sitting in the stack's buffers, FIONREAD will be larger than the size of the application data, because it's a TLS record, which has non-zero overhead. Specifically it has a header containing type, version, and length, and a footer with MAC and padding. The application only gets the application data, so it must get fewer than FIONREAD bytes.

Unless I'm forgetting something, since Open SSL will only deliver application data to the caller, and only from a complete record, then:

- If, when the application obtains the values from FIONREAD + ssl_pending (call this sum N), at least one complete TLS record has been received by the stack and not read by the application, then the amount of data the application gets from SSL_read will be strictly less than N
- Otherwise, in the case where the application gets those values too early, N will be less than the size of the record OpenSSL will eventually assemble, the amount of application data *may* be greater than, equal to (unlikely), or less than N. In this case there's simply no way for the application to know.

> Can this approach work?

No. OpenSSL doesn't know how much data is in a TLS record until it's processed it, and it doesn't know that until it has the complete record. (It could assume the record is valid before it has the complete record and look at the length field, but it doesn't know how long the padding is until it has the very last byte. And assuming the record is valid is a Bad Idea.)

Consequently, your application can't know that either.

Looking at the amount of data buffered by the stack is pointless, for the reasons discussed above.

>  Could it be improved?  Or is there some
> fundamental problem with operating in this way?

The fundamental problem is that you don't know how much data is going to be available from whatever complete records OpenSSL has received, and you don't even know that OpenSSL has received a complete record. The sender could be dribbling data to you one byte at a time. (This would be perverse, but what if some MITM is mucking about with your window announcements? Note those are at the TCP protocol level and so are not protected by TLS.)

You might want to look at something like this:

- Use non-blocking sockets. When you get a POLLIN event, try SSL_read with a small fixed buffer. If it returns SSL_WANT_READ, you don't have a complete record yet.
- Set the read-ahead flag with SSL_CTX_set_read_ahead (before creating your SSL objects), so that OpenSSL will grab all available data off the wire when you call SSL_read; that will reduce useless POLLIN events.
- When you have a successful SSL_read, use SSL_pending to get the number of application-data bytes remaining. Allocate a buffer of fixed-small-buffer-size + value-from-SSL_pending. Copy in the small fixed buffer, then SSL_read into the tail of the allocated buffer.
- If SSL_read returns SSL_WANT_READ, loop back to poll. The call to SSL_read (with read-ahead set in the SSL object via the context) should have grabbed the available data from the socket, so the socket will no longer be readable unless something else has arrived in the meantime.

Disclaimer: I haven't tried this.

--
Michael Wojcik
Technology Specialist, Micro Focus


_______________________________________________
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Find size of available data prior to ssl_read

Kurt Roeckx
In reply to this post by counterpoint
On Wed, Dec 16, 2015 at 06:23:25PM +0000, Martin Brampton wrote:

> Is there a way to obtain the amount of data available to be read?
>
> I'm working with a system that operates in non-blocking mode using epoll.
> When an EPOLLIN event is received the aim is to read the data. For the
> non-SSL case, the amount of data can be obtained using ioctl FIONREAD.  This
> is used to malloc a suitable sized buffer, followed by read the data into
> the buffer.
>
> How should the SSL version of our code work?  At present it is using the sum
> of the number obtained from ioctl FIONREAD (which seems suspect when SSL is
> in use and appears to be always too large) and the number from ssl_pending
> (which seems to be zero).  The buffer then has to be truncated.

Please note that SSL_pending() returns the data about already
processed / decrypted TLS records.  If the record is not complete
it's not processed and we won't tell how big it is.  This means
that it's possible for SSL_pending() to return 0 and that
receiving a single byte for the kernel might make the whole packet
available.

If you then go and only read 1 byte, calling SSL_pending() will
actually tell you how many other bytes are still has available for
you that already passed all the checks.

So the library can have unprocessed bytes from a TLS record in
it's internal buffer, but it's not going to tell you much about
it.

SSL / TLS also has overhead, the data might also not even be
application data.  Also, some ciphers work in blocks so there
might be added padding for those blocks.  So there are various
reasons why you might receive less data too.

If you always call SSL_read() on the boundaries of the records
you'll always get less data, but there is really no way for you to
see that.  It might be that in your application this is always
what happens, but I wouldn't rely on it.

If you don't call in on the boundaries there is little
you can predict about the size you're going to get.


Kurt

_______________________________________________
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Find size of available data prior to ssl_read

counterpoint
Thanks to Michael and Kurt for explanatory comments.

Is there an available setting that gives the upper limit on the amount of data that will be obtained by a single ssl_read()?

The data stream is SQL requests, and often these are quite small, but they can run to megabytes. I need to malloc a buffer for the data. If it is too small, that will impose extra processing overheads in the rest of the system. If it is too large, it will impose memory wastage on the rest of the system.  The system has an upper limit of 32 KB on the initial size of a buffer for reading, but that is way above the typical SQL request.

So, accepting that I can't set the size precisely, if there is a limit for SSL data reads that is significantly lower than 32 KB then that might be a feasible fixed buffer size.  If that isn't possible, maybe it will have to be a tunable configuration value.  Any comments?
Reply | Threaded
Open this post in threaded view
|

Re: Find size of available data prior to ssl_read

counterpoint
Although maybe the simple answer is to read into a temporary 32 KB buffer and then malloc and copy.
Reply | Threaded
Open this post in threaded view
|

Re: Find size of available data prior to ssl_read

Michael Wojcik
> From: openssl-users [mailto:[hidden email]] On Behalf
> Of counterpoint
> Sent: Thursday, December 17, 2015 04:51
>
> Although maybe the simple answer is to read into a temporary 32 KB buffer and
> then malloc and copy.

That, more or less, was my recommendation in my previous post.

The optimal size of the temporary buffer depends on factors we don't know. If most of your messages fit in 32KB, then that may save you extra calls to SSL_read. On the other hand, it could mean excessive copying - it might be better to use a smaller buffer to reduce the size of the additional copy operation, even at the cost of an extra call to SSL_read. (Obviously some copying is happening in the SSL/TLS processing anyway, and the cost of such copying is small relative to the cost of decryption and other compute-intensive operations. But if your application deals with a high transaction rate then cutting down that extra copy may be worthwhile anyway.)

If your application is single-threaded, you can make that a static buffer; if not, it needs to go on the stack, which could be a problem if your threads are stack-constrained. That's another argument (if it applies to your case) for using a smaller initial buffer.

If the first chunk of your message tells you how large the entire message will be, then this approach means only one call to the allocator per message received, which is good. And it means the same code path for every message regardless of size, which is good for program correctness and maintainability.

Based on what you've told us, this is the approach I'd recommend. The only question is the size of that initial buffer, and you're in a better position to determine that.

--
Michael Wojcik
Technology Specialist, Micro Focus


_______________________________________________
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Find size of available data prior to ssl_read

Jakob Bohm-7
In reply to this post by counterpoint
On 17/12/2015 10:36, counterpoint wrote:
Thanks to Michael and Kurt for explanatory comments.

Is there an available setting that gives the upper limit on the amount of
data that will be obtained by a single ssl_read()?

The data stream is SQL requests, and often these are quite small, but they
can run to megabytes. I need to malloc a buffer for the data. If it is too
small, that will impose extra processing overheads in the rest of the
system. If it is too large, it will impose memory wastage on the rest of the
system.  The system has an upper limit of 32 KB on the initial size of a
buffer for reading, but that is way above the typical SQL request.

So, accepting that I can't set the size precisely, if there is a limit for
SSL data reads that is significantly lower than 32 KB then that might be a
feasible fixed buffer size.  If that isn't possible, maybe it will have to
be a tunable configuration value.  Any comments?
The current SSL/TLS standards limits the per record data
size to 16K exactly, see for example RFC5246 section 6.2.1.


However the data you want in your (higher level) code
probably has a completely different natural size
limit/unit which may be larger and smaller.  For SQL there
is no natural limit however, unless your SQL parser
happens to fail on statements above some arbitrary size.



Enjoy and Merry Christmas

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded 

_______________________________________________
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Find size of available data prior to ssl_read

counterpoint
In reply to this post by Michael Wojcik
Thanks, that makes sense. My ability to optimise is constrained - the system is a product so I do not know what the actual pattern of usage will be. But there is a limit on buffer size within the system. It's a defined symbol, so can be altered from the default of 32 KB, but only by recompiling the system. I rely on a working assumption that people who change definitions and recompile know what they're doing.

The system is threaded, but it is designed to operate with a relatively small number of highly active threads, so grabbing 32 KB on the stack for a short period shouldn't be too much of an issue. It would be much harder to figure out the actual message size because the calls to SSL are taking place in a generic core, whereas the protocol is in a different layer of code. There are ways it could be done, but I'm inclined to leave that for a future optimisation.

That leaves me feeling that the fixed buffer on the stack is the cleanest solution, involving simple code. The copying overhead is there, but looks hard to eliminate, and as you say there is plenty of other overhead. I'm not sure that the small initial buffer offers me much gain, although it might help in some situations. (Personally I'm inclined to use SSH tunnels rather than SSL for SQL traffic, but that's another story!).

One remaining point leaves me uncertain. Supposing an SSL write gets the response SSL_ERROR_WANT_READ. Then there is a POLLIN event. I take it the first thing that must happen is a retry of the write. Assuming that works, do I need to assume that there could be data to be read?  Or will a further event occur, so that I should return to looking out for events?  I guess the answer to the last question is probably no, but am unsure.

Reply | Threaded
Open this post in threaded view
|

Re: Find size of available data prior to ssl_read

Michael Wojcik
> From: openssl-users [mailto:[hidden email]] On Behalf
> Of counterpoint
> Sent: Thursday, December 17, 2015 11:35
>
> Thanks, that makes sense. My ability to optimise is constrained - the system
> is a product so I do not know what the actual pattern of usage will be. But
> there is a limit on buffer size within the system. It's a defined symbol, so
> can be altered from the default of 32 KB, but only by recompiling the
> system. I rely on a working assumption that people who change definitions
> and recompile know what they're doing.

Fair enough.

> The system is threaded, but it is designed to operate with a relatively
> small number of highly active threads, so grabbing 32 KB on the stack for a
> short period shouldn't be too much of an issue.

It's not really a matter of how many threads there are (except indirectly), or of how long the item is on the stack. It's a question of how much space is available on the thread's stack when you try to allocate the buffer (which, assuming we're talking C or C++, is when you enter the function / method).

A thread's stack size is typically set at creation time, with a default that may be fixed in the threading implementation or set at link time. How much space is available when you allocate that 32 KB buffer depends on how deep your call chain is and how much data each of those frames adds to the stack.

If the stack is too small to accommodate the buffer and can't be expanded, you'll get some kind of run-time failure, like a Windows exception or a UNIX signal.

Note that stack space is an address-space resource, not (generally) a virtual memory one - that is, stack-space is unlikely to be constrained because the system is running short on virtual memory. It'll happen because most language implementations use contiguous stacks for performance (rather than, say, displays or other non-contiguous structures), and if the stack runs into something else in the process address space, it can't grow any further. So if your process is 64-bit, you should be able to specify ridiculously large thread stacks and not worry about it.

If the process is 32-bit, take a look at your thread stack sizes and do a quick estimate on how much space you expect will be there. You can determine this for a specific thread, in a specific run, in a debugger by looking at the address of an automatic variable at the bottom of the thread's stack (in the thread's initial function) and the address of one in your data-receiving function. (Technically comparing those addresses isn't authorized by the language standard, but it's valid on most of the platforms OpenSSL supports.)

So I'd say try it in some test runs and see if it looks like stack space might be getting tight; if so, you can likely increase the stack size you specify when creating your threads, since you don't have many of them.

> One remaining point leaves me uncertain. Supposing an SSL write gets the
> response SSL_ERROR_WANT_READ. Then there is a POLLIN event. I take it
> the
> first thing that must happen is a retry of the write. Assuming that works,
> do I need to assume that there could be data to be read?  Or will a further
> event occur, so that I should return to looking out for events?  I guess the
> answer to the last question is probably no, but am unsure.

There could be data to be read. Consider this scenario:

1. The peer decides it wants to renegotiate during the conversation.
2. In the middle of the handshake, you call SSL_write. The handshake hasn't completed, and the local side is waiting for a message from the peer, so SSL_write returns SSL_ERROR_WANT_READ.
3. You wait for POLLIN, then call SSL_write again.
4. Before SSL_write returns, the peer has time to respond to the request you just sent. Or it sends something else immediately after completing the handshake, if your application doesn't use a strict switched-duplex request-response protocol.

So I'd recommend going ahead and trying a non-blocking SSL_read at that point. The overhead is tiny and you won't miss any inbound-data events.

--
Michael Wojcik
Technology Specialist, Micro Focus


_______________________________________________
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
Reply | Threaded
Open this post in threaded view
|

Re: Find size of available data prior to ssl_read

counterpoint
Thanks, very helpful. We only support 64 bit.