writev over OpenSSL

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

writev over OpenSSL

Eran Borovik
Hi all,
I am in the process of integrating OpenSSL with my application. My application uses scatter-gather unencrypted buffers. Without OpenSSL, I would use writev with no issues. Is there a way to do the equivalent over OpenSSL? I understand that I can split the vector into multiple SSL_write/SSL_read operations but that defeats the purpose and has a large overhead. Creating a temporary buffer and then consolidating the vector is a problem because of the performance cost associated with memory copy.
Is there a clean way to achieve this without the performance overhead. Perhaps dealing with BIOs directly?

Many thanks,
Eran
Reply | Threaded
Open this post in threaded view
|

Re: writev over OpenSSL

Marian Beermann
> Creating a temporary buffer and then consolidating the
> vector is a problem because of the performance cost associated with
> memory copy.

Did you actually benchmark this or do you just think this is the case?
Consider that SSL_write/read will normally do something like AES or
Chapoly on your CPU at a throughput of 2-4 GB/s, which is about an order
of magnitude slower than streaming memory throughput.

-Marian

Am 02.02.20 um 15:27 schrieb Eran Borovik:

> Hi all,
> I am in the process of integrating OpenSSL with my application. My
> application uses scatter-gather unencrypted buffers. Without OpenSSL, I
> would use writev with no issues. Is there a way to do the equivalent
> over OpenSSL? I understand that I can split the vector into multiple
> SSL_write/SSL_read operations but that defeats the purpose and has a
> large overhead. Creating a temporary buffer and then consolidating the
> vector is a problem because of the performance cost associated with
> memory copy.
> Is there a clean way to achieve this without the performance overhead.
> Perhaps dealing with BIOs directly?
>
> Many thanks,
> Eran

Reply | Threaded
Open this post in threaded view
|

Re: writev over OpenSSL

Eran Borovik
Hi Marian,
Thank you for the prompt response. I do understand that the overhead of encryption actually shadows the memory overhead involved, but still lost cycles are lost cycles. And these cycles might have been used by other logic (in the end of the day, the application does much more than send and receive). After-all, writev was invented for a specific reason.
Anyhow, even the temporary buffer solution is problematic. My application might use thousands of sockets. And since I don't want to allocate a buffer per socket, I will have no recourse but to re-use a small amount of buffers (perhaps one per thread), and to re-create the buffer content after every socket blocking condition. You might argue that this is negligible as well as blocking condition is relatively rare, but still these are extra wasted cycles.
Anyhow, if there isn't any other viable solution,  temporary buffer it is.

Regards,
Eran

On Sun, Feb 2, 2020 at 5:47 PM Marian Beermann <[hidden email]> wrote:
> Creating a temporary buffer and then consolidating the
> vector is a problem because of the performance cost associated with
> memory copy.

Did you actually benchmark this or do you just think this is the case?
Consider that SSL_write/read will normally do something like AES or
Chapoly on your CPU at a throughput of 2-4 GB/s, which is about an order
of magnitude slower than streaming memory throughput.

-Marian

Am 02.02.20 um 15:27 schrieb Eran Borovik:
> Hi all,
> I am in the process of integrating OpenSSL with my application. My
> application uses scatter-gather unencrypted buffers. Without OpenSSL, I
> would use writev with no issues. Is there a way to do the equivalent
> over OpenSSL? I understand that I can split the vector into multiple
> SSL_write/SSL_read operations but that defeats the purpose and has a
> large overhead. Creating a temporary buffer and then consolidating the
> vector is a problem because of the performance cost associated with
> memory copy.
> Is there a clean way to achieve this without the performance overhead.
> Perhaps dealing with BIOs directly?
>
> Many thanks,
> Eran

Reply | Threaded
Open this post in threaded view
|

RE: writev over OpenSSL

Michael Wojcik
This has of course come up before - there was an energetic discussion on this list back in May 2001, and then again in August of that year. Eric Rescorla was one of the participants (as was I).

And the answer has always been that given the miniscule performance gain,[1] and portability issues for platforms that don't have scatter/gather I/O, no one has been motivated to implement it. The OpenSSL core team have better things to do; and clearly no one else has found it sufficiently rewarding to implement it, submit a pull request, and advocate for its inclusion in the official distribution.

OpenSSL is source, after all. There's nothing to stop anyone from adding SSL_writev to their own fork, testing the result, and submitting a PR.

Regarding the "many temporary buffers" problem - traditionally this has been solved with a buffer pool, such as the BSD mbuf architecture. A disadvantage of a single buffer pool is serialization for obtaining and releasing buffers; that can be relieved somewhat by using multiple buffer pools, with threads selecting a pool based on e.g. a hash of the thread ID. That gives you multiple, smaller lock domains.


[1] Yes, "wasted" cycles are wasted cycles. But by Amdahl's Law, optimizing a part of the system where performance is dominated by other considerations that are two or more orders of magnitude larger can never gain you even a single percentage point of improvement. Is it really that useful to improve your application's capacity from, say, 100,000 clients to 100,100? What's the value of that relative to the cost of implementing and testing a new API?

--
Michael Wojcik
Distinguished Engineer, Micro Focus


Reply | Threaded
Open this post in threaded view
|

Re: writev over OpenSSL

Richard Levitte - VMS Whacker-2
In reply to this post by Eran Borovik
So if I understand correctly, the desirable advantages with writev(2)
are atomicity across the set of buffers passed as well as minimum
system call overhead.

I can't see that we have support for this kind of construct.  We
*could* simulate something like that with smartly written BIOs, but
it would be just that, a simulation, and I'm quite sceptical that it
would gain you much more than the mere comfort of having an interface
that you're used to deal with.

Cheers,
Richard

On Sun, 02 Feb 2020 15:27:52 +0100,
Eran Borovik wrote:

> I am in the process of integrating OpenSSL with my application. My application uses scatter-gather unencrypted
> buffers. Without OpenSSL, I would use writev with no issues. Is there a way to do the equivalent over OpenSSL?
> I understand that I can split the vector into multiple SSL_write/SSL_read operations but that defeats the
> purpose and has a large overhead. Creating a temporary buffer and then consolidating the vector is a problem
> because of the performance cost associated with memory copy.
> Is there a clean way to achieve this without the performance overhead. Perhaps dealing with BIOs directly?
>
> Many thanks,
> Eran
>
>
--
Richard Levitte         [hidden email]
OpenSSL Project         http://www.openssl.org/~levitte/
Reply | Threaded
Open this post in threaded view
|

Re: writev over OpenSSL

OpenSSL - User mailing list
In reply to this post by Eran Borovik

TLS/TLS will take your data and wrap it inside it’s own record structure.  It has to, that’s the nature of the protocol.  Thinking that a single writev() is “encrypt buffers and then do analogous syscall” is wrong.

Reply | Threaded
Open this post in threaded view
|

Re: writev over OpenSSL

Viktor Dukhovni
On Sun, Feb 02, 2020 at 05:28:19PM +0000, Salz, Rich via openssl-users wrote:

> TLS/TLS will take your data and wrap it inside it’s own record
> structure.  It has to, that’s the nature of the protocol.  Thinking
> that a single writev() is “encrypt buffers and then do analogous
> syscall” is wrong.

Right, the encryption is not in place, the user's data is copied for
encryption, by which point there's no incentive for a writev between
OpenSSL and the socket.

What could be useful to the OP is some equivalent to "cork" and
"uncork", that tell OpenSSL to not send anything until it has
accumulated a maximal size TLS record or the user "uncorks"
first.

This could allow the OP to do multipe SSL_write calls from from his
iovec, that would be buffered internally in OpenSSL, removing the
need for the user to copy the data before OpenSSL copies it again.

If the OP is actually looking for in-place encryption, that not
generally possible with every block cipher mode.  OCB can do in place
encryption, but OpenSSL presents a general-purpose API.  And one
should it seems avoid OCB2:

    https://en.wikipedia.org/wiki/OCB_mode#Attacks

--
    Viktor.
Reply | Threaded
Open this post in threaded view
|

Re: writev over OpenSSL

Eran Borovik
In reply to this post by Michael Wojcik
I truly appreciate all the answers. Makes sense!!
Most of my background is from systems where reducing (or even eliminating) memory copy by the CPU was the holy grail (using RDMA and other such techniques). I do realize that compared to all the other overheads in the network and OpenSSL path, we can argue that the extra memory copy isn't as prohibitive as it first seemed to me. I still think that in case OpenSSL turns out to use some hardware to offload the processing, then the extra memory copy might be noticeable, but I totally agree that in the general case this will not be an issue.
So, temporary buffer it is. Thanks you very much for all the help.

On Sun, Feb 2, 2020 at 6:59 PM Michael Wojcik <[hidden email]> wrote:
This has of course come up before - there was an energetic discussion on this list back in May 2001, and then again in August of that year. Eric Rescorla was one of the participants (as was I).

And the answer has always been that given the miniscule performance gain,[1] and portability issues for platforms that don't have scatter/gather I/O, no one has been motivated to implement it. The OpenSSL core team have better things to do; and clearly no one else has found it sufficiently rewarding to implement it, submit a pull request, and advocate for its inclusion in the official distribution.

OpenSSL is source, after all. There's nothing to stop anyone from adding SSL_writev to their own fork, testing the result, and submitting a PR.

Regarding the "many temporary buffers" problem - traditionally this has been solved with a buffer pool, such as the BSD mbuf architecture. A disadvantage of a single buffer pool is serialization for obtaining and releasing buffers; that can be relieved somewhat by using multiple buffer pools, with threads selecting a pool based on e.g. a hash of the thread ID. That gives you multiple, smaller lock domains.


[1] Yes, "wasted" cycles are wasted cycles. But by Amdahl's Law, optimizing a part of the system where performance is dominated by other considerations that are two or more orders of magnitude larger can never gain you even a single percentage point of improvement. Is it really that useful to improve your application's capacity from, say, 100,000 clients to 100,100? What's the value of that relative to the cost of implementing and testing a new API?

--
Michael Wojcik
Distinguished Engineer, Micro Focus


Reply | Threaded
Open this post in threaded view
|

RE: writev over OpenSSL

Michael Wojcik
In reply to this post by Viktor Dukhovni
> From: openssl-users [mailto:[hidden email]] On Behalf Of
> Viktor Dukhovni
> Sent: Sunday, February 02, 2020 11:10
>
> On Sun, Feb 02, 2020 at 05:28:19PM +0000, Salz, Rich via openssl-users wrote:
>
> > TLS/TLS will take your data and wrap it inside it’s own record
> > structure.  It has to, that’s the nature of the protocol.  Thinking
> > that a single writev() is “encrypt buffers and then do analogous
> > syscall” is wrong.
>
> Right, the encryption is not in place, the user's data is copied for
> encryption, by which point there's no incentive for a writev between
> OpenSSL and the socket.

True. There's still an argument to be made for a gather-write at the application level, though. That would let the application say "here are multiple buffers of application data which should be coalesced into as few TLS records as possible, then encrypted and transmitted". It saves either a temporary buffer and memory copy prior to calling SSL_write at the application level, or sending short TLS records.

Back in '01 I suggested it would also be useful for applications using the BIO abstraction for both TLS conversations and for plaintext stream sockets. Eighteen and a half years later, I suspect that's not a common use case.

But in any case, as I noted in my previous message, if this enhancement is sufficiently valuable to someone they can always implement it and submit a PR.

--
Michael Wojcik
Distinguished Engineer, Micro Focus