PKEY CMAC timings

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

PKEY CMAC timings

Hal Murray
Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz

After Kurt's improvement, with our usage patterns (48 bytes), PKEY mode on
3.0.0 takes 2x as many cycles as 1.1.1

That factor probably depends on how good the hardware AES support is in your
CPU.  I think it's significantly faster in newer CPU chips.

1.1.1g:
     AES-128  16 48 16    434   0.434  475ac1c053379e7dbd4ce80b87d2178e
     AES-192  24 48 16    442   0.442  c906422bfe0963de6df50e022b4aa7d4
     AES-256  32 48 16    453   0.453  991f4017858de97515260dd9ae440b06

1.1.1g improved:
     AES-128  16 48 16    230   0.230  475ac1c053379e7dbd4ce80b87d2178e
     AES-192  24 48 16    252   0.252  c906422bfe0963de6df50e022b4aa7d4
     AES-256  32 48 16    252   0.252  991f4017858de97515260dd9ae440b06

3.0.0 alpha3:
     AES-128  16 48 16    815   0.815  475ac1c053379e7dbd4ce80b87d2178e
     AES-192  24 48 16    831   0.831  c906422bfe0963de6df50e022b4aa7d4
     AES-256  32 48 16    846   0.846  991f4017858de97515260dd9ae440b06

3.0.0-alpha3 improved:
     AES-128  16 48 16    500   0.500  475ac1c053379e7dbd4ce80b87d2178e
     AES-192  24 48 16    515   0.515  c906422bfe0963de6df50e022b4aa7d4
     AES-256  32 48 16    530   0.530  991f4017858de97515260dd9ae440b06

Thanks again.


--
These are my opinions.  I hate spam.



Reply | Threaded
Open this post in threaded view
|

Re: PKEY CMAC timings

Dr Paul Dale
How does it look for large input?  As in many kilobytes or megabytes?


Pauli
-- 
Dr Paul Dale | Distinguished Architect | Cryptographic Foundations 
Phone +61 7 3031 7217
Oracle Australia




On 18 Jun 2020, at 1:18 pm, Hal Murray <[hidden email]> wrote:

Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz

After Kurt's improvement, with our usage patterns (48 bytes), PKEY mode on
3.0.0 takes 2x as many cycles as 1.1.1

That factor probably depends on how good the hardware AES support is in your
CPU.  I think it's significantly faster in newer CPU chips.

1.1.1g:
    AES-128  16 48 16    434   0.434  475ac1c053379e7dbd4ce80b87d2178e
    AES-192  24 48 16    442   0.442  c906422bfe0963de6df50e022b4aa7d4
    AES-256  32 48 16    453   0.453  991f4017858de97515260dd9ae440b06

1.1.1g improved:
    AES-128  16 48 16    230   0.230  475ac1c053379e7dbd4ce80b87d2178e
    AES-192  24 48 16    252   0.252  c906422bfe0963de6df50e022b4aa7d4
    AES-256  32 48 16    252   0.252  991f4017858de97515260dd9ae440b06

3.0.0 alpha3:
    AES-128  16 48 16    815   0.815  475ac1c053379e7dbd4ce80b87d2178e
    AES-192  24 48 16    831   0.831  c906422bfe0963de6df50e022b4aa7d4
    AES-256  32 48 16    846   0.846  991f4017858de97515260dd9ae440b06

3.0.0-alpha3 improved:
    AES-128  16 48 16    500   0.500  475ac1c053379e7dbd4ce80b87d2178e
    AES-192  24 48 16    515   0.515  c906422bfe0963de6df50e022b4aa7d4
    AES-256  32 48 16    530   0.530  991f4017858de97515260dd9ae440b06

Thanks again.


--
These are my opinions.  I hate spam.




Reply | Threaded
Open this post in threaded view
|

Re: PKEY CMAC timings

Hal Murray
In reply to this post by Hal Murray
> How does it look for large input?  As in many kilobytes or megabytes?

16K is all I was willing to wait for.  Timing for really long blocks turns
into a memory test.  The right unit is ns/byte.  If that's an interesting
case, I'll hack some code to do longer blocks.

1.1.1g
     AES-128  16 48 16    225   0.225  475ac1c053379e7dbd4ce80b87d2178e
     AES-128  16 1024 16   1682   1.682  159d6d5c13f35d37c72efc5f6dbf40ad
     AES-128  16 16384 16  24566  24.566  581f7b133ad6f3697f33c3f836fdb6e6

3.0.0 alpha3
     AES-128  16 48 16    496   0.496  475ac1c053379e7dbd4ce80b87d2178e
     AES-128  16 1024 16   1953   1.953  159d6d5c13f35d37c72efc5f6dbf40ad
     AES-128  16 16384 16  24820  24.820  581f7b133ad6f3697f33c3f836fdb6e6

-----------

3.0.0 alpha3:
CMAC
     AES-128  16 16384 16  25270  25.270  581f7b133ad6f3697f33c3f836fdb6e6
PKEY
     AES-128  16 16384 16  24839  24.839  581f7b133ad6f3697f33c3f836fdb6e6
EVP_MAC
     AES-128  16 16384 16  25462  25.462  581f7b133ad6f3697f33c3f836fdb6e6
EVP_MAC with Preload cipher and key
     AES-128  16 16384 16  24567  24.567  581f7b133ad6f3697f33c3f836fdb6e6



--
These are my opinions.  I hate spam.



Reply | Threaded
Open this post in threaded view
|

Re: PKEY CMAC timings

Richard Levitte - VMS Whacker-2
I think 16k was enough to demonstrate that the timing difference
becomes more marginal the larger the amount of data to encrypt in the
same session is.

This makes me think that we might want to rethink the reset functions,
i.e. the likes of EVP_CIPHER_CTX_reset()...  could we change that
function to become a call down to provider code?  We do allow that for
the non-provider back-ends, they can implement a 'cleanup' function.
Right now, EVP_CIPHER_CTX_reset() just calls the provider's function
to free its operation context, which forces us to re-initialize
everything with a restarted session, i.e. pass the key anew, etc etc
etc.

Cheers,
Richard

On Thu, 18 Jun 2020 06:50:45 +0200,
Hal Murray wrote:
>
> > How does it look for large input?  As in many kilobytes or megabytes?
>
> 16K is all I was willing to wait for.  Timing for really long blocks turns
> into a memory test.  The right unit is ns/byte.  If that's an interesting
> case, I'll hack some code to do longer blocks.
pp>

> 1.1.1g
>      AES-128  16 48 16    225   0.225  475ac1c053379e7dbd4ce80b87d2178e
>      AES-128  16 1024 16   1682   1.682  159d6d5c13f35d37c72efc5f6dbf40ad
>      AES-128  16 16384 16  24566  24.566  581f7b133ad6f3697f33c3f836fdb6e6
>
> 3.0.0 alpha3
>      AES-128  16 48 16    496   0.496  475ac1c053379e7dbd4ce80b87d2178e
>      AES-128  16 1024 16   1953   1.953  159d6d5c13f35d37c72efc5f6dbf40ad
>      AES-128  16 16384 16  24820  24.820  581f7b133ad6f3697f33c3f836fdb6e6
>
> -----------
>
> 3.0.0 alpha3:
> CMAC
>      AES-128  16 16384 16  25270  25.270  581f7b133ad6f3697f33c3f836fdb6e6
> PKEY
>      AES-128  16 16384 16  24839  24.839  581f7b133ad6f3697f33c3f836fdb6e6
> EVP_MAC
>      AES-128  16 16384 16  25462  25.462  581f7b133ad6f3697f33c3f836fdb6e6
> EVP_MAC with Preload cipher and key
>      AES-128  16 16384 16  24567  24.567  581f7b133ad6f3697f33c3f836fdb6e6
>
>
>
> --
> These are my opinions.  I hate spam.
>
>
>
--
Richard Levitte         [hidden email]
OpenSSL Project         http://www.openssl.org/~levitte/
Reply | Threaded
Open this post in threaded view
|

Re: PKEY CMAC timings

Richard Levitte - VMS Whacker-2
On Thu, 18 Jun 2020 08:27:13 +0200,
Richard Levitte wrote:

>
> I think 16k was enough to demonstrate that the timing difference
> becomes more marginal the larger the amount of data to encrypt in the
> same session is.
>
> This makes me think that we might want to rethink the reset functions,
> i.e. the likes of EVP_CIPHER_CTX_reset()...  could we change that
> function to become a call down to provider code?  We do allow that for
> the non-provider back-ends, they can implement a 'cleanup' function.
> Right now, EVP_CIPHER_CTX_reset() just calls the provider's function
> to free its operation context, which forces us to re-initialize
> everything with a restarted session, i.e. pass the key anew, etc etc
> etc.

Never mind what I just said, just calling EVP_EncryptInit_ex(ctx,
NULL, NULL, NULL, NULL) performs (or should perform) the reset I was
thinking of.

Cheers,
Richard

> On Thu, 18 Jun 2020 06:50:45 +0200,
> Hal Murray wrote:
> >
> > > How does it look for large input?  As in many kilobytes or megabytes?
> >
> > 16K is all I was willing to wait for.  Timing for really long blocks turns
> > into a memory test.  The right unit is ns/byte.  If that's an interesting
> > case, I'll hack some code to do longer blocks.
> pp>
> > 1.1.1g
> >      AES-128  16 48 16    225   0.225  475ac1c053379e7dbd4ce80b87d2178e
> >      AES-128  16 1024 16   1682   1.682  159d6d5c13f35d37c72efc5f6dbf40ad
> >      AES-128  16 16384 16  24566  24.566  581f7b133ad6f3697f33c3f836fdb6e6
> >
> > 3.0.0 alpha3
> >      AES-128  16 48 16    496   0.496  475ac1c053379e7dbd4ce80b87d2178e
> >      AES-128  16 1024 16   1953   1.953  159d6d5c13f35d37c72efc5f6dbf40ad
> >      AES-128  16 16384 16  24820  24.820  581f7b133ad6f3697f33c3f836fdb6e6
> >
> > -----------
> >
> > 3.0.0 alpha3:
> > CMAC
> >      AES-128  16 16384 16  25270  25.270  581f7b133ad6f3697f33c3f836fdb6e6
> > PKEY
> >      AES-128  16 16384 16  24839  24.839  581f7b133ad6f3697f33c3f836fdb6e6
> > EVP_MAC
> >      AES-128  16 16384 16  25462  25.462  581f7b133ad6f3697f33c3f836fdb6e6
> > EVP_MAC with Preload cipher and key
> >      AES-128  16 16384 16  24567  24.567  581f7b133ad6f3697f33c3f836fdb6e6
> >
> >
> >
> > --
> > These are my opinions.  I hate spam.
> >
> >
> >
> --
> Richard Levitte         [hidden email]
> OpenSSL Project         http://www.openssl.org/~levitte/
>
--
Richard Levitte         [hidden email]
OpenSSL Project         http://www.openssl.org/~levitte/
Reply | Threaded
Open this post in threaded view
|

Re: PKEY CMAC timings

Dr Paul Dale
In reply to this post by Richard Levitte - VMS Whacker-2
I honestly believe that the various contexts should be reusable.
Without this, the recent provider additions will impose a significant overhead.

Pauli
-- 
Dr Paul Dale | Distinguished Architect | Cryptographic Foundations 
Phone +61 7 3031 7217
Oracle Australia




On 18 Jun 2020, at 4:27 pm, Richard Levitte <[hidden email]> wrote:

I think 16k was enough to demonstrate that the timing difference
becomes more marginal the larger the amount of data to encrypt in the
same session is.

This makes me think that we might want to rethink the reset functions,
i.e. the likes of EVP_CIPHER_CTX_reset()...  could we change that
function to become a call down to provider code?  We do allow that for
the non-provider back-ends, they can implement a 'cleanup' function.
Right now, EVP_CIPHER_CTX_reset() just calls the provider's function
to free its operation context, which forces us to re-initialize
everything with a restarted session, i.e. pass the key anew, etc etc
etc.

Cheers,
Richard

On Thu, 18 Jun 2020 06:50:45 +0200,
Hal Murray wrote:

How does it look for large input?  As in many kilobytes or megabytes?

16K is all I was willing to wait for.  Timing for really long blocks turns
into a memory test.  The right unit is ns/byte.  If that's an interesting
case, I'll hack some code to do longer blocks.
pp>
1.1.1g
    AES-128  16 48 16    225   0.225  475ac1c053379e7dbd4ce80b87d2178e
    AES-128  16 1024 16   1682   1.682  159d6d5c13f35d37c72efc5f6dbf40ad
    AES-128  16 16384 16  24566  24.566  581f7b133ad6f3697f33c3f836fdb6e6

3.0.0 alpha3
    AES-128  16 48 16    496   0.496  475ac1c053379e7dbd4ce80b87d2178e
    AES-128  16 1024 16   1953   1.953  159d6d5c13f35d37c72efc5f6dbf40ad
    AES-128  16 16384 16  24820  24.820  581f7b133ad6f3697f33c3f836fdb6e6

-----------

3.0.0 alpha3:
CMAC
    AES-128  16 16384 16  25270  25.270  581f7b133ad6f3697f33c3f836fdb6e6
PKEY
    AES-128  16 16384 16  24839  24.839  581f7b133ad6f3697f33c3f836fdb6e6
EVP_MAC
    AES-128  16 16384 16  25462  25.462  581f7b133ad6f3697f33c3f836fdb6e6
EVP_MAC with Preload cipher and key
    AES-128  16 16384 16  24567  24.567  581f7b133ad6f3697f33c3f836fdb6e6



--
These are my opinions.  I hate spam.



--
Richard Levitte         [hidden email]
OpenSSL Project         http://www.openssl.org/~levitte/

Reply | Threaded
Open this post in threaded view
|

Re: PKEY CMAC timings

Hal Murray
In reply to this post by Hal Murray
In the context of making things go fast/clean, do I need a reset?  If so, why?

My straw man is that setup has 3 stages:
  1: get storage and whatever for the cipher
  2: setup tables and such for a key
  3: init internal data

In the same key case, the basic operation is
  Init (does step 3)
  Update
  Final

I think setup steps 1 and 2 can be done with something like
  Setup(ctx, cipher, key+length)

A NULL cipher means keep using the current one - no allocs.

With something like that, I'd be happy to have a ctx per cipher.

Setup and Init can be merged into one function if a NULL key means keep using
the old one.  I think it's slightly cleaner (and faster) to leave them split.



--
These are my opinions.  I hate spam.



Reply | Threaded
Open this post in threaded view
|

Re: PKEY CMAC timings

Richard Levitte - VMS Whacker-2
On Thu, 18 Jun 2020 09:25:43 +0200,
Hal Murray wrote:
>
> In the context of making things go fast/clean, do I need a reset?  If so, why?

No.  I sent another message where I pointed out that I made a mistake
when saying so.


--
Richard Levitte         [hidden email]
OpenSSL Project         http://www.openssl.org/~levitte/