[openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
Hi,

Lei Zhang (re)discovered that OpenSSL 1.0.1* and below gets miscompiled,
resulting in incorrect computation of at least SHA-1 hashes (and probably
SHA-0, MD4, MD5) when it's compiled with icc for 64-bit Linux (x86_64 or
mic), but not for Windows.  The problem is already fixed in 1.0.2 and in
LibreSSL.

The problem is that OpenSSL uses the _lrotl() intrinsic to rotate 32-bit
integers, whereas it is defined to operate on "unsigned long", which
obviously is 64-bit on many platforms.

Lei's report:

http://www.openwall.com/lists/john-dev/2015/03/26/1

A previous report (from 2011):

https://software.intel.com/en-us/articles/openssl-generates-incorrect-shamd5-value-if-built-with-icc-compiler

I suggest that this be fixed for all currently supported branches of
OpenSSL.  For now, Lei switched to using LibreSSL in our John the Ripper
-jumbo builds for Xeon Phi, but we'd like to (re-)include instructions
for building with OpenSSL as well.

Thanks,

Alexander


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
Hi,

For reference. icc was not cared for for quite some time. Initially it
was possible for me, by then university employee, to use it, but then
they changes terms and it became impossible for me to maintain it. But
I've just noticed they provide some starter version of something, I'll
see...

> Lei Zhang (re)discovered that OpenSSL 1.0.1* and below gets miscompiled,
> resulting in incorrect computation of at least SHA-1 hashes (and probably
> SHA-0, MD4, MD5) when it's compiled with icc for 64-bit Linux (x86_64 or
> mic), but not for Windows. The problem is already fixed in 1.0.2 and in
> LibreSSL.
>
> The problem is that OpenSSL uses the _lrotl() intrinsic to rotate 32-bit
> integers, whereas it is defined to operate on "unsigned long", which
> obviously is 64-bit on many platforms.
>
> Lei's report:
>
> http://www.openwall.com/lists/john-dev/2015/03/26/1
>
> A previous report (from 2011):
>
> https://software.intel.com/en-us/articles/openssl-generates-incorrect-shamd5-value-if-built-with-icc-compiler
>
> I suggest that this be fixed for all currently supported branches of
> OpenSSL.  For now, Lei switched to using LibreSSL in our John the Ripper
> -jumbo builds for Xeon Phi, but we'd like to (re-)include instructions
> for building with OpenSSL as well.

But linux-x86_64-icc is not present in and was never supported in
pre-1.0.2. So you ought to provide custom line. This remark doesn't mean
that fix can't be backported, but out of curiosity, what's your config
line? Is assembly engaged? If so, how fast is it? Or is it so that you
count on compiler to produce vector code that would process multiple
inputs in parallel with SIMD?

On related note. What's Xeon Phi in this context? I mean are we talking
about Knights Corner (that features own compatible-with-nothing SIMD
instruction set) or Knights Landing (that features AVX512)? If latter,
it might be interesting to extend multi-block SHA support(*), which
should allow to achieve pretty cool results (with vector rotate and
ternary logic instructions, not to mention 16 lanes:-). [As for
"interesting". It's possible but not really interesting in Knights
Corner case, because effort is too specific, just a single obscure and
hardly available CPU, while AVX512 is planned even for other processors
so that code will be reusable.]

(*) BTW, did you try existing one?


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
Hi Andy,

Thank you for your reply!  I am CC'ing Lei on mine.

On Wed, May 20, 2015 at 12:55:10PM +0200, Andy Polyakov via RT wrote:
> For reference. icc was not cared for for quite some time. Initially it
> was possible for me, by then university employee, to use it, but then
> they changes terms and it became impossible for me to maintain it. But
> I've just noticed they provide some starter version of something, I'll
> see...

Yes, this might be usable for you:

https://software.intel.com/en-us/qualify-for-free-software/opensourcecontributor

"Intel provides select Intel Software Development Products at no cost to
qualified open source contributors who are working on open source
projects compliant with the Open Source Initiative (OSI)."

> But linux-x86_64-icc is not present in and was never supported in
> pre-1.0.2.

Oh, I didn't realize that.  Like I mentioned, we're actually building
with icc for MIC.  When we build with icc for x86_64 host, we typically
simply link against the distro's gcc-built OpenSSL, so didn't run into
this issue ourselves until we started building for MIC and thus had to
make our own OpenSSL build with icc.  (Indeed, I've been building
OpenSSL from source on many other occasions, and as part of a distro
too, but that's not with icc and unrelated to JtR project.)

> So you ought to provide custom line. This remark doesn't mean
> that fix can't be backported, but out of curiosity, what's your config
> line?

Currently, Lei put this into JtR -jumbo README-MIC:

Build LibreSSL (version 2.1.6):
$ cd libressl-2.1.6
$ ./configure CC="icc -mmic" --host=k1om-linux --prefix=$MIC
$ make && make install

The previous instructions were:

Build OpenSSL (version 1.0.0q):
$ cd openssl-1.0.0q
$ patch Configure < $JOHN/src/unused/openssl.patch
$ ./Configure linux-mic shared --prefix=$MIC
$ make && make install

I'm not sure what was in $JOHN/src/unused/openssl.patch - I guess it had
to add linux-mic support.  Lei, please reply to all.

> Is assembly engaged? If so, how fast is it? Or is it so that you
> count on compiler to produce vector code that would process multiple
> inputs in parallel with SIMD?

We're using OpenSSL (or LibreSSL) as an easy but slower option,
replacing it with our own SIMD code right in JtR tree whenever we can
and where this makes sense.  So we're not trying to optimize OpenSSL's
code.  It remains scalar and unmodified, and our use of it is just to
have things working where we do not have optimized code yet or where we
prefer simpler rather than faster code (such as for some lightweight
precomputation in some rare cases where this makes sense).

This varies by crypto primitive, but overall we currently have SIMD
intrinsics code for MMX, SSE2+/AVX, XOP, AVX2, MIC/AVX-512, and for
bitslice DES also for AltiVec and NEON.

One thing for which we still use OpenSSL's code in performance-critical
manner is SSH key passphrase cracking (which involves RSA).  There are
probably many more examples like this, but this is a prominent one that
comes to mind.  There must be a lot of room for optimization here.

As to compiler auto-vectorization - no, we are not relying on it.

> On related note. What's Xeon Phi in this context? I mean are we talking
> about Knights Corner

Unfortunately, yes.  BTW, you're welcome to play with it if you like:

http://openwall.info/wiki/HPC/Village

> (that features own compatible-with-nothing SIMD instruction set)

Yes, but at source code level many intrinsics match AVX-512.  So we use
it as a way to prepare for AVX-512.  In many cases, it's just a
recompile away.  There are some notable exceptions to this, though - in
fact, you happened to list some below.

> or Knights Landing (that features AVX512)? If latter,
> it might be interesting to extend multi-block SHA support(*), which
> should allow to achieve pretty cool results (with vector rotate and
> ternary logic instructions, not to mention 16 lanes:-). [As for
> "interesting". It's possible but not really interesting in Knights
> Corner case, because effort is too specific, just a single obscure and
> hardly available CPU, while AVX512 is planned even for other processors
> so that code will be reusable.]

This will take some #ifdef's to provide vector rotates as a macro when
building for MIC and to use the ternary logic intrinsics only when
building for true AVX-512 - nasty, but I think reasonable.  For now,
we're simply using the common subset between MIC and AVX-512:

https://github.com/magnumripper/JohnTheRipper/blob/bleeding-jumbo/src/pseudo_intrinsics.h
https://github.com/magnumripper/JohnTheRipper/blob/bleeding-jumbo/src/sse-intrinsics.c

> (*) BTW, did you try existing one?

No, totally missed it!  Found it now, good work!

$ find -name 'sha*-mb*'
./crypto/sha/asm/sha256-mb-x86_64.pl
./crypto/sha/asm/sha1-mb-x86_64.pl

How is an application using OpenSSL supposed to access this
functionality?  Is there documentation?  So far, I only found uses in
OpenSSL's own e_aes_cbc_hmac_sha*.c and no export of these symbols.

You could want to add optional use of XOP there - rotates and vcmov.
For SHA-1, F() is just one vcmov and H() is vcmov/andnot/xor (see
sse-intrinsics.c above).  For SHA-2, we use:

#define Maj(x,y,z) vcmov(x, y, vxor(z, y))
#define Ch(x,y,z) vcmov(y, z, x)

We're also experimenting with instruction interleaving.  Sometimes,
especially when running only 1 thread/core (such as on cheaper Intel
CPUs without HT, or when there's no thread-level parallelism in the
application - not our case, though), it's optimal to interleave several
SIMD computations, for even wider virtual SIMD vectors than the CPU
supports natively.  e.g. for MD5 on AVX (64-bit builds only, since need
16 registers for interleaving), we currently interleave 3 of those (so
12 MD5's in parallel per thread).

Is it OK that we went quite off-topic on this RT issue?

Alexander


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
In reply to this post by Rich Salz via RT

> On May 24, 2015, at 3:09 PM, Solar Designer <[hidden email]> wrote:
>
>> So you ought to provide custom line. This remark doesn't mean
>> that fix can't be backported, but out of curiosity, what's your config
>> line?
>
> Currently, Lei put this into JtR -jumbo README-MIC:
>
> Build LibreSSL (version 2.1.6):
> $ cd libressl-2.1.6
> $ ./configure CC="icc -mmic" --host=k1om-linux --prefix=$MIC
> $ make && make install
>
> The previous instructions were:
>
> Build OpenSSL (version 1.0.0q):
> $ cd openssl-1.0.0q
> $ patch Configure < $JOHN/src/unused/openssl.patch
> $ ./Configure linux-mic shared --prefix=$MIC
> $ make && make install
>
> I'm not sure what was in $JOHN/src/unused/openssl.patch - I guess it had
> to add linux-mic support.  Lei, please reply to all.

Yes, I added a new target "linux-mic" into Configure, which is slightly modified from "linux-generic64".

From the original patch:

(...)
 "linux-generic64","gcc:-DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
+"linux-mic","icc:-mmic -DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
(...)


Lei




_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
In reply to this post by Rich Salz via RT
Hi,

Thanks for tips and pointers. As for getting off-topic, I'm the one to
blame anyway. So I'm going to strip most of message and comment on
points that still might be of public interest.

>> (*) BTW, did you try existing [multi-block SHA]?
>
> No, totally missed it!  Found it now, good work!
>
> $ find -name 'sha*-mb*'
> ./crypto/sha/asm/sha256-mb-x86_64.pl
> ./crypto/sha/asm/sha1-mb-x86_64.pl
>
> How is an application using OpenSSL supposed to access this
> functionality?  Is there documentation?  So far, I only found uses in
> OpenSSL's own e_aes_cbc_hmac_sha*.c and no export of these symbols.

Well, you have to admit that it's a bit too special to provide
general-purpose interface to it. Which is why application-specific
interface is provided instead, TLS-oriented one in
e_aes_cbc_hmac_sha*.c. Mention of multi-block SHA was not really "go
ahead and use it" kind, but rather "is it interesting?" with implied "if
it is interesting, then we can discuss how to interface your application
to it". Note that it's even possible to take those modules out of
OpenSSL context...

> You could want to add optional use of XOP there - rotates and vcmov.
> For SHA-1, F() is just one vcmov and H() is vcmov/andnot/xor (see
> sse-intrinsics.c above).  For SHA-2, we use:
>
> #define Maj(x,y,z) vcmov(x, y, vxor(z, y))
> #define Ch(x,y,z) vcmov(y, z, x)

As for XOP. Motto is to provide near-optimal performance with minimum
code. That means that if some processor-specific optimization provides
just little improvement, then it's likely to be omitted. I don't recall
attempting XOP specifically in multi-block SHA256, but it was attempted
in SHA1 and it wasn't impressive. I even recall XOP-rotates delivering
worse performance in some case. It likely was some instruction alignment
issue (at least I ran into some anomaly with ChaCha code when merely
flipping order of instruction input arguments affected performance).
Another case of XOP omission is plain SHA256. Point there is that
execution is dominated by scalar part and reducing number of
vector instruction has no effect whatsoever. Anyway, XOP is considered,
but so far was not found "worthy". But it makes sense to double-check
specifically multi-block SHA256...

> We're also experimenting with instruction interleaving.  Sometimes,
> especially when running only 1 thread/core (such as on cheaper Intel
> CPUs without HT, or when there's no thread-level parallelism in the
> application - not our case, though), it's optimal to interleave several
> SIMD computations, for even wider virtual SIMD vectors than the CPU
> supports natively.  e.g. for MD5 on AVX (64-bit builds only, since need
> 16 registers for interleaving), we currently interleave 3 of those (so
> 12 MD5's in parallel per thread).

It's not uncommon that cryptographic algorithms have short dependency
chains and consequently limited ILP, instruction-level parallelism. But
then processors have limited resources too, and question is if those
resources are sufficient to sustain the algorithmic IPL. Or rather vice
versa, if processor has more resources than ILP, then resources will run
underutilized. And naturally only then it makes sense to interleave
instructions. Processor resources can be characterized by IPC,
instructions per cycle, limit, and maximum possible improvement would be
IPC/ILP. But one should remember that IPC is not just amount of
execution ports, for example 4 on Haswell. Some instructions are
port-specific and if algorithm uses such instructions a lot, you'll be
limited by that port. Anyway, MD5 is known for its low IPL and it does
make sense to interleave it (with itself or other algorithm). This
doesn't apply to SHA. It has higher ILP and no contemporary processor
has capacity to fully utilize this parallelism. Actually it's a bit
worse in practice, because thing about multi-block is that it's limited
by shifts, which are port-specific. This is why you observe virtually no
difference among "desktop/server" processors.

As for 4 Haswell ports. Of the 4 only 3 can execute vector instructions.
So that absolutely best results can be achieved when you mix scalar
integer-only and vector instructions, e.g. in addition to MD5 on AVX,
mix in even scalar "thread". Well, gain would have to be divided by
ratio between how many blocks vector part processes vs. how many blocks
scalar parts adds. So gain would be too little to care about. So it's
more of a fun fact in the context.


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
In reply to this post by Rich Salz via RT
> Yes, I added a new target "linux-mic" into Configure, which is slightly modified from "linux-generic64".
>
> From the original patch:
>
> (...)
>  "linux-generic64","gcc:-DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> +"linux-mic","icc:-mmic -DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
> (...)

But what prevents you from 'env CC=icc ./Configure linux-generic64
-mmic'? Or same with linux-x86_64? Can you confirm if './Configure
linux-x86_64-icc -mmic' works in 1.0.2?


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
In reply to this post by Rich Salz via RT

> On May 25, 2015, at 6:01 PM, Andy Polyakov <[hidden email]> wrote:
>
>> Yes, I added a new target "linux-mic" into Configure, which is slightly modified from "linux-generic64".
>>
>> From the original patch:
>>
>> (...)
>> "linux-generic64","gcc:-DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
>> +"linux-mic","icc:-mmic -DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
>> (...)
>
> But what prevents you from 'env CC=icc ./Configure linux-generic64
> -mmic'? Or same with linux-x86_64? Can you confirm if './Configure
> linux-x86_64-icc -mmic' works in 1.0.2?

'CC="icc -mmic" ./Configure shared linux-generic64' works in 1.0.0. It's better than modifying Configure. I just didn't think of it.

But it doesn't work in 1.0.2, getting some link error:
../libcrypto.so: undefined reference to `rc4_md5_enc'

And linux-x86_64 won't work here, since it uses some instructions not supported by MIC.


Lei

_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
>>> Yes, I added a new target "linux-mic" into Configure, which is slightly modified from "linux-generic64".
>>>
>>> From the original patch:
>>>
>>> (...)
>>> "linux-generic64","gcc:-DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
>>> +"linux-mic","icc:-mmic -DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
>>> (...)
>>
>> But what prevents you from 'env CC=icc ./Configure linux-generic64
>> -mmic'? Or same with linux-x86_64? Can you confirm if './Configure
>> linux-x86_64-icc -mmic' works in 1.0.2?
>
> 'CC="icc -mmic" ./Configure shared linux-generic64' works in 1.0.0. It's better than modifying Configure. I just didn't think of it.
>
> But it doesn't work in 1.0.2, getting some link error:
> ../libcrypto.so: undefined reference to `rc4_md5_enc'

Yes, similar issue was reported in another context and it will be
resolved. Meanwhile could you pass explicit no-asm to confirm that it's
in *general* viable option for you.

> And linux-x86_64 won't work here, since it uses some instructions not supported by MIC.

But all x86_64 modules feature run-time switch, when processor
capabilities are detected [with cpuid] and code that can't be executed
on any particular processor won't execute. Or do you mean that fails to
*compile* it with -mmic? Or do you mean that cpuid doesn't work on mic?
But I recall that there is cpuid...


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
In reply to this post by Rich Salz via RT

> On May 26, 2015, at 12:01 AM, Andy Polyakov <[hidden email]> wrote:
>
>>>> Yes, I added a new target "linux-mic" into Configure, which is slightly modified from "linux-generic64".
>>>>
>>>> From the original patch:
>>>>
>>>> (...)
>>>> "linux-generic64","gcc:-DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
>>>> +"linux-mic","icc:-mmic -DTERMIO -O3 -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT DES_UNROLL BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)",
>>>> (...)
>>>
>>> But what prevents you from 'env CC=icc ./Configure linux-generic64
>>> -mmic'? Or same with linux-x86_64? Can you confirm if './Configure
>>> linux-x86_64-icc -mmic' works in 1.0.2?
>>
>> 'CC="icc -mmic" ./Configure shared linux-generic64' works in 1.0.0. It's better than modifying Configure. I just didn't think of it.
>>
>> But it doesn't work in 1.0.2, getting some link error:
>> ../libcrypto.so: undefined reference to `rc4_md5_enc'
>
> Yes, similar issue was reported in another context and it will be
> resolved. Meanwhile could you pass explicit no-asm to confirm that it's
> in *general* viable option for you.
>
>> And linux-x86_64 won't work here, since it uses some instructions not supported by MIC.
>
> But all x86_64 modules feature run-time switch, when processor
> capabilities are detected [with cpuid] and code that can't be executed
> on any particular processor won't execute. Or do you mean that fails to
> *compile* it with -mmic? Or do you mean that cpuid doesn't work on mic?
> But I recall that there is cpuid...

It fails to compile with -mmic:
x86_64cpuid.s:165: Error: `pxor' is not supported on `k1om'
(...)

Here 'pxor' is a MMX instruction, but MIC doesn't support MMX. MIC has its own 512-bit SIMD instruction set, which is not backward-compatible like AVX512.


Lei


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
>>> And linux-x86_64 won't work here, since it uses some instructions not supported by MIC.
>>
>> But all x86_64 modules feature run-time switch, when processor
>> capabilities are detected [with cpuid] and code that can't be executed
>> on any particular processor won't execute. Or do you mean that fails to
>> *compile* it with -mmic? Or do you mean that cpuid doesn't work on mic?
>> But I recall that there is cpuid...
>
> It fails to compile with -mmic:
> x86_64cpuid.s:165: Error: `pxor' is not supported on `k1om'

I see, thanks. In other words, as it turns out my suggestion about
run-time switch does not apply in this case, because minimum of SSE2 is
actually *assumed* for x86_64 platform. And this doesn't hold true for
Knights Corner. But it does hold true for Knights Landing, doesn't it? I
see no point in attempting to accommodate assembler support for Knights
Corner (too rare processor) and would appreciate if you could confirm if
following works with 1.0.2:

./Configure linux-x86_64-icc no-asm -mmic

BTW, _lrotl fix is applied to 1.0.1, but not earlier versions, which are
open for security fixes only.


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
In reply to this post by Rich Salz via RT

> On May 26, 2015, at 4:57 PM, Andy Polyakov <[hidden email]> wrote:
>
>>>> And linux-x86_64 won't work here, since it uses some instructions not supported by MIC.
>>>
>>> But all x86_64 modules feature run-time switch, when processor
>>> capabilities are detected [with cpuid] and code that can't be executed
>>> on any particular processor won't execute. Or do you mean that fails to
>>> *compile* it with -mmic? Or do you mean that cpuid doesn't work on mic?
>>> But I recall that there is cpuid...
>>
>> It fails to compile with -mmic:
>> x86_64cpuid.s:165: Error: `pxor' is not supported on `k1om'
>
> I see, thanks. In other words, as it turns out my suggestion about
> run-time switch does not apply in this case, because minimum of SSE2 is
> actually *assumed* for x86_64 platform. And this doesn't hold true for
> Knights Corner. But it does hold true for Knights Landing, doesn't it?

Yes, Knights Landing supposedly implements AVX512, which is backward compatible with older SIMD instructions.

> I see no point in attempting to accommodate assembler support for Knights
> Corner (too rare processor) and would appreciate if you could confirm if
> following works with 1.0.2:
>
> ./Configure linux-x86_64-icc no-asm -mmic

Yes, it works.

Solar, should I update JtR's READ-MIC to switch back to using OpenSSL? BTW, I'm not sure if switching between OpenSSL and LibreSSL would cause performance variation.


Lei


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Jan Just Keijser-2
In reply to this post by Rich Salz via RT
Hi,

[hidden email] via RT wrote:

>>>> And linux-x86_64 won't work here, since it uses some instructions not supported by MIC.
>>>>        
>>> But all x86_64 modules feature run-time switch, when processor
>>> capabilities are detected [with cpuid] and code that can't be executed
>>> on any particular processor won't execute. Or do you mean that fails to
>>> *compile* it with -mmic? Or do you mean that cpuid doesn't work on mic?
>>> But I recall that there is cpuid...
>>>      
>> It fails to compile with -mmic:
>> x86_64cpuid.s:165: Error: `pxor' is not supported on `k1om'
>>    
>
> I see, thanks. In other words, as it turns out my suggestion about
> run-time switch does not apply in this case, because minimum of SSE2 is
> actually *assumed* for x86_64 platform. And this doesn't hold true for
> Knights Corner. But it does hold true for Knights Landing, doesn't it? I
> see no point in attempting to accommodate assembler support for Knights
> Corner (too rare processor) and would appreciate if you could confirm if
> following works with 1.0.2:
>
> ./Configure linux-x86_64-icc no-asm -mmic
>
> BTW, _lrotl fix is applied to 1.0.1, but not earlier versions, which are
> open for security fixes only.
>
>  
I can confirm that a clean build of openssl 1.0.2a using the above
./Configure line works for me. The resulting binary runs without issues.

JJK
_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: [openssl-dev] [openssl.org #3843] OpenSSL 1.0.1* and below: incorrect use of _lrotl()

Rich Salz via RT
Hi,

[hidden email] via RT wrote:

>>>> And linux-x86_64 won't work here, since it uses some instructions not supported by MIC.
>>>>        
>>> But all x86_64 modules feature run-time switch, when processor
>>> capabilities are detected [with cpuid] and code that can't be executed
>>> on any particular processor won't execute. Or do you mean that fails to
>>> *compile* it with -mmic? Or do you mean that cpuid doesn't work on mic?
>>> But I recall that there is cpuid...
>>>      
>> It fails to compile with -mmic:
>> x86_64cpuid.s:165: Error: `pxor' is not supported on `k1om'
>>    
>
> I see, thanks. In other words, as it turns out my suggestion about
> run-time switch does not apply in this case, because minimum of SSE2 is
> actually *assumed* for x86_64 platform. And this doesn't hold true for
> Knights Corner. But it does hold true for Knights Landing, doesn't it? I
> see no point in attempting to accommodate assembler support for Knights
> Corner (too rare processor) and would appreciate if you could confirm if
> following works with 1.0.2:
>
> ./Configure linux-x86_64-icc no-asm -mmic
>
> BTW, _lrotl fix is applied to 1.0.1, but not earlier versions, which are
> open for security fixes only.
>
>  
I can confirm that a clean build of openssl 1.0.2a using the above
./Configure line works for me. The resulting binary runs without issues.

JJK


_______________________________________________
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev