sparcv9a-mont SIGBUS

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

sparcv9a-mont SIGBUS

Jan Engelhardt-3
Hi,


Since the inclusion of sparcv9a-mont.s/.pl, I get a SIGBUS error when
running bntest. Package is openssl 1.0.0 with sparcv9a on Linux 2.6.34
with a sparcv9 environment (64-bit kernel, 32-bit userspace/v8/v8plus)
on a sun4v US T1 CPU. I am aware of the FPU implications - openssl just
chooses sparcv9a by default, so I stumbled across this.
Any insight would be appreciated.


Program received signal SIGBUS, Bus error.
bn_mul_mont_fpu () at sparcv9a-mont.s:80
80              ldda    [%o4+2]%asi,%f0
Current language:  auto
The current source language is "auto; currently asm".
(gdb) bt
#0  bn_mul_mont_fpu () at sparcv9a-mont.s:80
#1  0xf7e67744 in BN_mod_mul_montgomery (r=0xffffd8f0, a=0xffffd8c8,
    b=0xffffd8b4, mont=0x2c5a8, ctx=0x2c008) at bn_mont.c:140
#2  0x00013dd8 in test_mont (bp=0x2c038, ctx=0x2c008) at bntest.c:755
#3  0x00011e84 in main (argc=0, argv=0xffffda88) at bntest.c:243
(gdb) info registers
g0             0x0      0
g1             0x0      0
g2             0x22000  139264
g3             0x2c2b8  180920
g4             0x807ffff        134742015
g5             0x0      0
g6             0x10     16
g7             0xf7ff3620       -134269408
o0             0xaf2f975d       -1355835555
o1             0xe0b68eb5       -524906827
o2             0xffffd8b4       -10060
o3             0x2c920  182560
o4             0x2c2a8  180904
o5             0x2c200  180736
sp             0xffffd000       0xffffd000
o7             0xf0     240
l0             0xffffd0c0       -12096
l1             0xffffd0e0       -12064
l2             0xffffd0f0       -12048
l3             0xffffd100       -12032
l4             0xffffd110       -12016
l5             0xfffffff0       -16
l6             0xfffffff0       -16
l7             0xffff   65535
---Type <return> to continue, or q <return> to quit---
i0             0x2c2d0  180944
i1             0x2c930  182576
i2             0x2c2b8  180920
i3             0x2c210  180752
i4             0x2c5e8  181736
i5             0x10     16
fp             0xffffd7e0       0xffffd7e0
i7             0xf7e6773c       -135891140
y              0x396423c        60179004
psr            0xff440082       [ #1 S #18 #22 #24 #25 #26 #27 #28 #29 #30 #31 ]
wim            0x0      0
tbr            0x0      0
pc             0xf7e66468       0xf7e66468 <bn_mul_mont_fpu+232>
npc            0xf7e6646c       0xf7e6646c <bn_mul_mont_fpu+236>
fsr            0x0      [ ]
csr            0x0      0
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: sparcv9a-mont SIGBUS

Andy Polyakov
> Since the inclusion of sparcv9a-mont.s/.pl, I get a SIGBUS error when
> running bntest. Package is openssl 1.0.0 with sparcv9a on Linux 2.6.34
> with a sparcv9 environment (64-bit kernel, 32-bit userspace/v8/v8plus)
> on a sun4v US T1 CPU. I am aware of the FPU implications - openssl just
> chooses sparcv9a by default, so I stumbled across this.
> Any insight would be appreciated.

T1 on Linux is the keyword here. I mean it wouldn't have happened under
Solaris. Because CPU detection on Solaris is more elaborate than on
Linux and bn_mul_mont_fpu wouldn't have been called on T1,
bn_mul_mont_int would. For details see crypto/sparcv9cap.c. As you can
see on Linux capability vector is assumed to be fixed (with "for now
assume that the rest supports UltraSPARC-I* only" resolution) and the
only way to override it is to set OPENSSL_sparcv9cap environment
variable. So to quickly verify this run 'env OPENSSL_sparcv9cap=0 make
test' and see if it passes. Actually it's strange that it fails with
SIGBUS... I'd rather expect suboptimal performance, but not SIGBUS...
SIGBUS normally denotes unaligned access, but instruction in qustion
pulls 16-bit value and effective address is 16-bit aligned...

Anyway, the solution is to refine CPU detection and the question is how
to *programmatically* detect if it's sun4v system. uname(2) returns
sparc64 in utsname.machine field (right?) and doesn't tell the story...
One can parse /proc/cpuinfo (looking for type: line), but then you
depend on /proc being mounted... Finally one can probe if instruction in
question fails the way similar to one in crypto/ppccap.c...

Something will be done shortly, meanwhile set OPENSSL_sparcv9cap
environment variable to 0. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: sparcv9a-mont SIGBUS

Jan Engelhardt-3

On Wednesday 2010-06-30 16:04, Andy Polyakov wrote:

>> Since the inclusion of sparcv9a-mont.s/.pl, I get a SIGBUS error when
>> running bntest. Package is openssl 1.0.0 with sparcv9a on Linux 2.6.34
>> with a sparcv9 environment (64-bit kernel, 32-bit userspace/v8/v8plus)
>> on a sun4v US T1 CPU. I am aware of the FPU implications - openssl just
>> chooses sparcv9a by default, so I stumbled across this.
>> Any insight would be appreciated.
>
>T1 on Linux is the keyword here. I mean it wouldn't have happened under
>Solaris. Because CPU detection on Solaris is more elaborate than on
>Linux and bn_mul_mont_fpu wouldn't have been called on T1,
>[...]
>variable. So to quickly verify this run 'env OPENSSL_sparcv9cap=0 make
>test' and see if it passes. Actually it's strange that it fails with
>SIGBUS... I'd rather expect suboptimal performance, but not SIGBUS...
>SIGBUS normally denotes unaligned access, but instruction in qustion
>pulls 16-bit value and effective address is 16-bit aligned...

Perhaps you can reproduce the SIGBUS on Solaris by forcing
FPU on T1. Or maybe it's another signal, which could point to
a trap translation issue in Linux.

>Anyway, the solution is to refine CPU detection and the question is how
>to *programmatically* detect if it's sun4v system. uname(2) returns
>sparc64 in utsname.machine field (right?) and doesn't tell the story...
>One can parse /proc/cpuinfo (looking for type: line), but then you
>depend on /proc being mounted... Finally one can probe if instruction in
>question fails the way similar to one in crypto/ppccap.c...

$ LD_SHOW_AUXV=1 /bin/true
AT_HWCAP:    flush stbar swap muldiv v9 v9v

Though that relies on glibc...
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: sparcv9a-mont SIGBUS

Andy Polyakov
>>> Since the inclusion of sparcv9a-mont.s/.pl, I get a SIGBUS error when
>>> running bntest. Package is openssl 1.0.0 with sparcv9a on Linux 2.6.34
>>> with a sparcv9 environment (64-bit kernel, 32-bit userspace/v8/v8plus)
>>> on a sun4v US T1 CPU. I am aware of the FPU implications - openssl just
>>> chooses sparcv9a by default, so I stumbled across this.
>>> Any insight would be appreciated.
>> T1 on Linux is the keyword here. I mean it wouldn't have happened under
>> Solaris. Because CPU detection on Solaris is more elaborate than on
>> Linux and bn_mul_mont_fpu wouldn't have been called on T1,
>> [...]
>> variable. So to quickly verify this run 'env OPENSSL_sparcv9cap=0 make
>> test' and see if it passes. Actually it's strange that it fails with
>> SIGBUS... I'd rather expect suboptimal performance, but not SIGBUS...
>> SIGBUS normally denotes unaligned access, but instruction in qustion
>> pulls 16-bit value and effective address is 16-bit aligned...
>
> Perhaps you can reproduce the SIGBUS on Solaris by forcing
> FPU on T1. Or maybe it's another signal, which could point to
> a trap translation issue in Linux.

I have no access to T1, so I have no opportunity to reproduce it. I had
couple of hours hands-on experience in local Sun office back in 2006,
but since then T1 support is based on users' feedback, so bear with me...

>> Anyway, the solution is to refine CPU detection and the question is how
>> to *programmatically* detect if it's sun4v system. uname(2) returns
>> sparc64 in utsname.machine field (right?) and doesn't tell the story...
>> One can parse /proc/cpuinfo (looking for type: line), but then you
>> depend on /proc being mounted... Finally one can probe if instruction in
>> question fails the way similar to one in crypto/ppccap.c...

I've committed http://cvs.openssl.org/chngview?cn=19727 to address this
and the other assembler issue you've reported. It's essential that you
test it and report how it went. You can wait till tomorrow morning and
download most recent 1.0.0 snapshot at ftp://ftp.openssl.org/snapshot/,
or you can rsync repository now. See
http://www.openssl.org/source/repos.html for further details.

> $ LD_SHOW_AUXV=1 /bin/true
> AT_HWCAP:    flush stbar swap muldiv v9 v9v

Thanks. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: sparcv9a-mont SIGBUS

Jan Engelhardt-3

On Thursday 2010-07-01 10:09, Andy Polyakov wrote:

>>> SIGBUS normally denotes unaligned access, but instruction in qustion
>>> pulls 16-bit value and effective address is 16-bit aligned...

I just tried a test .S file with

        ldda [%sp+0+16]%asi, %f0
        ldda [%sp+0+8]%asi, %f0
        ldda [%sp+0+4]%asi, %f0
        ldda [%sp+0+2]%asi, %f0

And +4 is the first one it SIGBUS'd on. So if the alignment in
sparcv9a-mont is increases to +8, it would also work on T1.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: sparcv9a-mont SIGBUS

Andy Polyakov
>>>> SIGBUS normally denotes unaligned access, but instruction in qustion
>>>> pulls 16-bit value and effective address is 16-bit aligned...
>
> I just tried a test .S file with
>
> ldda [%sp+0+16]%asi, %f0
> ldda [%sp+0+8]%asi, %f0
> ldda [%sp+0+4]%asi, %f0
> ldda [%sp+0+2]%asi, %f0
>
> And +4 is the first one it SIGBUS'd on. So if the alignment in
> sparcv9a-mont is increases to +8, it would also work on T1.

Yes, but spacv9a-mont *relies* on +2, +4 and even +6. Offsets are used
to pick 16-bit words constituting single [naturally aligned] 64-bit
value, i.e. words reside on adjacent +2n offsets [with n=0-3]. It does
work on UltraSPARC-I-IV and SPARC64 V-VII.

But getting bn_mul_mont_fpu working on T1 is *not* the goal, because
performance would be *horrible* (1/10th or worth). Idea implemented in
updated sparcv9cap.c is to use this SIGBUS to heuristically detect T1
and to disable FP code in favor of pure IALU bn_mul_mont_int...

... But wait... The fact that I remember 1/10th coefficient must mean
that sparcv9a-mont did work under Solaris on T1. Question is how.
Chances are that Solaris kernel transparently fixes the ldda unaligned
access in trap handler. Meaning that *if/when* Linux chooses to do the
same, the above mentioned heuristic test will fail to detect T1... A.



______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: sparcv9a-mont SIGBUS

Jan Engelhardt-3
On Thursday 2010-07-01 11:31, Andy Polyakov wrote:

>>>>> SIGBUS normally denotes unaligned access, but instruction in qustion
>>>>> pulls 16-bit value and effective address is 16-bit aligned...
>>
>> I just tried a test .S file with
>>
>> ldda [%sp+0+16]%asi, %f0
>> ldda [%sp+0+8]%asi, %f0
>> ldda [%sp+0+4]%asi, %f0
>> ldda [%sp+0+2]%asi, %f0
>>
>> And +4 is the first one it SIGBUS'd on. So if the alignment in
>> sparcv9a-mont is increases to +8, it would also work on T1.
>
>Yes, but spacv9a-mont *relies* on +2, +4 and even +6. Offsets are used
>to pick 16-bit words constituting single [naturally aligned] 64-bit

Hm. If I read the SPARC quick reference at
http://docs.sun.com/app/docs/doc/816-1681/sparcv9-15322?a=view right,
ldd(a) loads a floating point word rather than a 16-bit word, which
would explain why it declines non-8 aligned addresses.

>value, i.e. words reside on adjacent +2n offsets [with n=0-3]. It does
>work on UltraSPARC-I-IV and SPARC64 V-VII.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: sparcv9a-mont SIGBUS

Andy Polyakov
>>>>>> SIGBUS normally denotes unaligned access, but instruction in qustion
>>>>>> pulls 16-bit value and effective address is 16-bit aligned...
>>> I just tried a test .S file with
>>>
>>> ldda [%sp+0+16]%asi, %f0
>>> ldda [%sp+0+8]%asi, %f0
>>> ldda [%sp+0+4]%asi, %f0
>>> ldda [%sp+0+2]%asi, %f0
>>>
>>> And +4 is the first one it SIGBUS'd on. So if the alignment in
>>> sparcv9a-mont is increases to +8, it would also work on T1.
>> Yes, but spacv9a-mont *relies* on +2, +4 and even +6. Offsets are used
>> to pick 16-bit words constituting single [naturally aligned] 64-bit
>
> Hm. If I read the SPARC quick reference at
> http://docs.sun.com/app/docs/doc/816-1681/sparcv9-15322?a=view right,
> ldd(a) loads a floating point word rather than a 16-bit word, which
> would explain why it declines non-8 aligned addresses.

With magic 0xD2 value in %asi register ldda []%asi reads 16-bit value
and allows for 16-bit alignment. Quoting UltraSPARC User's Manual:

"These ASIs allow 8- and 16-bit loads or stores to be performed to the
floating-point registers. Eight-bit loads can be performed to arbitrary
byte addresses. For sixteen bit loads, the least significant bit of the
address must be zero, or a mem_not_aligned trap is taken."

A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: sparcv9a-mont SIGBUS

Andy Polyakov
In reply to this post by Andy Polyakov
> But getting bn_mul_mont_fpu working on T1 is *not* the goal, because
> performance would be *horrible* (1/10th or worth). Idea implemented in
> updated sparcv9cap.c is to use this SIGBUS to heuristically detect T1
> and to disable FP code in favor of pure IALU bn_mul_mont_int...
>
> ... But wait... The fact that I remember 1/10th coefficient must mean
> that sparcv9a-mont did work under Solaris on T1. Question is how.
> Chances are that Solaris kernel transparently fixes the ldda unaligned
> access in trap handler. Meaning that *if/when* Linux chooses to do the
> same, the above mentioned heuristic test will fail to detect T1...

As it turned out 16-bit ldda is emulated by Solaris kernel [but
apparently not Linux one]. Secondly [and most importantly] 16-bit ldda
is documented to be implemented in hardware by UltraSPARC T2, meaning
that test in question will fail on T2. But bn_mul_mont_fpu performance
is suboptimal even on T2, so the procedure should detect it too, not
only T1...

I've examined glibc code responsible for printing AT_HWCAP vector (with
earlier suggested 'env LD_SHOW_AUXV=1 /bin/true'). There is _dl_auxv
vector filled by kernel/fs/binfmt_elf.c, but it's totally private to
ld-linux.so.2 and not accessible to me...

As result I've chosen to settle for instrumentation of pair of VIS1
instructions to detect Tx. See http://cvs.openssl.org/chngview?cn=19738
for further details. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]