8a bn_sub_words dumps core, Sol8

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

8a bn_sub_words dumps core, Sol8

Martin Carpenter

Hello list,

I'm using 0.9.8a with the latest OpenSSH 4.2 on Solaris 8. Sometimes the ssh
client will dump core. Typically, I see this when connecting to the native sshd
running on a Solaris 9 machine.

(gdb) where
#0  0xff1fe914 in bn_sub_words () from /usr/local/ssl/lib/libcrypto.so.0.9.8
#1  0xff1f7e18 in bn_sub_part_words ()
   from /usr/local/ssl/lib/libcrypto.so.0.9.8
#2  0xff1f89d8 in bn_mul_recursive ()
   from /usr/local/ssl/lib/libcrypto.so.0.9.8
#3  0xff1f88ac in bn_mul_recursive ()
   from /usr/local/ssl/lib/libcrypto.so.0.9.8
(gdb)

The archives only reveal some old x86 asm problems. I've googled up this
reference:

  http://msgs.securepoint.com/cgi-bin/get/openssh-unix-dev-0509/17.html

but unfortunately nothing further. Since my own SPARC assembler knowledge is,
erm, limited, I'm a bit stuck as to how to progress with this. Can anyone
provide any assistance? My backups are failing :-(

Thanks,

Martin.
--
Martin Carpenter          <[hidden email]>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Andy Polyakov
> I'm using 0.9.8a with the latest OpenSSH 4.2 on Solaris 8. Sometimes the ssh
> client will dump core.

There was a report on OpenSSH list last year about intermittent failures
on multi-CPU UltraSPARC system. It was not as fatal as core dump, but it
was sporadic. They asserted that ssh worked reliably if one disengages
one CPU. Concensus was that the failure is caused by a hardware
deficiency. What's your hardware?

> (gdb) where
> #0  0xff1fe914 in bn_sub_words () from /usr/local/ssl/lib/libcrypto.so.0.9.8
> #1  0xff1f7e18 in bn_sub_part_words ()
>    from /usr/local/ssl/lib/libcrypto.so.0.9.8
> #2  0xff1f89d8 in bn_mul_recursive ()
>    from /usr/local/ssl/lib/libcrypto.so.0.9.8
> #3  0xff1f88ac in bn_mul_recursive ()
>    from /usr/local/ssl/lib/libcrypto.so.0.9.8
> (gdb)

You need to:
- state which platform line was used, run 'openssl version -a' to figure
out;
- state which signal caused the core dump, gdb reports it when loads
core file;
- supply disassemble output, run 'disassemble' at gdb prompt;
- supply register bank contents, run 'info reg' at gdb prompt;

> The archives only reveal some old x86 asm problems. I've googled up this
> reference:
>
>   http://msgs.securepoint.com/cgi-bin/get/openssh-unix-dev-0509/17.html

The referred url sounds very much like identical problem. At least
libcrypto.so location suggests that both are UltraSPARC-based systems.
So I don't quite understand what do you mean by "old x86 asm problems."
Unless of course it refers to something else and not to the url above. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Martin Carpenter
Quoting Andy Polyakov <[hidden email]>:

> Concensus was that the failure is caused by a hardware
> deficiency. What's your hardware?

Mostly Sun Ultra 10s, running Solaris 8. I can produce the error on more than
one host, too.


> - state which platform line was used, run 'openssl version -a' to figure
> out;

OpenSSL 0.9.8a 11 Oct 2005
built on: Mon Oct 17 12:52:43 WEST 2005
platform: solaris-sparcv9-gcc
options:  bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) idea(int)
blowfish(ptr)
compiler: gcc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN
-DHAVE_DLFCN_H -m32 -mcpu=ultrasparc -O3 -fomit-frame-pointer -Wall -DB_ENDIAN
-DBN_DIV2W -DMD5_ASM
OPENSSLDIR: "/usr/local/ssl"



> - state which signal caused the core dump, gdb reports it when loads
> core file;

Program terminated with signal 11, Segmentation fault.


> - supply disassemble output, run 'disassemble' at gdb prompt;

#0  0xff1fe914 in bn_sub_words () from /usr/local/ssl/lib/libcrypto.so.0.9.8
(gdb) disassemble
Dump of assembler code for function bn_sub_words:
0xff1fe8b8 <bn_sub_words+0>:    unknown
0xff1fe8bc <bn_sub_words+4>:    ld  [ %o1 ], %o4
0xff1fe8c0 <bn_sub_words+8>:    retl
0xff1fe8c4 <bn_sub_words+12>:   mov  %g0, %o0
0xff1fe8c8 <bn_sub_words+16>:   btst  -4, %o3
0xff1fe8cc <bn_sub_words+20>:   unknown
0xff1fe8d0 <bn_sub_words+24>:   addcc  %g0, 0, %g0
0xff1fe8d4 <bn_sub_words+28>:   nop
0xff1fe8d8 <bn_sub_words+32>:   sub  %o3, 4, %o3
0xff1fe8dc <bn_sub_words+36>:   ld  [ %o2 ], %o5
0xff1fe8e0 <bn_sub_words+40>:   ld  [ %o1 + 4 ], %g1
0xff1fe8e4 <bn_sub_words+44>:   ld  [ %o2 + 4 ], %g2
0xff1fe8e8 <bn_sub_words+48>:   ld  [ %o1 + 8 ], %g3
0xff1fe8ec <bn_sub_words+52>:   ld  [ %o2 + 8 ], %g4
0xff1fe8f0 <bn_sub_words+56>:   subxcc  %o4, %o5, %o5
0xff1fe8f4 <bn_sub_words+60>:   st  %o5, [ %o0 ]
0xff1fe8f8 <bn_sub_words+64>:   ld  [ %o1 + 0xc ], %o4
0xff1fe8fc <bn_sub_words+68>:   ld  [ %o2 + 0xc ], %o5
0xff1fe900 <bn_sub_words+72>:   add  %o1, 0x10, %o1
0xff1fe904 <bn_sub_words+76>:   subxcc  %g1, %g2, %g2
0xff1fe908 <bn_sub_words+80>:   st  %g2, [ %o0 + 4 ]
0xff1fe90c <bn_sub_words+84>:   add  %o2, 0x10, %o2
---Type <return> to continue, or q <return> to quit---
0xff1fe910 <bn_sub_words+88>:   subxcc  %g3, %g4, %g4
0xff1fe914 <bn_sub_words+92>:   st  %g4, [ %o0 + 8 ]
0xff1fe918 <bn_sub_words+96>:   add  %o0, 0x10, %o0
0xff1fe91c <bn_sub_words+100>:  subxcc  %o4, %o5, %o5
0xff1fe920 <bn_sub_words+104>:  st  %o5, [ %o0 + -4 ]
0xff1fe924 <bn_sub_words+108>:  and  %o3, -4, %g1
0xff1fe928 <bn_sub_words+112>:  unknown
0xff1fe92c <bn_sub_words+116>:  ld  [ %o1 ], %o4
0xff1fe930 <bn_sub_words+120>:  unknown
0xff1fe934 <bn_sub_words+124>:  ld  [ %o1 ], %o4
0xff1fe938 <bn_sub_words+128>:  mov  %g0, %o0
0xff1fe93c <bn_sub_words+132>:  retl
0xff1fe940 <bn_sub_words+136>:  unknown
0xff1fe944 <bn_sub_words+140>:  nop
0xff1fe948 <bn_sub_words+144>:  ld  [ %o2 ], %o5
0xff1fe94c <bn_sub_words+148>:  dec  %o3
0xff1fe950 <bn_sub_words+152>:  subxcc  %o4, %o5, %o5
0xff1fe954 <bn_sub_words+156>:  unknown
0xff1fe958 <bn_sub_words+160>:  st  %o5, [ %o0 ]
0xff1fe95c <bn_sub_words+164>:  ld  [ %o1 + 4 ], %o4
0xff1fe960 <bn_sub_words+168>:  ld  [ %o2 + 4 ], %o5
0xff1fe964 <bn_sub_words+172>:  dec  %o3
0xff1fe968 <bn_sub_words+176>:  subxcc  %o4, %o5, %o5
0xff1fe96c <bn_sub_words+180>:  unknown
0xff1fe970 <bn_sub_words+184>:  st  %o5, [ %o0 + 4 ]
0xff1fe974 <bn_sub_words+188>:  ld  [ %o1 + 8 ], %o4
0xff1fe978 <bn_sub_words+192>:  ld  [ %o2 + 8 ], %o5
0xff1fe97c <bn_sub_words+196>:  subxcc  %o4, %o5, %o5
0xff1fe980 <bn_sub_words+200>:  st  %o5, [ %o0 + 8 ]
0xff1fe984 <bn_sub_words+204>:  mov  %g0, %o0
0xff1fe988 <bn_sub_words+208>:  retl
0xff1fe98c <bn_sub_words+212>:  unknown
0xff1fe990 <bn_sub_words+216>:  unimp  0
0xff1fe994 <bn_sub_words+220>:  unimp  0
0xff1fe998 <bn_sub_words+224>:  unimp  0
0xff1fe99c <bn_sub_words+228>:  unimp  0
End of assembler dump.


> - supply register bank contents, run 'info reg' at gdb prompt;

(gdb) info reg
g0             0x0      0
g1             0x2660b08e       643870862
g2             0x9ce70afc       -1662579972
g3             0x2937ebe4       691530724
g4             0xc61ad6af       -971319633
g5             0x0      0
g6             0x0      0
g7             0x0      0
o0             0xcdff8  843768
o1             0xcde18  843288
o2             0xcde38  843320
o3             0xffffd24b       -11701
o4             0x2b1f43a4       723469220
o5             0xf535303d       -181063619
sp             0xffbef0b0       4290703536
o7             0xff1f7e10       -14713328
l0             0xfa97d2d9       -90713383
l1             0x6bd0dc02       1808849922
l2             0xf20264bd       -234724163
l3             0x779c03d8       2006713304
l4             0xf11effe3       -249626653
l5             0xfb5c9ccd       -77816627
l6             0x9e069b72       -1643734158
l7             0x5e5ff26        98959142
i0             0xc2918  796952
i1             0xc2728  796456
i2             0xc2748  796488
i3             0x7      7
i4             0x1      1
i5             0xffffff84       -124
fp             0xffbef120       4290703648
i7             0xff1f89d0       -14710320
y              0xffffffff       -1
psr            0xfe901005       -24113147       icc:N--C, pil:0, s:0, ps:0,
et:0, cwp:5
wim            0x0      0
tbr            0x0      0
pc             0xff1fe914       4280281364
npc            0xff1fe918       -14685928
fpsr           0x0      0       rd:N, tem:0, ns:0, ver:0, ftt:0, qne:0, fcc:=,
aexc:0, cexc:0
cpsr           0x0      0
(gdb)


> >   http://msgs.securepoint.com/cgi-bin/get/openssh-unix-dev-0509/17.html
>
> The referred url sounds very much like identical problem. At least
> libcrypto.so location suggests that both are UltraSPARC-based systems.
> So I don't quite understand what do you mean by "old x86 asm problems."
> Unless of course it refers to something else and not to the url above. A.

Sorry for the confusion - two separate things. The old x86 problems that I found
are not related AFAIK to the problem described by the URL.

Thanks for your help,

Martin.
--
Martin Carpenter          <[hidden email]>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Andy Polyakov
>>Concensus was that the failure is caused by a hardware
>>deficiency. What's your hardware?
>
> Mostly Sun Ultra 10s, running Solaris 8. I can produce the error on more than
> one host, too.

Then it can't be hardware...

> platform: solaris-sparcv9-gcc
> compiler: gcc -fPIC ...
> Program terminated with signal 11, Segmentation fault.
> #0  0xff1fe914 in bn_sub_words () from /usr/local/ssl/lib/libcrypto.so.0.9.8
> 0xff1fe914 <bn_sub_words+92>:   st  %g4, [ %o0 + 8 ]
> o0             0xcdff8  843768
> o3             0xffffd24b       -11701

The key question is how come %o3 managed to advance to negative value.
I'm really baffled how it could happen, when the module was known to
work for *years*. Can you test if
http://cvs.openssl.org/chngview?cn=14621 fixes the problem? A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Kurt Roeckx
On Fri, Nov 11, 2005 at 09:20:58PM +0100, Andy Polyakov wrote:

> >>Concensus was that the failure is caused by a hardware
> >>deficiency. What's your hardware?
> >
> >Mostly Sun Ultra 10s, running Solaris 8. I can produce the error on more
> >than
> >one host, too.
>
> Then it can't be hardware...
>
> >platform: solaris-sparcv9-gcc
> >compiler: gcc -fPIC ...
> >Program terminated with signal 11, Segmentation fault.
> >#0  0xff1fe914 in bn_sub_words () from
> >/usr/local/ssl/lib/libcrypto.so.0.9.8
> >0xff1fe914 <bn_sub_words+92>:   st  %g4, [ %o0 + 8 ]
> >o0             0xcdff8  843768
> >o3             0xffffd24b       -11701

We had someone report about the same thing on linux on a sparc:
http://bugs.debian.org/335912


Kurt

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Andy Polyakov
>>>> What's your hardware?
>>>
>>>Mostly Sun Ultra 10s, running Solaris 8. I can produce the error on more
>>>than one host, too.

>>Then it can't be hardware...

I mean not such hardware deficiency that qualifies for CPU replacement.
But can you, Martin, say that the problem occurs only on USIIi CPUs
[those found in Ultra 10]?

>>>platform: solaris-sparcv9-gcc
>>>compiler: gcc -fPIC ...
>>>Program terminated with signal 11, Segmentation fault.
>>>#0  0xff1fe914 in bn_sub_words () from
>>>/usr/local/ssl/lib/libcrypto.so.0.9.8
>>>0xff1fe914 <bn_sub_words+92>:   st  %g4, [ %o0 + 8 ]
>>>o0             0xcdff8  843768
>>>o3             0xffffd24b       -11701
>
> We had someone report about the same thing on linux on a sparc:
> http://bugs.debian.org/335912

Do suggest to test http://cvs.openssl.org/chngview?cn=14621. BTW, what's
*your* hardware? I mean the one you, Kurt, failed to reproduce the
problem on? A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Kurt Roeckx
On Sat, Nov 12, 2005 at 04:15:23PM +0100, Andy Polyakov wrote:
> >
> >We had someone report about the same thing on linux on a sparc:
> >http://bugs.debian.org/335912
>
> Do suggest to test http://cvs.openssl.org/chngview?cn=14621. BTW, what's
> *your* hardware? I mean the one you, Kurt, failed to reproduce the
> problem on? A.

I've tried it on one of the debian developer machines
(vore.debian.org) which is an UltraSPARC II 300Mhz.


Kurt

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Andy Polyakov
>>>We had someone report about the same thing on linux on a sparc:
>>>http://bugs.debian.org/335912
>>
>>Do suggest to test http://cvs.openssl.org/chngview?cn=14621. BTW, what's
>>*your* hardware? I mean the one you, Kurt, failed to reproduce the
>>problem on?
>
> I've tried it on one of the debian developer machines
> (vore.debian.org) which is an UltraSPARC II 300Mhz.

Do ask bug report originator to test on his hardware and ask what's his
hardware. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Andy Polyakov
>>>> We had someone report about the same thing on linux on a sparc:
>>>> http://bugs.debian.org/335912
>>>
>>> Do suggest to test http://cvs.openssl.org/chngview?cn=14621.
>
> Do ask bug report originator to test on his hardware and ask what's his
> hardware.

Just for the record. On http://bugs.debian.org/335912 you mention
"Upstream seems to think this might be related to type of processor
you're using." The problem is very strange, as the code was working for
years. Being open for all options, I'm just collecting information. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Martin Carpenter
In reply to this post by Andy Polyakov
Selon Andy Polyakov <[hidden email]>:

> Can you test if
> http://cvs.openssl.org/chngview?cn=14621 fixes the problem?

Initial tests look good. I haven't seen a failure yet; I'm running a script to
see if I can stress the problem into reappearing.


--
Martin Carpenter          <[hidden email]>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Andy Polyakov
>>Can you test if
>>http://cvs.openssl.org/chngview?cn=14621 fixes the problem?
>
>
> Initial tests look good. I haven't seen a failure yet; I'm running a script to
> see if I can stress the problem into reappearing.

http://cvs.openssl.org/chngview?cn=14624 is "official" resolution for
the problem. Difference between first draft above and this one is that
"better safe than sorry" approach is applied to the rest of routines in
the module. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Kurt Roeckx
On Tue, Nov 15, 2005 at 09:57:59AM +0100, Andy Polyakov wrote:

> >>Can you test if
> >>http://cvs.openssl.org/chngview?cn=14621 fixes the problem?
> >
> >
> >Initial tests look good. I haven't seen a failure yet; I'm running a
> >script to
> >see if I can stress the problem into reappearing.
>
> http://cvs.openssl.org/chngview?cn=14624 is "official" resolution for
> the problem. Difference between first draft above and this one is that
> "better safe than sorry" approach is applied to the rest of routines in
> the module. A.

We had someone else report the problem and say that the patch
fixed it:
http://bugs.debian.org/339532


Kurt

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: 8a bn_sub_words dumps core, Sol8

Martin Carpenter
Selon Kurt Roeckx <[hidden email]>:

> We had someone else report the problem and say that the patch
> fixed it:
> http://bugs.debian.org/339532

Certainly works for me. Backups are running faultlessly again for two days here.

Thanks!

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]