Fwd: PPC bn_div_words routine rewrite

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: PPC bn_div_words routine rewrite

David Ho-2
This is the second confirmed report of the same problem on the ppc8xx.

After reading my email.  I must say I was the unfriendly one, I
apologize for that.

More debugging evidence to come.  

---------- Forwarded message ----------
From: Murch, Christopher <[hidden email]>
Date: Jul 1, 2005 9:46 AM
Subject: RE: PPC bn_div_words routine rewrite
To: David Ho <[hidden email]>


David,
I had observed the same issue on ppc 8xx machines after upgrading to the asm
version of the BN routines.  Thank you very much for your work for the fix.
My question is, do you have high confidence in the other new asm ppc BN
routines after observing this issue or do you think they might have similiar
problems?
Thanks.
Chris

-----Original Message-----
From: David Ho [mailto:[hidden email]]
Sent: Thursday, June 30, 2005 6:22 PM
To: [hidden email]; [hidden email]
Subject: Re: PPC bn_div_words routine rewrite


The reason I had to redo this routine, in case anyone is wondering, is
because ssh-keygen  segfaults when this assembly routine returns junk
to the BN_div_word function. On a ppc, if you issue the command

ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""

The program craps out when it tries to write the public key in ascii
decimal.

Regards,
David

On 6/30/05, David Ho <[hidden email]> wrote:

> Hi all,
>
> This is a rewrite of the bn_div_words routine for the PowerPC arch,
> tested on a MPC8xx processor.
> I initially thought there is maybe a small mistake in the code that
> requires a one-liner change but it turns out I have to redo the
> routine.
> I guess this routine is not called very often as I see that most other
> routines are hand-crafted, whereas this routine is compiled from a C
> function that apparently has not gone through a whole lot of testing.
>
> I wrote a C function to confirm correctness of the code.
>
> unsigned long div_words (unsigned long h,
>                          unsigned long l,
>                          unsigned long d)
> {
>   unsigned long i_h; /* intermediate dividend */
>   unsigned long i_q; /* quotient of i_h/d */
>   unsigned long i_r; /* remainder of i_h/d */
>
>   unsigned long i_cntr;
>   unsigned long i_carry;
>
>   unsigned long ret_q; /* return quotient */
>
>   /* cannot divide by zero */
>   if (d == 0) return 0xffffffff;
>
>   /* do simple 32-bit divide */
>   if (h == 0) return l/d;
>
>   i_q = h/d;
>   i_r = h - (i_q*d);
>   ret_q = i_q;
>
>   i_cntr = 32;
>
>   while (i_cntr--)
>   {
>     i_carry = (l & 0x80000000) ? 1:0;
>     l = l << 1;
>
>     i_h = (i_r << 1) | i_carry;
>     i_q = i_h/d;
>     i_r = i_h - (i_q*d);
>
>     ret_q = (ret_q << 1) | i_q;
>   }
>
>   return ret_q;
> }
>
>
> Then I handcrafted the routine in PPC assembly.
> The result is a 26 line assembly that is easy to understand and
> predictable as opposed to a 81liner that I am still trying to
> decipher...
> If anyone is interested in incorporating this routine to the openssl
> code I'll be happy to assist.
> At this point I think I will be taking a bit of a break from this 3
> day debugging/fixing marathon.
>
> Regards,
> David Ho
>
>
> #
> #       Handcrafted version of bn_div_words
> #
> #       r3 = h
> #       r4 = l
> #       r5 = d
>
>         cmplwi  0,r5,0                  # compare r5 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
>         li      r3,-1                   # d=0 return -1
>         bclr    BO_ALWAYS,CR0_LT
> .Lppcasm_div1:
>         cmplwi  0,r3,0                  # compare r3 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
>         divwu   r3,r4,r5                # ret_q = l/d
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> .Lppcasm_div2:
>         divwu   r9,r3,r5                # i_q = h/d
>         mullw   r10,r9,r5               # i_r = h - (i_q*d)
>         subf    r10,r10,r3
>         mr      r3,r9                   # req_q = i_q
> .Lppcasm_set_ctr:
>         li      r12,32                  # ctr = bitsizeof(d)
>         mtctr   r12
> .Lppcasm_div_loop:
>         addc    r4,r4,r4                # l = l << 1 -> i_carry
>         adde    r11,r10,r10             # i_h = (i_r << 1) | i_carry
>         divwu   r9,r11,r5               # i_q = i_h/d
>         mullw   r10,r9,r5               # i_r = i_h - (i_q*d)
>         subf    r10,r10,r11
>         add     r3,r3,r3                # ret_q = ret_q << 1 | i_q
>         add     r3,r3,r9
>         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> .Lppc_div_end:
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
>         .long   0x00000000
>
_______________________________________________
Linuxppc-embedded mailing list
[hidden email]
https://ozlabs.org/mailman/listinfo/linuxppc-embedded
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

David Ho-2
First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
with following arguments.

(gdb) show args
Argument list to give program being debugged when it is started is
    "-t rsa1 -f /etc/ssh/ssh_host_key -N """.

Program received signal SIGSEGV, Segmentation fault.
BN_bn2dec (a=0x1002d9f0) at bn_print.c:136
136                             *lp=BN_div_word(t,BN_DEC_CONV);

(gdb) i r
r0             0x0      0
r1             0x7fffd580       2147472768
r2             0x30012868       805382248
r3             0x80000000       2147483648
r4             0xfef33fc        267334652
r5             0x25     37
r6             0xfccdef8        265084664
r7             0x7fffd4c0       2147472576
r8             0xfbad2887       4222429319
r9             0x84044022       2214871074
r10            0x0      0
r11            0x2      2
r12            0xfef2054        267329620
r13            0x10030bc8       268635080
r14            0x0      0
r15            0x0      0
r16            0x0      0
r17            0x0      0
r18            0x0      0
r19            0x0      0
r20            0x0      0
r21            0x0      0
r22            0x0      0
r23            0x64     100
r24            0x5      5
r25            0x1002d438       268620856
r26            0x1002d9f0       268622320
r27            0x1002c578       268617080
r28            0x1      1
r29            0x10031000       268636160
r30            0xffbf7d0        268171216
r31            0x1002d9f0       268622320
pc             0xfef2058        267329624
ps             0xd032   53298
cr             0x24044022       604258338
lr             0xfef2054        267329620
ctr            0xfccefa0        265088928
xer            0x20000000       536870912
fpscr          0x0      0
vscr           0x0      0
vrsave         0x0      0

(gdb) p/x $pc
$1 = 0xfef2058

0x0fef2058 <BN_bn2dec+472>:     stw     r3,0(r29)

(gdb) x 0x10031000
0x10031000:     Cannot access memory at address 0x10031000










On 7/5/05, David Ho <[hidden email]> wrote:

> This is the second confirmed report of the same problem on the ppc8xx.
>
> After reading my email.  I must say I was the unfriendly one, I
> apologize for that.
>
> More debugging evidence to come.
>
> ---------- Forwarded message ----------
> From: Murch, Christopher <[hidden email]>
> Date: Jul 1, 2005 9:46 AM
> Subject: RE: PPC bn_div_words routine rewrite
> To: David Ho <[hidden email]>
>
>
> David,
> I had observed the same issue on ppc 8xx machines after upgrading to the asm
> version of the BN routines.  Thank you very much for your work for the fix.
> My question is, do you have high confidence in the other new asm ppc BN
> routines after observing this issue or do you think they might have similiar
> problems?
> Thanks.
> Chris
>
> -----Original Message-----
> From: David Ho [mailto:[hidden email]]
> Sent: Thursday, June 30, 2005 6:22 PM
> To: [hidden email]; [hidden email]
> Subject: Re: PPC bn_div_words routine rewrite
>
>
> The reason I had to redo this routine, in case anyone is wondering, is
> because ssh-keygen  segfaults when this assembly routine returns junk
> to the BN_div_word function. On a ppc, if you issue the command
>
> ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
>
> The program craps out when it tries to write the public key in ascii
> decimal.
>
> Regards,
> David
>
> On 6/30/05, David Ho <[hidden email]> wrote:
> > Hi all,
> >
> > This is a rewrite of the bn_div_words routine for the PowerPC arch,
> > tested on a MPC8xx processor.
> > I initially thought there is maybe a small mistake in the code that
> > requires a one-liner change but it turns out I have to redo the
> > routine.
> > I guess this routine is not called very often as I see that most other
> > routines are hand-crafted, whereas this routine is compiled from a C
> > function that apparently has not gone through a whole lot of testing.
> >
> > I wrote a C function to confirm correctness of the code.
> >
> > unsigned long div_words (unsigned long h,
> >                          unsigned long l,
> >                          unsigned long d)
> > {
> >   unsigned long i_h; /* intermediate dividend */
> >   unsigned long i_q; /* quotient of i_h/d */
> >   unsigned long i_r; /* remainder of i_h/d */
> >
> >   unsigned long i_cntr;
> >   unsigned long i_carry;
> >
> >   unsigned long ret_q; /* return quotient */
> >
> >   /* cannot divide by zero */
> >   if (d == 0) return 0xffffffff;
> >
> >   /* do simple 32-bit divide */
> >   if (h == 0) return l/d;
> >
> >   i_q = h/d;
> >   i_r = h - (i_q*d);
> >   ret_q = i_q;
> >
> >   i_cntr = 32;
> >
> >   while (i_cntr--)
> >   {
> >     i_carry = (l & 0x80000000) ? 1:0;
> >     l = l << 1;
> >
> >     i_h = (i_r << 1) | i_carry;
> >     i_q = i_h/d;
> >     i_r = i_h - (i_q*d);
> >
> >     ret_q = (ret_q << 1) | i_q;
> >   }
> >
> >   return ret_q;
> > }
> >
> >
> > Then I handcrafted the routine in PPC assembly.
> > The result is a 26 line assembly that is easy to understand and
> > predictable as opposed to a 81liner that I am still trying to
> > decipher...
> > If anyone is interested in incorporating this routine to the openssl
> > code I'll be happy to assist.
> > At this point I think I will be taking a bit of a break from this 3
> > day debugging/fixing marathon.
> >
> > Regards,
> > David Ho
> >
> >
> > #
> > #       Handcrafted version of bn_div_words
> > #
> > #       r3 = h
> > #       r4 = l
> > #       r5 = d
> >
> >         cmplwi  0,r5,0                  # compare r5 and 0
> >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
> >         li      r3,-1                   # d=0 return -1
> >         bclr    BO_ALWAYS,CR0_LT
> > .Lppcasm_div1:
> >         cmplwi  0,r3,0                  # compare r3 and 0
> >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
> >         divwu   r3,r4,r5                # ret_q = l/d
> >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > .Lppcasm_div2:
> >         divwu   r9,r3,r5                # i_q = h/d
> >         mullw   r10,r9,r5               # i_r = h - (i_q*d)
> >         subf    r10,r10,r3
> >         mr      r3,r9                   # req_q = i_q
> > .Lppcasm_set_ctr:
> >         li      r12,32                  # ctr = bitsizeof(d)
> >         mtctr   r12
> > .Lppcasm_div_loop:
> >         addc    r4,r4,r4                # l = l << 1 -> i_carry
> >         adde    r11,r10,r10             # i_h = (i_r << 1) | i_carry
> >         divwu   r9,r11,r5               # i_q = i_h/d
> >         mullw   r10,r9,r5               # i_r = i_h - (i_q*d)
> >         subf    r10,r10,r11
> >         add     r3,r3,r3                # ret_q = ret_q << 1 | i_q
> >         add     r3,r3,r9
> >         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> > .Lppc_div_end:
> >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> >         .long   0x00000000
> >
> _______________________________________________
> Linuxppc-embedded mailing list
> [hidden email]
> https://ozlabs.org/mailman/listinfo/linuxppc-embedded
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

David Ho-2
I can tell you with certainty, with reference to the function
BN_bn2dec, that since lp is a pointer, and within the while loop
around bn_print.c:136 lp is being incremented.  Because the test
BN_is_zero(t) is always false, you have a pointer that is going off
into the stratosphere, hence the segfault on ppc8xx.

More analysis to come.

On 7/5/05, David Ho <[hidden email]> wrote:

> First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
> with following arguments.
>
> (gdb) show args
> Argument list to give program being debugged when it is started is
>     "-t rsa1 -f /etc/ssh/ssh_host_key -N """.
>
> Program received signal SIGSEGV, Segmentation fault.
> BN_bn2dec (a=0x1002d9f0) at bn_print.c:136
> 136                             *lp=BN_div_word(t,BN_DEC_CONV);
>
> (gdb) i r
> r0             0x0      0
> r1             0x7fffd580       2147472768
> r2             0x30012868       805382248
> r3             0x80000000       2147483648
> r4             0xfef33fc        267334652
> r5             0x25     37
> r6             0xfccdef8        265084664
> r7             0x7fffd4c0       2147472576
> r8             0xfbad2887       4222429319
> r9             0x84044022       2214871074
> r10            0x0      0
> r11            0x2      2
> r12            0xfef2054        267329620
> r13            0x10030bc8       268635080
> r14            0x0      0
> r15            0x0      0
> r16            0x0      0
> r17            0x0      0
> r18            0x0      0
> r19            0x0      0
> r20            0x0      0
> r21            0x0      0
> r22            0x0      0
> r23            0x64     100
> r24            0x5      5
> r25            0x1002d438       268620856
> r26            0x1002d9f0       268622320
> r27            0x1002c578       268617080
> r28            0x1      1
> r29            0x10031000       268636160
> r30            0xffbf7d0        268171216
> r31            0x1002d9f0       268622320
> pc             0xfef2058        267329624
> ps             0xd032   53298
> cr             0x24044022       604258338
> lr             0xfef2054        267329620
> ctr            0xfccefa0        265088928
> xer            0x20000000       536870912
> fpscr          0x0      0
> vscr           0x0      0
> vrsave         0x0      0
>
> (gdb) p/x $pc
> $1 = 0xfef2058
>
> 0x0fef2058 <BN_bn2dec+472>:     stw     r3,0(r29)
>
> (gdb) x 0x10031000
> 0x10031000:     Cannot access memory at address 0x10031000
>
>
>
>
>
>
>
>
>
>
> On 7/5/05, David Ho <[hidden email]> wrote:
> > This is the second confirmed report of the same problem on the ppc8xx.
> >
> > After reading my email.  I must say I was the unfriendly one, I
> > apologize for that.
> >
> > More debugging evidence to come.
> >
> > ---------- Forwarded message ----------
> > From: Murch, Christopher <[hidden email]>
> > Date: Jul 1, 2005 9:46 AM
> > Subject: RE: PPC bn_div_words routine rewrite
> > To: David Ho <[hidden email]>
> >
> >
> > David,
> > I had observed the same issue on ppc 8xx machines after upgrading to the asm
> > version of the BN routines.  Thank you very much for your work for the fix.
> > My question is, do you have high confidence in the other new asm ppc BN
> > routines after observing this issue or do you think they might have similiar
> > problems?
> > Thanks.
> > Chris
> >
> > -----Original Message-----
> > From: David Ho [mailto:[hidden email]]
> > Sent: Thursday, June 30, 2005 6:22 PM
> > To: [hidden email]; [hidden email]
> > Subject: Re: PPC bn_div_words routine rewrite
> >
> >
> > The reason I had to redo this routine, in case anyone is wondering, is
> > because ssh-keygen  segfaults when this assembly routine returns junk
> > to the BN_div_word function. On a ppc, if you issue the command
> >
> > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
> >
> > The program craps out when it tries to write the public key in ascii
> > decimal.
> >
> > Regards,
> > David
> >
> > On 6/30/05, David Ho <[hidden email]> wrote:
> > > Hi all,
> > >
> > > This is a rewrite of the bn_div_words routine for the PowerPC arch,
> > > tested on a MPC8xx processor.
> > > I initially thought there is maybe a small mistake in the code that
> > > requires a one-liner change but it turns out I have to redo the
> > > routine.
> > > I guess this routine is not called very often as I see that most other
> > > routines are hand-crafted, whereas this routine is compiled from a C
> > > function that apparently has not gone through a whole lot of testing.
> > >
> > > I wrote a C function to confirm correctness of the code.
> > >
> > > unsigned long div_words (unsigned long h,
> > >                          unsigned long l,
> > >                          unsigned long d)
> > > {
> > >   unsigned long i_h; /* intermediate dividend */
> > >   unsigned long i_q; /* quotient of i_h/d */
> > >   unsigned long i_r; /* remainder of i_h/d */
> > >
> > >   unsigned long i_cntr;
> > >   unsigned long i_carry;
> > >
> > >   unsigned long ret_q; /* return quotient */
> > >
> > >   /* cannot divide by zero */
> > >   if (d == 0) return 0xffffffff;
> > >
> > >   /* do simple 32-bit divide */
> > >   if (h == 0) return l/d;
> > >
> > >   i_q = h/d;
> > >   i_r = h - (i_q*d);
> > >   ret_q = i_q;
> > >
> > >   i_cntr = 32;
> > >
> > >   while (i_cntr--)
> > >   {
> > >     i_carry = (l & 0x80000000) ? 1:0;
> > >     l = l << 1;
> > >
> > >     i_h = (i_r << 1) | i_carry;
> > >     i_q = i_h/d;
> > >     i_r = i_h - (i_q*d);
> > >
> > >     ret_q = (ret_q << 1) | i_q;
> > >   }
> > >
> > >   return ret_q;
> > > }
> > >
> > >
> > > Then I handcrafted the routine in PPC assembly.
> > > The result is a 26 line assembly that is easy to understand and
> > > predictable as opposed to a 81liner that I am still trying to
> > > decipher...
> > > If anyone is interested in incorporating this routine to the openssl
> > > code I'll be happy to assist.
> > > At this point I think I will be taking a bit of a break from this 3
> > > day debugging/fixing marathon.
> > >
> > > Regards,
> > > David Ho
> > >
> > >
> > > #
> > > #       Handcrafted version of bn_div_words
> > > #
> > > #       r3 = h
> > > #       r4 = l
> > > #       r5 = d
> > >
> > >         cmplwi  0,r5,0                  # compare r5 and 0
> > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
> > >         li      r3,-1                   # d=0 return -1
> > >         bclr    BO_ALWAYS,CR0_LT
> > > .Lppcasm_div1:
> > >         cmplwi  0,r3,0                  # compare r3 and 0
> > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
> > >         divwu   r3,r4,r5                # ret_q = l/d
> > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > .Lppcasm_div2:
> > >         divwu   r9,r3,r5                # i_q = h/d
> > >         mullw   r10,r9,r5               # i_r = h - (i_q*d)
> > >         subf    r10,r10,r3
> > >         mr      r3,r9                   # req_q = i_q
> > > .Lppcasm_set_ctr:
> > >         li      r12,32                  # ctr = bitsizeof(d)
> > >         mtctr   r12
> > > .Lppcasm_div_loop:
> > >         addc    r4,r4,r4                # l = l << 1 -> i_carry
> > >         adde    r11,r10,r10             # i_h = (i_r << 1) | i_carry
> > >         divwu   r9,r11,r5               # i_q = i_h/d
> > >         mullw   r10,r9,r5               # i_r = i_h - (i_q*d)
> > >         subf    r10,r10,r11
> > >         add     r3,r3,r3                # ret_q = ret_q << 1 | i_q
> > >         add     r3,r3,r9
> > >         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> > > .Lppc_div_end:
> > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > >         .long   0x00000000
> > >
> > _______________________________________________
> > Linuxppc-embedded mailing list
> > [hidden email]
> > https://ozlabs.org/mailman/listinfo/linuxppc-embedded
> >
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

David Ho-2
Let's take first call to BN_div_word for example from BN_bn2dec, the
parameter being passed to BN_div_word is (a=35, w=1000000000) (decimal
numbers).  It then calls the bn_div_words with (h=0, l=35,
d=1000000000)  if you examine the code in linux_ppc32.s it will exit
early on because h is 0.  the routine returns a divide by 0, which is
undefined according to the manual.  In the case of ppc8xx the result
is 0x80000000.  So this is the return value from bn_div_words, as seen
in register R3.

So what happens next is BN_div_word modifies "a" (1st parameter) with
the result (0x80000000) and returns 23 as the remainder of the
division. So "a" is never zero as a result and hence the test for
BN_is_zero is always false.  The problem fails the very first time it
uses bn_div_words.

The next thing I did naturally was to fix the case when you have h=0,
which you can quite easy do it with the native divwu instruction.  Lo
and behold I was once again disappointed when h is not equal to 0.

More to come...


On 7/5/05, David Ho <[hidden email]> wrote:

> I can tell you with certainty, with reference to the function
> BN_bn2dec, that since lp is a pointer, and within the while loop
> around bn_print.c:136 lp is being incremented.  Because the test
> BN_is_zero(t) is always false, you have a pointer that is going off
> into the stratosphere, hence the segfault on ppc8xx.
>
> More analysis to come.
>
> On 7/5/05, David Ho <[hidden email]> wrote:
> > First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
> > with following arguments.
> >
> > (gdb) show args
> > Argument list to give program being debugged when it is started is
> >     "-t rsa1 -f /etc/ssh/ssh_host_key -N """.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > BN_bn2dec (a=0x1002d9f0) at bn_print.c:136
> > 136                             *lp=BN_div_word(t,BN_DEC_CONV);
> >
> > (gdb) i r
> > r0             0x0      0
> > r1             0x7fffd580       2147472768
> > r2             0x30012868       805382248
> > r3             0x80000000       2147483648
> > r4             0xfef33fc        267334652
> > r5             0x25     37
> > r6             0xfccdef8        265084664
> > r7             0x7fffd4c0       2147472576
> > r8             0xfbad2887       4222429319
> > r9             0x84044022       2214871074
> > r10            0x0      0
> > r11            0x2      2
> > r12            0xfef2054        267329620
> > r13            0x10030bc8       268635080
> > r14            0x0      0
> > r15            0x0      0
> > r16            0x0      0
> > r17            0x0      0
> > r18            0x0      0
> > r19            0x0      0
> > r20            0x0      0
> > r21            0x0      0
> > r22            0x0      0
> > r23            0x64     100
> > r24            0x5      5
> > r25            0x1002d438       268620856
> > r26            0x1002d9f0       268622320
> > r27            0x1002c578       268617080
> > r28            0x1      1
> > r29            0x10031000       268636160
> > r30            0xffbf7d0        268171216
> > r31            0x1002d9f0       268622320
> > pc             0xfef2058        267329624
> > ps             0xd032   53298
> > cr             0x24044022       604258338
> > lr             0xfef2054        267329620
> > ctr            0xfccefa0        265088928
> > xer            0x20000000       536870912
> > fpscr          0x0      0
> > vscr           0x0      0
> > vrsave         0x0      0
> >
> > (gdb) p/x $pc
> > $1 = 0xfef2058
> >
> > 0x0fef2058 <BN_bn2dec+472>:     stw     r3,0(r29)
> >
> > (gdb) x 0x10031000
> > 0x10031000:     Cannot access memory at address 0x10031000
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 7/5/05, David Ho <[hidden email]> wrote:
> > > This is the second confirmed report of the same problem on the ppc8xx.
> > >
> > > After reading my email.  I must say I was the unfriendly one, I
> > > apologize for that.
> > >
> > > More debugging evidence to come.
> > >
> > > ---------- Forwarded message ----------
> > > From: Murch, Christopher <[hidden email]>
> > > Date: Jul 1, 2005 9:46 AM
> > > Subject: RE: PPC bn_div_words routine rewrite
> > > To: David Ho <[hidden email]>
> > >
> > >
> > > David,
> > > I had observed the same issue on ppc 8xx machines after upgrading to the asm
> > > version of the BN routines.  Thank you very much for your work for the fix.
> > > My question is, do you have high confidence in the other new asm ppc BN
> > > routines after observing this issue or do you think they might have similiar
> > > problems?
> > > Thanks.
> > > Chris
> > >
> > > -----Original Message-----
> > > From: David Ho [mailto:[hidden email]]
> > > Sent: Thursday, June 30, 2005 6:22 PM
> > > To: [hidden email]; [hidden email]
> > > Subject: Re: PPC bn_div_words routine rewrite
> > >
> > >
> > > The reason I had to redo this routine, in case anyone is wondering, is
> > > because ssh-keygen  segfaults when this assembly routine returns junk
> > > to the BN_div_word function. On a ppc, if you issue the command
> > >
> > > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
> > >
> > > The program craps out when it tries to write the public key in ascii
> > > decimal.
> > >
> > > Regards,
> > > David
> > >
> > > On 6/30/05, David Ho <[hidden email]> wrote:
> > > > Hi all,
> > > >
> > > > This is a rewrite of the bn_div_words routine for the PowerPC arch,
> > > > tested on a MPC8xx processor.
> > > > I initially thought there is maybe a small mistake in the code that
> > > > requires a one-liner change but it turns out I have to redo the
> > > > routine.
> > > > I guess this routine is not called very often as I see that most other
> > > > routines are hand-crafted, whereas this routine is compiled from a C
> > > > function that apparently has not gone through a whole lot of testing.
> > > >
> > > > I wrote a C function to confirm correctness of the code.
> > > >
> > > > unsigned long div_words (unsigned long h,
> > > >                          unsigned long l,
> > > >                          unsigned long d)
> > > > {
> > > >   unsigned long i_h; /* intermediate dividend */
> > > >   unsigned long i_q; /* quotient of i_h/d */
> > > >   unsigned long i_r; /* remainder of i_h/d */
> > > >
> > > >   unsigned long i_cntr;
> > > >   unsigned long i_carry;
> > > >
> > > >   unsigned long ret_q; /* return quotient */
> > > >
> > > >   /* cannot divide by zero */
> > > >   if (d == 0) return 0xffffffff;
> > > >
> > > >   /* do simple 32-bit divide */
> > > >   if (h == 0) return l/d;
> > > >
> > > >   i_q = h/d;
> > > >   i_r = h - (i_q*d);
> > > >   ret_q = i_q;
> > > >
> > > >   i_cntr = 32;
> > > >
> > > >   while (i_cntr--)
> > > >   {
> > > >     i_carry = (l & 0x80000000) ? 1:0;
> > > >     l = l << 1;
> > > >
> > > >     i_h = (i_r << 1) | i_carry;
> > > >     i_q = i_h/d;
> > > >     i_r = i_h - (i_q*d);
> > > >
> > > >     ret_q = (ret_q << 1) | i_q;
> > > >   }
> > > >
> > > >   return ret_q;
> > > > }
> > > >
> > > >
> > > > Then I handcrafted the routine in PPC assembly.
> > > > The result is a 26 line assembly that is easy to understand and
> > > > predictable as opposed to a 81liner that I am still trying to
> > > > decipher...
> > > > If anyone is interested in incorporating this routine to the openssl
> > > > code I'll be happy to assist.
> > > > At this point I think I will be taking a bit of a break from this 3
> > > > day debugging/fixing marathon.
> > > >
> > > > Regards,
> > > > David Ho
> > > >
> > > >
> > > > #
> > > > #       Handcrafted version of bn_div_words
> > > > #
> > > > #       r3 = h
> > > > #       r4 = l
> > > > #       r5 = d
> > > >
> > > >         cmplwi  0,r5,0                  # compare r5 and 0
> > > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
> > > >         li      r3,-1                   # d=0 return -1
> > > >         bclr    BO_ALWAYS,CR0_LT
> > > > .Lppcasm_div1:
> > > >         cmplwi  0,r3,0                  # compare r3 and 0
> > > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
> > > >         divwu   r3,r4,r5                # ret_q = l/d
> > > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > > .Lppcasm_div2:
> > > >         divwu   r9,r3,r5                # i_q = h/d
> > > >         mullw   r10,r9,r5               # i_r = h - (i_q*d)
> > > >         subf    r10,r10,r3
> > > >         mr      r3,r9                   # req_q = i_q
> > > > .Lppcasm_set_ctr:
> > > >         li      r12,32                  # ctr = bitsizeof(d)
> > > >         mtctr   r12
> > > > .Lppcasm_div_loop:
> > > >         addc    r4,r4,r4                # l = l << 1 -> i_carry
> > > >         adde    r11,r10,r10             # i_h = (i_r << 1) | i_carry
> > > >         divwu   r9,r11,r5               # i_q = i_h/d
> > > >         mullw   r10,r9,r5               # i_r = i_h - (i_q*d)
> > > >         subf    r10,r10,r11
> > > >         add     r3,r3,r3                # ret_q = ret_q << 1 | i_q
> > > >         add     r3,r3,r9
> > > >         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> > > > .Lppc_div_end:
> > > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > >         .long   0x00000000
> > > >
> > > _______________________________________________
> > > Linuxppc-embedded mailing list
> > > [hidden email]
> > > https://ozlabs.org/mailman/listinfo/linuxppc-embedded
> > >
> >
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

Andy Polyakov
> Let's take first call to BN_div_word for example from BN_bn2dec, the
> parameter being passed to BN_div_word is (a=35, w=1000000000) (decimal
> numbers).  It then calls the bn_div_words with (h=0, l=35,
> d=1000000000)  if you examine the code in linux_ppc32.s it will exit
> early on because h is 0.  the routine returns a divide by 0,  which is
> undefined according to the manual.  In the case of ppc8xx the result
> is 0x80000000.

And on the PPC machine I have access to it returns 0. This is
explanation why I never experienced any SEGV, but a sparse decimal
output. And it does explain why BN_is_zero condition never met on your
system and you hit sbrk(0) limit and suffer the penalty. However! Note
that updated routine,
http://cvs.openssl.org/getfile/openssl/crypto/bn/asm/ppc.pl?v=1.4 never
issues divide by 0 [it traps instead, but the condition is never met now
when called from BN_div_words] and it does return correct answer to me.
Can you really confirm that updated subroutine doesn't work for you? And
if so, how does problem manifest? Still SEGV? At same point?

It should pointed out that bug in ppc.pl is encountered only in 0.9.7
context, as 0.9.8 avoids it by normalizing divisor [and adjusting
dividend accordingly]. BTW, I can confirm that 0.9.7 produces correct
decimal ASCII with your routine [but no luck with make test_bn], but in
0.9.8 context decimal printout comes out truncated [not sparse with some
significant digits there and there, but truncated] if your routine is
pasted in. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

David Ho-2
In reply to this post by David Ho-2
Okay, having actually did what Andy suggested, i.e. the one liner fix
in the assembly code, bn_div_words returns the correct results.

At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
bn_div_words routine generated from crypto/bn/ppc.pl is still busted.
Your solution is either to use the one liner fix andy provided in his
reply or my routine.  My solution is a complete rewrite of the
routine, which is tested to work on ppc8xx.  I personally do not know
why it would not work on other ppc32 systems but if you have time to
spare you are welcome to give it a try on your ppc32, because it is by
far the most straight-forward algorithm for division.  If you have
done long division on paper from your primary school days, this
function mimics exactly the long division steps in binary (as opposed
to decimal).  Way easier to fix if you know what the algorithm is
trying to do.

Comments I have for the old routine are:

Why do you signal an overflow condition when it appears functions that
call bn_div_words do not check for overflow conditions?  That's why in
my routine I just let the register overflow.

Why the complexity, does it offer performance advantage?  Obviously
code size is huge compared to mine.

David










On 7/5/05, David Ho <[hidden email]> wrote:

> Let's take first call to BN_div_word for example from BN_bn2dec, the
> parameter being passed to BN_div_word is (a=35, w=1000000000) (decimal
> numbers).  It then calls the bn_div_words with (h=0, l=35,
> d=1000000000)  if you examine the code in linux_ppc32.s it will exit
> early on because h is 0.  the routine returns a divide by 0, which is
> undefined according to the manual.  In the case of ppc8xx the result
> is 0x80000000.  So this is the return value from bn_div_words, as seen
> in register R3.
>
> So what happens next is BN_div_word modifies "a" (1st parameter) with
> the result (0x80000000) and returns 23 as the remainder of the
> division. So "a" is never zero as a result and hence the test for
> BN_is_zero is always false.  The problem fails the very first time it
> uses bn_div_words.
>
> The next thing I did naturally was to fix the case when you have h=0,
> which you can quite easy do it with the native divwu instruction.  Lo
> and behold I was once again disappointed when h is not equal to 0.
>
> More to come...
>
>
> On 7/5/05, David Ho <[hidden email]> wrote:
> > I can tell you with certainty, with reference to the function
> > BN_bn2dec, that since lp is a pointer, and within the while loop
> > around bn_print.c:136 lp is being incremented.  Because the test
> > BN_is_zero(t) is always false, you have a pointer that is going off
> > into the stratosphere, hence the segfault on ppc8xx.
> >
> > More analysis to come.
> >
> > On 7/5/05, David Ho <[hidden email]> wrote:
> > > First pass debugging results from gdb on ppc8xx.  Executing ssh-keygen
> > > with following arguments.
> > >
> > > (gdb) show args
> > > Argument list to give program being debugged when it is started is
> > >     "-t rsa1 -f /etc/ssh/ssh_host_key -N """.
> > >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > BN_bn2dec (a=0x1002d9f0) at bn_print.c:136
> > > 136                             *lp=BN_div_word(t,BN_DEC_CONV);
> > >
> > > (gdb) i r
> > > r0             0x0      0
> > > r1             0x7fffd580       2147472768
> > > r2             0x30012868       805382248
> > > r3             0x80000000       2147483648
> > > r4             0xfef33fc        267334652
> > > r5             0x25     37
> > > r6             0xfccdef8        265084664
> > > r7             0x7fffd4c0       2147472576
> > > r8             0xfbad2887       4222429319
> > > r9             0x84044022       2214871074
> > > r10            0x0      0
> > > r11            0x2      2
> > > r12            0xfef2054        267329620
> > > r13            0x10030bc8       268635080
> > > r14            0x0      0
> > > r15            0x0      0
> > > r16            0x0      0
> > > r17            0x0      0
> > > r18            0x0      0
> > > r19            0x0      0
> > > r20            0x0      0
> > > r21            0x0      0
> > > r22            0x0      0
> > > r23            0x64     100
> > > r24            0x5      5
> > > r25            0x1002d438       268620856
> > > r26            0x1002d9f0       268622320
> > > r27            0x1002c578       268617080
> > > r28            0x1      1
> > > r29            0x10031000       268636160
> > > r30            0xffbf7d0        268171216
> > > r31            0x1002d9f0       268622320
> > > pc             0xfef2058        267329624
> > > ps             0xd032   53298
> > > cr             0x24044022       604258338
> > > lr             0xfef2054        267329620
> > > ctr            0xfccefa0        265088928
> > > xer            0x20000000       536870912
> > > fpscr          0x0      0
> > > vscr           0x0      0
> > > vrsave         0x0      0
> > >
> > > (gdb) p/x $pc
> > > $1 = 0xfef2058
> > >
> > > 0x0fef2058 <BN_bn2dec+472>:     stw     r3,0(r29)
> > >
> > > (gdb) x 0x10031000
> > > 0x10031000:     Cannot access memory at address 0x10031000
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 7/5/05, David Ho <[hidden email]> wrote:
> > > > This is the second confirmed report of the same problem on the ppc8xx.
> > > >
> > > > After reading my email.  I must say I was the unfriendly one, I
> > > > apologize for that.
> > > >
> > > > More debugging evidence to come.
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Murch, Christopher <[hidden email]>
> > > > Date: Jul 1, 2005 9:46 AM
> > > > Subject: RE: PPC bn_div_words routine rewrite
> > > > To: David Ho <[hidden email]>
> > > >
> > > >
> > > > David,
> > > > I had observed the same issue on ppc 8xx machines after upgrading to the asm
> > > > version of the BN routines.  Thank you very much for your work for the fix.
> > > > My question is, do you have high confidence in the other new asm ppc BN
> > > > routines after observing this issue or do you think they might have similiar
> > > > problems?
> > > > Thanks.
> > > > Chris
> > > >
> > > > -----Original Message-----
> > > > From: David Ho [mailto:[hidden email]]
> > > > Sent: Thursday, June 30, 2005 6:22 PM
> > > > To: [hidden email]; [hidden email]
> > > > Subject: Re: PPC bn_div_words routine rewrite
> > > >
> > > >
> > > > The reason I had to redo this routine, in case anyone is wondering, is
> > > > because ssh-keygen  segfaults when this assembly routine returns junk
> > > > to the BN_div_word function. On a ppc, if you issue the command
> > > >
> > > > ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
> > > >
> > > > The program craps out when it tries to write the public key in ascii
> > > > decimal.
> > > >
> > > > Regards,
> > > > David
> > > >
> > > > On 6/30/05, David Ho <[hidden email]> wrote:
> > > > > Hi all,
> > > > >
> > > > > This is a rewrite of the bn_div_words routine for the PowerPC arch,
> > > > > tested on a MPC8xx processor.
> > > > > I initially thought there is maybe a small mistake in the code that
> > > > > requires a one-liner change but it turns out I have to redo the
> > > > > routine.
> > > > > I guess this routine is not called very often as I see that most other
> > > > > routines are hand-crafted, whereas this routine is compiled from a C
> > > > > function that apparently has not gone through a whole lot of testing.
> > > > >
> > > > > I wrote a C function to confirm correctness of the code.
> > > > >
> > > > > unsigned long div_words (unsigned long h,
> > > > >                          unsigned long l,
> > > > >                          unsigned long d)
> > > > > {
> > > > >   unsigned long i_h; /* intermediate dividend */
> > > > >   unsigned long i_q; /* quotient of i_h/d */
> > > > >   unsigned long i_r; /* remainder of i_h/d */
> > > > >
> > > > >   unsigned long i_cntr;
> > > > >   unsigned long i_carry;
> > > > >
> > > > >   unsigned long ret_q; /* return quotient */
> > > > >
> > > > >   /* cannot divide by zero */
> > > > >   if (d == 0) return 0xffffffff;
> > > > >
> > > > >   /* do simple 32-bit divide */
> > > > >   if (h == 0) return l/d;
> > > > >
> > > > >   i_q = h/d;
> > > > >   i_r = h - (i_q*d);
> > > > >   ret_q = i_q;
> > > > >
> > > > >   i_cntr = 32;
> > > > >
> > > > >   while (i_cntr--)
> > > > >   {
> > > > >     i_carry = (l & 0x80000000) ? 1:0;
> > > > >     l = l << 1;
> > > > >
> > > > >     i_h = (i_r << 1) | i_carry;
> > > > >     i_q = i_h/d;
> > > > >     i_r = i_h - (i_q*d);
> > > > >
> > > > >     ret_q = (ret_q << 1) | i_q;
> > > > >   }
> > > > >
> > > > >   return ret_q;
> > > > > }
> > > > >
> > > > >
> > > > > Then I handcrafted the routine in PPC assembly.
> > > > > The result is a 26 line assembly that is easy to understand and
> > > > > predictable as opposed to a 81liner that I am still trying to
> > > > > decipher...
> > > > > If anyone is interested in incorporating this routine to the openssl
> > > > > code I'll be happy to assist.
> > > > > At this point I think I will be taking a bit of a break from this 3
> > > > > day debugging/fixing marathon.
> > > > >
> > > > > Regards,
> > > > > David Ho
> > > > >
> > > > >
> > > > > #
> > > > > #       Handcrafted version of bn_div_words
> > > > > #
> > > > > #       r3 = h
> > > > > #       r4 = l
> > > > > #       r5 = d
> > > > >
> > > > >         cmplwi  0,r5,0                  # compare r5 and 0
> > > > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
> > > > >         li      r3,-1                   # d=0 return -1
> > > > >         bclr    BO_ALWAYS,CR0_LT
> > > > > .Lppcasm_div1:
> > > > >         cmplwi  0,r3,0                  # compare r3 and 0
> > > > >         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
> > > > >         divwu   r3,r4,r5                # ret_q = l/d
> > > > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > > > .Lppcasm_div2:
> > > > >         divwu   r9,r3,r5                # i_q = h/d
> > > > >         mullw   r10,r9,r5               # i_r = h - (i_q*d)
> > > > >         subf    r10,r10,r3
> > > > >         mr      r3,r9                   # req_q = i_q
> > > > > .Lppcasm_set_ctr:
> > > > >         li      r12,32                  # ctr = bitsizeof(d)
> > > > >         mtctr   r12
> > > > > .Lppcasm_div_loop:
> > > > >         addc    r4,r4,r4                # l = l << 1 -> i_carry
> > > > >         adde    r11,r10,r10             # i_h = (i_r << 1) | i_carry
> > > > >         divwu   r9,r11,r5               # i_q = i_h/d
> > > > >         mullw   r10,r9,r5               # i_r = i_h - (i_q*d)
> > > > >         subf    r10,r10,r11
> > > > >         add     r3,r3,r3                # ret_q = ret_q << 1 | i_q
> > > > >         add     r3,r3,r9
> > > > >         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> > > > > .Lppc_div_end:
> > > > >         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> > > > >         .long   0x00000000
> > > > >
> > > > _______________________________________________
> > > > Linuxppc-embedded mailing list
> > > > [hidden email]
> > > > https://ozlabs.org/mailman/listinfo/linuxppc-embedded
> > > >
> > >
> >
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

Andy Polyakov
> Okay, having actually did what Andy suggested, i.e. the one liner fix
> in the assembly code, bn_div_words returns the correct results.

Note that the final version, one committed to all relevant OpenSSL
branches since couple of days ago and one which actually made to just
released 0.9.8, is a bit different from originally suggested one-line
fix, see for example http://cvs.openssl.org/chngview?cn=14199.

> At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
> bn_div_words routine generated from crypto/bn/ppc.pl is still busted.

Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the
bug condition. Recall that original problem report was for 0.9.7.

> Why do you signal an overflow condition when it appears functions that
> call bn_div_words do not check for overflow conditions?

That's question to IBM. By the time they submitted the code, I've
explicitly asked what would be appropriate way to generate *fatal*
condition at that point, i.e. one which would result in a core dump, and
it came out as division by 0 instruction. By that time I had no access
to any PPC machine and had to just go with it. Now it actually came as
surprise that division by 0 does not raise an exception, but silently
returns implementation-specific value... A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

debug version of 0.9.8

David Ho-2
How do I build a debug version of 0.9.8?

David

On 7/5/05, Andy Polyakov <[hidden email]> wrote:

> > Okay, having actually did what Andy suggested, i.e. the one liner fix
> > in the assembly code, bn_div_words returns the correct results.
>
> Note that the final version, one committed to all relevant OpenSSL
> branches since couple of days ago and one which actually made to just
> released 0.9.8, is a bit different from originally suggested one-line
> fix, see for example http://cvs.openssl.org/chngview?cn=14199.
>
> > At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
> > bn_div_words routine generated from crypto/bn/ppc.pl is still busted.
>
> Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the
> bug condition. Recall that original problem report was for 0.9.7.
>
> > Why do you signal an overflow condition when it appears functions that
> > call bn_div_words do not check for overflow conditions?
>
> That's question to IBM. By the time they submitted the code, I've
> explicitly asked what would be appropriate way to generate *fatal*
> condition at that point, i.e. one which would result in a core dump, and
> it came out as division by 0 instruction. By that time I had no access
> to any PPC machine and had to just go with it. Now it actually came as
> surprise that division by 0 does not raise an exception, but silently
> returns implementation-specific value... A.
> _______________________________________________
> Linuxppc-embedded mailing list
> [hidden email]
> https://ozlabs.org/mailman/listinfo/linuxppc-embedded
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: debug version of 0.9.8

David Ho-2
My configure script

./Configure shared debug \
        --prefix=/usr \
        --openssldir=/usr/share/ssl \
        linux-ppc

no longer works on 0.9.8

David

On 7/6/05, David Ho <[hidden email]> wrote:

> How do I build a debug version of 0.9.8?
>
> David
>
> On 7/5/05, Andy Polyakov <[hidden email]> wrote:
> > > Okay, having actually did what Andy suggested, i.e. the one liner fix
> > > in the assembly code, bn_div_words returns the correct results.
> >
> > Note that the final version, one committed to all relevant OpenSSL
> > branches since couple of days ago and one which actually made to just
> > released 0.9.8, is a bit different from originally suggested one-line
> > fix, see for example http://cvs.openssl.org/chngview?cn=14199.
> >
> > > At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
> > > bn_div_words routine generated from crypto/bn/ppc.pl is still busted.
> >
> > Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the
> > bug condition. Recall that original problem report was for 0.9.7.
> >
> > > Why do you signal an overflow condition when it appears functions that
> > > call bn_div_words do not check for overflow conditions?
> >
> > That's question to IBM. By the time they submitted the code, I've
> > explicitly asked what would be appropriate way to generate *fatal*
> > condition at that point, i.e. one which would result in a core dump, and
> > it came out as division by 0 instruction. By that time I had no access
> > to any PPC machine and had to just go with it. Now it actually came as
> > surprise that division by 0 does not raise an exception, but silently
> > returns implementation-specific value... A.
> > _______________________________________________
> > Linuxppc-embedded mailing list
> > [hidden email]
> > https://ozlabs.org/mailman/listinfo/linuxppc-embedded
> >
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: debug version of 0.9.8

David Ho-2
Once again, for the newcomers who has no choice but to debug openssl
for whatever reasons.  ppc32 assumed.

./Configure -g shared [linux-ppc]

will get you a debug version of 0.9.8.  Correct me if I am wrong.

David


On 7/6/05, David Ho <[hidden email]> wrote:

> My configure script
>
> ./Configure shared debug \
>         --prefix=/usr \
>         --openssldir=/usr/share/ssl \
>         linux-ppc
>
> no longer works on 0.9.8
>
> David
>
> On 7/6/05, David Ho <[hidden email]> wrote:
> > How do I build a debug version of 0.9.8?
> >
> > David
> >
> > On 7/5/05, Andy Polyakov <[hidden email]> wrote:
> > > > Okay, having actually did what Andy suggested, i.e. the one liner fix
> > > > in the assembly code, bn_div_words returns the correct results.
> > >
> > > Note that the final version, one committed to all relevant OpenSSL
> > > branches since couple of days ago and one which actually made to just
> > > released 0.9.8, is a bit different from originally suggested one-line
> > > fix, see for example http://cvs.openssl.org/chngview?cn=14199.
> > >
> > > > At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
> > > > bn_div_words routine generated from crypto/bn/ppc.pl is still busted.
> > >
> > > Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the
> > > bug condition. Recall that original problem report was for 0.9.7.
> > >
> > > > Why do you signal an overflow condition when it appears functions that
> > > > call bn_div_words do not check for overflow conditions?
> > >
> > > That's question to IBM. By the time they submitted the code, I've
> > > explicitly asked what would be appropriate way to generate *fatal*
> > > condition at that point, i.e. one which would result in a core dump, and
> > > it came out as division by 0 instruction. By that time I had no access
> > > to any PPC machine and had to just go with it. Now it actually came as
> > > surprise that division by 0 does not raise an exception, but silently
> > > returns implementation-specific value... A.
> > > _______________________________________________
> > > Linuxppc-embedded mailing list
> > > [hidden email]
> > > https://ozlabs.org/mailman/listinfo/linuxppc-embedded
> > >
> >
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: debug version of 0.9.8

Andy Polyakov
To: <[hidden email]>
Cc: <[hidden email]>

By addressing me personally you discourage others from replying and
openssl-dev is *more* than enough.

> Once again, for the newcomers who has no choice but to debug openssl
> for whatever reasons.  ppc32 assumed.
>
> ./Configure -g shared [linux-ppc]
>
> will get you a debug version of 0.9.8.  Correct me if I am wrong.

./config -g [shared] or ./Configure linux-ppc -g [shared]. I mean you
can't say that linux-ppc is optional with ./Configure by placing it in
square brackets, as it shall fail without platform identification line. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

David Ho-2
In reply to this post by Andy Polyakov
Please do not use previously mentioned routine, it missed 1 corner
case where 32=num_bits_word(d)

Revised routine that passes (cd test; make bntest).  
All I had to do is add one more instruction to the routine.

Please test on your ppc32 machines.

Once we are all happy, it's a matter of adding the core dump at the beginning.  
Thus you have a fast, easy to understand, predictable bn_div_words, as
opposed to that monster in 0.9.8.

#
#       Handcrafted version of bn_div_words
#
#       r3 = h
#       r4 = l
#       r5 = d

        cmplwi  0,r5,0                  # compare r5 and 0
        bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
        li      r3,-1                   # d=0 return -1
        bclr    BO_ALWAYS,CR0_LT
.Lppcasm_div1:
        cmplwi  0,r3,0                  # compare r3 and 0
        bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
        divwu   r3,r4,r5                # ret_q = l/d
        bclr    BO_ALWAYS,CR0_LT        # return result in r3
.Lppcasm_div2:
        divwu   r9,r3,r5                # i_q = h/d
        mullw   r10,r9,r5               # i_r = h - (i_q*d)
        subf    r10,r10,r3
        mr      r3,r9                   # req_q = i_q
.Lppcasm_set_ctr:
        li      r12,32                  # ctr = bitsizeof(d)
        mtctr   r12
.Lppcasm_div_loop:
        addc    r4,r4,r4                # l = l << 1 -> i_carry
        adde    r11,r10,r10             # i_h = (i_r << 1) | i_carry
        divwu   r9,r11,r5               # i_q = i_h/d
        addze   r9,r9                   # very important! - DKWH
        mullw   r10,r9,r5               # i_r = i_h - (i_q*d)
        subf    r10,r10,r11
        add     r3,r3,r3                # ret_q = ret_q << 1 | i_q
        add     r3,r3,r9
        bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
.Lppc_div_end:
        bclr    BO_ALWAYS,CR0_LT        # return result in r3
        .long   0x00000000


Regards,
David


On 7/5/05, Peter Waltenberg <[hidden email]> wrote:

>  
> Thanks for finding and fixing this.  Particularly for finding and fixing it
> before 0.9.8 hit the streets.
>  
> Peter
>  
> Peter Waltenberg
>  Architect
>  IBM Crypto for C Team
>  IBM/Tivoli Gold Coast Office
>  
>  
>  
>  
>  Andy Polyakov <[hidden email]>
> Sent by: [hidden email]
>
> 06/07/2005 07:49 AM
>  
> Please respond to
>  openssl-dev
>  
>  
> To [hidden email]
>  
> cc [hidden email]
>  
> Subject Re: PPC bn_div_words routine rewrite
>  
>  
>  
>  
>  
> > Okay, having actually did what Andy suggested, i.e. the one liner fix
>  > in the assembly code, bn_div_words returns the correct results.
>  
>  Note that the final version, one committed to all relevant OpenSSL
>  branches since couple of days ago and one which actually made to just
>  released 0.9.8, is a bit different from originally suggested one-line
>  fix, see for example
> http://cvs.openssl.org/chngview?cn=14199.
>  
>  > At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
>  > bn_div_words routine generated from crypto/bn/ppc.pl is still busted.
>  
>  Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the
>  bug condition. Recall that original problem report was for 0.9.7.
>  
>  > Why do you signal an overflow condition when it appears functions that
>  > call bn_div_words do not check for overflow conditions?
>  
>  That's question to IBM. By the time they submitted the code, I've
>  explicitly asked what would be appropriate way to generate *fatal*
>  condition at that point, i.e. one which would result in a core dump, and
>  it came out as division by 0 instruction. By that time I had no access
>  to any PPC machine and had to just go with it. Now it actually came as
>  surprise that division by 0 does not raise an exception, but silently
>  returns implementation-specific value... A.
> ______________________________________________________________________
>  OpenSSL Project                                 http://www.openssl.org
>  Development Mailing List                       [hidden email]
>  Automated List Manager                           [hidden email]
>  
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

David Ho-2
C function corresponding to assembly routine below.
It's provided to ease review of the assembly.
Other architectures will benefit if this C function is used in bn_asm.c

Regards,
David

unsigned long div_words (unsigned long h,
                         unsigned long l,
                         unsigned long d)
{
  unsigned long i_h; /* intermediate dividend */
  unsigned long i_q; /* quotient of i/d */
  unsigned long i_r; /* remainder of i/d */

  unsigned long i_cntr;
  unsigned long i_carry;
  unsigned long i_overflow;

  unsigned long ret_q; /* return quotient */

  /* cannot divide by zero */
  if (d == 0) return 0xffffffff;

  /* do simple 32-bit divide */
  if (h == 0) return l/d;
     
  i_q = h/d;
  i_r = h - (i_q*d);
  ret_q = i_q;

  i_cntr = 32;

  while (i_cntr--)
  {
    i_carry = (l & 0x80000000) ? 1:0;
    l = l << 1;

    i_overflow = (i_r & 0x80000000) ? 1:0;
    i_h = (i_r << 1) | i_carry;
    i_q = i_h/d;
    i_q = i_q + i_overflow;
    i_s = i_q*d;
    i_r = i_h - (i_q*d);

    ret_q = (ret_q << 1) | i_q;

  }

  return ret_q;
}

On 7/7/05, David Ho <[hidden email]> wrote:

> Please do not use previously mentioned routine, it missed 1 corner
> case where 32=num_bits_word(d)
>
> Revised routine that passes (cd test; make bntest).
> All I had to do is add one more instruction to the routine.
>
> Please test on your ppc32 machines.
>
> Once we are all happy, it's a matter of adding the core dump at the beginning.
> Thus you have a fast, easy to understand, predictable bn_div_words, as
> opposed to that monster in 0.9.8.
>
> #
> #       Handcrafted version of bn_div_words
> #
> #       r3 = h
> #       r4 = l
> #       r5 = d
>
>         cmplwi  0,r5,0                  # compare r5 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
>         li      r3,-1                   # d=0 return -1
>         bclr    BO_ALWAYS,CR0_LT
> .Lppcasm_div1:
>         cmplwi  0,r3,0                  # compare r3 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
>         divwu   r3,r4,r5                # ret_q = l/d
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> .Lppcasm_div2:
>         divwu   r9,r3,r5                # i_q = h/d
>         mullw   r10,r9,r5               # i_r = h - (i_q*d)
>         subf    r10,r10,r3
>         mr      r3,r9                   # req_q = i_q
> .Lppcasm_set_ctr:
>         li      r12,32                  # ctr = bitsizeof(d)
>         mtctr   r12
> .Lppcasm_div_loop:
>         addc    r4,r4,r4                # l = l << 1 -> i_carry
>         adde    r11,r10,r10             # i_h = (i_r << 1) | i_carry
>         divwu   r9,r11,r5               # i_q = i_h/d
>         addze   r9,r9                   # very important! - DKWH
>         mullw   r10,r9,r5               # i_r = i_h - (i_q*d)
>         subf    r10,r10,r11
>         add     r3,r3,r3                # ret_q = ret_q << 1 | i_q
>         add     r3,r3,r9
>         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> .Lppc_div_end:
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
>         .long   0x00000000
>
>
> Regards,
> David
>
>
> On 7/5/05, Peter Waltenberg <[hidden email]> wrote:
> >
> > Thanks for finding and fixing this.  Particularly for finding and fixing it
> > before 0.9.8 hit the streets.
> >
> > Peter
> >
> > Peter Waltenberg
> >  Architect
> >  IBM Crypto for C Team
> >  IBM/Tivoli Gold Coast Office
> >
> >
> >
> >
> >  Andy Polyakov <[hidden email]>
> > Sent by: [hidden email]
> >
> > 06/07/2005 07:49 AM
> >
> > Please respond to
> >  openssl-dev
> >
> >
> > To [hidden email]
> >
> > cc [hidden email]
> >
> > Subject Re: PPC bn_div_words routine rewrite
> >
> >
> >
> >
> >
> > > Okay, having actually did what Andy suggested, i.e. the one liner fix
> >  > in the assembly code, bn_div_words returns the correct results.
> >
> >  Note that the final version, one committed to all relevant OpenSSL
> >  branches since couple of days ago and one which actually made to just
> >  released 0.9.8, is a bit different from originally suggested one-line
> >  fix, see for example
> > http://cvs.openssl.org/chngview?cn=14199.
> >
> >  > At this point, my conclusion is, up to openssl-0.9.8-beta6,  the ppc32
> >  > bn_div_words routine generated from crypto/bn/ppc.pl is still busted.
> >
> >  Yes. Though it should be noted that 0.9.8 was inadvertently avoiding the
> >  bug condition. Recall that original problem report was for 0.9.7.
> >
> >  > Why do you signal an overflow condition when it appears functions that
> >  > call bn_div_words do not check for overflow conditions?
> >
> >  That's question to IBM. By the time they submitted the code, I've
> >  explicitly asked what would be appropriate way to generate *fatal*
> >  condition at that point, i.e. one which would result in a core dump, and
> >  it came out as division by 0 instruction. By that time I had no access
> >  to any PPC machine and had to just go with it. Now it actually came as
> >  surprise that division by 0 does not raise an exception, but silently
> >  returns implementation-specific value... A.
> > ______________________________________________________________________
> >  OpenSSL Project                                 http://www.openssl.org
> >  Development Mailing List                       [hidden email]
> >  Automated List Manager                           [hidden email]
> >
> >
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

Andy Polyakov
In reply to this post by David Ho-2
> Please do not use previously mentioned routine, it missed 1 corner
> case where 32=num_bits_word(d)
>
> Revised routine that passes (cd test; make bntest).  

Does it mean that previous version didn't actually pass the test? I mean
if it did on your CPU, but not mine, probably we could learn something
else about ways PPC can be implemented...

> All I had to do is add one more instruction to the routine.
>
> Please test on your ppc32 machines.
>
> Once we are all happy,

Is this your agenda? Make everybody happy:-):-):-) Good luck:-):-):-)

> it's a matter of adding the core dump at the beginning.  
> Thus you have a fast,

32*(div latency + mul latency) is fast? If I call BN_bn2dec in loop it
spins 4 times slower than with current implementation. Well, at least on
computer I have access to...

> easy to understand, predictable bn_div_words, as
> opposed to that monster in 0.9.8.

Hostility again? Are you saying that nobody understands current
implementation and that it produces unpredictable results? I disagree:-)

> Other architectures will benefit if this C function is used in bn_asm.c

How? And which architectures exactly? Virtually all 32-bit
architectures, including PPC32, opt for
(BN_ULONG)(((((BN_ULLONG)h)<<BN_BITS2)|l)/(BN_ULLONG)d). A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

David Ho-2
Forgive my lack of knowledge in your existing code.  But it is really
designed with optimization in mind?  What was the driving force for
the C function?

If it is optimized what is the time required?

I jumped way to early at the "fast" conclusion I must admit. Because I
really never had speed in mind.  As I explained my goal is to make it
easy to understand.  If it has any performance advantage it is purely
a side effect. (You never answer my comment about performance in my
last email so I can only guess what the design intent was for you
code).

I mean if you choose to optimize my code for speed, it's perfectly
doable and I have full comfidence anyone else who have read this email
thread can do it.  But again, I have no idea how much time you spend
on your routine so I guess I should refrain from dissing it.  My
mistake once again.

What else will you be teaching me today? =)

David

On 7/8/05, Andy Polyakov <[hidden email]> wrote:

> > Please do not use previously mentioned routine, it missed 1 corner
> > case where 32=num_bits_word(d)
> >
> > Revised routine that passes (cd test; make bntest).
>
> Does it mean that previous version didn't actually pass the test? I mean
> if it did on your CPU, but not mine, probably we could learn something
> else about ways PPC can be implemented...
>
> > All I had to do is add one more instruction to the routine.
> >
> > Please test on your ppc32 machines.
> >
> > Once we are all happy,
>
> Is this your agenda? Make everybody happy:-):-):-) Good luck:-):-):-)
>
> > it's a matter of adding the core dump at the beginning.
> > Thus you have a fast,
>
> 32*(div latency + mul latency) is fast? If I call BN_bn2dec in loop it
> spins 4 times slower than with current implementation. Well, at least on
> computer I have access to...
>
> > easy to understand, predictable bn_div_words, as
> > opposed to that monster in 0.9.8.
>
> Hostility again? Are you saying that nobody understands current
> implementation and that it produces unpredictable results? I disagree:-)
>
> > Other architectures will benefit if this C function is used in bn_asm.c
>
> How? And which architectures exactly? Virtually all 32-bit
> architectures, including PPC32, opt for
> (BN_ULONG)(((((BN_ULLONG)h)<<BN_BITS2)|l)/(BN_ULLONG)d). A.
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> Development Mailing List                       [hidden email]
> Automated List Manager                           [hidden email]
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]