PPC bn_div_words routine rewrite

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

PPC bn_div_words routine rewrite

David Ho-2
Hi all,

This is a rewrite of the bn_div_words routine for the PowerPC arch,
tested on a MPC8xx processor.
I initially thought there is maybe a small mistake in the code that
requires a one-liner change but it turns out I have to redo the
routine.
I guess this routine is not called very often as I see that most other
routines are hand-crafted, whereas this routine is compiled from a C
function that apparently has not gone through a whole lot of testing.

I wrote a C function to confirm correctness of the code.

unsigned long div_words (unsigned long h,
                         unsigned long l,
                         unsigned long d)
{
  unsigned long i_h; /* intermediate dividend */
  unsigned long i_q; /* quotient of i_h/d */
  unsigned long i_r; /* remainder of i_h/d */

  unsigned long i_cntr;
  unsigned long i_carry;

  unsigned long ret_q; /* return quotient */

  /* cannot divide by zero */
  if (d == 0) return 0xffffffff;

  /* do simple 32-bit divide */
  if (h == 0) return l/d;
     
  i_q = h/d;
  i_r = h - (i_q*d);
  ret_q = i_q;

  i_cntr = 32;

  while (i_cntr--)
  {
    i_carry = (l & 0x80000000) ? 1:0;
    l = l << 1;

    i_h = (i_r << 1) | i_carry;
    i_q = i_h/d;
    i_r = i_h - (i_q*d);

    ret_q = (ret_q << 1) | i_q;
  }

  return ret_q;
}


Then I handcrafted the routine in PPC assembly.
The result is a 26 line assembly that is easy to understand and
predictable as opposed to a 81liner that I am still trying to
decipher...
If anyone is interested in incorporating this routine to the openssl
code I'll be happy to assist.
At this point I think I will be taking a bit of a break from this 3
day debugging/fixing marathon.

Regards,
David Ho


#
#       Handcrafted version of bn_div_words
#
#       r3 = h
#       r4 = l
#       r5 = d

        cmplwi  0,r5,0                  # compare r5 and 0
        bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
        li      r3,-1                   # d=0 return -1
        bclr    BO_ALWAYS,CR0_LT
.Lppcasm_div1:
        cmplwi  0,r3,0                  # compare r3 and 0
        bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
        divwu   r3,r4,r5                # ret_q = l/d
        bclr    BO_ALWAYS,CR0_LT        # return result in r3
.Lppcasm_div2:
        divwu   r9,r3,r5                # i_q = h/d
        mullw   r10,r9,r5               # i_r = h - (i_q*d)
        subf    r10,r10,r3
        mr      r3,r9                   # req_q = i_q
.Lppcasm_set_ctr:
        li      r12,32                  # ctr = bitsizeof(d)
        mtctr   r12
.Lppcasm_div_loop:
        addc    r4,r4,r4                # l = l << 1 -> i_carry
        adde    r11,r10,r10             # i_h = (i_r << 1) | i_carry
        divwu   r9,r11,r5               # i_q = i_h/d
        mullw   r10,r9,r5               # i_r = i_h - (i_q*d)
        subf    r10,r10,r11
        add     r3,r3,r3                # ret_q = ret_q << 1 | i_q
        add     r3,r3,r9
        bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
.Lppc_div_end:
        bclr    BO_ALWAYS,CR0_LT        # return result in r3
        .long   0x00000000
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

David Ho-2
The reason I had to redo this routine, in case anyone is wondering, is
because ssh-keygen  segfaults when this assembly routine returns junk
to the BN_div_word function. On a ppc, if you issue the command

ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""

The program craps out when it tries to write the public key in ascii decimal.

Regards,
David

On 6/30/05, David Ho <[hidden email]> wrote:

> Hi all,
>
> This is a rewrite of the bn_div_words routine for the PowerPC arch,
> tested on a MPC8xx processor.
> I initially thought there is maybe a small mistake in the code that
> requires a one-liner change but it turns out I have to redo the
> routine.
> I guess this routine is not called very often as I see that most other
> routines are hand-crafted, whereas this routine is compiled from a C
> function that apparently has not gone through a whole lot of testing.
>
> I wrote a C function to confirm correctness of the code.
>
> unsigned long div_words (unsigned long h,
>                          unsigned long l,
>                          unsigned long d)
> {
>   unsigned long i_h; /* intermediate dividend */
>   unsigned long i_q; /* quotient of i_h/d */
>   unsigned long i_r; /* remainder of i_h/d */
>
>   unsigned long i_cntr;
>   unsigned long i_carry;
>
>   unsigned long ret_q; /* return quotient */
>
>   /* cannot divide by zero */
>   if (d == 0) return 0xffffffff;
>
>   /* do simple 32-bit divide */
>   if (h == 0) return l/d;
>
>   i_q = h/d;
>   i_r = h - (i_q*d);
>   ret_q = i_q;
>
>   i_cntr = 32;
>
>   while (i_cntr--)
>   {
>     i_carry = (l & 0x80000000) ? 1:0;
>     l = l << 1;
>
>     i_h = (i_r << 1) | i_carry;
>     i_q = i_h/d;
>     i_r = i_h - (i_q*d);
>
>     ret_q = (ret_q << 1) | i_q;
>   }
>
>   return ret_q;
> }
>
>
> Then I handcrafted the routine in PPC assembly.
> The result is a 26 line assembly that is easy to understand and
> predictable as opposed to a 81liner that I am still trying to
> decipher...
> If anyone is interested in incorporating this routine to the openssl
> code I'll be happy to assist.
> At this point I think I will be taking a bit of a break from this 3
> day debugging/fixing marathon.
>
> Regards,
> David Ho
>
>
> #
> #       Handcrafted version of bn_div_words
> #
> #       r3 = h
> #       r4 = l
> #       r5 = d
>
>         cmplwi  0,r5,0                  # compare r5 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div1  # proceed if d!=0
>         li      r3,-1                   # d=0 return -1
>         bclr    BO_ALWAYS,CR0_LT
> .Lppcasm_div1:
>         cmplwi  0,r3,0                  # compare r3 and 0
>         bc      BO_IF_NOT,CR0_EQ,.Lppcasm_div2  # proceed if h != 0
>         divwu   r3,r4,r5                # ret_q = l/d
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
> .Lppcasm_div2:
>         divwu   r9,r3,r5                # i_q = h/d
>         mullw   r10,r9,r5               # i_r = h - (i_q*d)
>         subf    r10,r10,r3
>         mr      r3,r9                   # req_q = i_q
> .Lppcasm_set_ctr:
>         li      r12,32                  # ctr = bitsizeof(d)
>         mtctr   r12
> .Lppcasm_div_loop:
>         addc    r4,r4,r4                # l = l << 1 -> i_carry
>         adde    r11,r10,r10             # i_h = (i_r << 1) | i_carry
>         divwu   r9,r11,r5               # i_q = i_h/d
>         mullw   r10,r9,r5               # i_r = i_h - (i_q*d)
>         subf    r10,r10,r11
>         add     r3,r3,r3                # ret_q = ret_q << 1 | i_q
>         add     r3,r3,r9
>         bc      BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> .Lppc_div_end:
>         bclr    BO_ALWAYS,CR0_LT        # return result in r3
>         .long   0x00000000
>
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

Andy Polyakov
> The reason I had to redo this routine, in case anyone is wondering, is
> because ssh-keygen  segfaults when this assembly routine returns junk
> to the BN_div_word function. On a ppc, if you issue the command
>
> ssh-keygen -t rsa1 -f /etc/ssh/ssh_host_key -N ""
>
> The program craps out when it tries to write the public key in ascii decimal.

If would help if you provide evidence such as debugger stack trace and
program output. Provided description makes no sense. "seg-faults when
routine returns junk to BN_div_word"? Seg-fault [segmentation violation]
can occur when you write something to memory and nothing gets written to
memory upon result return. BN_div_word does write to memory, but I fail
to see how a bogus value could possibly trigger seg-fault. The only
possibility is that assembler doesn't follow ABI convention and corrupts
registers, which caller is using/expects to be preserved by callee.
There're several PPC ABI flavors in use, but OpenSSL routines were
designed ABI-neutral, Well, "neutrality" really means "common
denominator for ABI specs examined at the moment of coding," so there is
a window of opportunity that it won't be "neutral" to future ABI, but is
it really case? That your system uses some newly designed PPC ABI? You
never mentioned what's your system...

But you're apparently right about a bug being present in PPC assembler.
I too have got insane [with very few significant digits] decimal
printout of public key generated by ssh-keygen...

>>This is a rewrite of the bn_div_words routine for the PowerPC arch,
>>tested on a MPC8xx processor.

Well, suggested routine apparently sends ssh-keygen on the PPC-based
32-bit system I have access to to an end-less loop... And (cd test; make
test_bn) fails early in BN_sqr... And test/exptest fails miserably with
"bad reciprocal"...

>>I initially thought there is maybe a small mistake in the code that
>>requires a one-liner change

But apparently this appears to be the case! Please verify following:

--- crypto/bn/asm/ppc.pl.orig        2004-04-28 00:05:50.000000000 +0200
+++ crypto/bn/asm/ppc.pl      2005-07-01 18:58:21.105656512 +0200
@@ -1717,7 +1717,7 @@
         li      r9,1                    # r9=1
         $SHL    r10,r9,r8               # r9<<=r8
         $UCMP   0,r3,r10                #
-       bc      BO_IF,CR0_GT,Lppcasm_div2       #or if (h > (1<<r8))
+       bc      BO_IF_NOT,CR0_GT,Lppcasm_div2   #or if (h > (1<<r8))
         $UDIV   r3,r3,r0                #if not assert(0) divide by 0!
                                         #that's how we signal overflow
         bclr    BO_ALWAYS,CR0_LT        #return. NEVER REACHED.

A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

David Ho-2
Good morning gentleman,  

Let's start the week off with less hostility and more productive
criticism on this topic.  As I see it, the popular Linux distributions
are not that interested on ppc32 since neither Novell nor Redhat
openly support this arch.  We will have to put our heads together to
make ppc32 a stable Linux platform.

On 7/1/05, Andy Polyakov <[hidden email]> wrote:

> BN_div_word does write to memory, but I fail
> to see how a bogus value could possibly trigger seg-fault.

Assuming that you are a maintainer of the openssl code, which I gather
you are, would you even care to take the time to examine the code in
suspect, or find someone who is capable of doing that, I'm sure you
must have a lot of friends, at least one who is knowledgeable in the
ppc32 arch.

> The only
> possibility is that assembler doesn't follow ABI convention and corrupts
> registers, which caller is using/expects to be preserved by callee.
> There're several PPC ABI flavors in use, but OpenSSL routines were
> designed ABI-neutral, Well, "neutrality" really means "common
> denominator for ABI specs examined at the moment of coding," so there is
> a window of opportunity that it won't be "neutral" to future ABI, but is
> it really case?

I don't understand the terminology you use here.  As I understand it
ABI is there so binaries can inter-operate in the case of dynamic
loading, when the ABI is consistent you can use any compiler that
conforms to the ABI and those binaries can work together.  As I see it
I'm building openssl and openssh with the same compiler so what is the
real issue here?  Or is it an issue at all?

If you are referring to C calling convention, then I can only guess
you have figured it out or otherwise none of the assembly routine will
work.

> That your system uses some newly designed PPC ABI? You
> never mentioned what's your system...

In my very first email, I have already said quote "tested on a MPC8xx
processor"  and I am using 2.4.24 which is nothing close to the
bleeding edge so I reckon the ABI is fairly standard.

Do you have any experience with the PowerPC arch?  

> But you're apparently right about a bug being present in PPC assembler.

So you are saying there is a bug in the GCC assembler?  How confident
are you in that?  Is the first correct step to examine the assembly
code for errors before jumping to any conclusion that the GCC
assembler is bad?

> >>This is a rewrite of the bn_div_words routine for the PowerPC arch,
> >>tested on a MPC8xx processor.
>
> Well, suggested routine apparently sends ssh-keygen on the PPC-based
> 32-bit system I have access to to an end-less loop...

If you care to read the c function I supplied or if you don't believe
it:  If you understand ppc 32-bit instructions, as specified in the
PowerPC Microprocessor Family: Programming Environments for 32-Bit
Microprocessors.  My routine would not be able to find a condition
that will make it go into an end-less loop,unless you messed up bad
somewhere.

> And (cd test; make
> test_bn) fails early in BN_sqr... And test/exptest fails miserably with
> "bad reciprocal"...

I don't know what you did but (make test_bn) works for me.  So I
question how diligently you are in setting up the tests.  If you are
busy, please get one of your ppc friends to do it.

> >>I initially thought there is maybe a small mistake in the code that
> >>requires a one-liner change
>
> But apparently this appears to be the case! Please verify following:
>
> --- crypto/bn/asm/ppc.pl.orig        2004-04-28 00:05:50.000000000 +0200
> +++ crypto/bn/asm/ppc.pl      2005-07-01 18:58:21.105656512 +0200
> @@ -1717,7 +1717,7 @@
>          li      r9,1                    # r9=1
>          $SHL    r10,r9,r8               # r9<<=r8
>          $UCMP   0,r3,r10                #
> -       bc      BO_IF,CR0_GT,Lppcasm_div2       #or if (h > (1<<r8))
> +       bc      BO_IF_NOT,CR0_GT,Lppcasm_div2   #or if (h > (1<<r8))
>          $UDIV   r3,r3,r0                #if not assert(0) divide by 0!
>                                          #that's how we signal overflow
>          bclr    BO_ALWAYS,CR0_LT        #return. NEVER REACHED.
>
You don't think I have gone in and fix that before realizing it's
worse than that?  I am sure you didn't think I spend 3 days out in the
beach and somehow come up with an idea of clobbering some useful
routine because I think my routine is better?

In summary, what I am trying to provide the community is an
alternative to do something, the current implementation of which is
very questionable.  You might resist change which is understandable.
But I were a diligent maintainer and I realize something is broken in
my code, then I will put the time to investigate the problem.  If I
don't have the time, I will delagate it to a friend.  If I don't have
a friend with that expertise then, I will try to make friends with the
reporter of the bug since he will most likely have spent hours fixing
the problem.

Regards,
David Ho.

test_bn.txt (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PPC bn_div_words routine rewrite

Andy Polyakov
> Let's start the week off with less hostility and more productive
> criticism on this topic.

If you want productivity, then provide real evidence in form of stack
backtrace at segmentation violation point, disassemble output in the
vicinity of segmentation violation point and 'info registers' output at
the same point. As for hostility I leave it without comment, as you're
apparently can outrank anybody in that area:-)

>>But you're apparently right about a bug being present in PPC assembler.
>
>
> So you are saying there is a bug in the GCC assembler? How confident
> are you in that?  Is the first correct step to examine the assembly
> code for errors before jumping to any conclusion that the GCC
> assembler is bad?

Did I say GCC assembler? I said PPC assembler, which refers to
crypto/bn/asm/ppc.pl.

>>>>This is a rewrite of the bn_div_words routine for the PowerPC arch,
>>>>tested on a MPC8xx processor.
>>
>>Well, suggested routine apparently sends ssh-keygen on the PPC-based
>>32-bit system I have access to to an end-less loop...
>
>
> If you care to read the c function I supplied or if you don't believe
> it:  If you understand ppc 32-bit instructions, as specified in the
> PowerPC Microprocessor Family: Programming Environments for 32-Bit
> Microprocessors.  My routine would not be able to find a condition
> that will make it go into an end-less loop,unless you messed up bad
> somewhere.

I didn't say that suggested routine goes into an end-less loop, but that
it "*apparently* sends ssh-keygen into end-less loop." I made no claims
about which routine exactly loops, and I even admit that I don't know
for sure if it was in fact end-less loop, because I've chosen to kill
the process after couple of minutes. Note that normally it takes just
few seconds on the machine I've tested on, so that couple of minutes is
essentially unacceptable and by all means *appears* as end-less loop.

> In summary, what I am trying to provide the community is an
> alternative to ... the current implementation of which is
> very questionable.

crypto/bn/asm/ppc.pl distributed with OpenSSL was designed for and
explicitly tested by IBM under 32- and 64-bit PPC Linux, 32- and 64-bit
AIX, as well as 32-bit MacOS X. Special care was taken to make sure that
neither of ABIs/calling conventions used by above listed platforms are
violated, so that module can be safely invoked by compiler-generated
code for above mentioned OSes. Afterwards there were reports that it was
successfully used on unspecified [in report] embedded PPC-based
platform. Despite this on Friday I could personally confirm on 32-bit
MacOS X that there admittedly was a bug in ppc.pl, which manifests as
failure to generate sane decimal ASCII presentation of a BIGNUM, which
is exactly the kind of operation taking place when you run ssh-keygen -t
rsa1 [it should be noted decimal ASCII is unfortunately not covered by
'make test_bn' suite]. But under no circumstances segmentation violation
was observed. At the same time I could personally confirm that if pasted
into osx32_ppc.s, suggested implementation induces 'make test_bn'
failure on 32-bit MacOS X. In particular test/bntest terminates with

print "test BN_sqr\n"
-FF554CAEAE * -FF554CAEAE - FEAB0B30019BBA80FE44
Square test failed!
1

while test/exptest:

BN_mod_exp_recp() problems
14482:error:03082065:bignum routines:BN_div_recp:bad
reciprocal:bn_recp.c:194:

For me it's enough reasons to become sceptical to submission and conduct
trouble-shooting of my own. Currently available ppc.pl was verified to
pass 'make test_bn' on 32-bit MacOS X, 32- and 64-bit AIX [tested by
IBM], as well as to generate correct decimal ASCII presentation on the
mentioned platforms. If it doesn't work for you, then submit information
listed in the beginning of the letter. A.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [hidden email]
Automated List Manager                           [hidden email]