#GP happens in do_sse3_after_all

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

#GP happens in do_sse3_after_all

Yan, Shaopu

Hi dear openssl maintainer,

I met an issue in the crypto/chacha/chacha-x86_64.S, could you be kind to have a look on it? Thanks very much.

 

Currently it will stuck in the function do_sse3_after_all, and a #GP will occurs due to the following instructions

““movdqa %xmm0,0(%rsp)” need 16 bytes alignment, however, after I go through the detail code, I find that it already

adjust the rsp by “subq $64+8,%rsp” and I simply tried to change it like “subq $64,%rsp” then it will works correctly.

 

I don’t know whether there’s an issue about it?, if I have some mistake please correct me. J

I suppose that the “subq $64+8,%rsp” is used to align the stack with 16 bytes, but in my case if the default RSP already be 16 bytes

align then after execute it the stack will becomes 8 bytes align so the #GP happensL  So could you please help to check it?

 

 

438ChaCha20_4x:
439.LChaCha20_4x:
440        movq        %rsp,%r9
441        movq        %r10,%r11
442        shrq        $32,%r10
443        testq        $32,%r10
444        jnz        .LChaCha20_8x
445        cmpq        $192,%rdx
446        ja        .Lproceed4x
447
448        andq        $71303168,%r11
449        cmpq        $4194304,%r11
450        je        .Ldo_sse3_after_all

 


987.LChaCha20_8x:
988        movq        %rsp,%r9
989        
subq        $0x280+8,%rsp
990        andq        $-32,%rsp

991        vzeroupper

 

 

.Lproceed4x:
453        
subq        $0x140+8,%rsp
454        movdqa        .Lsigma(%rip),%xmm11
455        movdqu        (%rcx),%xmm15
456        movdqu        16(%rcx),%xmm7
457        movdqu        (%r8),%xmm3
458        leaq        256(%rsp),%rcx
459        leaq        .Lrot16(%rip),%r10
460        leaq        .Lrot24(%rip),%r11

 

 

.Ldo_sse3_after_all:
312        
subq        $64+8,%rsp
313        movdqa        .Lsigma(%rip),%xmm0
314        movdqu        (%rcx),%xmm1
315        movdqu        16(%rcx),%xmm2
316        movdqu        (%r8),%xmm3
317        movdqa        .Lrot16(%rip),%xmm6
318        movdqa        .Lrot24(%rip),%xmm7
319
320        movdqa        %xmm0,0(%rsp)
321        movdqa        %xmm1,16(%rsp)
322        movdqa        %xmm2,32(%rsp)
323        movdqa        %xmm3,48(%rsp)
324        movq        $10,%r8
325        jmp        .Loop_ssse3

 

/Best Regards!

--Shaopu

 


--
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Reply | Threaded
Open this post in threaded view
|

Re: #GP happens in do_sse3_after_all

Andy Polyakov-2
Hi,

> I met an issue in the crypto/chacha/chacha-x86_64.S, could you be kind
> to have a look on it? Thanks very much.
>
> Currently it will stuck in the function *do_sse3_after_all*, and a #GP
> will occurs due to the following instructions
>
> ““movdqa %xmm0,0(%rsp)” need 16 bytes alignment, however, after I go
> through the detail code, I find that it already
>
> adjust the rsp by “subq $64+8,%rsp” and I simply tried to change it like
> “subq $64,%rsp” then it will works correctly.
>
> I don’t know whether there’s an issue about it?, if I have some mistake
> please correct me. J
>
> I suppose that the “subq $64+8,%rsp” is used to align the stack with 16
> bytes, but in my case if the default RSP already be 16 bytes
>
> align then after execute it the stack will becomes 8 bytes align so the
> #GP happensL  So could you please help to check it?

All known x86_64 ABIs specify that top of stack is to be aligned at 16
bytes. Obviously it can't be aligned at each given moment, not on
x86_64, so question is *when* does it have to be aligned? It has to be
aligned at least at moment of call to another subroutine. Since x86_64
call instruction pushes return address to stack, this means that upon
entry to function stack is actually misaligned. Hence compliant function
has to allocate 16*n+8 frame. And that's what we see in code, 64+8 in
the referred case. Now, if you experience crash at the point in
question, it can only mean one thing, caller is not compliant with ABI.
Though there is ambiguity and it might be wrong to blame direct caller
for following reason. Customarily compilers don't explicitly align stack
in each subroutine, but instead assume that caller aligned it. In other
words stack alignment is kind of collective effort, with each subroutine
relying on its caller. So that all subroutines can be compliant, but it
would still be a problem. This would be case when stack was *initially*
misaligned [upon its creation]. To summarize, it's either one of
subroutines in chain of calls leading to ChaCha20_ctr32 that is not
compliant with ABI, or stack was initially seeded misaligned.
--
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev