Tuesday, July 29, 2014

A little Bug on Linux 3.16-rc6...

Hello everybody, here I am again to report an interesting bug that I saw just a little while ago.
The error was about GCC compiler that have produced a bugged kernel when compiling linux-3.16-rc.
We could find some detail in kernel mailing list ( "Random panic in load_balance() with 3.16-rc").
And here we could find a little extract of the mail:
...
movq $load_balance_mask, -136(%rbp) #, %sfp
subq $184, %rsp #,
movq (%rdx), %rax # sd_22(D)->parent, sd_parent
movl %edi, -144(%rbp) # this_cpu, %sfp
movl %ecx, -140(%rbp) # idle, %sfp
movq %r8, -200(%rbp) # continue_balancing, %sfp
movq %rax, -184(%rbp) # sd_parent, %sfp
movq -136(%rbp), %rax # %sfp, tcp_ptr__
#APP
add %gs:this_cpu_off, %rax # this_cpu_off, tcp_ptr__
#NO_APP
...

Note the contents of -136(%rbp). Seriously. That's an
_immediate_constant_ that the compiler is spilling.

Somebody needs to raise that as a gcc bug. Because it damn well is
some seriously crazy s**t.

However, that constant spilling part just counts as "too stupid to
live". The real bug is this:

movq $load_balance_mask, -136(%rbp) #, %sfp
subq $184, %rsp #,

where gcc creates the stack frame *after* having already used it tosave that constant *deep* below the stack frame.
The x86-64 ABI specifies a 128-byte red-zone under the stack pointer,and this is ok by that limit. It looks like it's illegal (136 > 128),but the fact is, we've had four "pushq"s to update %rsp since loading

the frame pointer, so it's just *barely* legal with the red-zoning.
Linus Torvalds reported this even on gcc-bugzilla.
Now seems that on rc7 introduced a workaround for buggy versions of gcc compiler.
So that's all folks, hope this will be fixed soon!


No comments:

Post a Comment