Bug #4062
closedvty tests fails on arm (raspberry pi)
100%
Description
The debian packages for raspbian fails on OBS because of a vty test failure.
Attached the debian source package.
https://build.opensuse.org/package/show/network:osmocom:nightly/libosmocore
Files
Related issues
Updated by laforge almost 5 years ago
- Assignee set to laforge
according to the logs, there's actually a segfault during vty test execution, which is quite troubling.
Updated by laforge almost 5 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 30
I can reproduce the problem on a Raspi 3 with raspbian: vty_test segfaults. gdb doesn't show a backtrace but indicates the crash is in 'memcmp'
When compiling with "-g -O0", the test passes. Will try to debug further.
Updated by laforge almost 5 years ago
It's not the optimization, but it's the "-g" which makes the problem vanish. weird.
strace shows:
mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f3f000 set_tls(0x76f3f4c0, 0x76f41e68, 0x76f4e050, 0x76f3f4c0, 0x76f4e050) = 0 mprotect(0x76e84000, 8192, PROT_READ) = 0 mprotect(0x76d49000, 4096, PROT_READ) = 0 mprotect(0x76ed6000, 4096, PROT_READ) = 0 mprotect(0x76f05000, 4096, PROT_READ) = 0 mprotect(0x76eb3000, 4096, PROT_READ) = 0 mprotect(0x76f07000, 20480, PROT_READ|PROT_WRITE) = 0 mprotect(0x76f07000, 20480, PROT_READ|PROT_EXEC) = 0 cacheflush(0x76f07000, 0x76f0c000, 0, 0x15, 0) = 0 mprotect(0x76f1b000, 4096, PROT_READ) = 0 mprotect(0x23000, 4096, PROT_READ) = 0 mprotect(0x76f4d000, 4096, PROT_READ) = 0 munmap(0x76f43000, 30037) = 0 brk(NULL) = 0x1f9f000 brk(0x1fc0000) = 0x1fc0000 getcwd("/home/pi/osmo/libosmocore/tests", 4096) = 32 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x76f43618} --- +++ killed by SIGSEGV +++ Segmentation fault
Updated by laforge almost 5 years ago
- File valgrind-vty_test.txt valgrind-vty_test.txt added
valgrind also makes the program work, but shows an impressive list of problems during startup of the program (See attachment)
Updated by laforge almost 5 years ago
stepping through the program in gdb points at gen_logging_level_cmd_strs(), and within that add_category_strings() and
osmo_str_tolower (src=0x76fa508d "LGLOBAL") at utils.c:912 912 osmo_str_tolower_buf(buf, sizeof(buf), src); (gdb) 910 { (gdb) 912 osmo_str_tolower_buf(buf, sizeof(buf), src); (gdb) osmo_str_tolower_buf (dest=0x76de9618 "-linux-armhf.so.3", dest_len=128, src=0x76fa508d "LGLOBAL") at utils.c:886
for sure we don't want ot write to somewhere that holds "-linux-armhf.so.3" ?!?
Updated by laforge almost 5 years ago
git bisect states 171ef826e1489031bc48745f29fa2d4657bf165f is the culprit. This is what introduces thread-local storage to libosmocore static buffers. But why would those be different on raspbian?
Updated by laforge almost 5 years ago
so the test executes a chain of functions ending with osmo_str_tolower_buf(), which wants to use a thread-local static buffer for string lower case conversion and that fails. but why?
Updated by laforge almost 5 years ago
It's a mystery to me, why the __thread
annotation for thread-local storage would fail to work on raspbian, but work with Debian on the same hardware, and also work with any other of the distributions/versions that we're working with.
It's unlikely that we're the first program on the planet wanting to use thread-local storage, either?
Also fascinating, when trying to use gdb:
(gdb) p buf Cannot find thread-local storage for process 25857, shared library /home/pi/osmo/libosmocore/src/.libs/libosmocore.so.12: Cannot find thread-local variables on this target
Updated by laforge almost 5 years ago
- Status changed from In Progress to Stalled
I'm giving up on this. It's ridiculous that something as basic as __thread is failing on something considered a "stable" distribution.
Updated by laforge almost 5 years ago
- the bug is gone when building with clang instead of gcc
- vty_test passes
- valgrind doesn't complain at all anymore
Updated by laforge over 4 years ago
getting back to this with some more information:
- manually trying to reproduce this with a small binary and a small shared library outside of libosmocore failed for some reason
- libosmocore contains the expected .tbss ELF section with "ALLOC, THREAD_LOCAL" flags
Updated by laforge over 4 years ago
for the clang-built libosmocore:
pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ objdump -h ./libosmocore.so.12 ./libosmocore.so.12: file format elf32-littlearm Sections: Idx Name Size VMA LMA File off Algn ... 16 .tbss 000022c8 00032b40 00032b40 00022b40 2**3 ALLOC, THREAD_LOCAL <pre> for the gcc-built one: <pre> 15 .tbss 000022c8 0002db74 0002db74 0001db74 2**2 ALLOC, THREAD_LOCAL </pre>
Updated by laforge over 4 years ago
[earlier comments about which combination worked or didn't work were wrong].
The following steps result in a working binary:- building the main program with gcc against a gcc-built libosmocore
- later swapping libosmocore.so.12 against a clang-built one
Updated by laforge over 4 years ago
Adding the "objdump -d -S" output for the relevant function for both clang and gcc:
clang:
const char *osmo_str_tolower(const char *src) { 17ec: e92d4830 push {r4, r5, fp, lr} 17f0: e28db008 add fp, sp, #8 17f4: e1a04000 mov r4, r0 static __thread char buf[128]; printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 17f8: e59f0044 ldr r0, [pc, #68] ; 1844 <osmo_str_tolower+0x58> 17fc: e08f0000 add r0, pc, r0 1800: ebfffffe bl 0 <__tls_get_addr> 1804: e1a05000 mov r5, r0 1808: e59f0038 ldr r0, [pc, #56] ; 1848 <osmo_str_tolower+0x5c> 180c: e1a01005 mov r1, r5 1810: e3a02080 mov r2, #128 ; 0x80 1814: e08f0000 add r0, pc, r0 1818: ebfffffe bl 0 <printf> memset(buf, 0, sizeof(buf)); 181c: e1a00005 mov r0, r5 1820: e3a01000 mov r1, #0 printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 1824: e3a02080 mov r2, #128 ; 0x80 memset(buf, 0, sizeof(buf)); 1828: ebfffffe bl 0 <memset> osmo_str_tolower_buf(buf, sizeof(buf), src); 182c: e1a00005 mov r0, r5 printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 1830: e3a01080 mov r1, #128 ; 0x80 osmo_str_tolower_buf(buf, sizeof(buf), src); 1834: e1a02004 mov r2, r4 1838: ebfffffe bl 1698 <osmo_str_tolower_buf> return buf; 183c: e1a00005 mov r0, r5 1840: e8bd8830 pop {r4, r5, fp, pc} 1844: 00000040 .word 0x00000040 1848: 0000002c .word 0x0000002c
gcc:
const char *osmo_str_tolower(const char *src) { static __thread char buf[128]; printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 152c: e59f3054 ldr r3, [pc, #84] ; 1588 <osmo_str_tolower+0x5c> { 1530: e92d4070 push {r4, r5, r6, lr} 1534: e1a06000 mov r6, r0 printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 1538: e08f0003 add r0, pc, r3 153c: ebfffffe bl 0 <__tls_get_addr> 1540: e59f4044 ldr r4, [pc, #68] ; 158c <osmo_str_tolower+0x60> 1544: e3a02080 mov r2, #128 ; 0x80 memset(buf, 0, sizeof(buf)); 1548: e1a05002 mov r5, r2 printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 154c: e0804004 add r4, r0, r4 1550: e59f0038 ldr r0, [pc, #56] ; 1590 <osmo_str_tolower+0x64> 1554: e1a01004 mov r1, r4 1558: e08f0000 add r0, pc, r0 155c: ebfffffe bl 0 <printf> memset(buf, 0, sizeof(buf)); 1560: e1a02005 mov r2, r5 1564: e1a00004 mov r0, r4 1568: e3a01000 mov r1, #0 156c: ebfffffe bl 0 <memset> osmo_str_tolower_buf(buf, sizeof(buf), src); 1570: e1a02006 mov r2, r6 1574: e1a01005 mov r1, r5 1578: e1a00004 mov r0, r4 157c: ebfffffe bl 1400 <osmo_str_tolower_buf> return buf; }
Updated by laforge over 4 years ago
building libosmocore with gcc-5 doesn't change the behavior; it fails just like with gcc-6.
Updated by laforge over 4 years ago
- File tls-demonstrator.tar.bz2 tls-demonstrator.tar.bz2 added
attaching my small "demonstrator" showing that tls works with a self-built shared library, but fails against libosocore.
Updated by laforge over 4 years ago
one difference is the the 'struct tls_index' offset when comparing the gcc vs. clang version:
working clang:
trying to use TLS in own library Breakpoint 2, __GI___tls_get_addr (ti=0x76fb7024) at dl-tls.c:831 831 dl-tls.c: No such file or directory. (gdb) p *ti $1 = {ti_module = 2, ti_offset = 0} (gdb) cont Continuing. tls_foo: buf=0x76e436d0 bar1='hello shlib' trying to use TLS in libosmocore Breakpoint 2, __GI___tls_get_addr (ti=0x23080) at dl-tls.c:831 831 in dl-tls.c (gdb) p *ti $2 = {ti_module = 1, ti_offset = 4360} (gdb) cont Continuing. buf=0x76e435d0, sizeof(buf)=128 osmo_str_tolower_buf(0x76e435d0, 128, 0x12a4c) i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 bar2='hello osmo' [Inferior 1 (process 19031) exited normally]
non-working gcc:
trying to use TLS in own library Breakpoint 1, __GI___tls_get_addr (ti=0x76fb7024) at dl-tls.c:831 831 dl-tls.c: No such file or directory. (gdb) p *ti $1 = {ti_module = 2, ti_offset = 0} (gdb) cont Continuing. tls_foo: buf=0x76e436d0 bar1='hello shlib' trying to use TLS in libosmocore Breakpoint 1, __GI___tls_get_addr (ti=0x23064) at dl-tls.c:831 831 in dl-tls.c (gdb) p *ti $2 = {ti_module = 1, ti_offset = 0} (gdb) cont Continuing. buf=0x76e455c8, sizeof(buf)=128 Program received signal SIGSEGV, Segmentation fault.
The binaries used in bothe examples above use static linking of libosmocore/src/.libs/utils.o to exclude any influence from other parts of libosmocore.
Updated by laforge over 4 years ago
further differences:
non-working gcc compiled libosmocore:
pi@rpi3-bplus:~/osmo/libosmocore-gcc/src/.libs $ readelf -r ./libosmocore.so | grep TLS 0002e480 00000011 R_ARM_TLS_DTPMOD3
working clang compiled libosmocore:
pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ readelf -r ./libosmocore.so | grep TLS 0003334c 00000011 R_ARM_TLS_DTPMOD3 00033354 00000011 R_ARM_TLS_DTPMOD3 0003335c 00000011 R_ARM_TLS_DTPMOD3 00033364 00000011 R_ARM_TLS_DTPMOD3 0003336c 00000011 R_ARM_TLS_DTPMOD3 00033374 00000011 R_ARM_TLS_DTPMOD3 0003337c 00000011 R_ARM_TLS_DTPMOD3 00033384 00000011 R_ARM_TLS_DTPMOD3 0003338c 00000011 R_ARM_TLS_DTPMOD3 00033394 00000011 R_ARM_TLS_DTPMOD3 0003339c 00000011 R_ARM_TLS_DTPMOD3 000333a4 00000011 R_ARM_TLS_DTPMOD3
Those entries each correspond to one 'struct tls_index', and I'm somewhat surprised that the count differs from gcc to clang ?!?
Updated by laforge over 4 years ago
specifically, looking at utils.o:
non-working gcc object:
0000009c 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 000000a4 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 0000054c 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 00000550 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 000005ac 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 000005b4 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 00000674 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 0000067c 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 000008e8 00005b6a R_ARM_TLS_LDO32 00001ff8 .LANCHOR2 000008ec 00005b69 R_ARM_TLS_LDM32 00001ff8 .LANCHOR2 000008f0 00005b69 R_ARM_TLS_LDM32 00001ff8 .LANCHOR2 000010f8 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 000010fc 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 00001304 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 00001308 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 00001588 00005b69 R_ARM_TLS_LDM32 00001ff8 .LANCHOR2 0000158c 00005b6a R_ARM_TLS_LDO32 00001ff8 .LANCHOR2 000016a0 00005b69 R_ARM_TLS_LDM32 00001ff8 .LANCHOR2 000016a4 00005b6a R_ARM_TLS_LDO32 00001ff8 .LANCHOR2 000003cf 00005c6a R_ARM_TLS_LDO32 00000000 namebuf 000003f2 00005d6a R_ARM_TLS_LDO32 00000100 hexd_buff 00000663 0000606a R_ARM_TLS_LDO32 00001188 buf.7531 00000840 00005f6a R_ARM_TLS_LDO32 00001108 buf.7509 00001597 00005e6a R_ARM_TLS_LDO32 00001100 buf.7331 00001bcc 00005e6a R_ARM_TLS_LDO32 00001100 buf.7331
working clang object:
pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ readelf -r ./utils.o | grep TLS 0000008c 0000fb68 R_ARM_TLS_GD32 00000000 namebuf 0000059c 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 000005a0 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 000005a4 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000680 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000684 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000690 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 0000081c 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000820 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000828 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000ab4 0000fc68 R_ARM_TLS_GD32 00001100 osmo_encode_big_endian 00000ab8 0000fc68 R_ARM_TLS_GD32 00001100 osmo_encode_big_endian 00001374 0000fb68 R_ARM_TLS_GD32 00000000 namebuf 0000158c 0000fb68 R_ARM_TLS_GD32 00000000 namebuf 00001844 0000fe68 R_ARM_TLS_GD32 00001108 osmo_str_tolower.buf 00001a5c 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a60 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a64 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a68 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a6c 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a70 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a74 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 0000004a 0000fc6a R_ARM_TLS_LDO32 00001100 osmo_encode_big_endian 000000d5 0000fe6a R_ARM_TLS_LDO32 00001108 osmo_str_tolower.buf 00000122 0000ff6a R_ARM_TLS_LDO32 00001188 osmo_str_toupper.buf 000001c8 0000fb6a R_ARM_TLS_LDO32 00000000 namebuf 000001e6 0000fa6a R_ARM_TLS_LDO32 00000100 hexd_buff
and finally, from a working gcc object compiled with -O0
instead of -O2
:
readelf -r ./utils.o | grep TLS 00000138 00000968 R_ARM_TLS_GD32 00000000 namebuf 00000140 00000968 R_ARM_TLS_GD32 00000000 namebuf 00000144 00000968 R_ARM_TLS_GD32 00000000 namebuf 00000a4c 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000ac0 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000ac8 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000bdc 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000be4 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000f80 00001c68 R_ARM_TLS_GD32 00001100 buf.7143 00000f84 00001c68 R_ARM_TLS_GD32 00001100 buf.7143 00002328 00000968 R_ARM_TLS_GD32 00000000 namebuf 00002824 00000968 R_ARM_TLS_GD32 00000000 namebuf 00002c3c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c44 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c48 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c4c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002e04 00003e68 R_ARM_TLS_GD32 00001188 buf.7337 00002e08 00003e68 R_ARM_TLS_GD32 00001188 buf.7337 000003cf 0000096a R_ARM_TLS_LDO32 00000000 namebuf 000003f2 00000e6a R_ARM_TLS_LDO32 00000100 hexd_buff 000005dc 00003e6a R_ARM_TLS_LDO32 00001188 buf.7337 000006d6 0000386a R_ARM_TLS_LDO32 00001108 buf.7318 00000ff5 00001c6a R_ARM_TLS_LDO32 00001100 buf.7143
We can judge by the size that only the lines with 1108 as size are relevant for osmo_str_tolower()
which gives us the following summary:
working gcc-O0:
00002c3c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c44 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c48 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c4c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c4c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 000006d6 0000386a R_ARM_TLS_LDO32 00001108 buf.7318
non-working gcc-O2:
00000840 00005f6a R_ARM_TLS_LDO32 00001108 buf.7509
So somehow, in the non-working case there is no R_ARM_TLS_GD32 entry at all. I don't know what it means but it's for sure a noticable difference.
Updated by laforge over 4 years ago
According to https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf
R_ARM_TLS_GD32 causes two adjacent entries to be added to the dynamically relocated section (the Global
Offset Table, or GOT). The first of these is dynamically relocated by R_ARM_TLS_DTPMOD32, the second by
R_ARM_TLS_DTPOFF32. The place resolves to the offset of the first of the GOT entries from the place.
this sounds a lot like what's required with tls_index (ti_module + ti_offset).
R_ARM_TLS_LDO32 resolves to the offset of the referenced data object (which must be local to the module) fromthe origin of the TLS block for the current module.
Updated by laforge over 4 years ago
gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516 also doesn't generate the GD32 relocations when "-O1" is enabled. So -O0 is the only option to generate working code with gcc.
This is 2019, TLS existed since late 2002...
Updated by laforge over 4 years ago
- Status changed from Stalled to In Progress
- % Done changed from 30 to 50
Ok, so the relocations might have been a dead end. I now have a relatively small, self-contained reproducer (see attached tls-reproducer2.tar.bz2). It has no external dependencies, not even libosmocore or talloc.
If you compile this with -O2 on raspbian9 (likely also other 32bit little-endian ARM architectures), it will segfault.
- if you change to -O0, the resulting binary will work.
- if you shrink hexd_buff from 4096 to 4092 bytes or lower, the resulting binary will work
- if you build with clang, the resulting binary will work.
- if you build without -fPIC, the resulting binary will work.
WTF is going on here?
Updated by laforge over 4 years ago
- File tls-reproducer2.tar.bz2 tls-reproducer2.tar.bz2 added
Updated by laforge over 4 years ago
- Assignee changed from laforge to pespin
assigning to pespin, please report upstream to gcc and/or analyze further.
Updated by laforge over 4 years ago
might still be relocation related:
non-working gcc-O2 binary:
pi@rpi3-bplus:~/tls $ objdump -x utils.o | grep buf 00000000 l .tbss 00001000 hexd_buff 00001000 l .tbss 00000080 buf.6264 00000337 R_ARM_TLS_LDO32 hexd_buff 00000397 R_ARM_TLS_LDO32 buf.6264
working gcc-O0 binary:
pi@rpi3-bplus:~/tls $ objdump -x utils.o | grep buf 00000000 l .tbss 00001000 hexd_buff 00001000 l .tbss 00000080 buf.6111 0000027c R_ARM_TLS_GD32 hexd_buff 00000284 R_ARM_TLS_GD32 hexd_buff 00000558 R_ARM_TLS_GD32 buf.6111 00000560 R_ARM_TLS_GD32 buf.6111 00000564 R_ARM_TLS_GD32 buf.6111 00000568 R_ARM_TLS_GD32 buf.6111
working clang binary:
pi@rpi3-bplus:~/tls $ objdump -x utils.o | grep buf 00000000 l .tbss 00001000 hexd_buff 00001000 l .tbss 00000080 osmo_str_tolower.buf 00000200 R_ARM_TLS_GD32 hexd_buff 00000204 R_ARM_TLS_GD32 hexd_buff 00000210 R_ARM_TLS_GD32 hexd_buff 00000428 R_ARM_TLS_GD32 osmo_str_tolower.buf 00000048 R_ARM_TLS_LDO32 osmo_str_tolower.buf 00000084 R_ARM_TLS_LDO32 hexd_buff
Updated by laforge over 4 years ago
in terms of potential ugly interim workarounds, we could think about shrinking the size of our large __thread buffers. There's only two: the 4096 byte hexdump buffer in utils.c, and the 4100 byte buffer that's part of msgb_hexdump() in msgb.c. The point is without understanding the details of this bug: How can we be sure we shrink them sufficiently?
Updated by pespin over 4 years ago
- Assignee changed from pespin to laforge
- % Done changed from 50 to 30
I think it makes sense for now to change both buffers to 4092 for now as a quick workaround, and later on find a better solution once we report the bug and receive more feedback from gcc guys.
Updated by laforge over 4 years ago
Hi Pau,
On Wed, Jul 31, 2019 at 10:44:56PM +0000, pespin [REDMINE] wrote:
Assignee changed from pespin to laforge
% Done changed from 50 to 30
were those two intentional changes?
I think it makes sense for now to change both buffers to 4092 for now as a quick workaround, and later on find a better solution once we report the bug and receive more feedback from gcc guys.
I'm not sure that 4092 will do it for msgb, this still needs to be validated. I played a bit
around and it seems there is a "global" maximum, most likely per source code file.
Regards,
Harald
Updated by pespin over 4 years ago
- Assignee changed from laforge to pespin
- % Done changed from 30 to 50
Hi Harald, no they were no intentional, I have no idea how those ended up being changed.
Updated by pespin over 4 years ago
Checking at osmo-bsc runtime crash from same issue (osmo_str_to_lower()), here's some conclusions (and tests below):
I couldn't find the exact culprit of the issue, but some hints on requirements for the issue to happen:- It seems the file must have more than 1 TLS variable for bug to appear (so compiler applies some optimizations in that case, which are not worth if there's less than 2 variables).
- TLS variable being inside or outside of function doesn't seem to make any difference to me (other than maybe ordering in memory, which I don't know).
- Behavior seems to be OK as long as all variables fit inside the first mem page (4096 bytes).
- Problems start when sum of TLS variables starts getting bigger than 1st mem page aprox, so to me it looks like issues with mem page boundaries.
- It seems order of variables matter. For instance having only "hexd_buff=1 && buf=4094" is fine, but having "hexd_buff=4094 && buf=1" crashes.
- Bug doesn't appear starting from a single threshold size. For instance "hexd_buff=4094 && buf=9000" is fine, but "hexd_buff=4094 && buf=1" crashes. So to me it's either memory page bounded related or there's some range of sizes affected somehow.
Let's start with changing only value for hexd_buf: 4096 crash 4092 crash 3900 crash 3850 crash 3830 crash 3829 crash <--- first value crashing, used next to see if it still can work by droppng size of other buffers (to verify relations). ---------- 3828 works 3824 works 3820 works 3810 works 3800 works 3000 works 2000 works Let's remove the uint64_t __thread variable (removing __thread attr), that's 8 bytes less: no-uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=255 works no-uint64_t + hexd_buff=3832 + buf=128 + buf=128 + namebuf=255 works no-uint64_t + hexd_buff=3833 + buf=128 + buf=128 + namebuf=255 works no-uint64_t + hexd_buff=3836 + buf=128 + buf=128 + namebuf=255 works no-uint64_t + hexd_buff=3837 + buf=128 + buf=128 + namebuf=255 crash <-- so in this case it's really the 8 bytes we removed, they can be passed from one var to another (so they are related). relation 1-1 bytes. uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=255 crash uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=254 crash uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=253 crash uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=252 works <-- so here it theoretically should be 1-3 bytes relation with hexd_buff and namebuf. uint64_t + hexd_buff=3830 + buf=128 + buf=128 + namebuf=252 works <-- let's increase the big buffer by 1, to see if relation is kept uint64_t + hexd_buff=3831 + buf=128 + buf=128 + namebuf=252 works <-- let's increase the big buffer by 1, to see if relation is kept. uint64_t + hexd_buff=3832 + buf=128 + buf=128 + namebuf=252 works <-- let's increase the big buffer by 1, to see if relation is kept. So it seems the initial 1-3 relation might be related to padding between 2 buffers (to be 4-byte/word aligned?). uint64_t + hexd_buff=3833 + buf=128 + buf=128 + namebuf=252 crash <-- Let's increase by 1 again, we break the 4-byte alignemnt so total space increases and we cross the file size "threshlold" line again. uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=250 works <-- here we continue dropping several buffers to check everything works fine. uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=240 works uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=100 works uint64_t + hexd_buff=3833 + buf=32 + buf=128 + namebuf=252 ? <-- Crashing situation, but checking if 1nd buf affects it: it doesn't uint64_t + hexd_buff=3833 + buf=128 + buf=32 + namebuf=252 crash <-- Crashing situation, but checking if 2nd buf affects it: it doesn't uint64_t + hexd_buff=3833 + buf=32 + buf=32 + namebuf=252 crash <-- Crashing situation, but checking if both buf affects it: they don't uint64_t + hexd_buff=3833 + buf=1 + buf=1 + namebuf=252 crash <-- Crashing situation, but checking if both buf affects it: they don't uint64_t + hexd_buff=3833 + no-buf + no-buf + namebuf=252 works <-- Crashing situation, but cremoving __thread from both buf affects it: it does!!! Let's try moving stuff around: no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=255 crash <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 255(+1) + 4 + 4 = 4101?) no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=252 works <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 252 + 4 + 4 = 4097?) no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=254 crash <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 254 + 4 + 4 = 4099?) no-uint64_t + hexd_buff=3836 + buf=1 + buf=1 + namebuf=254 works <-- Crashing situation from above, modified (sum with 4-byte padding: (3836 + 254 + 4 + 4 = 4098?) Let's try moving stuff around (to be 4-bytes without padding): no-uint64_t + hexd_buff=3836 + buf=1 + buf=1 + namebuf=256 works <-- (sum with 4-byte padding: (3836 + 256 + 2 + 2 = 4096) no-uint64_t + hexd_buff=3840 + buf=1 + buf=1 + namebuf=252 works <-- Let's move 4 bytes from namebuf to hexd_buff (sum with 4-byte padding: (3840 + 252 + 2 + 2 = 4096) no-uint64_t + hexd_buff=3844 + buf=1 + buf=1 + namebuf=252 crash <-- Let's add 4 bytes to hexd_buff (sum with 4-byte padding: (3844 + 252 + 2 + 2 = 4100) no-uint64_t + hexd_buff=4088 + buf=1 + buf=1 + namebuf=4 works <-- Let's go back to working situation, but moving most bytes to hexd_buff (sum with 4-byte padding: (4088 +2 +2 + 4 = 4096) no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + namebuf=2 crash <-- Let's go back to working situation, but moving most bytes to hexd_buff (sum with 4-byte padding: (4090 + 2 +2 + 2 = 4096). no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + namebuf=1 crash <-- Let's go back to working situation, but moving most bytes to hexd_buff, seems still padding issues?. no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + no-namebuf works <-- works, so probably namebuf had padding to 4 bytes? no-uint64_t + hexd_buff=4094 + buf=1 + buf=1 + no-namebuf crash <-- Let's increase for bytes to be higher than 4096, then we see it crashes and namebug had 4-byte padding. no-uint64_t + hexd_buff=4094 + no-buf + buf=1 + no-namebuf crash <-- Let's remove the buf we can get rid of to simplify. We are under 4096 but still crashes (maybe it takes 4-byte boundaries?). no-uint64_t + no-hexd_buff + no-buff + buf=4096 + no-namebuf works <-- Getting rid of hexd_buf.... works! no-uint64_t + no-hexd_buff + no-buff + buf=9000 + no-namebuf works <-- Let's go higher than 4096... works! wtf no-uint64_t + hexd_buff=4 + no-buf + buf=9000 + no-namebuf works <-- Going back to have 2 variables, but most of the size is on the second one. Moving buf out of function still makes it work. no-uint64_t + hexd_buff=4000 + no-buf + buf=9000 + no-namebuf works <-- Adding 4000 to hexd_buf, still works. no-uint64_t + hexd_buff=6000 + no-buf + buf=9000 + no-namebuf works <-- Adding 2000 to hexd_buf, still works!!! (moving buf inside the variable too, so being inside or outside doesn't seem to help there). no-uint64_t + hexd_buff=4094 + no-buf + buf=1 + no-namebuf crash <-- Going back to known to crash values. It crashes. So it seems there's no simple threshold at which point it crashes. no-uint64_t + hexd_buff=4094 + no-buf + buf=4 + no-namebuf crah no-uint64_t + hexd_buff=6000 + no-buf + buf=2100 + no-namebuf crash <-- Let's increase hexd_buff to previous working condition. Still crashes. (2nd variable still ends up in 2nd page). no-uint64_t + hexd_buff=6000 + no-buf + buf=6296 + no-namebuf crash <-- Let's increase hexd_buff to previous working condition. Still crashes. (2nd variable now ends up in 4th page). Crash happens in different place! exit()->__call_tls_dtors(). no-uint64_t + hexd_buff=4 + no-buf + buf=9000 + no-namebuf works <-- Let's drop most of size from first var. works! (it also works with buf=6296). no-uint64_t + hexd_buff=4 + no-buf + buf=4092 + no-namebuf works <-- Let's drop most of size from first var. works! (it also works with buf=4098). no-uint64_t + hexd_buff=1 + no-buf + buf=4094 + no-namebuf works <-- So it works, contrary to the hexd_buff=4094 + buf=1, so order matters no-uint64_t + hexd_buff=4094 + no-buf + buf=9000 + no-namebuf works no-uint64_t + hexd_buff=4094 + no-buf + buf=4 + no-namebuf crahes
Updated by pespin over 4 years ago
Submitted a couple related workaround patches to have it working for now:
remote: https://gerrit.osmocom.org/c/libosmocore/+/15024 utils: share static buffer in osmo_str_to{lower,upper}()
remote: https://gerrit.osmocom.org/c/libosmocore/+/15025 Workaround TLS gcc compiler bug in ARM
I tested they work fine (osmo-bsc doesn't crash upon startup like it used to, after applying the patch).
I will document better the 2nd one if needed. I still need to report the issue upstream (not sure if it really make sense other than for inforamtional purposes since it seems to be fixed on newer toolchains).
Updated by Hoernchen over 4 years ago
This looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81142
Updated by laforge over 4 years ago
"gcc-6 (Raspbian 6.5.0-1+rpi1+b1) 6.5.0 20181026" can reproduce the problem, while "gcc (Raspbian 8.3.0-6+rpi1) 8.3.0" cannot.
Updated by laforge over 4 years ago
"gcc-7 (Raspbian 7.3.0-19) 7.3.0" also cannot reproduce the problem. So it seems anything < 7.x for sure has the bug.
Updated by laforge over 4 years ago
"-mtls-dialect=gnu2" seems to be a viable way to avoid the segfault, as indicated in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81142#c12
I just built libosmocore using gcc-6.5 on raspbian 10 using "-O2 -g -mtls-dialect=gnu2" and linked main programs compiled with normal "-O2 -g" and the resulting programs worked as expected.
Updated by laforge over 4 years ago
So now the challenge is to find or write autoconf macros which detect the combination of 32bit arm and the compiler version < 7.1.x, and then tune the CFLAGS accordingly if that is the case.
For the gcc version, there's a suggestion in https://stackoverflow.com/questions/7067385/find-the-gcc-version.
Updated by Hoernchen over 4 years ago
The gnu2 aka TLSDESC model has been marked experimental for ~10 years in https://static.docs.arm.com/ihi0044/g/aaelf32.pdf and clang/lld does not support it except for aarch64 where it's the default - so being a rather arcane non-standard option it's hard to tell how well it works, but I guess we will find that out the same way we found this bug, by just using it.
Updated by pespin over 4 years ago
Patch to automatically workaround this bug (ARM32 and gcc < 7.3.0):
https://gerrit.osmocom.org/c/libosmocore/+/15037 configure: Autodetect TLS bug on ARM with old gcc and apply workaround
BTW, it seems sysmobts SDK's ld doesn't like the new CFLAG, it segfaults.
Updated by pespin over 4 years ago
- Status changed from In Progress to Feedback
- % Done changed from 50 to 90
toolchain we use for sysmobts and oc2g in OE crashes when using the -mtls-dialect=gnu2 CFLAG. However, the toolchain seems to produce correct binaries which don't crash at runtime.
So I added a new commit to libosmocore which allows disabling the automatic workaround we added:
https://gerrit.osmocom.org/c/libosmocore/+/15061 configure: Allow disabling workaround for TLS bug in old ARM gcc versions
I already submitted a patch against meta-telephony laforge/nightly to use it, and also another patch for osmo-gsm-tester to fix the build scripts using SDKs from that toolchain:
https://gerrit.osmocom.org/c/osmo-gsm-tester/+/15067 contrib: Avoid sysmobts/oc2g toolchain ld crashes building new libosmocore
Once those are merged and we cross-check that everything's fine, we can close the ticket.
Updated by pespin over 4 years ago
- Status changed from Feedback to Resolved
- % Done changed from 90 to 100
Everything's merged and included in release from this week, closing this ticket.
Updated by laforge about 4 years ago
- Related to Bug #4456: SIGABRT while opening USB device in osmo-remsim-client-st2 on OpenWRT with musl added
Updated by fixeria almost 3 years ago
- Related to Bug #5079: osmo_str_tolower() segfaults on sysmoBTS (arm-poky-linux-gnueabi) added