Project

General

Profile

Actions

Bug #6439

closed

ttcn3-sgsn-test: SGSN_Tests.TC_attach_rau_a_b crashes osmo-sgsn

Added by fixeria 12 days ago. Updated 7 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/21/2024
Due date:
% Done:

100%

Spec Reference:
Tags:

Description

Starting from build 2319 we see a significant regression in ttcn3-sgsn-test:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-sgsn-test/2319/

Interestingly enough, this is also affecting the -latest:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-sgsn-test-latest/2022/

I was able to reproduce the problem locally by running SGSN_Tests.TC_attach_rau_a_b two times in a row.


Files


Related issues

Related to OsmoSGSN - Feature #6294: Support GN/Gp interoperation procedures between SGSN and MMENewdaniel12/07/2023

Actions
Related to OsmoSGSN - Bug #6441: use-after-free on RAU with invalid Old RAINew04/22/2024

Actions
Actions #1

Updated by fixeria 12 days ago

20240421165147281 DLGSUP INFO GSUP link to 127.0.0.103:4222 DOWN (gsup_client.c:151)
20240421165148281 DLGSUP NOTICE GSUP connecting to 127.0.0.103:4222 (gsup_client.c:74)
20240421165148281 DLGSUP INFO GSUP link to 127.0.0.103:4222 DOWN (gsup_client.c:151)
20240421165149282 DLGSUP NOTICE GSUP connecting to 127.0.0.103:4222 (gsup_client.c:74)
20240421165149282 DLGSUP INFO GSUP link to 127.0.0.103:4222 DOWN (gsup_client.c:151)
20240421165149841 DLGLOBAL INFO Accept()ed new telnet connection r=127.0.0.1:43888<->l=127.0.0.10:4245 (telnet_interface.c:192)
20240421165149843 DMM INFO MM(262420000000038/e95c24ac) Cancelled, deleting context silently (gprs_gmm.c:1056)
20240421165149843 DMM INFO MM(262420000000038/e95c24ac) Cleaning MM context due to access cancelled (gprs_gmm.c:195)
20240421165149843 DMM DEBUG GMM(gmm_fsm)[0x55555574c700]{Registered.NORMAL}: Received Event E_GMM_CLEANUP (gprs_gmm.c:198)
20240421165149843 DMM DEBUG GMM(gmm_fsm)[0x55555574c700]{Registered.NORMAL}: state_chg to Deregistered (gprs_gmm_fsm.c:223)
20240421165149843 DMM DEBUG MM_STATE_Gb(0)[0x55555574c960]{Ready}: Received Event E_MM_GPRS_DETACH (gprs_gmm.c:205)
20240421165149843 DMM DEBUG MM_STATE_Gb(0)[0x55555574c960]{Ready}: state_chg to Idle (gprs_mm_state_gb_fsm.c:76)
20240421165149843 DLLC NOTICE LLME(527b5d30/36396334){(null)} LLGM Assign pre (36396334 => ffffffff) (gprs_llc.c:1079)
20240421165149843 DLLC NOTICE LLME(00000000/00000000){UNASSIGNED} LLGM Assign post (36396334 => ffffffff) (gprs_llc.c:1125)

Program received signal SIGABRT, Aborted.
0x00007ffff75e332c in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff75e332c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff75926c8 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff757a4b8 in abort () from /usr/lib/libc.so.6
#3  0x00007ffff78270d3 in ?? () from /usr/lib/libtalloc.so.2
#4  0x000055555557dd74 in llme_free (llme=0x555555753550) at ../../../../src/osmo-sgsn/src/sgsn/gprs_llc.c:605
#5  gprs_llgmm_assign (llme=0x555555753550, old_tlli=909730612, new_tlli=new_tlli@entry=4294967295) at ../../../../src/osmo-sgsn/src/sgsn/gprs_llc.c:1129
#6  0x000055555557e07d in gprs_llgmm_unassign (llme=<optimized out>) at ../../../../src/osmo-sgsn/src/sgsn/gprs_llc.c:1137
#7  0x00005555555673f5 in st_mm_idle_on_enter (fi=<optimized out>, prev_state=<optimized out>) at ../../../../src/osmo-sgsn/src/sgsn/gprs_mm_state_gb_fsm.c:51
#8  0x00007ffff797f7e0 in state_chg (fi=fi@entry=0x55555574c960, new_state=new_state@entry=0, keep_timer=keep_timer@entry=false, timeout_ms=timeout_ms@entry=0, T=0, 
    file=file@entry=0x55555558d838 "../../../../src/osmo-sgsn/src/sgsn/gprs_mm_state_gb_fsm.c", line=76) at ../../../../src/libosmocore/src/core/fsm.c:697
#9  0x00007ffff7980180 in _osmo_fsm_inst_state_chg (fi=fi@entry=0x55555574c960, new_state=new_state@entry=0, timeout_secs=timeout_secs@entry=0, T=<optimized out>, 
    file=file@entry=0x55555558d838 "../../../../src/osmo-sgsn/src/sgsn/gprs_mm_state_gb_fsm.c", line=line@entry=76) at ../../../../src/libosmocore/src/core/fsm.c:746
#10 0x00007ffff799b090 in _osmo_tdef_fsm_inst_state_chg (fi=fi@entry=0x55555574c960, state=state@entry=0, timeouts_array=timeouts_array@entry=0x55555559a8c0 <mm_state_gb_fsm_timeouts>, tdefs=<optimized out>, 
    default_timeout=93824992461304, default_timeout@entry=-1, file=file@entry=0x55555558d838 "../../../../src/osmo-sgsn/src/sgsn/gprs_mm_state_gb_fsm.c", line=76) at ../../../../src/libosmocore/src/core/tdef.c:344
#11 0x0000555555567358 in st_mm_ready (fi=0x55555574c960, event=<optimized out>, data=<optimized out>) at ../../../../src/osmo-sgsn/src/sgsn/gprs_mm_state_gb_fsm.c:76
#12 0x00007ffff79803bc in _osmo_fsm_inst_dispatch (fi=0x55555574c960, event=event@entry=1, data=data@entry=0x0, file=file@entry=0x55555558c1f8 "../../../../src/osmo-sgsn/src/sgsn/gprs_gmm.c", line=line@entry=205)
    at ../../../../src/libosmocore/src/core/fsm.c:875
#13 0x000055555555efbc in mm_ctx_cleanup_free (ctx=0x55555574c060, log_text=0x55555559768a "access cancelled") at ../../../../src/osmo-sgsn/src/sgsn/gprs_gmm.c:205
#14 0x0000555555575eaa in reset_sgsn_state (self=<optimized out>, vty=0x5555556d5f90, argc=<optimized out>, argv=<optimized out>) at ../../../../src/osmo-sgsn/src/sgsn/sgsn_vty.c:1052
#15 0x00007ffff79c7445 in cmd_execute_command_real (vline=<optimized out>, vty=<optimized out>, cmd=cmd@entry=0x101508d7ae252000) at ../../../../src/libosmocore/src/vty/command.c:2671
#16 0x00007ffff79c7f1d in cmd_execute_command (vline=<optimized out>, vty=<optimized out>, cmd=0x101508d7ae252000, vtysh=<optimized out>) at ../../../../src/libosmocore/src/vty/command.c:2723
#17 0x00007ffff79ca696 in vty_command (vty=0x5555556d5f90) at ../../../../src/libosmocore/src/vty/vty.c:464
#18 vty_execute (vty=0x5555556d5f90) at ../../../../src/libosmocore/src/vty/vty.c:729
#19 vty_read (vty=<optimized out>) at ../../../../src/libosmocore/src/vty/vty.c:1471
#20 0x00007ffff79cd3ae in client_data (fd=0x55555574ba68, what=1) at ../../../../src/libosmocore/src/vty/telnet_interface.c:161
#21 0x00007ffff798f94f in poll_disp_fds (n_fd=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:419
#22 _osmo_select_main (polling=polling@entry=0) at ../../../../src/libosmocore/src/core/select.c:457
#23 0x00007ffff798fa2e in osmo_select_main (polling=polling@entry=0) at ../../../../src/libosmocore/src/core/select.c:496
#24 0x000055555555d4e7 in main (argc=<optimized out>, argv=<optimized out>) at ../../../../src/osmo-sgsn/src/sgsn/sgsn_main.c:498
Actions #2

Updated by fixeria 12 days ago

  • Status changed from New to In Progress

git-bisect tells me that one of my recent commits is the culprit:

1ee1edd20e7e9d8e844de1e4826cfa9d9cba10ca is the first bad commit
commit 1ee1edd20e7e9d8e844de1e4826cfa9d9cba10ca
Author: Vadim Yanitskiy <vyanitskiy@sysmocom.de>
Date:   Sat Apr 6 05:35:39 2024 +0700

    sgsn: fix unused param in f_routing_area_update()

    Change-Id: I582b2836eba3188cfd4c640b1febbd25c50001e0
    Related: SYS#6603, OS#6294

 sgsn/SGSN_Tests.ttcn | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36587

Actions #3

Updated by fixeria 12 days ago

  • Related to Feature #6294: Support GN/Gp interoperation procedures between SGSN and MME added
Actions #4

Updated by fixeria 12 days ago

fixeria wrote in #note-2:

git-bisect tells me that one of my recent commits is the culprit:

[...]

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36587

  • SGSN_Tests.TC_attach_rau_a_b is the only testcase passing a different RAI than g_pars.ra to f_routing_area_update().
  • Before my commit, that different RAI would be ignored and g_pars.ra would be used internally by f_routing_area_update() (bug).
    • ... this was making SGSN_Tests.TC_attach_rau_a_b identical to SGSN_Tests.TC_attach_rau_a_a.
  • So my commit actually made the testcase behave the way it was supposed to behave! And uncovered a problem in osmo-sgsn.

I am attaching two PCAPs:

  • bogus (before the regression)
  • fixed (osmo-sgsn crashes)

As can be seen from the bogus PCAP (check frames 399 and 433), the RAI being sent is the same in both RAUs:

Routing Area Identification - Old routing area identification - RAI: 262-42-13135-0
    Routing area identification: 262-42-13135-0
        Mobile Country Code (MCC): Germany (262)
        Mobile Network Code (MNC): Vodafone GmbH (42)
        Location Area Code (LAC): 0x334f (13135)
        Routing Area Code (RAC): 0x00 (0)

In the "fixed" PCAP, the first and second RAUs carry different RAI values:

Routing Area Identification - Old routing area identification - RAI: 262-42-13135-0
    Routing area identification: 262-42-13135-0
        Mobile Country Code (MCC): Germany (262)
        Mobile Network Code (MNC): Vodafone GmbH (42)
        Location Area Code (LAC): 0x334f (13135)
        Routing Area Code (RAC): 0x00 (0)

Routing Area Identification - Old routing area identification - RAI: 262-42-13200-0
    Routing area identification: 262-42-13200-0
        Mobile Country Code (MCC): Germany (262)
        Mobile Network Code (MNC): Vodafone GmbH (42)
        Location Area Code (LAC): 0x3390 (13200)
        Routing Area Code (RAC): 0x00 (0)

So, most likely, the problem is in osmo-sgsn and needs to be fixed there.

Actions #5

Updated by fixeria 12 days ago

  • Status changed from In Progress to Feedback
  • % Done changed from 20 to 100

fixeria wrote in #note-4:

As can be seen from the bogus PCAP (check frames 399 and 433), the RAI being sent is the same in both RAUs:

[...]

In the "fixed" PCAP, the first and second RAUs carry different RAI values:

[...]

I actually misunderstood the testcase scenario. What I was talking above is the Old RAI IE (GMM layer), which should be the same in both RAUs.
There's also RAI at the BSSGP layer, which unlike the Old RAI should obviously indicate the new RAI.

So, most likely, the problem is in osmo-sgsn and needs to be fixed there.

It's actually both:

  • regression in the testsuite due to my patch, and
  • use-after-free problem in osmo-sgsn, which was uncovered by broken testcase scenario.

Here is a fixup-patch fixing the testsuite:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36613 fixup: sgsn: fix unused param in f_routing_area_update() [NEW]

I'll prepare a separate testcase triggering the use-after-free problem and open a separate ticket for it.

Actions #6

Updated by fixeria 11 days ago

  • Related to Bug #6441: use-after-free on RAU with invalid Old RAI added
Actions #7

Updated by fixeria 7 days ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)