Bug #3740
openosmo-gsm-tester: Figure out best fix for iperf3m4.py where some MS fails to register
0%
Description
By looking at osmo-gsm-tester_run_prod history of "gprs" testsuite when used with "sysmobts" in all its variant, it can be seen that test "iperf3m4" fails sporadically with at least 1 MS being unable to register to the network:
"osmo_gsm_tester.log.Error: Failed to find Network Operator {attempts=4, mcc_mnc=('901', '70')}"
Attached all required information to showcase what's going on: pcap file, osmo-gsm-tester log, etc.
These are the MS being used and their IMSIs:
gobi_1 901700000010978 gobi_2 901700000023847 gobi_3 901700000023848 gobi_4 901700000023849
Events as seen from pcap:
1- Some MS try to register with TMSI instead of IMSI and their are Identity Requested but they don't answer and close the channel. We don't care about this right now.
1- gobi_1 LU accepted (fine)
2- gobi_3 LU accepted (but the MS doesn't send back on the uplink a "TMSI Reallocation Complete" message, so channel is left open for 5 second, .45 to .50)
3- gobi_2 LU accepted (fine)
4- gobi_3 LU rejected (cause=congestion, a reject is sent at .50, frame-nr 1353, probably due to 5 seconds timeout without "TMSI Reallocation Complete")
5- gobi_4 accepted
6- gobi_3 tries LU again (probably after congestion period), but it's too late (test is already finishing) and never gets a response.
Why "TMSI Reallocation Complete" is never sent MS->BTS for gobi_3 modem:
According to timestamps in osmo-gsm-tester and pcap file, this is what happens:
1- ogt: 03:26:43.328617 - 03:26:44.403392: during this period of time, gobi_3 modem is set "Online", which means (found experimentally) it can start scanning and/or registering to any network starting from this point
2- pcap: 03:26:45.014113: RACH request from gobi_3 is received on Um (GSMTAP), then channel is allocated
3- pcap: 03:26:45.319336: LU Request from gobi_3 is received on Um (GSMTAP)
4- ogt: 03:26:45.414694: gobi_3 is set by ofono (by osmo-gsm-tester) to start scanning, which means it will start jumping arfcns.
5- pcap: 03:26:45.319336: LU Accept is sent to gobi_3 on Um (GSMTAP, fr-nr 1216). MS probably never receives it or doesn't send a TMSI Reallocation Update because it's already in scan mode.
6- pcap: 03:26:50.336382: After 5 seconds, MSC decides to LU Reject and tears down the channel
Interestingly, after start scanning, /gobi_3 is unable to find the network but all others modems can find it at roughtly the same time (before and after):
03:27:00.935287 tst /gobi_3: DBG: scanned operators: [('/gobi_3/operator/26201', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'TDG', 'MobileNetworkCode': '01', 'MobileCountryCode': '262'}), ('/gobi_3/operator/26202', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'Vodafone.de', 'MobileNetworkCode': '02', 'MobileCountryCode': '262'})] [trial-1698↪gprs:sysmo↪/gobi_3] [modem.py:512] 03:27:08.901940 tst /gobi_3: DBG: scanned operators: [('/gobi_3/operator/26201', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'TDG', 'MobileNetworkCode': '01', 'MobileCountryCode': '262'}), ('/gobi_3/operator/26202', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'Vodafone.de', 'MobileNetworkCode': '02', 'MobileCountryCode': '262'})] [trial-1698↪gprs:sysmo↪/gobi_3] [modem.py:512] 03:27:17.157086 tst /gobi_3: DBG: scanned operators: [('/gobi_3/operator/26201', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'TDG', 'MobileNetworkCode': '01', 'MobileCountryCode': '262'}), ('/gobi_3/operator/26202', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'Vodafone.de', 'MobileNetworkCode': '02', 'MobileCountryCode': '262'})] [trial-1698↪gprs:sysmo↪/gobi_3] [modem.py:512] 03:27:25.285414 tst /gobi_3: DBG: scanned operators: [('/gobi_3/operator/26201', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'TDG', 'MobileNetworkCode': '01', 'MobileCountryCode': '262'}), ('/gobi_3/operator/26202', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'Vodafone.de', 'MobileNetworkCode': '02', 'MobileCountryCode': '262'})] [trial-1698↪gprs:sysmo↪/gobi_3] [modem.py:512]
However, it finally finds the network afterwards, at 03:27:27.537501 it sends a RACH req and at 03:27:27.688234000 it sends an LU Request (IMSI can be checked here).
So, after whole explanation, some questions regarding what's the best way to improve this:
1- Is behavior of the network correct? sending a Reject after an Accept if TMSI Reallocation Complete is not received.
2- As finally the MS seems to be able to find the network ,probably increasing the amount of scan+regsiter attempts in modem.py should be enough to have the MS finally registering to the network (see NETREG_MAX_REGISTER_ATTEMPTS = 3 there).
Files
Updated by pespin about 1 year ago
- Subject changed from Figure out best fix for iperf3m4.py where some MS fails to register to osmo-gsm-tester: Figure out best fix for iperf3m4.py where some MS fails to register
- Priority changed from Normal to Low