Bug #6414
closedttcn3-msc-test: TC_ho_inter_bsc_a5_[134] are failing with io_uring
Added by fixeria about 2 months ago. Updated about 1 month ago.
100%
Description
Since recently, we started executing ttcn3-msc-test (among with some other testsuites) with LIBOSMO_IO_BACKEND=IO_URING
in a separate job. Compared to the "normal" job, the IO_URING one exhibits a regression since build 5 (Mar 15, 2024):
- TC_ho_inter_bsc_a5_3 is failing "reliably",
- TC_ho_inter_bsc_a5_14 are alternating between pass and fail.
I was unable to reproduce the problem, neither locally on my machine nor on the dedicated build server in Docker env.
Files
MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap | MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap | 104 KB | fixeria, 03/22/2024 08:04 AM | ||
MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap | MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap | 105 KB | fixeria, 03/22/2024 08:04 AM | ||
MSC_Tests.TC_ho_inter_bsc_a5_3.merged | MSC_Tests.TC_ho_inter_bsc_a5_3.merged | 557 KB | fixeria, 03/24/2024 09:16 AM | ||
MSC_Tests.TC_ho_inter_bsc.pcap.gz | MSC_Tests.TC_ho_inter_bsc.pcap.gz | 92.4 KB | fixeria, 03/27/2024 06:05 AM |
Updated by fixeria about 2 months ago
- File MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap added
- File MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap added
- Status changed from New to In Progress
I am attaching two PCAP files:
MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap
generated by failing io_uring job, build 11MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap
generated by passing normal job, build 2335
Comparing them side-by-side, I found an interesting diff:
MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap
:- frame 2216 BSSMAP Handover Detect
- frame 2283 BSSMAP Handover Succeeded
- frame 2303 BSSMAP Handover Complete
MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap
:- frame 2216 BSSMAP Handover Detect + BSSMAP Handover Complete
- frame 2347 BSSMAP Handover Succeeded
so in the passing scenario, frame 2216 carries two SCTP chunks.
Updated by fixeria about 2 months ago
fixeria wrote in #note-1:
Comparing them side-by-side, I found an interesting diff:
MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap
:
- frame 2216 BSSMAP Handover Detect
- frame 2283 BSSMAP Handover Succeeded
- frame 2303 BSSMAP Handover Complete
This is also what I am seeing when running the testcase locally, and it's passing. So it's unlikely the problem here.
There is another nitpick though:
MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap
:- frame 2673 BSSMAP Clear Command
- frame 2686 DTAP (CC) Release Complete
- frame 2719 BSSMAP Clear Complete (!)
MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap
:- frame 2675 BSSMAP Clear Command
- frame 2678 DTAP (CC) Release Complete
Note that in the failing scenario, there is no BSSMAP Clear Complete
.
Updated by fixeria about 2 months ago
According to the logging, the testcase gets stuck waiting for some message to be received on a port. Most likely, it's somewhere in f_call_hangup()
, because the last line logged is "f_call_hangup 2: rx MNCC REL ind" before the Tguard timeout happens.
1822 log("f_call_hangup 1: rx DTAP CC DISC"); 1823 1824 if (release_by_ms) { 1825 var BIT1 tid_remote := '1'B; 1826 if (cpars.mo_call) { 1827 tid_remote := '0'B; 1828 } 1829 /* B-side (MS) Release of call */ 1830 BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_RELEASE(cpars.transaction_id, tid_remote, '0000000'B))); 1831 MNCC.receive(tr_MNCC_REL_ind(cpars.mncc_callref)); 1832 log("f_call_hangup 2: rx MNCC REL ind"); 1833 BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id))); // <-- (!) here? 1834 log("f_call_hangup 3: rx DTAP CC REL COMPL"); 1835 } else {
Updated by fixeria about 2 months ago
fixeria wrote in #note-3:
According to the logging, the testcase gets stuck waiting for some message to be received on a port. Most likely, it's somewhere in
f_call_hangup()
, because the last line logged is "f_call_hangup 2: rx MNCC REL ind" before the Tguard timeout happens.
I managed to reproduce the problem on the build server by running the whole testsuite (not the problematic testcases directly). With some modifications, I confirmed my assumption about getting stuck in f_call_hangup()
.
As it turns out, even though the CC Release Complete message is present in both PCAPs, in the failing scenario it never reaches the testsuite. Somehow we receive something else on the BSSAP port, not even matching PDU_DTAP_MT:?
.
Updated by fixeria about 2 months ago
- % Done changed from 0 to 20
fixeria wrote in #note-4:
As it turns out, even though the CC Release Complete message is present in both PCAPs, in the failing scenario it never reaches the testsuite. Somehow we receive something else on the BSSAP port, not even matching
PDU_DTAP_MT:?
.
After hours of debugging and trying to understand why do we get stuck in f_call_hangup()
, I finally made a progress and some interesting discoveries.
Running one of the affected testcases, e.g. MSC_Tests.TC_ho_inter_bsc_a5_3
with this patch (purely logging, no logic changes) applied:
diff --git a/msc/BSC_ConnectionHandler.ttcn b/msc/BSC_ConnectionHandler.ttcn
index 92ec3b15..f03cfcec 100644
--- a/msc/BSC_ConnectionHandler.ttcn
+++ b/msc/BSC_ConnectionHandler.ttcn
@@ -1829,9 +1829,15 @@ runs on BSC_ConnHdlr {
/* B-side (MS) Release of call */
BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_RELEASE(cpars.transaction_id, tid_remote, '0000000'B)));
MNCC.receive(tr_MNCC_REL_ind(cpars.mncc_callref));
- log("f_call_hangup 2: rx MNCC REL ind");
- BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id)));
- log("f_call_hangup 3: rx DTAP CC REL COMPL");
+ log("======================== f_call_hangup 2: rx MNCC REL ind");
+ var PDU_DTAP_MT dtap_mt;
+ BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id))) -> value dtap_mt;
+ if (isbound(dtap_mt)) {
+ log("======================== f_call_hangup 3: got CC REL COMPL: ", dtap_mt);
+ } else {
+ log("======================== f_call_hangup 3: got NOTHING");
+ }
+ log("======================== f_call_hangup 3: continue");
} else {
/* A-side (PLMN) Release of call */
MNCC.send(ts_MNCC_REL_req(cpars.mncc_callref, valueof(ts_MNCC_cause(42))));
diff --git a/msc/MSC_Tests.ttcn b/msc/MSC_Tests.ttcn
index 7f993c4f..c38145a4 100644
--- a/msc/MSC_Tests.ttcn
+++ b/msc/MSC_Tests.ttcn
@@ -179,13 +179,16 @@ private altstep as_Tguard_direct() runs on MTC_CT {
private altstep as_optional_cc_rel(CallParameters cpars, boolean respond := false) runs on BSC_ConnHdlr {
[] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_RELEASE(cpars.transaction_id))) {
- if (respond) {
- var BIT1 tid_remote := '1'B;
- if (cpars.mo_call) {
- tid_remote := '0'B;
- }
- BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_REL_COMPL(cpars.transaction_id, tid_remote)));
+ log("================================ as_optional_cc_rel(): got CC RELEASE");
+ if (respond) {
+ var BIT1 tid_remote := '1'B;
+ if (cpars.mo_call) {
+ tid_remote := '0'B;
}
+ log("================================ as_optional_cc_rel(): sending CC REL COMPL");
+ BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_REL_COMPL(cpars.transaction_id, tid_remote)));
+ }
+ log("================================ as_optional_cc_rel(): not doing repeat!");
}
}
I am seeing the following output when the testcase is passing:
TC_ho_inter_bsc_a5_30(15)@LEGION: ======================== f_call_hangup 2: rx MNCC REL ind TC_ho_inter_bsc_a5_30(15)@LEGION: ================================ as_optional_cc_rel(): got CC RELEASE TC_ho_inter_bsc_a5_30(15)@LEGION: ================================ as_optional_cc_rel(): sending CC REL COMPL TC_ho_inter_bsc_a5_30(15)@LEGION: ================================ as_optional_cc_rel(): not doing repeat! TC_ho_inter_bsc_a5_30(15)@LEGION: ======================== f_call_hangup 3: got NOTHING TC_ho_inter_bsc_a5_30(15)@LEGION: ======================== f_call_hangup 3: continue
This output means that in the successful scenario we never receive CC RELEASE COMPLETE
. The standalone receive statement gets bypassed by altstep as_optional_cc_rel()
running in background.
This definitely sounds confusing, and took me some time to realize, but the important thing here is that altstep as_optional_cc_rel()
does not repeat
.
The standalone BSSAP.receive
statement we have in f_call_hangup()
:
log("f_call_hangup 2: rx MNCC REL ind"); BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id))); log("f_call_hangup 3: rx DTAP CC REL COMPL");
actually evaluates to the following code, given that the altstep as_optional_cc_rel()
is activated in background:
log("f_call_hangup 2: rx MNCC REL ind"); alt { [] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id))); [] as_optional_cc_rel(cpars, true); } log("f_call_hangup 3: rx DTAP CC REL COMPL");
which expands to:
log("f_call_hangup 2: rx MNCC REL ind"); alt { [] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id))); [] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_RELEASE(cpars.transaction_id))) { BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_REL_COMPL(cpars.transaction_id, tid_remote))); // no repeat statement here! } } log("f_call_hangup 3: rx DTAP CC REL COMPL");
With this code, we continue if either of the two DTAP messages is received! Now let's see what happens with io_uring...
Updated by fixeria about 2 months ago
As was discussed with neels in Jabber, the idea is that only one of the two scenarios should happen:
- we receive MT DTAP with
CC RELEASE
and respond with MO DTAPCC RELEASE COMPLETE
, - we send MO DTAP with
CC RELEASE
and expect to receive MT DTAPCC RELEASE COMPLETE
.
I find this a bit weird, specifically this permissive uncertainty on who initiates the call release. Anyways.
Below is my speculative (not confirmed yet) understanding of the failing scenario:
1804 function f_call_hangup(inout CallParameters cpars, boolean release_by_ms, boolean is_csfb := false)
1805 runs on BSC_ConnHdlr {
1806
1807 var MobileIdentityLV mi;
1808 var MNCC_PDU mncc;
1809 var MgcpCommand mgcp_cmd;
1810 var boolean respond_to_dlcx;
1811 var boolean dlcx_contained_ci := false;
1812 var template PDU_BSSAP t_clear := tr_BSSMAP_ClearCommand;
1813
1814 if (is_csfb) {
1815 t_clear := tr_BSSMAP_ClearCommandCSFB;
1816 }
1817
1818 log("f_call_hangup 0: tx MNCC_DISC_REQ");
1819 MNCC.send(ts_MNCC_DISC_req(cpars.mncc_callref, valueof(ts_MNCC_cause(23))));
1820 BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_DISC(cpars.transaction_id)));
1821 // ^^^^^^^^^^ this receive() operation gets skipped if as_optional_cc_rel() triggers here
1822 // may happen if MT DTAP with CC RELEASE is received earlier than this message
1823 // as a result, MT DTAP with CC DISONNECT remains in the queue of port BSSAP
1824
1825 log("f_call_hangup 1: rx DTAP CC DISC");
1826
1827 if (release_by_ms) {
1828 var BIT1 tid_remote := '1'B;
1829 if (cpars.mo_call) {
1830 tid_remote := '0'B;
1831 }
1832 /* B-side (MS) Release of call */
1833 BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_RELEASE(cpars.transaction_id, tid_remote, '0000000'B)));
1834 MNCC.receive(tr_MNCC_REL_ind(cpars.mncc_callref));
1835 // ^^^^^^^^^ this receive() may also be skipped due to as_optional_cc_rel() triggering here
1836 // but this is unlikely to affect the outcome, because we do not receive() on port MNCC anymore
1837 log("f_call_hangup 2: rx MNCC REL ind");
1838 BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id)));
1839 // ^^^^^^^^^ this receive() blocks the testcase execution until Tguard expires
1840 // no mater if we receive MT DTAP with CC RELEASE COMPLETE or not,
1841 // the port queue is jammed due to CC DISONNECT was not dequeued above
1842 log("f_call_hangup 3: rx DTAP CC REL COMPL");
1843 } else {
1844 /* A-side (PLMN) Release of call */
1845 MNCC.send(ts_MNCC_REL_req(cpars.mncc_callref, valueof(ts_MNCC_cause(42))));
1846 BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_RELEASE(cpars.transaction_id)));
1847 log("f_call_hangup 2.a: rx DTAP CC RELEASE");
1848 BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_REL_COMPL(cpars.transaction_id)));
1849 log("f_call_hangup 3.a: rx MNCC REL cnf");
1850 MNCC.receive(tr_MNCC_REL_cnf(cpars.mncc_callref, cause := *));
1851 }
Updated by fixeria about 2 months ago
- File MSC_Tests.TC_ho_inter_bsc_a5_3.merged MSC_Tests.TC_ho_inter_bsc_a5_3.merged added
- Status changed from In Progress to Feedback
- % Done changed from 20 to 30
fixeria wrote in #note-6:
Below is my speculative (not confirmed yet) understanding of the failing scenario: [...]
Interestingly enough, below I assumed that as_optional_cc_rel()
triggering at line 1834 (near MNCC.receive
) would unlikely affect the testcase outcome, but it does!
I reproduced the problem with some additional logging, and below is a brief sequence of events (see attached log file):
* [1572] "17:12:59.867039 1384 MSC_Tests.ttcn:5859 Altstep as_optional_cc_rel was activated as default, id 4" * [1594] "17:12:59.871612 1384 BSC_ConnectionHandler.ttcn:1818 f_call_hangup 0: tx MNCC_DISC_REQ" * [1611] "17:12:59.876589 1384 BSC_ConnectionHandler.ttcn:1820 Message enqueued on BSSAP from MSC_Test_0-RAN(1374) @RAN_Emulation.PDU_DTAP_MT ..." -- Rx CC DISCONNECT, id 21 * [1614] "17:12:59.877007 1384 BSC_ConnectionHandler.ttcn:1820 Message with id 21 was extracted from the queue of BSSAP." * [1615] "17:12:59.877042 1384 BSC_ConnectionHandler.ttcn:1822 f_call_hangup 1: rx DTAP CC DISC"
So far so good. While I assumed that as_optional_cc_rel()
triggers here, it does not, and CC DISCONNECT
gets dequeued as expected.
* [1616] "17:12:59.877642 1384 BSC_ConnectionHandler.ttcn:1830 Sent on BSSAP to MSC_Test_0-RAN(1374) @RAN_Emulation.PDU_DTAP_MO ..." -- Tx CC RELEASE (cause=0) * [1656] "17:13:19.878345 1384 BSC_ConnectionHandler.ttcn:1831 Message enqueued on BSSAP from MSC_Test_0-RAN(1374) @RAN_Emulation.PDU_DTAP_MT ..." -- Rx CC RELEASE (cause=47), id 22 * [1659] "17:13:19.878505 1379 MNCC_Emulation.ttcn:308 Sent on MNCC_CLIENT to TC_ho_inter_bsc_a5_30(1384) @MNCC_Types.MNCC_PDU ..." -- Rx MNCC_REL_IND (not enqueued yet) * [1662] "17:13:19.878657 1384 MSC_Tests.ttcn:181 Receive operation on port BSSAP succeeded, message from MSC_Test_0-RAN(1374): @RAN_Emulation.PDU_DTAP_MT ..." -- CC RELEASE (cause=47), id 22 * [1664] "17:13:19.878697 1384 MSC_Tests.ttcn:181 Message with id 22 was extracted from the queue of BSSAP." * [1665] "17:13:19.878732 1384 MSC_Tests.ttcn:182 ================================ as_optional_cc_rel(): got CC RELEASE" * [1672] "17:13:19.878944 1384 MSC_Tests.ttcn:188 ================================ as_optional_cc_rel(): sending CC REL COMPL" * [1673] "17:13:19.879275 1384 MSC_Tests.ttcn:189 Sent on BSSAP to MSC_Test_0-RAN(1374) @RAN_Emulation.PDU_DTAP_MO ..." -- Tx CC RELEASE COMPLETE * [1674] "17:13:19.879382 1384 MSC_Tests.ttcn:191 ================================ as_optional_cc_rel(): not doing repeat!" * [1675] "17:13:19.879415 1384 BSC_ConnectionHandler.ttcn:1831 Default with id 4 (altstep as_optional_cc_rel) finished. Skipping current alt statement or receiving operation." -- (!) skipping MNCC.receive * [1676] "17:13:19.879445 1384 BSC_ConnectionHandler.ttcn:1832 ======================== f_call_hangup 2: rx MNCC REL ind" -- this is false, we have not received anything on MNCC port yet
Here we see that as_optional_cc_rel()
triggers while we're waiting for MNCC_REL_IND
on port MNCC
(see BSC_ConnectionHandler.ttcn:1831), and this receive operation gets skipped.
* [1678] "17:13:19.879620 1384 BSC_ConnectionHandler.ttcn:1834 Message enqueued on BSSAP from MSC_Test_0-RAN(1374) @BSSAP_Types.PDU_BSSAP" -- Rx BSSAP CLEAR COMMAND * [1680] "17:13:19.879760 1384 BSC_ConnectionHandler.ttcn:1834 Message enqueued on MNCC from MSC_Test-MNCC(1379) @MNCC_Types.MNCC_PDU" -- Rx MNCC_REL_IND
Here we get a BSSAP CLEAR COMMAND
from osmo-msc, which is likely triggered by CC RELEASE COMPLETE
, which was sent from as_optional_cc_rel()
?
Then the MNCC_REL_IND
is received on port MNCC, which we will never dequeue because the receive operation was skipped by as_optional_cc_rel()
above.
* [1693] "17:13:49.878418 1374 RAN_Emulation.ttcnpp:1246 Message enqueued on BSSAP from MSC_Test_0-SCCP(1373) @SCCPasp_Types.ASP_SCCP_N_DISCONNECT_ind" * [1703] "17:13:49.878981 1384 BSC_ConnectionHandler.ttcn:1834 Message enqueued on BSSAP from MSC_Test_0-RAN(1374) @RAN_Emulation.RAN_Conn_Prim : MSC_CONN_PRIM_DISC_IND (0) id 24" * [1706] "17:13:49.958098 1384 BSC_ConnectionHandler.ttcn:203 Timeout g_Tguard: 60 s" * [1707] "17:13:49.958277 1384 BSC_ConnectionHandler.ttcn:204 setverdict(fail): pass -> fail reason: "Tguard timeout", new component reason: "Tguard timeout""
Looks like the MSC terminates the SCCP connection, and finally the Tguard
expires.
As can be seen, we never receive CC RELEASE COMPLETE
, neither in successful nor in the failing scenario. It's just a lucky coincidence that in the successful scenario the as_optional_cc_rel()
"unblocks" the flow by skipping BSSAP.receive
operation. In the failing scenario (with io_uring) it simply happens earlier, so the execution flow gets blocked until the Tguard
expires.
neels I believe we need to rework the testcase, since we cannot rely on pure luck and hope that the altstep triggers in the right time. If you have ideas how to do this properly, please let me know! And feel free to take this ticket over if you wish.
Updated by neels about 2 months ago
On Sun, Mar 24, 2024 at 10:31:49AM +0000, redmine@osmocom.org wrote:
neels I believe we need to rework the testcase, since we cannot rely on pure luck and hope that the altstep triggers in the right time. If you have ideas how to do this properly, please let me know! And feel free to take this ticket over if you wish.
That code feels distant to me now, and while I might remember a thing or two, I think it's much more of an advantage that you already looked at tall the details. So if you have any ideas to solve it, they are probably good, and please go ahead! I would have to first read everything again and wrap my head around the symptoms -- or if you have a specific idea we can chat about it?
Updated by fixeria about 2 months ago
- Status changed from Feedback to In Progress
neels wrote in #note-8:
That code feels distant to me now, and while I might remember a thing or two, I think it's much more of an advantage that you already looked at tall the details.
I know this feeling. I just hoped that you would remember your own code quicker than me understanding and finding problems in it :P
So if you have any ideas to solve it, they are probably good, and please go ahead! I would have to first read everything again and wrap my head around the symptoms [...]
I didn't have any ideas at the moment of writing this, and all I had to do is dig into the code and try to understand what goes wrong. And I think I did.
Updated by fixeria about 2 months ago
- File MSC_Tests.TC_ho_inter_bsc.pcap.gz MSC_Tests.TC_ho_inter_bsc.pcap.gz added
- % Done changed from 30 to 60
Below is logging that piqued my curiosity:
msc_a(IMSI-262420000000053:MSISDN-491230000053:TMSI-0x80C2D1DC:GERAN-A-2347:CM_SERVICE_REQ)[0x55f012dee8e0]{MSC_A_ST_COMMUNICATING}: Duplicate DTAP: bin=0, expected n_sd == 0, got 3 (ran_msg.c:159) msc_a(IMSI-262420000000053:MSISDN-491230000053:TMSI-0x80C2D1DC:GERAN-A-2347:CM_SERVICE_REQ)[0x55f012dee8e0]{MSC_A_ST_COMMUNICATING}: Dropping duplicate message (3GPP TS 24.007 11.2.3.2 Message Type Octet / Duplicate Detection) (msc_a.c:1363)
This can be seen in both successful and failing scenarios when running TC_ho_inter_bsc_a5_*
, but not when running other non-A5 testcases:
- TC_ho_inter_bsc
- TC_ho_inter_bsc_ipv6
- TC_ho_inter_bsc_csd
What I also noticed [later] is that the execution time is different: ~12s for non-A5 testcases vs ~32s for A5 testcases. And the BSSAP message flow looks slightly different!
This brings us to the following code snippet in MSC_Tests.ttcn
:
5857 var default ccrel := activate(as_optional_cc_rel(cpars, true));
5858
5859 /* blatant cheating */
5860 var N_Sd_Array last_n_sd := f_bssmap_last_n_sd();
5861 last_n_sd[0] := 3;
5862 f_bssmap_continue_after_n_sd(last_n_sd);
5863
5864 f_call_hangup(cpars, true);
5865 f_sleep(1.0);
5866 deactivate(ccrel);
This code orders the RAN emulation component to patch sequence number in DTAP messages starting from 3.
If I remove this "blatant cheating" block, then I am observing the following situation:
- non-A5 testcases take longer to execute and make osmo-msc complain about unexpected
n_sd
, - the A5 testcases run quicker and trigger no such warnings anymore.
So what's so special about the A5 testcases? Answer: the authentication! What?!?
Let's take a look at another PCAP from passing normal job, build 2335 MSC_Tests.TC_ho_inter_bsc
and compare it against MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap
:
$ tshark -T fields -e frame.number -e gsm_a.dtap.seq_no -e _ws.col.Info -Y gsm_a.dtap.seq_no -r MSC_Tests.TC_ho_inter_bsc.pcap.gz 386 0 SACK (Ack=5, Arwnd=106496) CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) Location Updating Request 612 0 SACK (Ack=4, Arwnd=106496) DT1 (DTAP) (MM) Location Updating Accept 677 1 SACK (Ack=8, Arwnd=106496) DT1 (DTAP) (MM) TMSI Reallocation Complete 859 0 CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) CM Service Request 1006 0 DT1 (DTAP) (MM) CM Service Accept 1013 1 SACK (Ack=13, Arwnd=106496) DT1 (DTAP) (CC) Setup 1235 0 SACK (Ack=10, Arwnd=106496) DT1 (DTAP) (CC) Call Proceeding 1275 0 DT1 (DTAP) (CC) Alerting 1322 0 DT1 (DTAP) (CC) Connect 1342 2 SACK (Ack=17, Arwnd=106496) DT1 (DTAP) (CC) Connect Acknowledge 2208 0 DT1 (DTAP) (CC) Disconnect 2210 3 SACK (Ack=24, Arwnd=106496) DT1 (DTAP) (CC) Release 2337 0 SACK (Ack=19, Arwnd=106496) DT1 (DTAP) (CC) Release Complete $ tshark -T fields -e frame.number -e gsm_a.dtap.seq_no -e _ws.col.Info -Y gsm_a.dtap.seq_no -r MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap 375 0 SACK (Ack=4, Arwnd=106496) CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) Location Updating Request 551 0 SACK (Ack=4, Arwnd=106496) DT1 (DTAP) (MM) Authentication Request 553 1 SACK (Ack=6, Arwnd=106496) DT1 (DTAP) (MM) Authentication Response 802 0 DT1 (DTAP) (MM) Location Updating Accept 805 2 DT1 (DTAP) (MM) TMSI Reallocation Complete 987 0 CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) CM Service Request 1126 0 DT1 (DTAP) (MM) Authentication Request 1129 1 DT1 (DTAP) (MM) Authentication Response 1259 2 SACK (Ack=15, Arwnd=106496) DT1 (DTAP) (CC) Setup 1495 0 SACK (Ack=14, Arwnd=106496) DT1 (DTAP) (CC) Call Proceeding 1521 0 DT1 (DTAP) (CC) Alerting 1568 0 DT1 (DTAP) (CC) Connect 1588 3 SACK (Ack=19, Arwnd=106496) DT1 (DTAP) (CC) Connect Acknowledge 2480 0 DT1 (DTAP) (CC) Disconnect 2482 3 SACK (Ack=26, Arwnd=106496) DT1 (DTAP) (CC) Release 2671 0 DT1 (DTAP) (CC) Release 2686 0 DT1 (DTAP) (CC) Release Complete
Note presence of the Authentication Request/Response
, and how this affects sequence numbers (second column) in the messages.
- in
MSC_Tests.TC_ho_inter_bsc.pcap.gz
no authentication is performed- [frame 2210]
CC RELEASE
is sent with the correct sequence number (3), so that - [frame 2337] the network immediately responds with
CC RELEASE COMPLETE
and clears connection as expected.
- [frame 2210]
- in
MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap
the authentication is performed- [frame 1129]
MM Authentication Response
does shift sequence number of all pending messages (basically +1), - [frame 1588]
CC Connect Acknowledge
is the last message with correct sequence number, - [frame 2482]
CC Release
gets discarded by osmo-msc because it should have sequence number 0:(3 + 1) % 4 == 0
- [frame 2671] osmo-msc releases the call itself due to timeout of timeout of T306
- [frame 2686]
as_optional_cc_rel()
triggers and responds withCC Release Complete
- [frame 1129]
So now that I have the full picture, my proposal is to bring those A5 and non-A5 testcases into consistence by having authentication enabled or disabled for them all.
Updated by fixeria about 2 months ago
- Status changed from In Progress to Feedback
- % Done changed from 60 to 100
fixeria wrote in #note-10:
So now that I have the full picture, my proposal is to bring those A5 and non-A5 testcases into consistence by having authentication enabled or disabled for them all.
I found a better solution, fixing the "blatant cheating":
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36467 msc: fix f_tc_ho_inter_bsc0(): properly patch n_sd [NEW]
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36468 msc: cosmetic: fix formatting in altsteps [NEW]
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36469 msc: fix missing 'repeat' in as_optional_cc_rel() [NEW]
This fixes the problems with sequence counter, while keeping ciphering & authentication enabled only for the A5-testcases.
Updated by fixeria about 1 month ago
- Status changed from Feedback to Resolved
All patches have been merged, the problem is resolved \o/.
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-msc-test-io_uring/19/
Updated by neels about 1 month ago
All patches have been merged, the problem is resolved \o/.
\o/
suggestion: OsmoDevCon shirt with just the word 'repeat' on it.