Bug #5366
closedttcn3-{bsc,msc,sgsn,smlc,sccp}-test regressions due to timeout waiting for RESET-ACK
Added by fixeria over 2 years ago. Updated over 2 years ago.
100%
Description
Since Dec 17 2021 we started to see massive regressions in the recent test results on Jenkins:
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test/1594/
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-msc-test/1500/
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-sccp-test/707/
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-sgsn-test/1457/
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-smlc-test/431/
Interestingly enough, -latest is generally not affected, except ttcn3-msc-test:
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-msc-test-latest/1167/ (+151 failures)
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-msc-test-latest/1168/ (-152 failures, back to normal)
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-msc-test-latest/1169/ (+152 failures)
Related issues
Updated by fixeria over 2 years ago
I managed to reproduce the failures in a local docker environment. What brought my attention is this message:
M3UA_Emulation.ttcn:554 Warning: Setsockopt error!
which may be originated from several places:
$ grep -r "Setsockopt error" . ./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc: TTCN_warning("Setsockopt error!"); ./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc: TTCN_warning("Setsockopt error!"); ./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc: TTCN_warning("Setsockopt error!"); ./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc: TTCN_warning("Setsockopt error!"); ./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc: TTCN_warning("Setsockopt error!"); ./deps/titan.TestPorts.SCTPasp/doc/SCTPasp_Description.adoc:`*Setsockopt error!*` ./deps/titan.TestPorts.IPL4asp/src/IPL4asp_PT.cc: TTCN_error("Setsockopt error: SO_BROADCAST"); ./deps/titan.TestPorts.IPL4asp/src/IPL4asp_PT.cc: TTCN_warning("Setsockopt error!"); ./deps/titan.TestPorts.UDPasp/src/UDPasp_PT.cc: TTCN_error("Setsockopt error: SO_BROADCAST"); ./deps/titan.TestPorts.UDPasp/src/UDPasp_PT.cc: TTCN_error("Setsockopt error: SO_REUSEADDR"); ./deps/titan.TestPorts.UDPasp/src/UDPasp_PT.cc: TTCN_error("UDP test port: Setsockopt error: SO_BROADCAST"); ./deps/titan.TestPorts.UDPasp/src/UDPasp_PT.cc: TTCN_error("UDP test port: Setsockopt error: SO_BROADCAST");
Most likely, it comes from deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc. I'll continue the investigation...
Updated by fixeria over 2 years ago
- % Done changed from 0 to 10
fixeria wrote in #note-1:
Most likely, it comes from deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc.
Indeed it does. I confirmed this by enabling debug logging:
MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Calling user_start(). MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Leaving user_start(). MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Calling user_map(sctp). MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Running in NORMAL MODE. MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Leaving user_map(). MTC@0b2d69031686: Connecting BSSAP RAN_Emulation to SCCP_SP_PORT MTC@0b2d69031686: Starting RAN_Emulation MSC_Test_0-M3UA(18)@0b2d69031686: ************************************************* MSC_Test_0-M3UA(18)@0b2d69031686: M3UA emulation initiated, the test can be started MSC_Test_0-M3UA(18)@0b2d69031686: ************************************************* MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Calling outgoing_send (ASP_SCTP_CONNECTFROM). MSC_Test_0-SCCP(16)@0b2d69031686: v_sccp_pdu_maxlen:268 MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Creating SCTP socket. MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Setting SCTP socket options (initmsg). MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Setting SCTP socket options (events). MSC_Test_0-M3UA(18)@0b2d69031686: Warning: Setsockopt error!
This narrows down the scope to SCTPasp__PT_PROVIDER::create_socket():
log("Setting SCTP socket options (initmsg).");
if (setsockopt(local_fd, IPPROTO_SCTP, SCTP_INITMSG, &initmsg,
sizeof(struct sctp_initmsg)) < 0)
{
TTCN_warning("Setsockopt error!");
errno = 0;
}
log("Setting SCTP socket options (events).");
if (setsockopt(local_fd, IPPROTO_SCTP, SCTP_EVENTS, &events, sizeof (events)) < 0)
{
TTCN_warning("Setsockopt error!");
errno = 0;
}
Updated by fixeria over 2 years ago
- Related to Bug #4573: [centos] ttcn3-msc-test: 177 failures! added
Updated by fixeria over 2 years ago
- Status changed from In Progress to New
- Assignee changed from fixeria to laforge
- % Done changed from 10 to 40
I think I know what's going on: we're dealing with the same problem as in #4573 - kernel ABI breakage.
This patch was useful during debugging:
commit 6ed17697914e74410004ebd8588d03ee04b97426
Author: Dock Er <docker@dock.er>
Date: Tue Dec 21 17:24:07 2021 +0000
HACK: log() generates event TTCN_ERROR regardless of debug flag
diff --git a/src/SCTPasp_PT.cc b/src/SCTPasp_PT.cc
index 552946c..4ce343d 100644
--- a/src/SCTPasp_PT.cc
+++ b/src/SCTPasp_PT.cc
@@ -1302,17 +1302,13 @@ void SCTPasp__PT_PROVIDER::handle_event(void *buf)
void SCTPasp__PT_PROVIDER::log(const char *fmt, ...)
{
- if(debug)
- {
va_list ap;
va_start(ap, fmt);
- TTCN_Logger::begin_event(TTCN_DEBUG);
+ TTCN_Logger::begin_event(TTCN_ERROR);
TTCN_Logger::log_event("SCTPasp Test Port (%s): ", get_name());
TTCN_Logger::log_event_va_list(fmt, ap);
TTCN_Logger::end_event();
va_end(ap);
-
- }
}
Applying this patch makes the test cases behave as usual:
commit 3046518e09fd9eef8de2ef17c2803420cdef5045 (HEAD)
Author: Dock Er <docker@dock.er>
Date: Tue Dec 21 17:25:07 2021 +0000
create_socket(): workaround kernel ABI breakage
diff --git a/src/SCTPasp_PT.cc b/src/SCTPasp_PT.cc
index 4ce343d..e0c1cec 100644
--- a/src/SCTPasp_PT.cc
+++ b/src/SCTPasp_PT.cc
@@ -1566,8 +1566,11 @@ int SCTPasp__PT_PROVIDER::create_socket(int addr_family)
log("Setting SCTP socket options (events).");
if (setsockopt(local_fd, IPPROTO_SCTP, SCTP_EVENTS, &events, sizeof (events)) < 0)
{
- TTCN_warning("Setsockopt error!");
- errno = 0;
+ /* See https://gerrit.osmocom.org/c/libosmo-netif/+/18097 */
+ if (setsockopt(local_fd, IPPROTO_SCTP, SCTP_EVENTS, &events, 13) < 0)
+ {
+ TTCN_warning("Setsockopt error!");
+ errno = 0;
+ }
}
return local_fd;
}
laforge I believe you have more experience with the kernel API/ABI, so I hand over this task to you.
Updated by laforge over 2 years ago
- Assignee changed from laforge to daniel
- Priority changed from High to Immediate
This needs to be fixed immediately as it breaks most of our test suites. Unfortunately my backlog today alone has grown by another week, so I cannot really look into this anywhere near enough.
The work-around of libosmo* in #4573 needs to be ported into the SCTPasp_PT.
Updated by daniel over 2 years ago
- % Done changed from 40 to 70
I pushed a patch to https://github.com/osmocom/titan.TestPorts.SCTPasp (didn't look like we're doing pull requests there).
Gerrit change for TTCN3 here: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/26674
I wasn't able to reproduce that locally, but I hacked up the jenkins SCCP test for one run to callOSMO_TTCN3_BRANCH=daniel/sctp_events ./jenkins.sh
See https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-sccp-test/713/
Updated by laforge over 2 years ago
- Related to Bug #5368: consider using SCTP_EVENT instead of SCTP_EVENTS added
Updated by daniel over 2 years ago
- Priority changed from Immediate to High
- % Done changed from 70 to 90
Looks like the tests are passing again.
I also created pull requests about this issue at the eclipse gitlab:
https://gitlab.eclipse.org/eclipse/titan/titan.TestPorts.IPL4asp/-/merge_requests/2
https://gitlab.eclipse.org/eclipse/titan/titan.TestPorts.SCTPasp/-/merge_requests/4
Updated by fixeria over 2 years ago
Great, thank you Daniel!
Interestingly enough, -latest is generally not affected, except ttcn3-msc-test:
I am still wondering why? Perhaps the build machines have different kernel versions in use:
- ttcn3-bsc-test: "Building remotely on build2-deb9build-ansible ..."
- ttcn3-bsc-test-latest: "Building remotely on gtp0-deb9build ..."
- ttcn3-msc-test: "Building remotely on build2-deb9build-ansible ..."
- ttcn3-msc-test-latest: "Building remotely on build2-deb9build-ansible ..."
Updated by daniel over 2 years ago
- Status changed from In Progress to Stalled
Waiting for upstream -> stalled
fixeria wrote in #note-10:
Interestingly enough, -latest is generally not affected, except ttcn3-msc-test:
I am still wondering why? Perhaps the build machines have different kernel versions in use:
- ttcn3-bsc-test: "Building remotely on build2-deb9build-ansible ..."
- ttcn3-bsc-test-latest: "Building remotely on gtp0-deb9build ..."
- ttcn3-msc-test: "Building remotely on build2-deb9build-ansible ..."
- ttcn3-msc-test-latest: "Building remotely on build2-deb9build-ansible ..."
Yeah, kernel version most likely.
Updated by fixeria over 2 years ago
- Status changed from Stalled to Resolved
- % Done changed from 90 to 100
daniel wrote in #note-9:
Looks like the tests are passing again.
I also created pull requests about this issue at the eclipse gitlab:
https://gitlab.eclipse.org/eclipse/titan/titan.TestPorts.IPL4asp/-/merge_requests/2
https://gitlab.eclipse.org/eclipse/titan/titan.TestPorts.SCTPasp/-/merge_requests/4
Both pull requests have been merged. Tests are passing, marking as resolved.