Bug #4573
closed[centos] ttcn3-msc-test: 177 failures!
100%
Description
See https://jenkins.osmocom.org/jenkins/view/TTCN3-centos/job/TTCN3-centos-msc-test/2/.
Here I what I noticed in the logs of osmo-stp (build artifacts):
20200531003124852 DLGLOBAL <0000> telnet_interface.c:104 Available via telnet 127.0.0.1 4239 20200531003126073 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003126073 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003128331 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003131074 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003131075 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003136075 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003136076 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003138405 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003141077 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003141077 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8 20200531003146078 DLINP <0002> stream.c:113 couldn't activate SCTP events on FD 8
and osmo-msc:
20200531003126073 DSGS <0011> sgs_server.c:186 SGs socket bound to r=NULL<->l=0.0.0.0:29118 20200531003126073 DMSC <0006> msc_main.c:697 A-interface: SCCP user OsmoMSC-A:RI=SSN_PC,PC=(no PC),SSN=BSSAP, cs7-instance 0 ((null)) 20200531003126073 DMSC <0006> msc_main.c:716 Iu-interface: SCCP user OsmoMSC-IuCS:RI=SSN_PC,PC=(no PC),SSN=RANAP, cs7-instance 0 ((null)) 20200531003126073 DLINP <0015> stream.c:113 couldn't activate SCTP events on FD 12 20200531003126073 DLSS7 <001f> xua_default_lm_fsm.c:354 xua_default_lm(asp-clnt-OsmoMSC-A)[0x831380]{WAIT_ASP_UP}: Ignoring primitive M-ASP_DOWN.indication 20200531003126073 DLINP <0015> stream.c:269 [WAIT_RECONNECT] osmo_stream_cli_write(): not connected, dropping data!
The origin of this error message is libosmo-netif's sctp_sock_activate_events():
/* IMPORTANT: Do NOT enable sender_dry_event here, see
* https://bugzilla.redhat.com/show_bug.cgi?id=1442784 */
rc = setsockopt(fd, IPPROTO_SCTP, SCTP_EVENTS,
&event, sizeof(event));
if (rc < 0)
LOGP(DLINP, LOGL_ERROR, "couldn't activate SCTP events "
"on FD %u\n", fd);
Related issues
Updated by fixeria almost 4 years ago
Huh, build#3 is ok (-173 failures). Still would be good to know what was the reason.
https://jenkins.osmocom.org/jenkins/view/TTCN3-centos/job/TTCN3-centos-msc-test/3/
Updated by laforge almost 4 years ago
On Sun, May 31, 2020 at 06:37:19PM +0000, fixeria [REDMINE] wrote:
> /* IMPORTANT: Do NOT enable sender_dry_event here, see > * https://bugzilla.redhat.com/show_bug.cgi?id=1442784 */ > rc = setsockopt(fd, IPPROTO_SCTP, SCTP_EVENTS, > &event, sizeof(event)); > > if (rc < 0) > LOGP(DLINP, LOGL_ERROR, "couldn't activate SCTP events " > "on FD %u\n", fd); >
sigh. This is indeed most likely a consequence of https://bugzilla.redhat.com/show_bug.cgi?id=1442784
which means that containers are no longer potable across kernels, if they are using
different definitions...
Updated by laforge almost 4 years ago
- Assignee set to laforge
We already introduced a work-around in https://gerrit.osmocom.org/c/libosmo-netif/+/18097.
I just checked:- centos8 still has a kernel before 5.5, i.e. without the additional sctp_send_failure_event_event member of the struct.
- host2 has kernel 4.9.189, also without the additional sctp_send_failure_event_event
So I'm not quit sure what is causing the incompatibility here...
Updated by laforge almost 4 years ago
fixeria wrote:
Huh, build#3 is ok (-173 failures). Still would be good to know what was the reason.
build#3 was running on build2.osmocom.org, while build#2 was running on host2.osmocom.org
- build2: Debian 10 / Linux build2.osmocom.org 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux
- host2: Debian 9 / Linux host2.osmocom.org 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/Linux
So there appears to be an incompatibility specifically with Centos8 containers on a Debian9 kernel?
Updated by fixeria almost 4 years ago
- Related to Bug #4570: TTCN3-centos-bsc-test: 159 failing tests added
Updated by laforge almost 4 years ago
laforge wrote:
I've created a fresh debian9 qemu-kvm VMSo there appears to be an incompatibility specifically with Centos8 containers on a Debian9 kernel?
- running "Linux d9dc8sctp 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64 GNU/Linux"
- installed docker-ce
- built the ttcn3-msc-test container and the osmo-msc-master-centos8 container
- ran the test suite
And indeed:
DLINP <0015> stream.c:113 couldn't activate SCTP events on FD 12
it seems there has been even more ABI breakage over time:
Debian9:
struct sctp_event_subscribe { __u8 sctp_data_io_event; __u8 sctp_association_event; __u8 sctp_address_event; __u8 sctp_send_failure_event; __u8 sctp_peer_error_event; __u8 sctp_shutdown_event; __u8 sctp_partial_delivery_event; __u8 sctp_adaptation_layer_event; __u8 sctp_authentication_event; __u8 sctp_sender_dry_event; };
centos8:
struct sctp_event_subscribe { __u8 sctp_data_io_event; __u8 sctp_association_event; __u8 sctp_address_event; __u8 sctp_send_failure_event; __u8 sctp_peer_error_event; __u8 sctp_shutdown_event; __u8 sctp_partial_delivery_event; __u8 sctp_adaptation_layer_event; __u8 sctp_authentication_event; __u8 sctp_sender_dry_event; __u8 sctp_stream_reset_event; __u8 sctp_assoc_reset_event; __u8 sctp_stream_change_event; };
And current mainline linux / Debian unstable:
struct sctp_event_subscribe { __u8 sctp_data_io_event; __u8 sctp_association_event; __u8 sctp_address_event; __u8 sctp_send_failure_event; __u8 sctp_peer_error_event; __u8 sctp_shutdown_event; __u8 sctp_partial_delivery_event; __u8 sctp_adaptation_layer_event; __u8 sctp_authentication_event; __u8 sctp_sender_dry_event; __u8 sctp_stream_reset_event; __u8 sctp_assoc_reset_event; __u8 sctp_stream_change_event; __u8 sctp_send_failure_event_event; };
so we have a 10, 13 or 14 byte version.
Updated by laforge almost 4 years ago
Ok, so
- sctp_stream_reset_event was added in commit 35ea82d611da59f8bea44a37996b3b11bb1d3fd7 (first released in kernel v4.11)
- sctp_assoc_reset_event was added in commit c95129d127c6d3d9fca189c6f94c539a7f086b1a (first released in kernel v4.12)
- sctp_stream_change_event was added in commit b444153fb5a647448c2080ad28656ad183cae4fc (first released in kernel v4.12)
- sctp_send_failure_event_event was added in commit b6e6b5f1da7e8d092f86a4351802c27c0170c5a5 (first released in kernel v5.5)
- kernels < 4.11 have 10 bytes
- kernel 4.11 has 11 bytes
- 4.11 < x < 5.5 has 13 bytes
- kernels >= 5.5 have 14 bytes
Updated by laforge almost 4 years ago
proposed (untested): https://gerrit.osmocom.org/c/libosmo-netif/+/18628
Updated by laforge almost 4 years ago
- % Done changed from 40 to 70
second version of patch https://gerrit.osmocom.org/c/libosmo-netif/+/18628 now merged.
Updated by laforge almost 4 years ago
- Status changed from New to Resolved
- % Done changed from 70 to 100
libosm-netif with that patch merged is now working in my debian9 vm with centos8 docker container.
Updated by fixeria over 2 years ago
- Related to Bug #5366: ttcn3-{bsc,msc,sgsn,smlc,sccp}-test regressions due to timeout waiting for RESET-ACK added
Updated by laforge over 2 years ago
- Related to Bug #5368: consider using SCTP_EVENT instead of SCTP_EVENTS added