Bug #3715
closedosmo-gsm-tester: Mainloop.wait() hangs forever in Mainloop.poll()
0%
Description
It was spotted today that an osmo-gsm-tester run was running for more than a day. I actually checked that job the day before, and I could see that it was still exactly at same place the day after.
It was running following test: gprs:trx-umtrx+mod-bts0-dynts-ipa iperf3m4.py
osmo-gsm-tester extended (debug) log showed last lines:
05:33:54.235706 tst iperf3m4.py:8: waiting for modems to attach... 05:33:55.049684 tst /gobi_3: Already registered with network ('901', '70') 05:34:10.177197 tst /gobi_4: Already registered with network ('901', '70') 05:34:15.523556 tst iperf3m4.py:8: waiting for modems to attach to data services...
I attached to the python3 process and did a apply thread all bt
:
Thread 3 (Thread 0x7f3f1dcfb700 (LWP 23377)): #0 0x00007f3f2300767d in poll () at ../sysdeps/unix/syscall-template.S:84 #1 0x00007f3f1fe519f6 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 0x00007f3f1fe51d82 in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #3 0x00007f3f1f72f656 in ?? () from /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0 #4 0x00007f3f1fe793d5 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #5 0x00007f3f23e1d494 in start_thread (arg=0x7f3f1dcfb700) at pthread_create.c:333 #6 0x00007f3f23010acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 Thread 2 (Thread 0x7f3f1e5cf700 (LWP 23376)): #0 0x00007f3f2300767d in poll () at ../sysdeps/unix/syscall-template.S:84 #1 0x00007f3f1fe519f6 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 0x00007f3f1fe51b0c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #3 0x00007f3f1fe51b51 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #4 0x00007f3f1fe793d5 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #5 0x00007f3f23e1d494 in start_thread (arg=0x7f3f1e5cf700) at pthread_create.c:333 #6 0x00007f3f23010acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 Thread 1 (Thread 0x7f3f24237700 (LWP 23375)): #0 0x00007f3f2300767d in poll () at ../sysdeps/unix/syscall-template.S:84 #1 0x00007f3f1fe519f6 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 0x00007f3f1fe51b0c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #3 0x00007f3f1fc04038 in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6 #4 0x00007f3f1fc03a9a in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6 #5 0x00007f3f207cf9cc in ?? () from /usr/lib/python3/dist-packages/gi/_gi.cpython-35m-x86_64-linux-gnu.so #6 0x00007f3f207d1478 in ?? () from /usr/lib/python3/dist-packages/gi/_gi.cpython-35m-x86_64-linux-gnu.so #7 0x00007f3f207c5209 in ?? () from /usr/lib/python3/dist-packages/gi/_gi.cpython-35m-x86_64-linux-gnu.so #8 0x000055e932850647 in PyObject_Call () #9 0x000055e9327c7ee1 in PyEval_EvalFrameEx () #10 0x000055e9327cc7bf in ?? () #11 0x000055e9327c80a9 in PyEval_EvalFrameEx () #12 0x000055e9327cc286 in ?? () #13 0x000055e9327c80a9 in PyEval_EvalFrameEx () #14 0x000055e9327c793f in PyEval_EvalFrameEx () #15 0x000055e9327cc7bf in ?? () #16 0x000055e9327c84c9 in PyEval_EvalFrameEx () #17 0x000055e9327cc7bf in ?? () #18 0x000055e9327c84c9 in PyEval_EvalFrameEx () #19 0x000055e9327c793f in PyEval_EvalFrameEx () #20 0x000055e9327c793f in PyEval_EvalFrameEx () #21 0x000055e9327cc286 in ?? () #22 0x000055e9327c80a9 in PyEval_EvalFrameEx () #23 0x000055e9327cc286 in ?? () ---Type <return> to continue, or q <return> to quit--- #24 0x000055e9327c80a9 in PyEval_EvalFrameEx () #25 0x000055e9327c793f in PyEval_EvalFrameEx () #26 0x000055e9327cc286 in ?? () #27 0x000055e9327ccf9f in PyEval_EvalCode () #28 0x000055e93289a8f2 in ?? () #29 0x000055e93289ce1d in PyRun_FileExFlags () #30 0x000055e93289d5be in PyRun_SimpleFileExFlags () #31 0x000055e9328cb4d7 in Py_Main () #32 0x000055e93275bc01 in main () (gdb)
After I dettached the process from gdb, it continued and log showed following error:
05:34:15.573785 tst iperf3m4.py:8: waiting for modems to attach to data services... [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪iperf3m4.py:8] [testlib.py:92] 05:34:16.690121 tst /gobi_1: DBG: attached: True [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪/gobi_1] [modem.py:589] 05:34:16.695481 tst /gobi_1: DBG: activate_context {apn='inet46', protocol='ip', user='ogt'} [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪/gobi_1] [modem.py:609] 12:34:56.809299 tst /gobi_4: DBG: 'org.ofono.ConnectionManager'.PropertyChanged() -> Attached=True [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪/gobi_4] [modem.py:759] 12:39:57.157456 tst iperf3m4.py:8: ERR: Error: Wait timeout {condition=<function Modem.activate_context.<locals>.<lambda> at 0x7f3f1c3a8048>, timeout=300, timestep=1} [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪iperf3m4.py:8] [event_loop.py:111: raise log.Error('Wait timeout', condition=condition, timeout=timeout, timestep=timestep)] 12:39:57.168774 tst iperf3m4.py:8: TRACEBACK: Traceback (most recent call last): File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/test.py", line 61, in run self.path) File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/util.py", line 351, in run_python_file spec.loader.exec_module( importlib.util.module_from_spec(spec) ) File "<frozen importlib._bootstrap_external>", line 673, in exec_module File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/suites/gprs/iperf3m4.py", line 8, in <module> setup_run_iperf3_test_parallel(4) File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/suites/gprs/lib/testlib.py", line 96, in setup_run_iperf3_test_parallel ctx_id_v4 = ms.activate_context(apn='inet46', protocol=ms.CTX_PROT_IPv4) File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/modem.py", line 622, in activate_context MainLoop.wait(self, lambda: ctx.GetProperties()['Active'] == True) File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/event_loop.py", line 111, in wait raise log.Error('Wait timeout', condition=condition, timeout=timeout, timestep=timestep) osmo_gsm_tester.log.Error: Wait timeout {condition=<function Modem.activate_context.<locals>.<lambda> at 0x7f3f1c3a8048>, timeout=300, timestep=1} [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪iperf3m4.py:8] [test.py:104]
So it seems it was somehow blocked forever in Mainloop.poll() until probably detaching the process made self.gctx.iteration(may_block=True) (and poll() behind it) return an -EAGAIN or similar stuff. I'm not sure how can that happen since we add a timeout immediatelly before polling:
wait_req = WaitRequest(condition, condition_args, condition_kwargs, timeout, timestep) wait_id = GObject.timeout_add(timestep*1000, self._trigger_cb_func, wait_req.condition_check)_trigger_cb_func always returns True so it is always rearmed, and as a consequence either condition_ack or timeout_ack will become True. Only conditions I can think of:
- self.gctx.iteration(may_block=True) doesn't return sometimes for whataver reason after firing the timeout.
- Time returned by time.time() stops working sanely.
- Some bug in glib.
Updated by pespin over 5 years ago
- Priority changed from Normal to Low
Moving to low since it was only spotted once and thus not usually happening.
Updated by pespin almost 4 years ago
- Status changed from New to Closed
I haven't seen this in 1 year with lots of runs so closing it, a lot has been improved over this year. Let's reopen if we see it again.