Bug #4586
closedosmo-bts-trx leaks memory
Added by ipse almost 4 years ago. Updated almost 4 years ago.
100%
Description
After some operations, osmo-bts-trx
has an ever-growing list of records like this when you do show talloc-context application brief
:
struct trx_ctrl_msg contains 168 bytes in 1 blocks (ref 0) 0x55b7420379a0
I'm not sure of the source of this at the moment but it might be related to the use of dynamic TCH/F_PDCH
channels.
Files
trxc-leak.pcap | trxc-leak.pcap | 1.95 MB | TRXC capture | ipse, 06/06/2020 09:52 PM | |
Screenshot from 2020-06-07 00-56-19.png | View Screenshot from 2020-06-07 00-56-19.png | 46.2 KB | Used memory graph | ipse, 06/06/2020 09:56 PM | |
Screenshot from 2020-06-07 12-11-23.png | View Screenshot from 2020-06-07 12-11-23.png | 51.8 KB | ipse, 06/07/2020 09:12 AM | ||
Screenshot from 2020-06-11 21-06-16.png | View Screenshot from 2020-06-11 21-06-16.png | 41.9 KB | ipse, 06/11/2020 06:06 PM |
Related issues
Updated by fixeria almost 4 years ago
Would be good to investigate how to reproduce this memleak, but I think I found a potential problem:
https://gerrit.osmocom.org/c/osmo-bts/+/18704 osmo-bts-trx/trx_if: fix memleak in trx_ctrl_cmd_cb()
Unfortunately, ttcn3-bts-test is unable to catch such memleaks because osmo-bts-trx is restarted between the test case executions :/
Updated by ipse almost 4 years ago
fixeria All items I checked manually where related to NOHANDOVER TRXC message:
(gdb) p *((struct trx_ctrl_msg *) 0x55b742039090) $1 = {list = {next = 0x0, prev = 0x0}, cmd = "NOHANDOVER", '\000' <repeats 17 times>, params = "0 1", '\000' <repeats 96 times>, cmd_len = 10, params_len = 3, critical = 1, cb = 0x0}
Updated by fixeria almost 4 years ago
fixeria All items I checked manually where related to @NOHANDOVER TRXC message:
Interesting.
Can you please attach a TRXC packet capture, so we can see if there are response delays and retransmissions?
Updated by ipse almost 4 years ago
- File trxc-leak.pcap trxc-leak.pcap added
- File Screenshot from 2020-06-07 00-56-19.png Screenshot from 2020-06-07 00-56-19.png added
fixeria See attached a trace of a few hours of the capture. This has accidentally captured a restart but otherwise, memory has been leaking constantly (see the attached screenshot).
Updated by ipse almost 4 years ago
The patch indeed fixes the memory leak.
It's not clear to my though why do we keep sending duplicated NOHANDOVER commands at the first place, though.
Updated by fixeria almost 4 years ago
- Related to Bug #4592: osmo-bts-trx: make sure that handover detection works added
Updated by fixeria almost 4 years ago
It's not clear to my though why do we keep sending duplicated NOHANDOVER commands at the first place, though.
Such a nice bug! I checked your packet capture and immediately noticed that TRXC HANDOVER is never sent to the transceiver, and TRXC NOHANDOVER is sent twice. Most likely, in some rare cases an active channel is deactivated (so we send CMD NOHANDOVER due to a bug instead of CMD HANDOVER) and then activated again, even faster than we get RSP NOHANDOVER from transceiver. In this case we try to enqueue CMD NOHANDOVER again, and here it's leaked because the last command was the same, and we still did not get the response.
Updated by ipse almost 4 years ago
I'm glad that you like the bug :)
I'm attaching the screenshot of the used memory here - notice that it doesn't grow as fast as before after the restart last night when we applied the patch. But it looks like there is another memory leak since it's still growing even if slower.
Here is the current talloc-context
output. The scheduler_trx.c
items look suspicious:
OsmoBTS# show talloc-context all 2 talloc report on 'null_context' (total 2641062 bytes in 7723 blocks) lapd context contains 1153 bytes in 37 blocks (ref 0) 0x55e5198d3580 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198e8700 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5199193a0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e519908190 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198d22c0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e519917450 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198e7b20 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198ff700 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198e8a80 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e51993c440 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198f13a0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e51991d600 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198ea280 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e519872210 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5199083a0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198d4660 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e519872af0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198fd0a0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198e2220 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198d6bf0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e519908430 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e519925ca0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e51992f7f0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198d3380 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198fdf00 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198e1aa0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198ceee0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198f8f90 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198ecbb0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e51992ea20 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e519913880 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198e5170 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e519904830 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e519900a40 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198cd2e0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198e20d0 struct lapd_history contains 32 bytes in 1 blocks (ref 0) 0x55e5198e7bb0 struct signal_handler contains 40 bytes in 1 blocks (ref 0) 0x55e519872b80 struct pcu_sock_state contains 120 bytes in 1 blocks (ref 0) 0x55e5197df210 struct lookup_helper contains 24 bytes in 1 blocks (ref 0) 0x55e519871200 struct signal_handler contains 40 bytes in 1 blocks (ref 0) 0x55e519870d70 struct signal_handler contains 40 bytes in 1 blocks (ref 0) 0x55e519870c00 struct signal_handler contains 40 bytes in 1 blocks (ref 0) 0x55e5198705e0 struct signal_handler contains 40 bytes in 1 blocks (ref 0) 0x55e519860320 vty contains 138180 bytes in 7114 blocks (ref 0) 0x55e519785d40 struct vty contains 9104 bytes in 6 blocks (ref 0) 0x55e519870e00 save_cwd contains 2 bytes in 1 blocks (ref 0) 0x55e5197992f0 vty_command contains 79538 bytes in 4463 blocks (ref 0) 0x55e519785e20 vty_vector contains 49536 bytes in 2643 blocks (ref 0) 0x55e519785db0 OsmoBTS context contains 2505505 bytes in 565 blocks (ref 0) 0x55e51976c4b0 struct trx_ctrl_msg contains 168 bytes in 1 blocks (ref 0) 0x55e5198bb630 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198dda00 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e5198e0010 scheduler_trx.c:570 contains 928 bytes in 1 blocks (ref 0) 0x55e51991d1f0 scheduler_trx.c:1155 contains 928 bytes in 1 blocks (ref 0) 0x55e519918d40 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198f1160 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5199196f0 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e51990ad70 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e5198e9c50 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198d63c0 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e519915500 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d0500 struct trx_ctrl_msg contains 168 bytes in 1 blocks (ref 0) 0x55e5198bb740 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198ccfc0 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e5198db270 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198e0390 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e5198e8440 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198cc9e0 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198c9bc0 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e519924d60 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198e7820 scheduler_trx.c:570 contains 928 bytes in 1 blocks (ref 0) 0x55e519932300 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e51992ba50 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e5199194b0 scheduler_trx.c:1155 contains 928 bytes in 1 blocks (ref 0) 0x55e519931ef0 scheduler_trx.c:570 contains 928 bytes in 1 blocks (ref 0) 0x55e5198dc720 scheduler_trx.c:1155 contains 928 bytes in 1 blocks (ref 0) 0x55e5198d6660 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198ead70 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198ca1a0 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e519922370 scheduler_trx.c:570 contains 928 bytes in 1 blocks (ref 0) 0x55e519919950 scheduler_trx.c:1155 contains 928 bytes in 1 blocks (ref 0) 0x55e519916ed0 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198cc7a0 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e5198db4b0 scheduler_trx.c:570 contains 928 bytes in 1 blocks (ref 0) 0x55e51992d6f0 scheduler_trx.c:1155 contains 928 bytes in 1 blocks (ref 0) 0x55e5198e0910 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e519917830 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198ce900 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e51992e360 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e51992eae0 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198df8c0 scheduler_trx.c:570 contains 928 bytes in 1 blocks (ref 0) 0x55e519930460 scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e51992c510 scheduler_trx.c:1155 contains 928 bytes in 1 blocks (ref 0) 0x55e5198ecec0 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198e5350 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198efba0 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198dbc40 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198c79e0 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198edb50 scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198da8f0 scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d89f0 scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d0ae0 scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d8090 scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d7ab0 scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d74d0 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d6ef0 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d5de0 scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d5800 scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d2a60 scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198c7fc0 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198bbf80 scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198bbd40 telnet_connection contains 89 bytes in 2 blocks (ref 0) 0x55e5198715a0 struct trx_l1h contains 53064 bytes in 1 blocks (ref 0) 0x55e519899d40 struct trx_l1h contains 53064 bytes in 1 blocks (ref 0) 0x55e51987fe00 struct phy_link contains 105748 bytes in 5 blocks (ref 0) 0x55e5197dfa00 struct osmo_stats_reporter contains 207 bytes in 3 blocks (ref 0) 0x55e5197fcc70 abis contains 49481 bytes in 17 blocks (ref 0) 0x55e519870ab0 UNNAMED contains 102499 bytes in 196 blocks (ref 0) 0x7fe9ace7a090 struct gsm_bts contains 2068795 bytes in 238 blocks (ref 0) 0x55e5197a4490 logging contains 5566 bytes in 11 blocks (ref 0) 0x55e5197855b0 msgb contains 10975 bytes in 28 blocks (ref 0) 0x55e51976c540
Btw, notice that we don't have handover enabled which might be the reason for not sending HANDOVER
command.
Here is our BTS config:
phy 0 instance 0 osmotrx rx-gain 8 osmotrx tx-attenuation oml osmotrx maxdly 20 instance 1 osmotrx rx-gain 8 osmotrx tx-attenuation oml osmotrx maxdly 20 osmotrx ip local 127.0.0.1 osmotrx ip remote 127.0.0.1 bts 0 band ${BAND} ipa unit-id 10453 0 oml remote-ip 10.7.12.108 gsmtap-sapi ccch gsmtap-sapi pdtch trx 0 phy 0 instance 0 trx 1 phy 0 instance 1
and the relevant BSC config:
network ... neci 1 paging any use tch 0 handover 0 handover algorithm 1 handover1 window rxlev averaging 10 handover1 window rxqual averaging 1 handover1 window rxlev neighbor averaging 10 handover1 power budget interval 6 handover1 power budget hysteresis 3 handover1 maximum distance 9999 periodic location update 30 ... bts 6 type sysmobts band DCS1800 cell_identity 10453 location_area_code 39 base_station_id_code 63 ms max power 33 cell reselection hysteresis 4 neighbor-list mode manual rxlev access min 0 radio-link-timeout 32 channel allocator ascending rach tx integer 9 rach max transmission 7 channel-description attach 1 channel-description bs-pa-mfrms 5 channel-description bs-ag-blks-res 1 early-classmark-sending forbidden ipa unit-id 10453 0 oml ipa stream-id 255 line 0 codec-support fr efr gprs mode egprs gprs routing area 39 gprs cell bvci 10453 gprs nsei 10453 gprs nsvc 0 nsvci 10453 gprs nsvc 0 local udp port 23000 gprs nsvc 0 remote udp port 24000 gprs nsvc 0 remote ip 10.7.12.108 trx 0 rf_locked 0 arfcn 773 nominal power 37 max_power_red 0 rsl e1 tei 0 timeslot 0 phys_chan_config CCCH+SDCCH4 hopping enabled 0 timeslot 1 phys_chan_config SDCCH8 hopping enabled 0 timeslot 2 phys_chan_config TCH/F hopping enabled 0 timeslot 3 phys_chan_config TCH/F hopping enabled 0 timeslot 4 phys_chan_config TCH/F hopping enabled 0 timeslot 5 phys_chan_config TCH/F hopping enabled 0 timeslot 6 phys_chan_config TCH/F hopping enabled 0 timeslot 7 phys_chan_config TCH/F hopping enabled 0 trx 1 rf_locked 0 arfcn 779 nominal power 37 max_power_red 0 rsl e1 tei 0 timeslot 0 phys_chan_config TCH/F_PDCH hopping enabled 0 timeslot 1 phys_chan_config TCH/F_PDCH hopping enabled 0 timeslot 2 phys_chan_config TCH/F_PDCH hopping enabled 0 timeslot 3 phys_chan_config TCH/F_PDCH hopping enabled 0 timeslot 4 phys_chan_config TCH/F_PDCH hopping enabled 0 timeslot 5 phys_chan_config TCH/F_PDCH hopping enabled 0 timeslot 6 phys_chan_config TCH/F_PDCH hopping enabled 0 timeslot 7 phys_chan_config TCH/F_PDCH hopping enabled 0
Updated by fixeria almost 4 years ago
But it looks like there is another memory leak since it's still growing even if slower.
Here is the current talloc-context output.
It would be much more informative if you attached two full reports (better as files):
- one created just after the process is started,
- another created a few hours/days after,
so we could do differential analysis.
scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198dda00
scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e5198db4b0
Those are buffers for 4 GMSK bursts, allocated by rx_data_fn(), tx_data_fn() for each logical channel (including BCCH).
scheduler_trx.c:570 contains 928 bytes in 1 blocks (ref 0) 0x55e51991d1f0
scheduler_trx.c:1155 contains 928 bytes in 1 blocks (ref 0) 0x55e519918d40
Those are buffers for 8 GMSK bursts, allocated for TCH/F logical channels.
scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d0500
scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198da8f0
Those are buffers for 4 8-PSK bursts, allocated for PDTCH logical channels.
The scheduler_trx.c items look suspicious:
... so I don't see anything suspicious.
Btw, notice that we don't have handover enabled which might be the reason for not sending HANDOVER command.
As I figured out in #4592, osmo-bts-trx sends NOHANDOVER regardless of the previous state, i.e. no matter was it enabled or not.
Updated by ipse almost 4 years ago
Actually, observing it for a couple of days, it looks like there is no memory leak. The memory grows slowly for the first few hours but then stabilizes at ~7.0MiB RSS.
Sorry for the false alarm. I believe we can close this ticket. Thank you for the quick fix!
Updated by ipse almost 4 years ago
Updated by fixeria almost 4 years ago
- Status changed from New to Resolved
- Assignee set to fixeria
- % Done changed from 0 to 100
Actually, observing it for a couple of days, it looks like there is no memory leak.
I think I fixed another memory leak even before this one was reported ;)
https://git.osmocom.org/osmo-bts/commit/?id=b8200203d8c650123c7e19cef195ad287f53812f
Please reopen if you see anything suspicious again.