Actions
Bug #5564
openblocking database I/O by SMS database
Start date:
05/15/2022
Due date:
% Done:
20%
Resolution:
Spec Reference:
Description
when OsmoMSC was split from OsmoNITB, we externalized the HLR database and removed the database-stored counters. This leaves the internal SMS queue / database code as the only remaining part of code which performs potentailly blocking disk I/O.
As seen in #5563 this is a real issue.
I spent half a day on reviewing the code in detail and playing with different ideas, including- ripping out the sms_queue.c / db.c code completely into an external osmo-smsc which then uses GSUP
- just moving db.c into a separate thread; make DB operations asynchronous
- move sms_queue + db.c into a separate thread
moving sms_queue + DB code to new osmo-smsc, intrfaced via GSUP¶
osmo-msc already contains code to do SMS via GSUP, so there's no mandatory modification to osm-msc expected in this approach.
the major disadvantages of this appraoch are:- SMPP code would have to move to SMSC, and it is more tied into the MSC/VLR codebase -> extra effort
- GSUP SMS interface is at a lower level than current sms_queue intrface -> extra effort of migrating/reimplementing that stuff in SMSC
SMS related VTY commands (not an issue, SMSC would have its own VTY)¶
this would cover the following API parts
- sms_queue_clear
- sms_queue_set_max_failure
- sms_queue_set_max_pending
- sms_queue_stats
- sms_queue_sms_is_pending
- sms_queue_trigger
- vty_out
incoming signals into sms_queue¶
- SS_SUBSCR / S_SUBSCR_ATTACHED
- FIXME: unclear how this is handled in the GSUP case?
- SS_SMS / S_SMS_DELIVERED
- -> gsm411_gsup_mt_fwd_sm_res()
- SS_SMS / S_SMS_MEM_EXCEEDED
- -> gsm411_gsup_mt_fwd_sm_err()
- SS_SMS / S_SMS_UNKNOWN_ERROR
- -> gsm411_gsup_mt_fwd_sm_err()
- SS_SMS / S_SMS_SUBMITTED
- -> gsm411_gsup_mo_fwd_sm_req()
- SS_SMS / S_SMS_SMMA
- -> gsm411_gsup_mo_ready_for_sm_req()
DB (not an issue, DB code would then run in SMSC)¶
- db_sms_delete_oldest_expired_message
- db_sms_delete_sent_message_by_id
- db_sms_get
- db_sms_get_next_unsent_rr_msisdn
- db_sms_get_unsent_for_subscr
- db_sms_inc_deliver_attempts
SMS transmission¶
- gsm411_send_sms calls by sms_queue
- would have to be mapped to OSMO_GSUP_MSGT_MT_FORWARD_SM_REQUEST
- sms_free
- FIXME: what about vsub pointer/references?
- vlr_subscr_msisdn_or_name
- just for logging, can be avoided
making just the DB code async / run in separate thread¶
Is not easy as all of the call sites are assuming synchronous return/resultsdb_sms_get
- sms_resend_pending
- resend_pending timer
- sms_queue_start
- => can be executed from separate thread
- sms_queue_start
- resend_pending timer
- smsq_take_next_sms
- sms_submit_pending
- sms_send_next
- sms_sms_cb / S_SMS_DELIVERED
- => happens from the send_next it_Q completion handler
- sms_sms_cb / S_SMS_DELIVERED
- push_queue_timer
- sms_queue_start
- => can be executed from separate thread
- sms_queue_start
- sms_send_next
- sms_submit_pending
- sms_send_next
- sms_sms_cb / S_SMS_DELIVERED
- => request to it_Q; completion then might add SMS to pending + gsm411_send_sms
- sms_sms_cb / S_SMS_DELIVERED
- sub_ready_for_sm
- sms_subscr_cb / S_SUBSCR_ATTACHED
- => request to it_Q; completion then might add SMS to pending + gsm411_send_sms
- sms_subscr_cb / S_SUBSCR_ATTACHED
- sms_sms_cb / S_SMS_DELIVERED
- => no return value, no success check: async it_Q
- sms_sms_cb / S_SMS_UNKNOWN_ERROR
- => no return value, no success check: async it_Q
- sms_sms_cb / any signal
- => no return value, no success check: async it_Q
moving sms_queue + DB code to separate thread¶
access to pending_sms linked list¶
There are quite a number of accesses to the pending_sms linked list. Given most ar read, and only some are write, we might use a rwlock?
- sms_find_pending [R]
- sms_sms_cb
- sms_queue_sms_is_pending
- sms_queue_sms_is_pending [R]
- sms_submit_pending
- timer
- vty
- sms_submit_pending
- sms_subscriber_find_pending [R]
- sub_ready_for_sm
- SS_SUBSCR / S_SUBSCR_ATTACHED
- sms_subscriber_is_pending
- sms_submit_pending
- timer
- sms_send_next
- sms_sms_cb / S_SMS_DELIVERED
- sms_submit_pending
- sub_ready_for_sm
- sms_pending_from [R]
- sms_submit_pending
- timer
- sms_send_next
- sms_sms_cb / S_SMS_DELIVERED
- sms_submit_pending
- sms_pending_free [W]
- sms_pending_failed
- sms_sms_cb / S_SMS_UNKNOWN_ERROR
- sms_resend_pending
- sms_sms_cb / S_SMS_DELIVERED
- sms_sms_cb / S_SMS_MEM_EXCEEDED
- sms_queue_clear
- vty
- sms_pending_failed
- sms_resend_pending [R]
- timer
- sms_queue_stats [R]
- vty
- sms_queue_clear [W]
- vty
Conclusion¶
I think the following approach is best:- have a separate "SMS" thread
- all database access happens from that thread only
- inter-thread message queues (libosmocore it_q) between main thread and SMS thread
- sms_queue timers (push_queue_timer, resend_pending_timer) run in that thread
- other input (mainly signals today) are serialized via it_q in main -> SMS direction
- other output (mainly gsm411_send_sms) are serialized via it_q in SMS -> main direction
Serialize SS_SMS signals¶
- we really only need to serialize paging_result and sms->id
- submit them into it_q to SMS thread
serialize SS_SUBSCR signal¶
- sms_subscriber_find_pending() can be done in main thread before serialization
- check for vsub->lu_complete and zero MSISDN before serialization
- we really only need to serialize the MSISDN
- db_sms_get_unsent_for_subscr() then happens from SMS thread
move push_queue_timer + resend_pending_timer to SMS thread¶
serialize db_sms_store() (MO-SMS, SMPP)¶
- failure to store in database would only be known asynchronously!
- we can probably just ignore that.
serialize db_sms_mark_delivered()¶
- we don't care about success right now anyway, so async is no problem
VTY¶
- remove 'sms send pending' or implement different command via it_Q
- remove 'sms delete expired' or implement different command via it_Q
- serialize 'subscriber ... sms ...' via it_Q
Related issues
Actions