Project

General

Profile

Actions

Feature #6394

open

Basic URR (Usage Reporting Rule) support for tunnel mapping

Added by laforge about 2 months ago. Updated about 1 month ago.

Status:
Stalled
Priority:
Low
Assignee:
-
Target version:
-
Start date:
03/08/2024
Due date:
% Done:

0%

Spec Reference:

Description

Implement basic support for URR (Usage Reporting Rule).

The goal here is to
  • count the number of packets and bytes within each TEID (ul/dl separately)
  • periodically report those counters via PFCP to the control plane
  • report the final counters when the tunnel is closed

pespin had implemented something like this for open5gs-upf in

commit fb8ebcdbeae0648e30d04fd016a956642131dddd
Author: Pau Espin Pedrol <pespin@sysmocom.de>
Date:   Fri Apr 8 16:10:42 2022 +0200

    [UPF] Add initial support for URR Usage Report (#1476)

and following commits.

Now sadly, open5gs-upf is more like a lab-grade upf and nothing that scales at all, and we hence have users of osmo-upf that use it for its fast kernel path.

The primary goal for this feature is the tunnel mapping case in a osmo-hnbgw co-located osmo-upf. The packet and byte counters hence will have to be added to the nftables rules.


Files

upf-add-tunnels.sh upf-add-tunnels.sh 549 Bytes pablo, 03/19/2024 10:54 AM
upf-initial-ruleset.nft upf-initial-ruleset.nft 483 Bytes pablo, 03/19/2024 10:54 AM
upf-add-tunnels2.sh upf-add-tunnels2.sh 523 Bytes neels, 03/21/2024 01:07 AM

Related issues

Related to OsmoHNBGW - Feature #6395: PFCP URR support in osmo-hnbgwNew03/08/2024

Actions
Actions #1

Updated by laforge about 2 months ago

Actions #2

Updated by pespin about 2 months ago

neels you can see this in action in PGW_Tests.TC_gy_charging_cc_time.

Basically open5gs-smfd connects to our emulated OCS (ttcn3) using the Gy interface (diameter based), which then grants temporary permission based on time/data buckets.

Related tickets at open5gs:
https://github.com/open5gs/open5gs/issues/1492
https://github.com/open5gs/open5gs/pull/1479

Actions #3

Updated by laforge about 2 months ago

pespin wrote in #note-2:

neels you can see this in action in PGW_Tests.TC_gy_charging_cc_time.

Basically open5gs-smfd connects to our emulated OCS (ttcn3) using the Gy interface (diameter based), which then grants temporary permission based on time/data buckets.

just to be clear: The Gy interface is out of scope here. What's relevant is just the PFCP with URR that the SMF communicates to the UPF, and the related responses.

Actions #4

Updated by neels about 2 months ago

The first step is to clarify where to get the metrics:
Can we get package counts from nftables efficiently?

(Does the kernel GTP module report number of packets? Though the focus is on
tunmap.)

Actions #5

Updated by laforge about 2 months ago

On Mon, Mar 11, 2024 at 04:54:39AM +0000, neels wrote:

The first step is to clarify where to get the metrics:
Can we get package counts from nftables efficiently?

Well' you can have rather efficient in-kernel counters; you have to add counters to each of the relevant
rules, and then dump those counters whenever you need. See
https://wiki.nftables.org/wiki-nftables/index.php/Counters

How efficient the reading of those counters is, is likely something that you'll have to find out.

I couldn't immediately find some counter-specific API in libnftnl; maybe pablo can help to advise
on the best strategy. With many tunnels, it's also probably a question on whether it's wise to read
all of them at the same time, or whetehr to submit small batches of counter reads - as well as the
question if named counters are faster or slower than anonymous counters, etc.

(Does the kernel GTP module report number of packets? Though the focus is on
tunmap.)

No; drivers/net/gtp.c:struct pdp_ctx does not contain any counters. It should be realtively trivial to add some (__percpu) counters to it, though. However, out of scope for the current use case.

Actions #6

Updated by neels about 2 months ago

Yes, I enquired with Pablo, and he confirms the way to get the counters is to
run a text command to nftables, the same way we set up tunnels. The result is
JSON.

I was hoping for something more bare metal. It seems we'll have to do a large
volume of text parsing with lots of atoi().


We have two options to get the counters from nft:

(1) add named counters, then list only the counters.

We can retrieve either one counter at a time: 'list counter tunmap-pre-123'
Or all at once: 'list counters'

Named counters are separate entities -- all chains could feed to a single
counter -- I think that is the intention behind the named counter feature, to
aggregate various counter sources.

But we need separate counters per subsriber and direction: for 50k tunmaps, we
add 100k separate counter entities to the 100k chain entities we already have.

Example of adding a named counter for a tunmap:

add counter inet osmo-upf n-tunmap-pre-123;
add rule inet osmo-upf tunmap-pre-123 ip saddr ... counter name "n-tunmap-pre-123";
add counter inet osmo-upf n-tunmap-post-123;
add rule inet osmo-upf tunmap-post-123 ip saddr ... counter name "n-tunmap-post-123";

How to query such counter:

▶ sudo nft list counter inet osmo-upf n-tunmap-pre-20
table inet osmo-upf {
        counter n-tunmap-pre-20 {
                packets 0 bytes 0
        }
}

(This 'sudo nft' prompt is just me invoking the rules manually. In osmo-upf, these would be char buffers fed to nft_run_cmd_from_buffer().)

▶ sudo nft list counters table inet osmo-upf
table inet osmo-upf {
        counter n-tunmap-pre-1 {
                packets 0 bytes 0
        }
        counter n-tunmap-post-1 {
                packets 0 bytes 0
        }
        counter n-tunmap-pre-2 {
                packets 5 bytes 5440
                ^^^^^^^^^^^^^^^^^^^^
                PARSE THIS
        }
        counter n-tunmap-post-2 {
                packets 0 bytes 0
        }
        counter n-tunmap-pre-3 {
                packets 0 bytes 0
        }
        ...
}

(2) The other option is, do not add named counters, just list the chain rules themselves, because they also barf out the counter values:

We can query them one by one:

▶ sudo nft list chain inet osmo-upf tunmap-pre-7
table inet osmo-upf {
        chain tunmap-pre-7 {
                ip daddr set 10.99.0.1 meta mark set 0x00000007 counter packets 5 bytes 5440 accept
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                                PARSE THIS
        }
}

or all at once:

▶ sudo nft list table inet osmo-upf
table inet osmo-upf { # progname osmo-upf
        flags owner

        map tunmap-pre {
                typeof ip daddr . @ih,32,32 : verdict
        }

        map tunmap-post {
                typeof meta mark : verdict
        }

        chain pre {
                type filter hook prerouting priority raw; policy accept;
                udp dport 2152 ip daddr . @ih,32,32 vmap @tunmap-pre
        }

        chain post {
                type filter hook postrouting priority 400; policy accept;
                meta mark vmap @tunmap-post
        }

        chain tunmap-pre-1 {
                ip daddr set 10.99.0.1 meta mark set 0x00000001 counter packets 0 bytes 0 accept
        }

        chain tunmap-post-1 {
                ip saddr set 127.0.0.1 @ih,32,32 set 0x1 counter packets 0 bytes 0 accept
        }

        chain tunmap-pre-2 {
                ip daddr set 10.99.0.2 meta mark set 0x00000002 counter packets 5 bytes 5440 accept
                                                                        ^^^^^^^^^^^^^^^^^^^^
                                                                        PARSE THIS 
        }

        chain tunmap-post-2 {
                ip saddr set 127.0.0.1 @ih,32,32 set 0x0 counter packets 0 bytes 0 accept
        }

        chain tunmap-pre-3 {
                ip daddr set 10.99.0.1 meta mark set 0x00000003 counter packets 0 bytes 0 accept
        }
        [... 999997 more entries...]
}

Apart from above choice, we have two potential complications:

(A) 'list' commands are not allowed to be run in a batch.

For the nft 'list' commands, no sort of batching is an option.
We need to dispatch a single 'list' command and then retrieve the result.

For example, to create a tunmap, we can push a bunch of lines en bloc, like:

add chain inet osmo-upf tunmap-pre-123;
add rule inet osmo-upf tunmap-pre-123 ip daddr set 3.3.3.3 meta mark set 123 counter accept;
add chain inet osmo-upf tunmap-post-123;
add rule inet osmo-upf tunmap-post-123 ip saddr set 2.2.2.3 @ih,32,32 set 0x302 counter accept;
add element inet osmo-upf tunmap-pre { 2.2.2.1 . 0x201 : jump tunmap-pre-123 };
add element inet osmo-upf tunmap-post { 123 : jump tunmap-post-123 };

(and we can pack up any amount of these for N distinct tunnel setups.)

But for 'list', we cannot get a single response with four counters from a batch like this:

list chain inet osmo-upf tunmap-pre-1;
list chain inet osmo-upf tunmap-post-1;
list chain inet osmo-upf tunmap-pre-2;
list chain inet osmo-upf tunmap-post-2;

We need to pass these to the nftables API one by one.

It is an open nft feature request to allow list batching, but it's not implemented yet.
(Also I hear that it does happen that people ask for a bare metal API.)

So...

On the one hand, I don't want to get all of the tunnels' state in one. This
potentially does not scale well. Seems bad to stop and query+parse 200k lines
of text at regular intervals. Megabytes.

On the other hand, if I don't fetch all of the state at once, I have to fetch
them one by one, single entries. No batching allowed. That is the other
extreme of worst possible scaling option.

There seems to be no in-between option available ATM.

Almost thought of a separate process to collect all the counters and pass them
back to osmo-upf efficiently... then I realize how insane that would be, add
tooling instead of removing artificial bottlenecks.

(B) The other thing: so far, osmo-upf doesn't parse responses from nft.

We only look at the successful rc, and otherwise we batch commands.
At first I thought we need an async correlation of a batched command's response
back to the caller. But since batching isn't an option anyway...

We need to implement code that parses responses and takes action based on
those. All doable, just a brand new corner in the code.


For both of the above, we need to bypass the nft cmd queue for retrieving the
counters (named or unnamed). So to fetch counters, we need to run single nft
commands immediately, besides the queue of tunnel maintenance commands.

Remember that the slowest part of osmo-upf is passing nft commands to the
kernel. Now we further load this bottleneck. This is very significant: So far,
we were struggling to quickly get something like 10k tunmap creations through
to the kernel. It takes minutes. You could argue that this is exceptional load,
and normal scenarios have far less PFCP load. Now, for counters, because we
want to query all of the tunnels regularly, that means we have this
exceptional load all of the time, at regular intervals. When we have 10k
subscribers, it means that it may take MINUTES to get back all of the counters?
(I still need to test, maybe getting counters is much faster than tunmap
creation.)

These are the things I'm worrying about: that the text based interface with
batching limitations just doesn't scale.

What could save our butts is, maybe it actually turns out fast enough to get
all of the counter state in a megabytes large JSON chunk.

This is what I'll test next.


Not there yet, but when it turns out just too slow, we can maybe explore these options:

  • how often do we need a tunnel's counters? Is it ok to get them only on tunnel destruction, and otherwise once every 10 minutes or so?
  • can we implement 'list' cmd batching in nftables? (and would it help at all?)
  • is there an alternative, easy-to-implement way to get the counter data streamed out of the kernel? ('/sys/kernel/nft/counters' ?????)
Actions #7

Updated by pespin about 2 months ago

The result is JSON.

I was hoping for something more bare metal. It seems we'll have to do a large
volume of text parsing with lots of atoi()

I'd go for cJSON (https://github.com/DaveGamble/cJSON) or alike, which is an embeddable json parser occupying 1 file.
The main problem is that one is loading and parsng the whole thing into memory afair, which for a 10k json list can be challenging.

regarding architecture, I'd for sure go into doing all that in a separate thread with its own event loop, which runs the nft code, gets the answers back and then submits batches of updates to the main thread using osmo_itq. You can even break the data sent over osmo_itq in batches to avoid starving the main even loop thread (eg PFCP, VTY, stats, etc.).
You can get some ideas from osmo-uecups with its http json server running on a separate thread.

(A) 'list' commands are not allowed to be run in a batch.

That's indeed bad. The ideal way to do this would be to process the counters in a number of batches of configurable size. In a tradeoff between spending to much overhead for each req+answer, plus avoid starving kernel network operations for too long due to having to process big amounts of data at the same time.

Maybe the easiest is to first give a try by fetching all counters at once (and please log the time spent doing so every time, it will be tremendously useful when debugging).

how often do we need a tunnel's counters? Is it ok to get them only on tunnel destruction, and otherwise once every 10 minutes or so?

PFCP URRs can be configured both in volume-based and time-based basis.

I would say we make the poll-counters frequency a VTY configuration, in seconds. If we notice that we are not able to process the load in between those time lapses, print big fat error messages.
As per volume-based, we do best effort, aka trigger once we run >= the threshold and quota volumes.

As per time-based, if the time threshold/quota requested is lower than poll-counter-frequency, then reject the URR so that the user knows it needs to modify UPF config, SMF/OCS config or simply figure out there's too much load and needs to add more UPFs.

PS1: For PFCP URR, see 3GPP TS 29.244 (5.2.2 Usage Reporting Rule Handling, C.2 Charging Support).
PS2: neels also have a look at SYS#5276, it may provide further context to you regarding the usual architecture.

Actions #8

Updated by laforge about 1 month ago

On Wed, Mar 13, 2024 at 12:13:17AM +0000, neels wrote:

Yes, I enquired with Pablo, and he confirms the way to get the counters is to
run a text command to nftables, the same way we set up tunnels. The result is
JSON.

I think we should find a better way to do this. It's super inefficient if we dump
things as binary numbers, then convert it to string and then parse it again.

I would expect the epdg is not the only software on the planet that would benefit
from programmatic binary access to the counters. pablo ?

Actions #9

Updated by pablo about 1 month ago

laforge wrote in #note-8:

On Wed, Mar 13, 2024 at 12:13:17AM +0000, neels wrote:

Yes, I enquired with Pablo, and he confirms the way to get the counters is to
run a text command to nftables, the same way we set up tunnels. The result is
JSON.

I think we should find a better way to do this. It's super inefficient if we dump
things as binary numbers, then convert it to string and then parse it again.

I would expect the epdg is not the only software on the planet that would benefit
from programmatic binary access to the counters. pablo ?

There are more people asking for libnftables to provide such binary interface, yes, it is becoming a requirement increasingly.

Actions #10

Updated by neels about 1 month ago

I've written an osmo-upf VTY command to dump the current ruleset containing the counters.
Getting the ruleset dump from nft is a lot slower than evaluating the returned data.

I ran a tunmap.vty script creating 10k tunmaps by PFCP.

(Just out of curiosity, I dumped a timestamp after each 1000 completed,
the results show significant slowdown with nr of tunnels:
The first 1k takes 2 seconds, round the middle 10 seconds, in the end 28 seconds.

1:46.347
1:48.571
1:52.923
2:00.172
2:10.273
2:23.227
2:39.131
2:58.069
3:19.833
3:44.937
4:12.986

In total about 2:26. This is with the nft command batching active, set to send batches of up to 64 nft command buffers in one.
With the queue disabled (set to 1), this takes a lot slower -- enough to have tea, get impatient and abort.
Ok but back to the counters, the actual question here:
)

Retrieving a single full dump of the entire ruleset with all counters takes 157 seconds!!
Notably, during all this time, handling of PFCP traffic completely stops
because osmo-upf is single-threaded and select() driven.

This is certainly far beyond being an option.

I observe heavy exponential load increase in retrieving a full ruleset dump.
The more entries there are, the worse the effect of each additional entry:
  • when I only setup 100 tunmaps, the ruleset result is instant.
  • For 1000 tunmaps, it takes ~ 2 seconds.
  • For 2000 tunmaps, it takes ~ 7 seconds.
  • For 10k, 157 seconds.

It is far better to retrieve the counters one by one:
Issuing 40k individual nft_run_cmd_from_buffer() calls to get all the counters, 2 chains per tun direction.
Seems insane, but this only takes 46 seconds to complete all of the 40k chains.
Also, these are 40k single calls of about 1ms, so we can easily do these scattered,
not blocking the select() loop for the other tasks.

Still, the turnaround seems slow. For 10k subscribers, if I go for full load, I get a counter snapshot every minute or so.
Imagine 50k subscribers, best shot is 5 minutes between counter snapshots.

I think we can halve these numbers by only getting one counter per tunnel direction, not both pre and post.
But the real problem is elsewhere, and it needs to be solved.

One way (short of a new API) could be to both
  • identify and alleviate the reasons for exponential load increase.
  • allow some sort of list batching, to allow getting 4 counters per nft call, or 40.

Especially after seeing the throughput that eUPF has from userland to kernel,
it seems to me that there is huge potential for nft with a more efficient data transmission.
IOW, this could be really fun to implement and then see in action =)

Once the tunmaps are in place, the GTP-U performance is apparently phenomenal.
So we have a super high bandwidth channel between remote machines for high volume GTP-U,
but a very high latency low bandwidth channel within one operating system for mostly binary information
-- we "just" need to take a few middle men out of the way between the userland rule semantics and the kernel netfilter,
and then we could be as fast as eUPF!

Actions #12

Updated by pablo about 1 month ago

Hi Neels,

neels wrote in #note-10:

I've written an osmo-upf VTY command to dump the current ruleset containing the counters.
Getting the ruleset dump from nft is a lot slower than evaluating the returned data.

I ran a tunmap.vty script creating 10k tunmaps by PFCP.

(Just out of curiosity, I dumped a timestamp after each 1000 completed,
the results show significant slowdown with nr of tunnels:
The first 1k takes 2 seconds, round the middle 10 seconds, in the end 28 seconds.
[...]
In total about 2:26. This is with the nft command batching active, set to send batches of up to 64 nft command buffers in one.
With the queue disabled (set to 1), this takes a lot slower -- enough to have tea, get impatient and abort.

I am failing to reproduce this issue you report, I made these scripts, but I suspect it is not exactly what you are doing?

Ok but back to the counters, the actual question here:
)

Retrieving a single full dump of the entire ruleset with all counters takes 157 seconds!!

This below, I have to make a proposal of binary API for libnftables. If you can help me sort out the issue you experience, I'd appreciate.

Thanks.

Actions #13

Updated by pablo about 1 month ago

pablo wrote in #note-12:

I am failing to reproduce this issue you report, I made these scripts, but I suspect it is not exactly what you are doing?

I am attaching scripts:

- upf-initial-ruleset.nft adds the initial ruleset skeleton.
- upf-add-tunnels.sh adds 100k tunnels.

this is using th instead of ih, because I have been using this also to test this issue in older kernels (5.10-stable), where ih (inner header) is not available.

Actions #15

Updated by neels about 1 month ago

At the moment, we are mostly interested in retrieving counters from nft.
Creating sessions would also be great to speed up, so I am also discussing that aspect here.
But the current focus is on reading 'list ruleset' or 'list counters' from nft.

pablo wrote in #note-12:

I am failing to reproduce this issue you report, I made these scripts

What are the timings you get from running the scripts?

I get

▶ sudo ./upf-add-tunnels.sh 

real    0m11.703s
user    0m8.979s
sys    0m2.675s

This is extremely much faster than the other timings indeed. But read on.

Now the currently interesting part is this:

▶ time sudo nft list table inet osmo-upf > /dev/null

real    0m26.860s
user    0m0.008s
sys    0m0.012s

That's already 2.5 times longer than creating the chains.

But testing with osmo-upf, this was 157 seconds for only 10k tunnels -- we should clarify where this stark difference is coming from.

So, 26 seconds is good, but still, blocking all nft I/O from osmo-upf for half a minute is not a good idea.
When osmo-upf retrieves the chains one by one, it took 45 seconds to cycle through all of them.
This is more doable, because each call blocks for a short time.

I think the goal is to allow retrieving something like 100 or 1000 chain counters per nft call.
An individual call hopefully takes only a short time, and it should be pretty fast to cycle all counters.

but I suspect it is not exactly what you are doing?

The differences between the script and osmo-upf I can identify:

  • The script tests the "wrong" thing =)
  • When osmo-upf creates sessions, there are repeated calls to nft_run_cmd_from_buffer(), instead of a single large call.
  • osmo-upf is not run as root, it has cap-net-admin. This scripted test I ran with sudo. (does that make a difference?)

Just for curiosity for the session creation case, I modified your script to call individual 'nft' commands, and that more closely resembles the exponential increase in time taken.
Except taking much much much longer than osmo-upf, because of course this, again, is not what osmo-upf is doing.
(This script creates a new process for each created chain, osmo-upf is one process calling nft_run_cmd_from_buffer N times.)
Using upf-add-tunnels2.sh, creating 1k tunnels takes 6 seconds, and creating 10k tunnels takes 305 seconds.
So there is an exponential increase, and you could probably examine this effect using upf-add-tunnels2.sh.
But the best way to reproduce the creation of sessions would be using osmo-upf with osmo-pfcp-tool.
I'll explain a bit later, first let me look at reading counters only...

This below, I have to make a proposal of binary API for libnftables. If you can help me sort out the issue you experience, I'd appreciate.

Am I missing something? I cannot find "This below" =)
I'd gladly give feedback on a binary API.

I would like to get your ideas on this aspect:
Can we somehow get a listing of only the counters that have changed since the last invocation?
This may also be a good perf improvement. Not all tunnels are active all the time.

Actions #16

Updated by neels about 1 month ago

attaching the script mentioned above...

Actions #17

Updated by neels about 1 month ago

neels wrote in #note-15:

  • osmo-upf is not run as root, it has cap-net-admin. This scripted test I ran with sudo. (does that make a difference?)

For the record, running 'sudo osmo-upf' doesn't change the timings I measure.

Actions #18

Updated by neels about 1 month ago

loosely related, this patch gets per-hNodeB counters of GTP-U traffic from nft and publishes as rate counters:
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36385

It would still be great to implement URR in osmo-upf.

Actions #19

Updated by laforge about 1 month ago

  • Status changed from New to Stalled
  • Assignee deleted (neels)
  • Priority changed from High to Low
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)