Ask Your Question
2

Messages getting dropped [ros melodic + industrial camera]

asked 2021-02-18 08:03:33 -0500

affi gravatar image

updated 2022-05-23 09:13:34 -0500

lucasw gravatar image

Hey everyone,

I am trying to perform a ground truth validation, having an Industrial camera and Ouster lidar (os1-64) where both are connected to Nvidia Agx via PoE switch. The bandwidth rate of the camera is 49 MByte/sec and Lidar Mode is 1024*20.

If both the camera and ouster run at the same time, my messages are getting dropped. Whereas, if I only run the ouster client I am getting all the messages. You can see the comparisons below:

I am trying to run with following command to record the messages:

rosbag record -b 1024 --chunksize=1024 --duration=1h  /os_cloud_node/points __name:=recordLidarData &

This is the output of the rosbag file which was recorded without the camera presence on the network: You can see the messages are 71991 and size of the recorded bag for 1 hour is almost 210.9 GB:

path:        _2018-01-28-17-35-57.bag
version:     2.0
duration:    59:59s (3599s)
start:       Jan 28 2018 17:35:57.88 (1517157357.88)
end:         Jan 28 2018 18:35:57.60 (1517160957.60)
size:        210.9 GB
messages:    71991
compression: none [71991/71991 chunks]
types:       sensor_msgs/PointCloud2 [1158d486dd51d683ce2f1be655c3c181]
topics:      /os_cloud_node/points   71991 msgs    : sensor_msgs/PointCloud2

When simultaneously, both the camera and ouster clinet runs, I am getting the correct FPS for the camera but all the messages are getting dropped for ouster and ended up in very small bag file. You can see the output of the rosbag file: Here you can see the messages I am getting for 1 hour only 6294 and the size is of rosbag is 18.04 GB.

path:        _2018-01-28-19-15-51.bag
version:     2.0
duration:    59:58s (3598s)
start:       Jan 28 2018 19:15:52.19 (1517163352.19)
end:         Jan 28 2018 20:15:50.64 (1517166950.64)
size:        18.4 GB
messages:    6294
compression: none [6294/6294 chunks]
types:       sensor_msgs/PointCloud2 [1158d486dd51d683ce2f1be655c3c181]
topics:      /os_cloud_node/points   6294 msgs    : sensor_msgs/PointCloud2

Can anyone nudge me in the right direction how to solve this issue. I mean in terms of, for example. what will be correct chunksize or buffsize so my messages should not drop. Is there anyway, where I can check how many messages are dropped kind of statistical information?

Or there is a bottleneck, because the switch might be under stress [too many packets]? Any suggestions would be appreciated.

Regards, Affi

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2021-04-16 20:22:54 -0500

peci1 gravatar image

How glad I am we're not the only ones seeing this bug! We've just spent about 20+ manhours debugging it. With the help of a network tap device, we pinpointed the issue to the timing of the packets sent by the lidar.

It seems the lidar sends data in bursts (8 UDP fragments, pause 8 UDP fragments, pause, 1 UDP fragment). These 17 fragments form one large fragmented UDP datagram. And that's what the ROS driver sends furter as the lidar packet. When you connect the lidar directly to a computer, you don't observe the packet loss as there is noone else on the cable and a Gbit link is good enough for the lidar. However, when the lidar traffic is mixed in a switch with traffic from other devices, it shows the bursts of data are just too fast and switches start throwing Ouster packets away as their buffers are half-full of the other traffic and can no longer contain all packets belonging to the Ouster bursts. We haven't found any parameter of the switches we tested (about 5 types) that would tell us the size of the buffers, but we assume the store-and-forward technology they use has a buffer for just a few packets. Ouster sends 10000 UDP packets/s in 1024x10 mode, and one of our switches specifies 17 Mpackets/s, which is way more. However, the bursting behavior goes beyond the limit of even this switch on short time intervals.

We plan to report this as an issue to Ouster, as it basically renders the sensor unusable when connected to a switch.

Until it is fixed (if!), there are some things you can do to make things a bit better.

  1. Switch to 512x10 mode and connect Ouster to switch via a 4-wire 100 Mbit Ethernet cable. This will force it into 100 Mbit mode, in which the bursts are much smaller. The bandwidth of 512x10 mode is 60 Mbps which is easy to handle on the 100 Mbit link. This option fully resolves the issue for us, but you have to work only with 512x10 mode.

  2. Connect the lidar via a dedicated network card (e.g. a USB-Ethernet adapter or a PCIe card). This fully resolves the issue (if you manage to find a reliable USB-Ethernet adapter, which is not easy).

  3. If your receiving machine is Linux, set the following in /etc/systctl.d/60-ouster-fix.conf:

net.ipv4.ipfrag_high_thresh = 26000000 net.ipv4.ipfrag_low_thresh = 13000000 net.ipv4.ipfrag_time = 5

You can play with the values. This basically doesn't fix the lost packets problem, but it makes it less serious for the receiving computer. As said earlier, Ouster sends 20k UDP frames fragmented to 17 MTU-sized frames (1500 B). As frames get lost on the switch, the receiving computer adds any received fragment to a buffer where it waits for its other buddies to form the large UDP datagram. However, with 20% of them missing, most datagrams are never completed (reassembled). So ... (more)

edit flag offensive delete link more

Comments

Jumbo frames might also help. It doesn't decrease data rate, but it will decrease packet rate. Especially for busy networks, that might help reduce fragmentation and lost packets.

gvdhoorn gravatar image gvdhoorn  ( 2021-04-17 08:54:54 -0500 )edit

Yeah, we'd like to use them. But it seems the lidar is limited to the standard MTU of 1500 B :(

peci1 gravatar image peci1  ( 2021-04-17 09:29:20 -0500 )edit

Is there any follow up solution for that? We have been facing the same problem here.

By running only the LiDAR everything is fine, but when we start the cameras we notice the packet drops on the LiDAR. We tried setting the LiDAR mode to 512x10 but we still saw the same behavior. Have you received any answer from Ouster?

lucas.nunes gravatar image lucas.nunes  ( 2021-11-19 03:30:03 -0500 )edit

If 512x10 mode is okay for you, connect the lidar via a 100 Mbps link and it will get much better (or even fixed). This connection can be forced either by putting some 100 Mbps-only switch in the way, or just by connecting only 4 of the 8 ethernet cables from Ouster (you need to figure out which ones).

peci1 gravatar image peci1  ( 2021-11-19 11:17:12 -0500 )edit

Otherwise, Ouster told that they will try to address this issue in a new firmware released around Q1 2022.

peci1 gravatar image peci1  ( 2021-11-19 11:17:50 -0500 )edit

Last, if you can open an issue with Ouster support, it would be great. Until now, they believed we are the only ones with this problem and thus it is a non-problem.

peci1 gravatar image peci1  ( 2021-11-19 11:18:27 -0500 )edit

I don't know if you are aware but ouster released a new firmware with configurable UDP packet here: https://data.ouster.io/downloads/chan...

Hopefully this will help.

lucas.nunes gravatar image lucas.nunes  ( 2022-01-18 04:43:47 -0500 )edit

Unfortunately, the new firmware does not help (yet). The configurability is only about supporting the new dual returns mode. There is nothing allowing configuration of how many columns are sent at once. But looking at the changes in the ros driver, it seems Ouster is preparing for that. So let's hope it will happen soon.

peci1 gravatar image peci1  ( 2022-01-18 05:25:12 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2021-02-18 05:06:32 -0500

Seen: 272 times

Last updated: Apr 16 '21