Slam Toolbox: Message Filter dropping message for reason 'discarding message because the queue is full'

asked 2022-01-05 10:43:06 -0500

Pedro Leal gravatar image

updated 2022-01-05 16:26:24 -0500

Mike Scheutzow gravatar image

Hello, (I'll start by saying I've checked already if it's a tf tree problem and it's not)

I'm currently attempting to perform SLAM and navigation utilizing a Jetson Xavier NX, and an Intel's D455. The Jetson's OS is the Ubuntu 20.04 (done by https://qengineering.eu/install-ubunt...) and I'm using Galactic built from source. The Jetson's mode is the one with 20W and 6 cores.

I initiate a launch file consisting of: D455 node, robot_localization, depth_image_to_laserscan (I plan to use a 2D lidar but I'm using the D455 for fast tests), and ros2_laser_scan_matcher (from https://github.com/AlexKaravaev/ros2_...), and finally a static transform between base_link and camera_link. Up until here, everything works exceptionally well. The tf tree looks as one would expect (odom -> base_link average rate of 20 as I set in robot_localization configuration, base_link -> camera link is a static transform set by me, and the rest is the camera's frames). (Sorry, I can't upload images yet.)

By using rviz, I can also see that everything is working as expected, meaning that I set the static frame to odom, and as I move the camera the scans don't change their position, only the camera's frames do. Great. Now I plan to use slam_toolbox as well as Nav2.

I'm using slam_toolbox default configuration (online_async), only remapping the robot's base frame to base_link. Here starts the problems: The terminal is spamming these:

[1641398181.499569062] [slam_toolbox]: Message Filter dropping message: frame 'camera_depth_frame' at time 1641398181.448 for reason 'discarding message because the queue is full'

In rviz I set the static frame to odom, the scans are being shown as well every frame and updated with decent frequency, I can see the map being built right at start, and then it only updates every now and then when I move the D455 significantly. If I set the static frame to the map, I can't see the scans, neither the frames, only every now and then when the map is updated. (I'm talking every 5 seconds or more and I have to move the camera). The tf tree shows the map -> odom transformation, however, the average rate is 2590000, buffer length 0. Meanwhile, every other transformation is as expected (as before initiating slam_toolbox).

At first I didn't mind it (before not realizing it wasn't only the map updating slowly, but the map->odom tf was messed up), and attempted to test Nav2 (again using default launch file navigation_launch, only changing every set_sim_time to false.). The terminal spams these:

[global_costmap.global_costmap]: Timed out waiting for transform from base_link to map to become available, tf error: Lookup would require extrapolation into the past. Requested time 1641398077.397696 but the earliest data is at time 1641398167.204133, when looking up transform from frame [base_link] to frame [map]

Unsurprisingly. And if I send a goal, controller/planner crashes.

Most solutions I see are regarding ... (more)

edit retag flag offensive close merge delete

Comments

1

That board has "low power" modes (== low performance.) What power-mode are you using? Link to Nvidia power management

Mike Scheutzow gravatar image Mike Scheutzow  ( 2022-01-05 21:46:58 -0500 )edit

Hi, I'm using the one with 20W and 6 cores. Which for some reason isn't listed on that link for the Xavier NX. This one should be the best performance-wise.

Pedro Leal gravatar image Pedro Leal  ( 2022-01-06 07:42:10 -0500 )edit
1

While slam is running, does top show any processes using 100% cpu?

Mike Scheutzow gravatar image Mike Scheutzow  ( 2022-01-06 08:11:21 -0500 )edit

The laser_scan_matcher is taking up 95-100%; realsense2 around 40-50%; Xorg around 35%; lxterminal is 20%; depthimage_to_laserscan is 15%; and slam_toolbox is at 10%. The rest is all under 10%. So for now, I should: switch the scan_matcher algorithm for another alternative (any suggestions)? And probably just use ssh, since the Xorg taking up so much is likely thanks to be using monitor/kb/mouse together with the camera, right?

Pedro Leal gravatar image Pedro Leal  ( 2022-01-06 10:24:39 -0500 )edit
1

Yes, shutting down Xorg and using ssh is a good idea (although I don't recall Xorg using so much cpu.)

  1. How many scans/second is laser_scan_matcher processing? Have you tried reducing the depthimage frame rate?
  2. Have you used rqt_console GUI app to identify which node is complaining about data loss?
Mike Scheutzow gravatar image Mike Scheutzow  ( 2022-01-06 10:44:57 -0500 )edit

Alright, I've reduced the FPS in the realsense wrapper configuration file to 15 (from 30) and this reduced the CPU usage of the laser_scan_matcher to around 75%. It is now almost working as expected, both slam_toolbox as is Nav2. Slam_toolbox now publishes both map and transform (default 50hz) consistently, and Nav2 accepts goals. However, Slam_toolbox spams the same message (at a reduced rate (5hz)). And Nav2 spams:

[global_costmap.global_costmap_rclcpp_node]: Message Filter dropping message: frame 'camera_depth_frame' at time * for reason 'the timestamp on the message is earlier than all the data in the transform cache'

and after sending a goal: [controller_server-1] [WARN] [controller_server]:Unable to transform robot pose into global plan's frame

[controller_server-1] [ERROR] [tf_help]:Transform data too old when converting from odom to map

Pedro Leal gravatar image Pedro Leal  ( 2022-01-06 13:48:05 -0500 )edit

I'll further attempt to reduce unnecessary CPU usage, since I plan to use a LIDAR like Velodyne's which might require even more effort from the ICP node. I'll also play around with it's parameters(laser_scan_matcher's), as well as with both slam_toolbox and Nav2 parameters regarding buffers and timeouts.

By the way, which rqt_console plugin were you mentioning that would tell me about the data loss? The logging console, right?

Thanks for the help.

Pedro Leal gravatar image Pedro Leal  ( 2022-01-06 13:55:07 -0500 )edit

Regarding rqt_console, I forgot you are using ros2; I don't know if tool is the same as in ros1.

Regarding the message timestamp complaint, I don't have any suggestions. That message seems very strange if all your nodes are running on the same host.

Mike Scheutzow gravatar image Mike Scheutzow  ( 2022-01-06 14:52:27 -0500 )edit