Slam Toolbox: Message Filter dropping message for reason 'discarding message because the queue is full'
Hello, (I'll start by saying I've checked already if it's a tf tree problem and it's not)
I'm currently attempting to perform SLAM and navigation utilizing a Jetson Xavier NX, and an Intel's D455. The Jetson's OS is the Ubuntu 20.04 (done by https://qengineering.eu/install-ubuntu-20.04-on-jetson-nano.html) and I'm using Galactic built from source. The Jetson's mode is the one with 20W and 6 cores.
I initiate a launch file consisting of: D455 node, robotlocalization, depthimagetolaserscan (I plan to use a 2D lidar but I'm using the D455 for fast tests), and ros2laserscanmatcher (from https://github.com/AlexKaravaev/ros2laserscanmatcher), and finally a static transform between baselink and cameralink. Up until here, everything works exceptionally well. The tf tree looks as one would expect (odom -> baselink average rate of 20 as I set in robotlocalization configuration, base_link -> camera link is a static transform set by me, and the rest is the camera's frames). (Sorry, I can't upload images yet.)
By using rviz, I can also see that everything is working as expected, meaning that I set the static frame to odom, and as I move the camera the scans don't change their position, only the camera's frames do. Great. Now I plan to use slam_toolbox as well as Nav2.
I'm using slamtoolbox default configuration (onlineasync), only remapping the robot's base frame to base_link. Here starts the problems: The terminal is spamming these:
[1641398181.499569062] [slamtoolbox]: Message Filter dropping message: frame 'cameradepth_frame' at time 1641398181.448 for reason 'discarding message because the queue is full'
In rviz I set the static frame to odom, the scans are being shown as well every frame and updated with decent frequency, I can see the map being built right at start, and then it only updates every now and then when I move the D455 significantly. If I set the static frame to the map, I can't see the scans, neither the frames, only every now and then when the map is updated. (I'm talking every 5 seconds or more and I have to move the camera). The tf tree shows the map -> odom transformation, however, the average rate is 2590000, buffer length 0. Meanwhile, every other transformation is as expected (as before initiating slam_toolbox).
At first I didn't mind it (before not realizing it wasn't only the map updating slowly, but the map->odom tf was messed up), and attempted to test Nav2 (again using default launch file navigationlaunch, only changing every setsim_time to false.). The terminal spams these:
[globalcostmap.globalcostmap]: Timed out waiting for transform from baselink to map to become available, tf error: Lookup would require extrapolation into the past. Requested time 1641398077.397696 but the earliest data is at time 1641398167.204133, when looking up transform from frame [baselink] to frame [map]
Unsurprisingly. And if I send a goal, controller/planner crashes.
Most solutions I see are regarding the tf tree not being well made, but I don't think that's the case here. I've also seen mentioned that it could be that the Jetson simply isn't keeping up with the processing, but I've seen SLAM on RPIs. Any solutions or extra information I can send?
Asked by Pedro Leal on 2022-01-05 11:43:06 UTC
Answers
Hi,
as you mentioned most problems are regarding the tree, as I had in here, BUT I saw a consistent reduction of the message you mention by synchronizing the clocks of my devices with chrony: the create3 refers to the raspberry PI (they communicate over USB) which in turn refers to the machine I am using to visualize data and peform navigation. I understand yours might be a computational-load issue but I would give it a try cause in my case it mitigated that message quite a lot. Good luck
Asked by maurizio on 2023-02-18 07:29:59 UTC
Comments
Hello, I was also facing a similar problem. Then I decided to use turtlebot 4 package SLAM and I am giving you the commands that helped me solve this problem. I hope this helps you:
The first step was to Install the odometry package which isrf2o_laser_odometry
The second step was to start the Lidar node using: ros2 launch rplidar_ros rplidar.launch.py
The third step is to start odometry from ros2 workspace:ros2 launch rf2o_laser_odometry rf2o_laser_odometry.launch.py laser_scan_topic:=scan
The fourth step is to do static transformation:
ros2 run tf2_ros static_transform_publisher 0 0 0 0 0 0 base_link laser
ros2 run tf2_ros static_transform_publisher 0 0 0 0 0 0 base_link odom
The fifth step was to start the SLAM algorithm:
ros2 launch turtlebot4_navigation slam.launch.py params:=/full/path/to/slam.yaml
The final step is to Bring up the robot:
ros2 launch turtlebot4_viz view_robot.launch.py
Asked by kathan on 2023-04-21 02:19:32 UTC
Comments
That board has "low power" modes (== low performance.) What power-mode are you using? Link to Nvidia power management
Asked by Mike Scheutzow on 2022-01-05 22:46:58 UTC
Hi, I'm using the one with 20W and 6 cores. Which for some reason isn't listed on that link for the Xavier NX. This one should be the best performance-wise.
Asked by Pedro Leal on 2022-01-06 08:42:10 UTC
While slam is running, does
top
show any processes using 100% cpu?Asked by Mike Scheutzow on 2022-01-06 09:11:21 UTC
The laser_scan_matcher is taking up 95-100%; realsense2 around 40-50%; Xorg around 35%; lxterminal is 20%; depthimage_to_laserscan is 15%; and slam_toolbox is at 10%. The rest is all under 10%. So for now, I should: switch the scan_matcher algorithm for another alternative (any suggestions)? And probably just use ssh, since the Xorg taking up so much is likely thanks to be using monitor/kb/mouse together with the camera, right?
Asked by Pedro Leal on 2022-01-06 11:24:39 UTC
Yes, shutting down Xorg and using ssh is a good idea (although I don't recall Xorg using so much cpu.)
rqt_console
GUI app to identify which node is complaining about data loss?Asked by Mike Scheutzow on 2022-01-06 11:44:57 UTC
Alright, I've reduced the FPS in the realsense wrapper configuration file to 15 (from 30) and this reduced the CPU usage of the laser_scan_matcher to around 75%. It is now almost working as expected, both slam_toolbox as is Nav2. Slam_toolbox now publishes both map and transform (default 50hz) consistently, and Nav2 accepts goals. However, Slam_toolbox spams the same message (at a reduced rate (5hz)). And Nav2 spams:
[global_costmap.global_costmap_rclcpp_node]: Message Filter dropping message: frame 'camera_depth_frame' at time * for reason 'the timestamp on the message is earlier than all the data in the transform cache'
and after sending a goal: [controller_server-1] [WARN] [controller_server]:Unable to transform robot pose into global plan's frame
[controller_server-1] [ERROR] [tf_help]:Transform data too old when converting from odom to map
Asked by Pedro Leal on 2022-01-06 14:48:05 UTC
I'll further attempt to reduce unnecessary CPU usage, since I plan to use a LIDAR like Velodyne's which might require even more effort from the ICP node. I'll also play around with it's parameters(laser_scan_matcher's), as well as with both slam_toolbox and Nav2 parameters regarding buffers and timeouts.
By the way, which rqt_console plugin were you mentioning that would tell me about the data loss? The logging console, right?
Thanks for the help.
Asked by Pedro Leal on 2022-01-06 14:55:07 UTC
Regarding rqt_console, I forgot you are using ros2; I don't know if tool is the same as in ros1.
Regarding the message timestamp complaint, I don't have any suggestions. That message seems very strange if all your nodes are running on the same host.
Asked by Mike Scheutzow on 2022-01-06 15:52:27 UTC
were optimisations enabled? If not, that could cause a significant increase in CPU utilisation.
Asked by gvdhoorn on 2022-01-07 03:22:54 UTC
I simply followed the tutorial ([https://docs.ros.org/en/galactic/Installation/Ubuntu-Development-Setup.html]) to build ROS2. I have no idea if optimizations are enabled, and to be honest I don't know what you are referring to since I'm quite the newbie both at ROS2 and using Colcon. I'd appreciate a link to what you are mentioning.
Asked by Pedro Leal on 2022-01-07 06:30:34 UTC
This isn't ROS or Colcon specific.
I was simply referring to the level of optimisation the compiler should perform while it's compiling your packages. This can be configured using CMAKE_BUILD_TYPE (obviously only for CMake-based packages, which would be quite a few in a typical ROS 2 workspace).
The command given in the tutorial you link (
colcon build --symlink-install
) doesn't seem to do that, which leaves it up to individual packages to configure a build type.The default CMake build type is 'empty', which doesn't set any flags IIRC.
You could test whether building your workspace with the following command improves performance:
that would configure CMake to use
RelWithDebInfo
for all packages in that workspace.I'd suggest doing this in a new workspace, so caching doesn't affect the results.
Asked by gvdhoorn on 2022-01-07 06:45:00 UTC
Eigen is notorious for showing bad performance if no optimisations are enabled. It can't use its SIMD acceleration properly in most cases without them.
As to why the ROS 2 build tutorial / instructions don't set a build type I don't know.
Asked by gvdhoorn on 2022-01-07 06:46:45 UTC
I'll definitely rebuild a new workspace with the same src file and check for differences in CPU usage. My background is physics and I lack what might be some basic knowledge and it shows in this thread.
Regarding the specific case of Eigen, in my workspace I compile an updated laser_scan_matcher package (https://github.com/AlexKaravaev/ros2_laser_scan_matcher) for ROS2, as well as CSM (https://github.com/AlexKaravaev/csm). However, to obtain Eigen itself I believe I used apt-get install eigen3.
Would this method of obtaining Eigen result in it not being well optimized, and if so, should I alternatively compile Eigen from source using that flag you mentioned?
Asked by Pedro Leal on 2022-01-07 08:46:34 UTC
Eigen is header-only. The package you've installed provides those headers and the compiler will still be able to optimise how they get used by other packages.
I would recommend you just rebuild your workspace and see what happens.
You can always make things more complex later.
Asked by gvdhoorn on 2022-01-07 08:52:52 UTC