For Nav2, LIDAR timestamp on the message is earlier than all the data in the transform cache

asked 2021-12-31 08:29:46 -0500

billymccafferty
1 ●1 ●2 ●1

I am trying to bringup Nav2 on a "server" Ubuntu machine hosting ROS2 Rolling, using messages published from a physical robot "client" via Raspberry Pi hosting ROS2 Foxy. The URDF description is at https://github.com/billymccafferty/xerbo .

When I bring up Nav2 via ros2 launch nav2_bringup navigation_launch.py on the server, I'm seeing the following error:

[controller_server-1] [INFO] [1640913301.944847057] [local_costmap.local_costmap_rclcpp_node]: Message Filter dropping message: frame 'laser_frame' at time 1640913301.637 for reason 'the timestamp on the message is earlier than all the data in the transform cache'

I have chrony running on both the server and client with successful clock sync on the client. E.g., if I run ntpdate from the client to the server, I see the following, showing that they're almost perfectly synced:

server 192.168.x.x, stratum 8, offset -0.000500, delay 0.03116
30 Dec 18:26:02 ntpdate[7778]: adjust time server 192.168.x.x offset -0.000500 sec

Furthermore, if I echo /scan coming from the client and compare that to echo /tf on the server, I see that the header timestamps are identical each other; e.g.,:

/tf

transforms:
- header:
    stamp:
      sec: 1640913147
      nanosec: 522616833
    frame_id: odom

/scan

header:
  stamp:
    sec: 1640913147
    nanosec: 60040883
  frame_id: laser_frame

I can get everything working fine in the demos and such, but am running into this issue with Nav2 using my physical robot.

Thank you!

Billy McCafferty

edit retag flag offensive close merge delete

Comments

Hey, did you solved your issue ? I have s similar one that appeared after migrating from Foxy to Galactic, and even if all my stack runs on the same single physical computer. I use 2 Hokuyos lidars that are merged. My warning message comes from global_costmap.global_costmap_rclcpp_node.

PatoInso ( 2022-01-31 04:32:13 -0500 )edit

Not yet. :(

billymccafferty ( 2022-01-31 16:16:55 -0500 )edit

I have other issues that may be related (like this), that bend me toward the map <-> odom transform. I use the slam_toolbox in localization mode to provide this TF. If I disable TF publication from slam toolbox and instead publish map <-> odom TF from command line with ros2 run tf2_ros static_transform_publisher 0 0 0 0 0 0 1 odom map, the error goes away...

PatoInso ( 2022-02-01 09:10:14 -0500 )edit

ok, this issue and my other one are somehow linked. Under Galactic, the slam_toolbox timestamps the TF map-odom with the timestamp of the last laserscan + transform_timeout offset (see here). Under Foxy, it was timestamped with now() + transform_timeout offset (see here).

I replaced in Galactic the laser timestamp by now(), and my two issues got resolved... definitively something to look here.

PatoInso ( 2022-02-02 10:20:09 -0500 )edit

I'm very happy that I'm not the only one experiencing this! I will try your suggested action for the map <-> odom TF to see if that resolves the issue in the short term, as well as the now() fix you suggested. Thank you very much for your input!

billymccafferty ( 2022-02-03 14:21:39 -0500 )edit

Ok, I'm lost. The slam_toolbox seems involved but not sure how, I have two issues intricated or that give the same error. However I could isolate this: both issues started when I corrected the timestamp of the TF base_link <-> odom provided by rf2o laser odometry, to use the scan's timestamps instead of now(). There probably some timing issue here...

PatoInso ( 2022-02-04 11:25:53 -0500 )edit

add a comment

2 Answers

Sort by » oldest newest most voted

answered 2023-08-03 05:41:43 -0500

Sönke Niemann
1

I had the exact same problem on ROS2 Humble with the nav2 stack running on a PC connected to the network over ethernet and the actual robot connected to the same network but over Wifi. First I installed chrony on all PCs and set the Robot PC as a local time server to which all other PCs in the network get synced to. However this didn't help.

But then I found this issue on the nav2 github, where the maintainer suggests to use Eclipse CylconeDDS as the RMW instead of the default eProsima fastDDS. By following the instructions to install Cyclone DDS which are simply

sudo apt install ros-humble-rmw-cyclonedds-cpp to install Cyclone DDS on Humble
and putting this line export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp into your bashrc (nano .bashrc or gedit .bashrc if you want a GUI tool and then insert at the bottom and save)

For me this solved all problems and I have the impression the ROS2 network overall runs faster and more stable.

edit flag offensive delete link

add a comment

answered 2022-02-08 09:28:55 -0500

PatoInso
109 ●8 ●19 ●21

Ok, may be very specific but for my case, it was because of delays in my laserscan pipeline:

A first delay introduced by a Message Filter that cached some laserscan message to sync them in my scan merger. So when a new scan was received, the previous scan was typically outputed (so 1 sample of delay)
The struct of my laser_odometry package was so that the the TF odom <-> base_link was computed and timestamped with the latest laserscan, but was only published 100ms later (a structure with a process() then a spin_some() that processed obsolete data)

Solving this two issues and reducing the delay, so that the timestamp of the TF remains close of now(), solved this behaviour for me.

edit flag offensive delete link

Comments

PatoInso, if convenient, would you mind sharing any techniques you leveraged to pin down how long the delay was and where it was occurring? Thank you!

billymccafferty ( 2022-02-08 11:53:56 -0500 )edit

Nothing terrible :)

1) I removed the message filter, and just wait for one laserscan from each sensor before merging them and immediately publish the merged scan

2) My laser odometry node was like:

// Loop computing and publishing odometry at 0.1s
while (rclcpp::ok()){
     myLaserOdomNode->process();           // Compute and publish odom TF, from the last laserscan
     rclcpp::spin_some(myLaserOdomNode);   // Publish TF and process latest laserscan callback (retrieve laserscan message <- here was the 100ms delay)
     rate.sleep();
  }

I just added a spin_some() before process() to grab the latest scans befor doing computations.

PatoInso ( 2022-02-08 12:18:49 -0500 )edit

To detect it I printed the time at which the TF and scans were published an received in each nodes of the pieline, and compared it with the timestamps contained in the messages themselves

PatoInso ( 2022-02-08 12:20:28 -0500 )edit

add a comment

For Nav2, LIDAR timestamp on the message is earlier than all the data in the transform cache

Comments

2 Answers

Comments

Question Tools

Stats

Related questions

For Nav2, LIDAR timestamp on the message is earlier than all the data in the transform cache edit

Comments

2 Answers

Comments

Question Tools

Stats

Related questions

For Nav2, LIDAR timestamp on the message is earlier than all the data in the transform cache