robot_localization (elf_se) fails
I have two launch files thay are identical except that one starts one additional node. Robotlocalization fails in one case and not in the other. The error makes me think of uninitialized memory someplace in the robotlocaliztion node.
[ERROR] [1557801978.336296726]: Client [/ekf_se] wants topic /imu to have datatype/md5sum [sensor_msgs/Imu/6a62c6daae103f4ff57a132d6f95cec2], but our version has [std_msgs/String/992ce8a1687cec8c8bd883ec73ca41d1]. Dropping connection.
^C[ekf_se-5]
I get the above if I start rosserial server before robotlocaization. No error if serial server is stated later.
The error does not seem to make sense. Is there really a topic called "992ce8a1687cec8c8bd883ec73ca41d1"?
Asked by chrisalbertson on 2019-05-13 22:01:29 UTC
Answers
I found the problem. Basically, a chain reaction caused by a buffer overflow.
I have a base controller that runs on an STM32 microcontroller that connects via USB to a Raspberry Pi running ros_serial. The microcontroller was publishing on /odom but there was a buffer error and the result was random garbage was being sent over the serial USB connection in addition to correct odometry. ros_serial interpreted this random gibberish as pointer to a standard message type "string" that contained just an ASCII "blank space" character being published on the /imu topic. This the just random luck.
I increased the size of a statically allocated buffer in the microcontroller and then robot-localization (running on the Pi3) stops writing errors messages.
So, what did I learn? ROS nodes are more tightly coupled than I would think. Buffer overflow on microcontroller causes unrelated node on another computer to fail. We need to better validate all incoming data to better contain errors
Asked by chrisalbertson on 2019-05-14 02:34:54 UTC
Comments
So, what did I learn? ROS nodes are more tightly coupled than I would think. Buffer overflow on microcontroller causes unrelated node on another computer to fail. We need to better validate all incoming data to better contain errors
In this case it was rosserial
that misinterpreted data. The rest of the system essentially never saw any of that data as due to md5 mismatches no connections could be setup.
rosserial
could probably be extended with more extensive checking.
Asked by gvdhoorn on 2019-05-14 02:39:25 UTC
Comments