nodelet load process does message (de)serialization?

asked 2020-06-11 11:27:58 -0500

1366 ●27 ●69 ●60

updated 2020-06-12 04:36:54 -0500

I've just run a profiler on my nodelets and I've found out an interesting thing. It seems that message (de)serialization is done by the processes launched by nodelet load, and not in the manager. Is that right? How could it then pass pointers around to the nodelet manager? It also seems that there is some serialization effort even though the nodelet doesn't publish anything consumed outside the manager.

The nodelet subscribes to slow pointclouds (0.3 Hz, about 1M points) and publishes similar ones (it is based on pcl_ros nodelets). Each nodelet load process is taking about 10% of one core (mobile i7) to just deserialize and serialize the data. That seems quite a lot given the size and frequency of the pointcloud. The only other subscribed topic is /tf (which I subsampled for these nodelets so that it isn't anything big). Am I doing something wrong? The publisher is publishing sensor_msgs::PointCloud2Ptr and the subscriber's argument type is const sensor_msgs::PointCloud2ConstPtr&. All nodes participating in this are compiled in release mode.

A profiler view of the nodelet manager shows just the "effective work", i.e. pointcloud transformation. Here's a view from the profiler of the nodelet load process (it isn't perfect, but you should be able to decode the big parts dedicated to (de)serialization).

image description

Not to future self and others: the perf data can be recorded with perf record --freq=997 --call-graph dwarf -q -o /tmp/perf -p $PID_OF_THE_PROCESS. I used Hotspot to view the perf file.

edit retag flag offensive close merge delete

Comments

It seems that message (de)serialization is done by the processes launched by nodelet load,

This confuses me: nodelet load does not start any processes, unless you are referring to the nodelet binary which calls a few services to load the shared library and start the nodelet.

Additionally: could you clarify where in your flamegraph you are seeing (de)serialisation happen?

Each nodelet load process is taking about 10% of one core (mobile i7) to just deserialize and serialize the data.

if you're doing things right, that would not be possible.

Again: nodelet load will only call a few services and then exit. Unless you are starting a manager or invoking it with standalone.

Please show the commands your using to start things.

gvdhoorn ( 2020-06-11 14:25:49 -0500 )edit

@gvdhoorn: As you can read in my answer, I've found that the culprit is the /clock topic. However, I have to correct you - it is not true that the nodelet load node exits when it finishes launching the node. It actually sits there (on a ros::spin() call) and waits for either broken bond or sigint, in which case it tries to call unload_nodelet on the manager before exit.

peci1 ( 2020-06-12 04:35:17 -0500 )edit

add a comment

answered 2020-06-12 04:32:24 -0500

peci1

1366 ●27 ●69 ●60

Found the culprit! It is the simulated time from Gazebo at 1 kHz which causes this performance degradation! 15% of a core per node! But as long as it's only in simulation, it's more or less okay for me.

I verified these findings - with rosbag play, clock is published only on 100 Hz, and the utilization is about 4% of a core per node. With real time, the nodelet load CPU utilization is almost zero.

Here's a flamegraph of the same nodelet setup running with real time (use_sim_time set to false and rosbag play without --clock). Notice the time line - it is almost empty. The only thing that's going on is the bond stuff, which is quite cheap, however.

image description

edit flag offensive delete link

add a comment

nodelet load process does message (de)serialization?

Comments

1 Answer

Question Tools

Stats

Related questions

nodelet load process does message (de)serialization? edit

Comments

1 Answer

Question Tools

Stats

Related questions

nodelet load process does message (de)serialization?