Ask Your Question
0

getMTNodeHandle/getMTPrivateNodeHandle doesn't result in additional callbacks, but does take more cpu

asked 2018-07-26 10:52:06 -0500

lucasw gravatar image

updated 2018-07-28 09:21:43 -0500

I've made an example nodelet https://github.com/lucasw/nodelet_demo that has a callback that takes a lot of time to complete, more time than is available before the next message is received. For non-nodelets in the past I've successfully used asyncspinners to handle that case (TODO make an example of that in same repo above), but with nodelets I see the following behavior:

The MTNodeHandle callback gets called no more often than the regular NodeHandle one. The nodelet manager in my case declares it has 8 worker threads, and in a more complex situation (not shown in the example, TODO add it later) I can see that multiple nodelets do spread across multiple threads.

The cpu usage goes way up (top show 500% cpu instead of 100%, 800% being the max for my cpu) when using the MTNodeHandle, as if what are otherwise dropped messages for single threads are still eating up cpu somewhere, but there is no external indication- no additional log messages from the callback or publishes from within the callback are seen.

void NodeletDemo::callback(const std_msgs::Float32ConstPtr& msg)
{
  ROS_INFO_STREAM("cpu id " << get_cpu_id() << ", count " << msg->data);
  // track cpu usage if true
  bool busy_wait = true;
  if (!busy_wait)
  {
    ros::Duration(callback_delay_).sleep();
  }
  else
  {
    ros::Time t0 = ros::Time::now();
    while ((ros::Time::now() - t0).toSec() < callback_delay_)
    {
    }
  }
  pub_.publish(msg);
}
...
nh = getMTNodeHandle();
sub_ = nh.subscribe("input", input_queue_size,
    &NodeletDemo::callback, this);

The default delay is 1.0 seconds, these launches result in 10 times as many messages being sent as a single threaded callback is able to handle:

roslaunch nodelet_demo demo.launch period:=0.1 mt_callback:=false
roslaunch nodelet_demo demo.launch period:=0.1 mt_callback:=true

Should I instance more nodelets of the same type into the manager subscribing to the same topics, and they will get messages distributed to them properly? This works as long as information doesn't need to be shared between the instances- that could be handled with addtional topics but is more cumbersome than using asyncspinner (though the likely required mutex locks could be equally or more cumbersome). Update Trying this crashes the nodelet manager.

Multiple subs to the same topic within the same nodelet? (I don't think that works elsewhere, but maybe here?) Update Trying this crashes the nodelet manager.

Update Trying to create an async spinner within the nodelet results in the error AsyncSpinnerImpl: Attempt to spin a callback queue from two spinners, one of them being single-threaded. In the future this will throw an exception! Which is maybe a clue to what is going wrong- is the nodelet locked in to be single threaded at a higher level?

Is there a good online example of using getMTNodeHandle()?

edit retag flag offensive close merge delete

Comments

lucasw gravatar imagelucasw ( 2018-07-28 09:38:28 -0500 )edit

2 Answers

Sort by ยป oldest newest most voted
0

answered 2018-07-28 11:14:58 -0500

lucasw gravatar image

updated 2018-07-29 11:26:28 -0500

With additional tests it looks like the answer is that a single subscriber will be single threaded, the MT node handles will just allow other subscribers to run concurrently. (And I was probably misremembering ASyncSpinner, it is the same?)

That doesn't explain the cpu usage increase though.

So the solution would be to demux the input then have a number of corresponding subscribers inside the nodelet that can run in parallel. This is inflexible which is good in that it limits the ability of too many incoming messages to take every thread available.

The nodelet topic tools demux for std would look like this (except demux doesn't do what I want it to https://github.com/ros/nodelet_core/i... ):

Two subscribers on different topics going to same callback:

sub1_ = nh.subscribe("input1", input_queue_size,
    &NodeletDemo::callback, this);

sub2_ = nh.subscribe("input2", input_queue_size,
    &NodeletDemo::callback, this);

...
#include <nodelet_topic_tools/nodelet_demux.h>
typedef nodelet::NodeletDEMUX<std_msgs::Float32> Float32NodeletDemux;
PLUGINLIB_EXPORT_CLASS(Float32NodeletDemux, nodelet::Nodelet);

and in the plugin xml:

<class name="Float32NodeletDemux"
  type="Float32NodeletDemux"
  base_class_type="nodelet::Nodelet">
<description>
  NodeletDEMUX Float32 instance
</description>
</class>

and the launch:

 <node pkg="nodelet" type="nodelet" name="data_demux"
      args="load Float32NodeletDemux manager" output="screen">
   <rosparam>
      output_topics: [input1, input2]
   </rosparam>
 </node>

but it isn't working working (fails with:

typename boost::detail::sp_dereference<T>::type boost::shared_ptr<T>::operator*() const [with T = ros::Publisher; typename boost::detail::sp_dereference<T>::type = ros::Publisher&]: Assertion `px != 0' failed`)

https://github.com/ros/nodelet_core/i... ) but I'll update this when it does work. (also should update the wiki for demux when it is working, it has the old PLUGINLIB_DECLARE_CLASS)

edit flag offensive delete link more
1

answered 2018-11-23 19:04:05 -0500

peci1 gravatar image

One of your problems has a solution described here: https://stackoverflow.com/a/48544551/... . In short, you're looking for SubscribeOptions.allow_concurrent_callbacks, which effectively allows more threads to process callbacks for a single subscriber. You just have to make sure then to correctly handle shared data access (e.g. accessing class member variables).

Borrowing code from Fruchtzwerg:

ros::SubscribeOptions ops;
ops.template init<std_msgs::String>("chatter", 1000, chatterCallback);
ops.transport_hints = ros::TransportHints();
ops.allow_concurrent_callbacks = true;
ros::Subscriber sub = nh.subscribe(ops);

I haven't tested that, but it should work.

However, I faced the very same problem you have with the high CPU usage when the callback blocks for longer then is the interval between messages. I'd expect queue_size to slowly crawl in and start throwing away messages that did not make it, but instead the Async/Multithreaded spinner just spins all the threads in a busy-wait for some reason. The only case where it doesn't happen is when you start these spinners with a single thread. I looked for what's the difference, and it seems it's either calling callAvailable() in the 1-threaded case, or callOne() in the more-threaded case . So I'd look for the bug somehere around, but I haven't succeeded yet. Maybe https://github.com/ros/ros_comm/issue... is also relevant (however I debug this problem in Indigo which is not affected by the mentioned bug, or at least a fix for it hasn't been backported there; and I do not have a custom build of ros_comm).

edit flag offensive delete link more

Comments

I created an issue for that: https://github.com/ros/ros_comm/issue... .

peci1 gravatar imagepeci1 ( 2018-11-23 19:31:00 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

2 followers

Stats

Asked: 2018-07-26 10:52:06 -0500

Seen: 338 times

Last updated: Nov 23 '18