Ask Your Question
1

What can cause spinOnce to stop processing messages?

asked 2017-04-20 13:07:32 -0500

cpagravel gravatar image

updated 2017-04-20 13:11:06 -0500

I'm trying to debug a very rare instance (that happens randomly) in my node when I will be waiting for a specific message and it never gets received. I've created a very rough code example to help illustrate how things are happening. My actual code has a lot more subscribers and classes, but this should get the point across.

Key Points

  • spinOnce() is being called inside a createTimer() callback
  • Messages are not being added to subscribers queue. (verified)
  • Messages are being published by publisher. (verified)
  • After the TimerCallback() function completes, messages will get received again.

Publisher Node:

int main(int argc, char** argv)
{
  ros::init(argc, argv, "Publisher");
  ros::NodeHandle node;
  ros::Rate r(1);
  publisher = node->advertise<Message_type>(topic_string, 1, true);
  int count = 0;
  while (ros::ok())
  {
    count++;
    publisher.publish(Message_type(count));
    r.sleep();
  }
}

Subscriber Node:

bool message_received_ = false;

void ImportantCallback(const Message_type& msg)
{
  ROS_INFO_STREAM("count: " << msg.count);
  message_received_ = true;
} 

void TimerCallback(const ros::TimerEvent &timer_event)
{
  // wait for a particular message to be received
  while(message_received_ == false)
  {
    ros::spinOnce();
    ros::Duration(0.01).sleep();
  }
  // do stuff
  ROS_INFO("Message received");
}


int main(int argc, char** argv)
{
  ros::init(argc, argv, "Subscriber");
  ros::NodeHandle node;
  ros::NodeHandle pr_node("~");
  ros::Rate r(10);
  // Other subscribers initialized here
  subscriber = node->subscribe(topic_string, 1, ImportantCallback, this);
  InitializeSubscribers(node);

  plan_timer_ = node->createTimer
    (ros::Duration(0.01), &TimerCallback, this);

  while (ros::ok())
  {
    ros::spinOnce();
    r.sleep();
  }
}

Any ideas are greatly appreciated.

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
1

answered 2017-04-20 22:26:35 -0500

Geoff gravatar image

Recursively calling spinOnce() can be dangerous. You might end up calling your TimerCallback() recursively, particularly with the timer's duration being so short.

As an alternative to recursively calling spinOnce(), you could try using a separate callback queue for the timer. See here for how this works:

http://wiki.ros.org/roscpp/Overview/C...

If you want to ignore all data received on other topics until a particular message is received, there might be better ways to do it. For example removing subscribers (sub.shutdown()) and re-creating them when ready to handle data again - although this is costly in destroying and setting up connections. Another would be a flag that tells other subscriber callbacks to ignore all received data until that particular message is received.

edit flag offensive delete link more

Comments

I had performed some experiments and found out that the timer callbacks are specially handled. I.e. once you're in a timer callback function, a call to spinOnce() will not trigger another time callback until your timer callback function has completed.

cpagravel gravatar imagecpagravel ( 2017-04-21 10:03:11 -0500 )edit

It seems the problem might have been fixed by avoiding the use of the timer callbacks, but I don't understand why. I also did some experiments to test the limit on recursively calling spinOnce() inside my callbacks (with the use of infinite loops). It seemed that, aside from the timer callback...

cpagravel gravatar imagecpagravel ( 2017-04-21 10:05:01 -0500 )edit

spinOnce() functioned as normal.

cpagravel gravatar imagecpagravel ( 2017-04-21 10:05:13 -0500 )edit

Yeah, I made a mistake when I was reading the code and docs. It turns out that the CallbackQueue class is actually designed to handle things like callbacks calling the queue again. See #9 in http://wiki.ros.org/roscpp/Internals . So the first part of my answer is wrong, I think.

Geoff gravatar imageGeoff ( 2017-04-23 19:35:13 -0500 )edit
0

answered 2017-04-22 09:14:15 -0500

kramer gravatar image

updated 2017-04-22 20:10:25 -0500

[Edit: looking into this, it's not clear to me why non-timer messages are being dropped, but this can be avoided by removing the loop/spinOnce in TimerCallback. Since there doesn't seem to be a strike-through markup and in the interest of some measure of clarity, I apologize, but changes to my original answer text will not be apparent.]

Here's the root of the problem: you should never put a loop in a callback (that is not explicitly bounded/controlled). In your case, it's not that timer callbacks are "specially handled", it's that TimerCallback doesn't complete. I think that's the issue, coupled with the spinOnce it contains and with the subscriber's limited queue size.

It appears to me that (as @Geoff has called them) the "recursive" spinOnce calls are not clearing the subscriber's queue; so further messages are being dropped, message_received_ is never being set to true, and the loop never exits. You can see some aspect of this with the following:

void TimerCallback(const ros::TimerEvent &timer_event) {
  int wait_count_ = 0;
  while(message_received_ == false) {
    ros::spinOnce();
    ros::Duration(0.01).sleep();
    if (wait_count_++ > 10) {
      break;
    }
  }
}

In my testing (which matches your original description of the situation, but not your comment that other messages are processed), exiting TimerCallback will then return to expected processing, in that calling of ImportantCallback will resume. More evidence: there will be a gap in the count from the publisher, indicating to me that messages are being dropped. (Also, increasing the queue size to 2 was beneficial, but I think would exhibit the same problem if let go longer than I'm willing to watch.) This might also explain what I see as a discrepancy in what you've said about processing other messages -- since I'm only looking at a single subscriber, I don't know how others would behave...perhaps their queue limits are not being reached and their messages are not being dropped.

While the above isn't a complete answer, I think there is a solution: reduce the sleep duration in main and remove the timer completely, moving whatever logic is currently there to an if (message_received_) { // do stuff } after the spinOnce in main (obviously, setting message_received_ to false when appropriate). Assuming single-threaded operation, the callback queue inherently synchronizes message reception, so the logic should be straightforward.

edit flag offensive delete link more

Comments

The actual loop is bounded by a time limit. The code sample is just for communication and I didn't crowd it with handling. I don't agree with your statement: The next message in the callback queue is not processed until the current one completes. I've conducted experiments and...

cpagravel gravatar imagecpagravel ( 2017-04-22 13:14:12 -0500 )edit

found that this is not the case. At least for any messages besides the TimerCallback messages. I.e., if I enter a infinite while loop in a callback, I can still process other messages even though the callback has not been completed. I have heard someone also say that callbacks should be as fast...

cpagravel gravatar imagecpagravel ( 2017-04-22 13:16:01 -0500 )edit

as possible, but I don't understand why. These aren't interrupt service routines, so what's the reason for that? Does it say that in the docs somewhere with an explanation?

cpagravel gravatar imagecpagravel ( 2017-04-22 13:17:20 -0500 )edit

OK, new information for me as to other messages processing. I'll edit my answer shortly. Edit: ok, not as shortly as I had hoped. :(

kramer gravatar imagekramer ( 2017-04-22 15:07:37 -0500 )edit

If the node is single-threaded nothing else will be processed while the callback is running. If the callback takes long enough then this might result in messages being lost or timers being missed due to queues filling up. That's probably the reason some people say make callbacks short.

Geoff gravatar imageGeoff ( 2017-04-23 19:37:47 -0500 )edit

A possible reason is the timer is very short relative to the callback processing time (actually probably shorter). If the timer fires again before that callback returns the new timer event might get on the queue before any messages. If this repeats enough, unprocessed messages might be dropped.

Geoff gravatar imageGeoff ( 2017-04-23 19:41:44 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2017-04-20 13:07:32 -0500

Seen: 713 times

Last updated: Apr 22 '17