ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Here's the root of the problem: you should never put a loop in a callback (that is not explicitly bounded/controlled). In your case, it's not that timer callbacks are "specially handled", it's that TimerCallback doesn't complete. The next message in the callback queue is not processed until the current one completes.

Callbacks should be fast and always complete (e.g., a point cloud handler copies message data to be processed elsewhere later; note that even that sometimes causes performance issues).

An example analogous to your situation is a GUI system whose main thread becomes unresponsive while doing heavy processing -- the next GUI event cannot be processed until the current one finishes (like when RViz or another Qt application "greys out").

[Edit: looking into this, it's not clear to me why non-timer messages are being dropped, but this can be avoided by removing the loop/spinOnce in TimerCallback. Since there doesn't seem to be a strike-through markup and in the interest of some measure of clarity, I apologize, but changes to my original answer text will not be apparent.]

Here's the root of the problem: you should never put a loop in a callback (that is not explicitly bounded/controlled). In your case, it's not that timer callbacks are "specially handled", it's that TimerCallback doesn't complete. The next message in I think that's the issue, coupled with the spinOnce it contains and with the subscriber's limited queue size.

It appears to me that (as @Geoff has called them) the "recursive" spinOnce calls are not clearing the subscriber's queue; so further messages are being dropped, message_received_ is never being set to true, and the loop never exits. You can see some aspect of this with the following:

void TimerCallback(const ros::TimerEvent &timer_event) {
  int wait_count_ = 0;
  while(message_received_ == false) {
    ros::spinOnce();
    ros::Duration(0.01).sleep();
    if (wait_count_++ > 10) {
      break;
    }
  }
}

In my testing (which matches your original description of the situation, but not your comment that other messages are processed), exiting TimerCallback will then return to expected processing, in that calling of ImportantCallback will resume. More evidence: there will be a gap in the count from the publisher, indicating to me that messages are being dropped. (Also, increasing the queue size to 2 was beneficial, but I think would exhibit the same problem if let go longer than I'm willing to watch.) This might also explain what I see as a discrepancy in what you've said about processing other messages -- since I'm only looking at a single subscriber, I don't know how others would behave...perhaps their queue limits are not being reached and their messages are not being dropped.

While the above isn't a complete answer, I think there is a solution: reduce the sleep duration in main and remove the timer completely, moving whatever logic is currently there to an if (message_received_) { // do stuff } after the spinOnce in main (obviously, setting message_received_ to false when appropriate). Assuming single-threaded operation, the callback queue is not processed until the current one completes.

Callbacks inherently synchronizes message reception, so the logic should be fast and always complete (e.g., a point cloud handler copies message data to be processed elsewhere later; note that even that sometimes causes performance issues).straightforward.

An example analogous to your situation is a GUI system whose main thread becomes unresponsive while doing heavy processing -- the next GUI event cannot be processed until the current one finishes (like when RViz or another Qt application "greys out").