Publish messages immediately within a subscribe callback?
What's the cleanest way to publish a message immediately, without triggering any subscriber callbacks? I would expect calling ros::spinOnce() inside a callback to (potentially) result in stack overflow as the callback calls itself.
Use case: I am trying to make heartbeat function that "just works" in callbacks and such. Here is a broken initial design:
class Heart {
private:
ros::NodeHandle nh;
std::unordered_map<std::string, ros::Publisher> publishers;
const char* file_;
public:
Heart(const char* file) {
file_ = file;
}
void beat(int line) {
string keystring = std::string("heartbeat_") + file_ + "_" + std::to_string(line);
std::replace(keystring.begin(), keystring.end(),'.', '_');
auto resultpair = publishers.find(keystring);
if (resultpair == publishers.end())
{
publishers[keystring] = nh.advertise<std_msgs::Empty>(keystring, 1);
}
publishers[keystring].publish(std_msgs::Empty());
ros::spinOnce();
};
};
Heart heart(__FILE__);
heart.beat(__LINE__);
The idea is that you spray a bunch of:
heart.beat(__LINE__);
all over your code, record with:
rosbag record --regex "/heartbeat_(.*)" -O heartbeats.bag
And look at the gloriously clearly sorted lines in:
rqt_bag heartbeats.bag
It is broken in the sense that spinOnce() isn't enough to result in even remotely reliable timings.
Asked by drewm1980 on 2015-10-20 02:19:43 UTC
Answers
What's the cleanest way to publish a message immediately, without triggering any subscriber callbacks?
Just use ros::Publisher::publish(..)
? As long as you're not running multithreaded or asynchronous spinners, callback queue processing is singlethreaded, so you should not run into issues.
I would expect calling ros::spinOnce() inside a callback to (potentially) result in stack overflow as the callback calls itself.
Calling spinOnce()
inside a callback is not a good idea in any case, for whatever reason. It generally points to bad data / control flow design.
Use case: Publishing messages as timing statements in different parts of a long running callback. rosbag record just the topics for the timing statements, look at their timing in rqt_bag
Could you describe your design for this a bit more? If you're publishing msg type X
for your timing info, and your long running callback is triggered by publications of type Y
, where do you expect there to be an issue with self-triggering?
edit:
It is broken in the sense that spinOnce() isn't enough to result in even remotely reliable timings.
I'd expect this to be the case because the infrastructure was never designed for this use case (the stamp
field is there for a reason).
As to your issue:
- you could probably get around the "spinOnce() calls my callback while I'm in the callback" by some judicious use of multiple callback queues and associated spinners. You are responsible for spinning custom queues, so that should allow you to call
spinOnce()
without triggering execution ofcallback_Y
- perhaps (this is really a guess) see if using UDPROS improves anything. Since it uses UDP, it should not do any buffering like TCP does, which might reduce the latency you observe
- use the
stamp
field and accept that for analysis / visualisation you can't directly userqt_bag
, but would need to process the messages and extract the timestamp - use proper profiling tools (like perf, oprofile, gprof, etc)
- see if you can use rqt_graphprofiler and rosprofiler (but really only for messaging analysis). Random presentation about this: ROS System Profiling and Visualization.
In general I'd say that if your bottleneck is in messaging, then yes, you could use bags and their visualisation to get an idea of that, and your approach may make sense (but I'd use other tools for that). If the bottleneck is in the callbacks, then it is an algorithm issue, and you should be profiling those algorithms, without the middleware interfering.
Asked by gvdhoorn on 2015-10-20 04:00:26 UTC
Comments
If I publish type X in a callback triggered by publications of type Y, and then call spinOnce() in the callback, the callback for type Y could be called again. Calling publish() alone isn't enough to push the message out immediately on the network socket, which is what I want for timing.
Asked by drewm1980 on 2015-10-20 04:08:57 UTC
I'm not sure I follow: why would callback_Y
be called again? Unless you're expecting there to be more outstanding msg_Y
instances in your queue?
Asked by gvdhoorn on 2015-10-20 04:11:56 UTC
As to timing: publishing is an asynchronous afair through-and-through. Once you let go of the msg, you don't have any control over it anymore. If capturing timing info is important, use the stamp
field in the header.
Asked by gvdhoorn on 2015-10-20 04:13:18 UTC
I'm trying to make something that isn't terribly accurate, but that is really easy to use, for debugging delays on the order of a tenth of a second. I updated the question with code
Asked by drewm1980 on 2015-10-20 04:18:41 UTC
and yes, another node is publishing messages of type Y in our discussion, and faster than my callback for type Y can handle them. I am manually instrumenting this callback to try to fix the problem. (really I am hoping to come up with a general solution to this problem, though)
Asked by drewm1980 on 2015-10-20 04:21:29 UTC
I'm looking into your first couple suggestions. I don't want to have to write a separate analysis tool (again). Sampling based profiling tools don't give you a good idea where your latencies are, or what your control flow was during the recorded history, especially if you're dropping data.
Asked by drewm1980 on 2015-10-21 09:00:38 UTC
Using a separate callBackQueue doesn't seem to work. And upon reflection, the delays I am seeing are way to big to be just a TCP vs. UDP thing. Here's a version with separate callback queue that wasn't better: http://paste.ubuntu.com/12892878/
Asked by drewm1980 on 2015-10-22 03:59:48 UTC
Comments