Why is spinOnce() right after publish() flaky?
I have a simple C++ test code which contains a subscriber, a publisher and nothing else. I do the following:
- Set up subscriber
sub
with queue size 1. - Set up publisher
pub
with queue size 1. - Wait until
sub.getNumPublishers()
is 1 to make sure the two guys talked (wait up to 1 second). pub.publish(msg)
std::thread t(&ros::spinOnce);
What's weird is that the publish call is flaky, i.e. in about 10% of cases, the subscriber callback isn't fired (regardless of how long I wait).
I can't understand why that happens, because:
- I'm running with sim_time set to true and nobody publishes the time, so time should not play a role.
- There are no other nodes running on this rosmaster.
- Both the subscriber and publisher are in the same process, so they should use intraprocess communication. Thus nothing like Nagle should play a role.
My understanding of what publish()
does is the following:
- Serialize the data
- Find all subscriber links
- Store the serialized data together with some other goodies in the subscriber link's callback queue.
There's nothing obviously asynchronous here. Am I missing something? Or is it possible that sub.getNumPublishers()
shows 1 while the publisher and subscriber are not really yet connected? (but I wait 1 second for this to happen...).
The only other explanation I can think of is that something else gets to the subscriber's queue. But what would that be? It is definitely not "converted" into calling the subscriber callback I registered because that's what I'm checking and what fails.
MWE:
#include <ros/ros.h>
#include <std_msgs/Header.h>
#include <thread>
int main(int argc, char **argv)
{
ros::init(argc, argv, "test_flaky_publish");
ros::Time::setNow(ros::Time(0));
bool called = false;
ros::NodeHandle nh;
auto cb = [&](const std_msgs::Header&){called = true;};
ros::Subscriber sub = nh.subscribe<std_msgs::Header, const std_msgs::Header&>("test", 1, cb);
ros::Publisher pub = nh.advertise<std_msgs::Header>("test", 1);
while (sub.getNumPublishers() == 0 && ros::ok())
ros::WallDuration(0, 1000).sleep();
pub.publish(std_msgs::Header());
std::thread t(&ros::spinOnce);
ros::WallDuration(0.5).sleep();
if (called)
ROS_INFO("called");
else
ROS_ERROR("not called");
if (t.joinable())
t.join();
return called ? 0 : 1;
}
is that actually true in ROS 1? IIRC, that's something only nodelets do.
I think that's true only for no-serialization comms. But no-tcp inside one process should work everywhere, even outside nodelets.