ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
0

ros_tutorials roscpp talker/listener loses first message or two

asked 2018-04-04 18:35:16 -0500

PaulBouchier gravatar image

updated 2018-04-05 09:55:43 -0500

This must've been seen before...I ran into it while trying to track down a lost message from a command-line app which publishes one message then exits. I'm running ros kinetic on Ubuntu 16.04.4. My colleague has duplicated the problem using the ros_tutorials, and even found it happens on indigo!

I downloaded the ros_tutorials and when I run the listener then the talker, I see the first one or two messages are not received by the listener. Note: I am careful to start the listener many seconds before the talker, so I know it's sitting there waiting to receive messages on the topic.


terminal 1:

$ rosrun roscpp_tutorials listener
[ INFO] [1522884464.985198808]: I heard: [hello world 3]
[ INFO] [1522884465.084941089]: I heard: [hello world 4]
[ INFO] [1522884465.184948624]: I heard: [hello world 5]

terminal 2:

$ rosrun roscpp_tutorials talker
[ INFO] [1522884464.684117550]: hello world 0
[ INFO] [1522884464.784338065]: hello world 1
[ INFO] [1522884464.884241317]: hello world 2
[ INFO] [1522884464.984259097]: hello world 3
[ INFO] [1522884465.084324610]: hello world 4
[ INFO] [1522884465.184348827]: hello world 5
^C[ INFO] [1522884465.284327998]: hello world 6

EDIT: The use case is I'm using a command-line utility to inject a fault-notification DiagnosticStatus message onto /diagnostics, where the diagnostic_aggregator is already up and running, and other publishers have been publishing for some time. In this case it is important to not lose messages - it is not a "emit sensor-data" use case, and the service model cannot be used.

FWIW the python talker/listener do not lose initial messages. The problem goes away if I put ros::Duration(1).sleep() between the call to advertise and the call to publish, but does not go away if I put a ros::Rate sleep() for 10 seconds between the advertise and the publish.

Once the messages start flowing they all come through - it's the first one or two that get lost.

EDIT2: The following code snippet in the talker seems to prevent initial packet loss. Advertising:

  ros::Publisher chatter_pub = n.advertise<std_msgs::String>("chatter", 10, true);

  ros::Duration(0.5).sleep();
  ros::spinOnce();
  ros::Duration(0.5).sleep();

Inside the loop:

chatter_pub.publish(msg);
ros::spinOnce();
ros::Duration(0.5).sleep();

Something seems broken. Does anyone have any clues?

EDIT3: Per suggestion from @gvdhoorn I added chatter_pub.getNumSubscribers() between the ROS_INFO that prints what the talker is about to publish, and the actual publish.

Talker output:

$ rosrun experiments talker1
[ INFO] [1522935255.630339354]: hello world 0
[ INFO] [1522935255.630375560]: Number of subscribers before publishng: 0
[ INFO] [1522935255.730489884]: hello world 1
[ INFO] [1522935255.730574901]: Number of subscribers before publishng: 0
[ INFO] [1522935255.830478047]: hello world 2
[ INFO] [1522935255.830555734]: Number of subscribers before publishng: 0
[ INFO] [1522935255.930438125]: hello world 3
[ INFO] [1522935255.930517228]: Number of subscribers before publishng: 1
[ INFO] [1522935256.030550391]: hello world 4
[ INFO] [1522935256.030636388]: Number of subscribers before publishng: 1

Listener output:

$ rosrun experiments listener2 
[ INFO] [1522935255.931268459]: I heard: [hello world 3]
[ INFO] [1522935256.031134685]: I heard: [hello world ...
(more)
edit retag flag offensive close merge delete

Comments

I am careful to start the listener many seconds before the talker, so I know it's sitting there waiting to receive messages on the topic

subscriptions still take time and msgs could be published before they are registered. If you add a getNumSubscribers() to the tutorial node, does it change?

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 07:31:53 -0500 )edit

Thanks for the suggestion gvdhoorn. I updated the post with edit3 with a code snippet that waits until getNumSubscribers is non-zero, and that fixed it. I wonder if I should put a note on the wiki page for pub/sub tutorial

PaulBouchier gravatar image PaulBouchier  ( 2018-04-05 08:50:04 -0500 )edit

I believe this is not specific to this tutorial, but a general characteristic of how pub-sub works (or: is implemented in ROS). I wouldn't know where to put this on the wiki so that it gets the attention it deserves though, so perhaps adding a note to the tutorial would be ok.

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 08:53:57 -0500 )edit

Note also that the answer by @knxa is the answer here. I only provided one possible way to "work around" this characteristic of pub-sub.

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 08:54:45 -0500 )edit

2 Answers

Sort by ยป oldest newest most voted
3

answered 2018-04-05 01:42:24 -0500

knxa gravatar image

This is not something broken. The published data is not stored anywhere. It's just published. If no one has yet subscribed, the message is lost. So it all depends on the order and timing of your nodes: when do the talker allow connections (advertise) and when do the listener actually subscribe.

Normally you will want a design where it is not a problem if the first message it lost. Maybe it helps to think of the published data as a momentarily status for a sensor, say it might be a temperature reading. The interesting thing is the actual temperature, which is published with some interval and the first temperature reading is usually not important.

However to some extend you can actually cache some data to help clients (listeners) keep up with the message stream. Read about queue sizes and the latch option here

edit flag offensive delete link more

Comments

I would also note the utility of ros::Publisher::getNumSubscribers() in these cases.

If it's important that at least N subscribers receive a msg, check for N subscribers.

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 01:45:12 -0500 )edit

@PaulBouchier: as @knxa writes, this is probably "by design". Pub-sub is anonymous by nature and there is no persistent msg store anywhere (by default), so late joiners will always miss out on msgs published before they connected. Latching can mitigate that to some extend, but it's fundamental. ..

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 01:47:02 -0500 )edit

.. If loss of messages is unacceptable, it might make more sense to use a synchronous interaction pattern (ie: one that requires both sender and receiver to be on-line at the same time, and known). Services or Actions could be alternatives then.

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 01:47:58 -0500 )edit

I edited the question to clarify the use case - the listener has been up a long time (diagnostic_aggregator). I can't use a service because I'm injecting a fault onto /diagnostics from a command-line utility. Messages are lost even if listeners are up and a different pub has been sending to them.

PaulBouchier gravatar image PaulBouchier  ( 2018-04-05 08:06:35 -0500 )edit

Data exchange between nodes is p2p, so other datastreams probably do not factor in (the fact that they are 'active' does not matter, your subscribers will still need to establish a conx with the new publisher).

If you can, please try adding the getNumSubscribers() before publishing, ..

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 08:08:04 -0500 )edit

.. just to see if that changes things. Using a latched publisher could also work, but you'll have to have your node stick around "long enough", until all intended recipients have received it (not sure how to do that).

All of this is speculation btw, so it'd be good to check.

gvdhoorn gravatar image gvdhoorn  ( 2018-04-05 08:08:56 -0500 )edit

Guys you saved my day!!! Thanks

pse18_10 gravatar image pse18_10  ( 2019-03-21 03:05:31 -0500 )edit
1

answered 2018-04-05 10:31:07 -0500

PaulBouchier gravatar image

updated 2018-04-05 10:34:04 -0500

As @gvdhoorn and @knxa noted, this is expected behavior (though perhaps surprising), because the advertise() call returns before the subscriber has made the socket connection to the publisher (which takes maybe 300ms).

If you are writing a command-line app that has to publish something to a listener that is already running, there are two things you have to do:

  1. Insert the following code snippet between advertise() and publish()

      while (0 == chatter_pub.getNumSubscribers()) {
          ROS_INFO("Waiting for subscribers to connect");
          ros::Duration(0.1).sleep();
      }
  1. Call ros::shutdown() before your program exits, to flush the published message to the subscriber
edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2018-04-04 18:35:16 -0500

Seen: 2,930 times

Last updated: Apr 05 '18