When exactly is a connect callback of a publisher called?

asked 2019-10-21 08:52:02 -0500

100 ●7 ●11 ●15

updated 2019-10-23 06:13:46 -0500

Hi all,

I would like to know when precisely the connect_cb is being invoked, specified in

Publisher ros::NodeHandle::advertise(const std::string &    topic,
    uint32_t    queue_size,
    const SubscriberStatusCallback &    connect_cb,
    const SubscriberStatusCallback &    disconnect_cb = SubscriberStatusCallback(),
    const VoidConstPtr &    tracked_object = VoidConstPtr(),
    bool    latch = false 
)

(see http://docs.ros.org/melodic/api/roscp...). More specifically, I know that if I do a NodeHandle:advertise(), the underlying (TCP) connection is not established directly, i. e., upon returning from advertise(), subscribers may not already be connected even if they are already "there" (= known to the master) since this clearly also takes time. So, publishing messages right after the advertise()can causes messages not being received by all "relevant nodes" at that time. (IIRC, this issue has been discussed frequently here on ros.answers.org and one nasty workaround is a delay after the advertise.) Assume I know about the existance of another node that will eventually/finally subscribe to the advertised topic. (Please, I don't want to discuss here that this may not be the idea of a decoupled communication concept, i.e., what pub/sub was designed for.)

TL;DR: Is it safe to assume that once the above mentioned connect_cb is triggered, the underlying TCP connection _must_ have been established so that publishing messages using the returned ros::Publisher object will definitely deliver the message to the node denoted by ros::SingleSubscriberPublisher::getSubscriberName() (= argument of the callback)?

I was already grep'ping through ROS' source code but was not able to find the location where the callbacks are being invoked.

Thanks you very much!

edit retag flag offensive close merge delete

add a comment

2 Answers

Sort by » oldest newest most voted

answered 2019-10-23 06:56:45 -0500

CodeFinder

100 ●7 ●11 ●15

updated 2019-10-23 06:58:09 -0500

TL;DR: the connect_cb is called AFTER the underlying TCP connection was created. So once the callback is triggered, a publisher knows for sure that messages being published will be received by the node denoted by the callback-provided ros::SingleSubscriberPublisher object. :-)

I was following a similar path through the ROS code but was unsure if the peerConnect() function really triggers the callback. After a 2nd thought/look, also due to your efforts, it seems that it "just" adds them to the callback queue so that some idle spinning thread will catch it up later. And it seems (according to your findings), that this is done after the connection is established. Indeed, the onConnectionHeaderReceived() callback is invoked upon tcprosAcceptConnection() which seems like an evidance for my assumption. Also according to my further investigations, this cannot fail or cause the just created TCP connection to fail anymore. (As a sidenote, the ros::SingleSubscriberPublisher object provided to the callback also underpins the fact since one can send messages to the other node that just subscribed.)

I know that the connect_cb is invoked once for every new subscription. Its exactly what I need. :-)

Now, regarding your question wrt. the underlying application architecture that requires this kind of callback: I am working on / doing research on a local planning algorithm that entirely prevents collisions given some reasonable contraints. This requires that every robot knows about all others in the system. I am using a topic to allow robots to discover others but once discovered, I require somewhat (rather tightly) coupled communication between the robots in order to guarantee safety (collision free motions). It is somewhat in contrast to all these DWA, VFH, etc. planners out there. ;-) And the decoupled nature of pub/sub made it difficult to send messages while ensuring that a designated receiver is actually receiving it. (Does this answer your question?)

Ah and yes, sure: state-based conditions are always better (= more precise) than time-based ones. That's why I am intending to use these callbacks. The problem with Publisher::getNumSubscribers() is that some (e.g.) rostopic echo ...also accounts for a subscriber and somewhat "disorts" the actual number of subscribers. :-)

Thanks again for your super fast and detailed reply!

edit flag offensive delete link

Comments

Also according to my further investigations, this cannot fail or cause the just created TCP connection to fail anymore. (As a sidenote, the ros::SingleSubscriberPublisher object provided to the callback also underpins the fact since one can send messages to the other node that just subscribed.)

it can certainly fail. This is all based on TCP/IP, so if the remote side goes off-line, becomes unresponsive or for some other reason does not keep up its end of the connection, all subsequent writes (and reads) will fail.

So pedantically I would say the answer to your question should be "no" (we cannot be certain there is a connection at all) . But for practical purposes you can probably assume that there is one after this callback is invoked.

gvdhoorn ( 2019-10-23 07:04:41 -0500 )edit

Your post is also not really an answer btw. More a further clarification of your question combined with responses to my answer + comment.

gvdhoorn ( 2019-10-23 07:07:23 -0500 )edit

Sure, connections can always fail. But this should cause the disconnect_cb to be called (may be another topic on its own). Pedantically, no, okay. But since my disconnect_cb will catch the case when the connection fails, it shouldn't be an issue. But generally, this is a problem of all coupled systems (relying on others causes somewhat "dangerous" dependencies but IMHO, this is inevitable anyway.)

I was asking whether the connect_cb is called after the connection was established. One phase of establishing a TCP connection is listening for it, and finally (if there's one) accepting it. So, tcprosAcceptConnection() is the correct location to conclude that it's true which answers my question (disregarding the fact that it can fail at any time anyway). Accepting my answer was just to let others know that they will find this information in this thread (and its worth reading). If you prefer to ...(more)

CodeFinder ( 2019-10-23 07:20:10 -0500 )edit

add a comment

answered 2019-10-21 09:54:44 -0500

gvdhoorn
86574 ●283 ●1432 ●1054 http://cor.tudelft.nl/

updated 2019-10-21 10:51:46 -0500

I'm pretty sure those callbacks are in the end called in Publication::peerConnect(..) and Publication::peerDisconnect(..). Those are in turn called here (in Publication::addSubscriberLink(..)) and here (in Publication::removeSubscriberLink(..)). The callbacks you pass to NodeHandle::advertise<>(..) are transferred to the actual Publisher object in Publisher::Publisher(..).

Publication::addSubscriberLink(..) gets called in TransportSubscriberLink::handleHeader(..), which is an event handler that gets invoked (via ConnectionManager::onConnectionHeaderReceived(..)) whenever a new connection is negotiated between a Subscriber and a Publisher (directly, so the master is already out of the picture here).

I've stopped there, you should be able to follow it from there yourself.

Assume I know about the existance of another node that will eventually/finally subscribe to the advertised topic. [..] Please, I don't want to discuss here that this may not be the idea of a decoupled communication concept, i.e., what pub/sub was designed for.)

you may not want to, but I'd still be interested to hear what application architecture necessitates using those callbacks. I've not seen them used very often. Nodes expecting other nodes to "be present" is certainly an anti-pattern (see “Runtime-configurable” Parameters in Autoware.ai on ROS Discourse for a good example of this in Autoware.ai (and see @Geoff's response here)).

Also note (but you may already be aware) that the connect_cb will be called for every new Subscriber. Not just the first.

So, publishing messages right after the advertise()can causes messages not being received by all "relevant nodes" at that time. (IIRC, this issue has been discussed frequently here on ros.answers.org and one nasty workaround is a delay after the advertise.)

Actually, using Publisher::getNumSubscribers() is best practice there, not a delay. State-based is always better than time-based.

Finally:

Is it safe to assume that once the above mentioned connect_cb is triggered, the underlying TCP connection _must_ have been established so that publishing messages using the returned ros::Publisher object will definitely deliver the message to the node denoted by ros::SingleSubscriberPublisher::getSubscriberName() (= argument of the callback)?

Seeing as the callbacks appear to be only called at the very end of TransportSubscriberLink::handleHeader(..), your assumption seems like one that could be true. The code suggests that after the connection is made and the connection header is exchanged, all registered connect_cb callbacks are called. I'm not sure whether this can fail sufficiently for the connection not to be considered "established" though.

edit flag offensive delete link

Comments

PS: as I'm not the author of any of the code involved here, I cannot say with authority whether or not my analysis here is correct. But this should get you at least very close to figuring out things for yourself.

gvdhoorn ( 2019-10-21 10:22:10 -0500 )edit

Thank you VERY MUCH for your efforts! Really appreciated! :-)

I will reply with an additional/separate answer since these comments are too restricted in terms of max. character count.

CodeFinder ( 2019-10-23 06:30:29 -0500 )edit

add a comment

When exactly is a connect callback of a publisher called?

2 Answers

Comments

Comments

Question Tools

Stats

Related questions

When exactly is a connect callback of a publisher called? edit

2 Answers

Comments

Comments

Question Tools

Stats

Related questions

When exactly is a connect callback of a publisher called?