No tf recieved inside docker
Hello,
I am trying to run Apollo 3.0 with the carla simulator with a ros bridge.
Currently I am stuck on the following issue:
In order to run the perception module apollo needs to recieve certain tf's. These tf are published by static transform publishers. However the apollo perception module throws the following error:
E0908 23:30:11.273775 6725 transform_input.cc:44] Cannot transform frame: novatel to frame velodyne64 , err: . canTransform returned after 10 timeout was 10.. Frames: Frame velodyne16 exists with parent novatel.
Frame radar exists with parent short_camera.
Frame short_camera exists with parent velodyne64.
Frame radar_front exists with parent velodyne64.
Frame velodyne64 exists with parent novatel.
Frame long_camera exists with parent short_camera.
Frame localization exists with parent world.
I have had a closer look into this issue and saw that while the static transform publisher nodes and the tf topics are visible (with rosnode info ... and rostopic info ...) inside the docker environment, no tf data is received. I tried to run tf viewframes, tfecho and tf_monitor inside and outside the docker. Outside the docker all the tf data was received correctly while inside the docker no tf data were received.
I am starting the docker environment like so:
${DOCKER_CMD} run -it \
-d \
--privileged \
--name apollo_dev \
${MAP_VOLUME_CONF} \
--volumes-from ${LOCALIZATION_VOLUME} \
--volumes-from ${YOLO3D_VOLUME} \
-e ROS_MASTER_URI=http://172.17.0.1:11311 \
-e ROS_IP=172.17.0.1 \
-e DISPLAY=$display \
-e DOCKER_USER=$USER \
-e USER=$USER \
-e DOCKER_USER_ID=$USER_ID \
-e DOCKER_GRP="$GRP" \
-e DOCKER_GRP_ID=$GRP_ID \
-e DOCKER_IMG=$IMG \
${EXTRA_VOLUMES} \
$(local_volumes) \
--net host \
-w /apollo \
--add-host in_dev_docker:127.0.0.1 \
--add-host ${LOCAL_HOST}:127.0.0.1 \
--hostname in_dev_docker \
--shm-size 2G \
--pid=host \
-v /dev/null:/dev/raw1394 \
$IMG \
/bin/bash
At this point I really have no idea why i cant receive any tf data inside the docker, while being able to see publishing nodes and the topics. Does anybody have an idea what the source of this issue is?
If you need any additional information, just let me know.
Thanks in advance!
Asked by udeto on 2019-09-08 16:57:58 UTC
Comments
Can you
ping
the host from inside the docker container by name? If not, that could be the problem (ie: nodes running on host report unresolvable hostname to nodes-in-container -> no traffic).Asked by gvdhoorn on 2019-09-09 02:09:58 UTC
What exactly do by "ping the host by name"? Do you mean the roscore?
Asked by udeto on 2019-09-09 05:09:10 UTC
"the host" == the machine running
docker
.You set
ROS_MASTER_URI=http://172.17.0.1:11311
, which will work for the nodes inside the container, as the master will probably bind on all IPs or Docker routing will take care of reaching the hosts IP from within the container.But nodes running outside the container will receive connection requests from nodes inside the container, and they may be returning a hostname that nodes inside the container cannot resolve.
By trying to
ping
the "host machine" (ie: the one runningdocker
), you could get an indication for whether DNS is working (sufficiently) for nodes inside your container to be able to resolve the hostname that nodes outside the container may be returning.ROS nodes connect directly, not through the master. So nodes must be able to resolve each others hostnames or only use IP addresses.
You're using
--net host
, so it may be that this doesn't matter, but I'd check it anyway.Asked by gvdhoorn on 2019-09-09 05:13:53 UTC
Ok I understand, thank you! I checked and I am able to ping the machine running the docker environment by its hostname from inside the docker. So I figure the nodes should be able to resolve each others hostnames, right?
Asked by udeto on 2019-09-09 05:31:26 UTC
There is a good chance it should work, yes, but I've seen stranger things.
You could test whether setting
ROS_IP
outside your Docker container makes any difference. Set it to the IP of the PC runningdocker
.Docker containers can essentially be considered "other hosts" when it comes to networking. So all the problems with DNS, routes and discoverability and communication that come up with multiple hosts can also affect Docker containers.
Running
--privileged
and with--net=host
makes things somewhat easier, but there's still enough that may not work.Asked by gvdhoorn on 2019-09-09 05:34:48 UTC
I tried setting the
ROS_IP
to the IP adress to the adress I got, when I ranhostname -I
outside of the docker. But then I wasn't able to contact any nodes inside the docker container (i.e. nodes were not visible when running rqt_graph outside the docker container). Therefore I changed theROS_MASTER_URI
as well, and tried to run a roscore outside of the docker, but that didnt work either, as no ros related commands inside the docker lead to any output.So I now changed both back to the IP address of the docker environment. I additionally checked weather I am able to ping the docker environment from outside the docker using its hostname. Turns out that while I am able to ping the host running the docker from inside the docker, I am not able to ping the docker environment from outside the docker. Is that how it should be, or may that be the source of the issue?
Asked by udeto on 2019-09-09 11:07:48 UTC
just to make sure: you're not setting the
-e ROS_IP
of your container to the IP of the host. Are you?That would not be what I meant. If you did, then set
ROS_IP
in the environment of the host (so not the container) to the IP of the host. Leave theROS_IP
of the Docker container as you already show it.Asked by gvdhoorn on 2019-09-09 11:42:30 UTC
Well, yes I did do that :D
Now I tried to set ROS_IP of the environment to the host IP address with the command:
Is that what you meant?
Asked by udeto on 2019-09-09 12:19:37 UTC
If
192.168.178.27
is the IP of the machine runningdocker
, then yes, that is what I meant.Be sure to set it in all shells that you start your container from (or add it to your
.bashrc
) and in all shells that start ROS nodes (when not starting them from Docker containers).But again: this is only a test. It could be the actual cause is something completely different.
Asked by gvdhoorn on 2019-09-09 12:25:16 UTC
I tested setting the ROS_IP to the host but that didnt change anything.
Asked by udeto on 2019-09-10 16:43:00 UTC
Next I tried to publish the tf data as a static transform publisher - node via a launchfile inside the docker. For example:
After launching the node I can see the tf by running
rosrun tf view_frames
, however when I runrosrun tf tf_echo /novatel /velodyne64
inside the docker nothing happens. There is no error of any kind and no data. It I run tf_echo outside the docker I recieve the published tf data.So to me that means, that even if I publish the tf inside the docker I am not able to read that published data, even though I am able to see the tf connection. I really do not understand whats going on here, I thought I was dealing with an issue of communication between the docker and the host machine, but if that would be the case, everything should work, when I publish and subscribe inside the docker. Especially as the roscore is running there as well
Asked by udeto on 2019-09-10 16:43:54 UTC
Let's take a step back: can you subscribe to any topics in a docker container and receive messages (published on the outside)?
Have you tried to run
rostopic pub
outside the container androstopic echo
inside of one and then receive messages? And the other way around?It's best to approach this sort of thing step-by-step instead of 'randomly' focusing on one specific aspect or node (in this case: treating TF as if it's special).
Asked by gvdhoorn on 2019-09-11 02:20:40 UTC
So you set
-e ROS_IP=172.17.0.1
, but have you made sure your container is given this IP address? Afaik.1
is given to the host, and the IP is used to reach the host from within a container.Containers get other addresses from the same range, but in any case they are handed out by DHCP (sort of), so if the IP changes, you need to update
-e ROS_IP=172.17.0.1
as well.I had sort-of assumed that you had this covered, but perhaps this is what is going wrong.
Asked by gvdhoorn on 2019-09-11 02:23:21 UTC