ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | Q&A answers.ros.org
Ask Your Question
0

ROS2 Windows node always hang

asked 2022-06-07 05:28:24 -0500

alberto gravatar image

updated 2022-06-23 08:54:01 -0500

Hi all,

These days I'm having a very strange problem with my application. Since last week it was all good, but this monday I noticed that I can't run more that 2/3 nodes (or ros cli tools) simultaneously. For example: I run 2 nodes, then I do ros2 node list and the cmd doesn't give me any answer and keeps pending. Moreover I can't close it with ctrl+c, but I have to end the task from the task manager. The same goes for ros2 topic list and echo.

Another example is this: I run the first node, then ros2 topic echo .. and I can see the data, then I run the second node and it gets stuck pending.

Just to add some informations, the 2 main nodes work properly togheter, but to develop my application I need to add another node, that with this "state" will never run.

Have you ever faced a similar error? Could it be something about my code?

EDIT: more info

Right now I still have the same problem, but maybe this time I found the cause. I'm writing a node that receive some image from Coppelia, and send it over with a topic. This morning and yesterday I was trying to fix it, because I wasn't able to see the image, but after some "runs" the nodes stopped working. And I still have the same problem as before (third node pending even with the tutorial code).

For the image I set some buffers:

simxUChar *eyeBuffers[2];
dim = resolution[0] * resolution[1] * sizeof(simxUChar);
eyeBuffers[0] = new simxUChar[dim];
eyeBuffers[1] = new simxUChar[dim];

and then set the sensor_msgs.Image msg:

int resolution[2] = { 320, 240 };    
for (int i = 0; i < 2; i++) {
                simxGetVisionSensorImage(clientID, eyes[i], resolution, &eyeBuffers[i], 0, simx_opmode_buffer);
                arr[i] = cv::Mat(resolution[1], resolution[0], CV_8UC3, eyeBuffers[i]);
                flip(arr[i], arr_flip[i], 1);
}
.. and then cv::Bridge to fill the msg.

I don't have any build errors and I can correctly send and receive the images.

After a few days, I've done some more test and now I can run only 1 node, all the others keep hanging. I thought it was something related to dynamic memory allocation, but I delete[] them.

What I can't understand is why this leads to ROS2 malfunction. I have the impression that something is going to saturate inside ros, because the problem only came out after a few runs (around 20), and before it was all fine and I could run as many nodes as I wanted.

edit retag flag offensive close merge delete

Comments

1

Something you could check to see if it's about your code. Try and run 5-6 default nodes. Basically run a couple of the tutorial code at the same time. If that causes the same problem it's not your code.

Joe28965 gravatar image Joe28965  ( 2022-06-07 06:18:17 -0500 )edit

Thank you @Joe29965 , I've tried running the simple talker and listener of the tutorial. Sadly I can reproduce the same exact errors :(

I have also reinstalled ROS2 and created a new ws (still keeping the old one). Sourcing and building with the new setup.bat, but I still face the bug with the tutorial code.

alberto gravatar image alberto  ( 2022-06-07 07:03:00 -0500 )edit

Honestly, I do not know what it might be. I have only ever used Ubuntu to run ROS.

Joe28965 gravatar image Joe28965  ( 2022-06-07 07:48:16 -0500 )edit
1

Could you please stop bumping your message with trivial edits?

gvdhoorn gravatar image gvdhoorn  ( 2022-06-08 08:37:44 -0500 )edit

Yes sorry, I wanted to find a more suitable title :P. Anyways, using WinDgb and scanning the working and not working nodes, I found out some differences. In particular the node that hangs doesn't "load" these .dll:

ModLoad: .. C:\dev_new\ros2-windows\bin\rmw_dds_common__rosidl_typesupport_fastrtps_cpp.dll ModLoad: .. C:\dev_new\ros2-windows\bin\rcl_interfaces__rosidl_typesupport_fastrtps_c.dll
ModLoad: .. C:\dev_new\ros2-windows\bin\builtin_interfaces__rosidl_typesupport_fastrtps_c.dll
ModLoad: .. C:\dev_new\ros2-windows\bin\rcl_interfaces__rosidl_typesupport_fastrtps_cpp.dll
ModLoad: .. C:\dev_new\ros2-windows\bin\builtin_interfaces__rosidl_typesupport_fastrtps_cpp.dll
ModLoad: .. C:\dev_new\ros2-windows\bin\std_msgs__rosidl_typesupport_fastrtps_cpp.dll

But obviously the cmd doesn't return any errors. Can this cause my bug?

alberto gravatar image alberto  ( 2022-06-08 09:26:21 -0500 )edit

No idea. We also don't yet know whether this is "a bug".

You mentioned you've "reinstalled ROS2": how exactly?

And did you also clean out your workspace and rebuild it completely after reinstalling ROS?

gvdhoorn gravatar image gvdhoorn  ( 2022-06-09 01:55:15 -0500 )edit

I wanted to keep the old underlay and overlay, so to reinstall ros2 I did this: I went here, I downloaded again ros2-foxy-20220208-windows-release-amd64.zip and extracted it in a new folder (underlay). Then I created a new dev_ws (overlay), I put cpp_pubsub in the src and build it. But this didn't change my problem. So I tried to re-download the other components (but not removing the old ones). I re-download Python, VC++, OpenSLL and catkin_pkg, cryptography etc using the command suggested in the link (I didn't re-download choco). Obviously they were already installed, so I think I didn't change anything, in fact the cmd always said something like: "Already up to date".

alberto gravatar image alberto  ( 2022-06-09 03:35:24 -0500 )edit

Adding some info. I have used ros2 run cpp_pubsub listener --ros-args --log-level debug and: On not working node I only get these 3 lines:

-[DEBUG] [1654776889.921857300] [rclcpp]: signal handler installed

-[DEBUG] [1654776889.922523400] [rcl]: Initializing node 'minimal_subscriber' in namespace ''

-[DEBUG] [1654776889.922739800] [rcl]: Using domain ID of '0'

On a working node, the next lines should be: -[DEBUG] [1654776963.071939700] [rcl]: Initializing publisher for topic name '/rosout'

-[DEBUG] [1654776963.072044000] [rcl]: Expanded topic name '/rosout'

-[DEBUG] [1654776963.074039400] [rcl]: Publisher initialized

-[DEBUG] [1654776963.074124300] [rcl]: Node initialized

etc.

alberto gravatar image alberto  ( 2022-06-09 07:17:24 -0500 )edit

2 Answers

Sort by ยป oldest newest most voted
0

answered 2022-06-13 03:59:55 -0500

alberto gravatar image

updated 2022-06-23 08:53:06 -0500

---> LAST EDIT: solution<---

I've found that changing the RMW implementation does the work. I was using the default one (rmw_fastrtps_cpp), and I changed it to rmw_cyclonedds_cpp. Now, without reinstalling anything, ROS seems to work normally.

Temporary solution:

I decided to reset my PC and after I installed all the things, it seems the problem is gone. I guess something gets corrupted between fastrtps and Windows. Anyways, using the same code with rmw_fastrtps_cpp will lead you to the same situation as before.

edit flag offensive delete link more

Comments

1

My comment was hidden under see more comments but I am also experiencing this Windows. I'll add that when experiencing the issue on Windows I switched over to a separate machine running Ubuntu and was able to run ros2 topic list, subscribe, etc for nodes running on that Windows machine. So the issue was occurring only on the "host" Windows machine but external connections were fine.

I don't know where to begin to diagnose it.

genevanmeter gravatar image genevanmeter  ( 2022-06-18 09:16:46 -0500 )edit

Hi, may I ask you what type of code operation are you doing? Are you using dynamic allocation like me? Are you sending sensor_msgs.Image[n] ? Because from what I've experienced, seems that there's some code (probably bad written by me) that causes all the problem, and then affects all the ros system. In fact using a different PC or a making fresh windows installation (note that only fresh ros2 installation doesn't resolve the problem), I don't have any problems running even 8 nodes simultaneously of the cpp_pubsub tutorial. However, I managed to corrupt again the "fresh windows installation" with the code above. My guess is that ROS2 on Windows miss some preventions for this type of situations.

alberto gravatar image alberto  ( 2022-06-18 10:01:14 -0500 )edit

New to this but as I understand it yes. I am creating/destroying topics during runtime via service calls. The service call takes an int which is appended to the topic name. The msg format also contains a variable length array which is also dynamic so the total size of the msg is changing within any given topic.

genevanmeter gravatar image genevanmeter  ( 2022-06-19 13:07:24 -0500 )edit
0

answered 2022-07-15 00:14:11 -0500

When using rmw_fastrtps_cpp, check if deleting the files under C:\ProgramData\eprosima\fastrtps_interprocess solves this.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

2 followers

Stats

Asked: 2022-06-07 05:28:24 -0500

Seen: 282 times

Last updated: Jul 15