ROS_INFO "Deadlocks"
Hello, I am developing a roscpp multi-threaded application and I am getting one error that I never had:
Two different threads get locked to each other at ROS_INFO calls.
- One of them is waiting in a boost::unique_lock inside ros::console::print(ros::console::FiterBase*....
- The other one, is not wating in a lock. It is waiting in a simple write operation inside ros::console::print(ros::console::FiterBase*..... In fact it is blocked by the operative system at the write function in syscall-template.S.
So technically it is not a "dead lock", but for some reason the operative system does not finish the write operation.
Anyone of you have ever had this problem? Do you know how to solve it?
Can it be something related with the buffer size? May I not be calling with enough frequency ros::spinOnce()?
Edit: The method I follow to say the app is "deadlocked" is the following:
First I check that the node does not behave externally as usual (via topics and services). It is frozen.
Then I attach the debugger to the running application and I see how the debugger is stopped in a system write operation (ROS_INFO),
I later, check the state of the rest of the threads of the system. One of them is located in a ROSINFO operation too, but waiting in the uniquelock
I try to do step over in both, but, they both freeze waiting to the SO response
These are the threads I have:
- ROS_INFO call in the main thread, singleThreadSpinner, in this case attending to a service call guarded by a lock.
- ros::ROSOutAppender
- ros::InternalCallbackThreadFunc-> ros::callAvailable -ros::XMLRPCManager::serverThreadFunc
- second ROSINFO thread. That thread runs in a loop and uses boost::thread- I guess this should be implemented using a timer, but it is a third party embedded library (hectormapping). It does not use a spinOnce(), only r.sleep() and ros::ok() (ref )
- ros::TimerManager (the application also uses a timer)
- the tf thread
- the ros::PollManager thread
To me looks like the second ROS_INFO thread should go on a timer. But it is still weird because I think it has been working like this for a while... (but now I am not sure)
Asked by Pablo Iñigo Blasco on 2017-01-25 11:33:56 UTC
Answers
I'd suggest looking into why the write is not finishing. If that's actually blocking, whatever you do at the higher level is not going to get very far.
Are you sure it's blocking or is it not just getting overloaded and every time you sample it's in the the same write call but a different instantion of it? It's quite easy to have the console output be the limiting factor for things turning over if you have a lot of output.
If you can provide a small self contained example that would be very helpful in figuring out what's going on.
Asked by tfoote on 2017-01-25 21:19:02 UTC
Comments
Hello tfoote. Thanks for your response, I have answered in a new response because it is enough large.
Asked by Pablo Iñigo Blasco on 2017-01-26 05:51:27 UTC
Comments
Please avoid posting updates as answers, this is not a forum. I've moved the contents of your post to the OP, but please keep it mind for next time.
Asked by gvdhoorn on 2017-01-26 06:29:50 UTC
I agree. Thanks.
Asked by Pablo Iñigo Blasco on 2017-01-27 04:09:46 UTC