ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

How to avoid/manage process/thread starvation?

asked 2012-07-29 23:58:17 -0500

ipso gravatar image

updated 2012-07-30 03:08:21 -0500

Short version: how to keep one (cpu bound) node from starving all other nodes without explicit sleep()?

I have a system in which one node (in particular) is a cpu hog and also publishes a lot of messages. This node is a source node, with the rest of the nodes waiting and only acting on messages they receive (from the cpu hog, or other nodes).

What I'd like to do is give this source node a lower priority, so that it only gets the cpu if the other nodes are idle (or at least, not very busy). The idea is that the source will still publish ('as fast as possible'), but without getting in the way of the processing nodes.

AFAIK, nice should do this on Linux, and can be used as a launch-prefix to run a node at a specific nice level, but it "doesn't seem to work" in my case. I've verified the nice level with ps axl | grep -i nodename, and it is at 19 (with PRIO at 39), but the other nodes (at NI 0 and PRIO 20) are still starved of cpu and are loosing many messsages (note that higher PRIO values are less likely to be given cpu time).

I'm wondering if nice is the proper way to achieve the described behaviour, or should I use something else?

PS: there is no sleep(..) anywhere in the source node, it's essentially an infinite 'busy' while. Sleeping 'solves' the described issue, but there is no explicit need for periodicity and also seems to incur overhead (ie less messages published than possible). For obvious reasons a request-reply (even if a pub-sub acknowledge) system is undesirable.

edit retag flag offensive close merge delete


Does maybe a usleep(1); help?

dornhege gravatar image dornhege  ( 2012-07-30 02:01:40 -0500 )edit

Well 'yes': sleeping allows other processes/tasks to get some cpu time, but I'd like to avoid it, as it puts an (artificial) upper limit on the maximum nr of messages published, even if there are no other nodes using cpu. Thanks for your comment though.

ipso gravatar image ipso  ( 2012-07-30 02:17:45 -0500 )edit

The usleep(1); hack shouldn't really be influencing the number of messages much as it is only one microsecond or better it gives up control to other processes for a very short time. If that works, it might be worth a try with no other processes running. Although I do remember using this only for inner process "nicing".

dornhege gravatar image dornhege  ( 2012-07-30 03:08:50 -0500 )edit

2 Answers

Sort by ยป oldest newest most voted

answered 2012-07-30 08:25:26 -0500

Lorenz gravatar image

You can make your process explicitly relinquish its cpu time by calling sched_yield, defined in the header file sched.h. It just moves your process to the end of the scheduling queue for the corresponding priority.

I guess the reason why renicing the process doesn't have the expected effect is that the scheduler is only called after your thread's time is over, i.e. processes with a higher priority are called only after your thread either performs a syscall or the OS calls the scheduler through a timer interrupt. That's really just a guess though. Using sched_yield should work, even when you don't change the priority.

edit flag offensive delete link more



Yes, I've tried that, but using boost::thread::yield() in the while loop. I need to do some more testing I guess. Yielding helps to a degree, but not as much as I'd like.

ipso gravatar image ipso  ( 2012-07-30 09:15:29 -0500 )edit

answered 2012-07-30 12:42:15 -0500

PerkinsJames gravatar image

I am surprised that the sched_yield doesnt work. Maybe you could rewrite your code so that its less threaded and then that code you don't want to run REALLY wont until your processing sections are complete.

edit flag offensive delete link more


It does work, just not as much as I'd like (afaict now). And as these are separate nodes (so processes) and not threads, I can't really do anything about their threading. Also: I'd like this to work in the general case, with any cpu bound node, not just my own code.

ipso gravatar image ipso  ( 2012-07-30 20:48:05 -0500 )edit

Question Tools


Asked: 2012-07-29 23:58:17 -0500

Seen: 2,463 times

Last updated: Jul 30 '12