How to avoid/manage process/thread starvation?
Short version: how to keep one (cpu bound) node from starving all other nodes without explicit sleep()
?
I have a system in which one node (in particular) is a cpu hog and also publishes a lot of messages. This node is a source node, with the rest of the nodes waiting and only acting on messages they receive (from the cpu hog, or other nodes).
What I'd like to do is give this source node a lower priority, so that it only gets the cpu if the other nodes are idle (or at least, not very busy). The idea is that the source will still publish ('as fast as possible'), but without getting in the way of the processing nodes.
AFAIK, nice
should do this on Linux, and can be used as a launch-prefix
to run a node at a specific nice level, but it "doesn't seem to work" in my case. I've verified the nice level with ps axl | grep -i nodename
, and it is at 19 (with PRIO
at 39), but the other nodes (at NI
0 and PRIO
20) are still starved of cpu and are loosing many messsages (note that higher PRIO
values are less likely to be given cpu time).
I'm wondering if nice
is the proper way to achieve the described behaviour, or should I use something else?
PS: there is no sleep(..)
anywhere in the source node, it's essentially an infinite 'busy' while. Sleeping 'solves' the described issue, but there is no explicit need for periodicity and also seems to incur overhead (ie less messages published than possible). For obvious reasons a request-reply (even if a pub-sub acknowledge) system is undesirable.
Does maybe a usleep(1); help?
Well 'yes': sleeping allows other processes/tasks to get some cpu time, but I'd like to avoid it, as it puts an (artificial) upper limit on the maximum nr of messages published, even if there are no other nodes using cpu. Thanks for your comment though.
The usleep(1); hack shouldn't really be influencing the number of messages much as it is only one microsecond or better it gives up control to other processes for a very short time. If that works, it might be worth a try with no other processes running. Although I do remember using this only for inner process "nicing".