Efficiency issues when separating processing pipline into nodes?

asked 2016-04-12 13:03:37 -0500

Alec gravatar image

I am porting an existing codebase to ROS and am running into an efficiency issue that seems endemic to ROS's architecture, and looking for the right way to address it.

My robot is driven by computer vision. When an image is captured, a series of processing steps is run in sequence, each building off of the data computed previously (e.g., "detect features", "compute location", "determine motion"), ending with motors being actuated. It is essential to minimize the delay between an image being captured and the robot driving its motors in reaction to that image.

In the original codebase, processing occurred in a single thread running a loop: it grabbed the latest available image, then stepped through the processing chain. If it couldn't process at the full framerate of the camera, frames would be missed, but always the processing would start with the latest available image.

In ROS, guided by documentation, I have split every image processing step into a separate nodelet. Consider now a case in which I have two processing phases, Phase 1 and Phase 2, each with its own nodelet. Now imagine that the camera emits a frame every 10ms, Phase 1 takes 20ms, and Phase 2 takes 40ms.

The issues I see:

  • Half of all the data Phase 1 produces will never get consumed. This wastes a lot of CPU and slows everything down.
  • If each nodelet does its processing in callbacks (even using multiple callback-processing threads), the buffers of images and of Phase 1 results will reach their capacity. Given the way buffers are implemented, each new frame that is processed will then be a stale, old frame from the back of the buffer, rather than the latest available one.
  • If each nodelet instead has its own processing thread, there will be a lot of code to maintain thread safety of every callback (also thread switching overhead, though this will probably be minimal) that all seems unnecessary given the straightforward processing pipeline that is desired. It also doesn't address the first issue.

In reality there are actually more phases, multiplying the complexity.

It seems to me that this processing pipeline really wants to be a single nodelet with a single thread. But this seems to go against ROS's design philosophy. And if I implement it that way, of course I lose access to the wonderful ROSsian ability to run and debug each node(let) independently, playing back intermediate data streams from bag files, etc..

So my current plan is to implement everything as separate nodelets for development, but deploy using a combo nodelet that ties the whole processing chain together, reusing inner implementation classes from the separated nodelets.

Does this sound reasonable? It seems like a common issue; are there other ways people have tried? Thanks in advance!

edit retag flag offensive close merge delete

Comments

Some quick comments: 1) would using a buffer of length 1 avoid the stale img problem? 2) make sure you're publishing (and subscribing) ..ConstPtr msgs, otherwise you'll not get zero-copy behaviour. 3) locking + nodelets is not a good idea. It's a msg-passing system, even if it's single-process.

gvdhoorn gravatar image gvdhoorn  ( 2016-04-13 02:27:40 -0500 )edit

sorry, just noticed this comment. (1) Apparently not, see https://github.com/ros/ros_comm/issue... (2) yes, I am (3) I'm not sure what locking you're referring to. But even if (1) were possible, it still doesn't solve the first bullet point above.

Alec gravatar image Alec  ( 2016-04-19 15:24:33 -0500 )edit