Ask Your Question
0

Pre-allocation of messages for nodelets

asked 2020-05-25 12:07:37 -0500

I'm working on an application where I need to remove all memory allocations from the critical path that affects the response time of my system. I'm using nodelets with zero-copy message passing. I noticed that there is the concept of borrowing messages from the middlewhare in ROS2. I assume this means you can have the system pre-allocate a buffer on messages on startup that can be used without allocations.

I'm looking for something similar in ROS1. Has anyone implemented this already? I'd like to avoid re-inventing the wheel.

edit retag flag offensive close merge delete

Comments

Conceptual question/observation: nodelets only support zero-copy message passing if you publish const pointers (we call them messages but it's really pointers). This means all subscribers share the same message. When would your publisher know when it's safe to change the message it has published (ie: for the next iteration)? The standard workflow for a nodelet would be:

  1. allocate new message
  2. set its fields
  3. publish
  4. goto 1

Step 1 is necessary as you can never know how long one of your downstream consumers holds on to the shared ptr.

Second:

I'm working on an application where I need to remove all memory allocations from the critical path that affects the response time of my system

is memory allocation such a bottleneck on your machine? What architecture is this? If you're doing this to make this "real-time safe", then there are bigger problems in roscpp than a ...(more)

gvdhoorn gravatar image gvdhoorn  ( 2020-05-25 12:16:03 -0500 )edit

Thank you for your reply. The issue is that I was using that workflow and profiled the loop rate of the system and over very long runs every so often the allocate new message step takes a long time to return and the system acts weird as a result.

The zero-copy part is less important to me than consistent response time (I have plenty of overhead for a few copies). I was able to do this by implementing my system as just separate threads that use a boost::lockfree:spsc_queue to communicate. This solves the problem of memory allocations as the input message is created on the stack before the loop starts, filled out, and then copied into the queue. This has the obvious limitation that if I fill the queue faster than I empty it I will lose messages. I'm ok with that as I can control my ...(more)

tyler-picknik gravatar image tyler-picknik  ( 2020-05-25 12:26:49 -0500 )edit

The issue is that I was using that workflow and profiled the loop rate of the system and over very long runs every so often the allocate new message step takes a long time to return and the system acts weird as a result.

wouldn't figuring out why this happens be the correct way to solve this (memory leak, or perhaps fragmentation?), instead of working around it by trying to change the system?

And from your description this sounds like what you did for jog_arm, is that correct?

gvdhoorn gravatar image gvdhoorn  ( 2020-05-25 12:28:51 -0500 )edit

Yes, this is my attempt at improving jog_arm and the determinism of the loop rates. I first tried using nodelets and ros messaging for message passing between the threads and when I profiled it I discovered the weird performance in the memory allocation. I'm just testing on my laptop which is some old-ish i5.

tyler-picknik gravatar image tyler-picknik  ( 2020-05-25 12:42:23 -0500 )edit

This is also a conceptual question. If I could get zero-copy and pre-allocation of a message pool it would be really nice to use that in many other places.

tyler-picknik gravatar image tyler-picknik  ( 2020-05-25 12:46:42 -0500 )edit
1

I'm not sure it'd be possible to use a different memory allocator with the code generated by gencpp et al. Perhaps placement new could be used.

The default allocator in C++ iirc does not deal very well with memory fragmentation, which you will run into with small-ish messages and frequent (de)allocation. There are allocators available which are much faster and better equipped to deal with that.

Placement new seems like it could approximate your memory pool idea (together with a pre-allocated block of storage and some accounting). Perhaps Boost's memory pool does this automatically (it's been too long, I don't remember).

Note: nodelets do not really exchange any messages. It's all pointers, and messages don't even leave the process. In the end it's threads + some shared memory and pointers with overlayed ROS msg semantics.

gvdhoorn gravatar image gvdhoorn  ( 2020-05-25 13:03:15 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2020-05-25 18:27:10 -0500

updated 2020-05-26 00:46:06 -0500

In case anyone comes back here looking for this I got what I wanted with boost memory pools. Here is a tiny code example creating my message that I would then publish (and would get automagically handled by a memory pool that allocates in blocks that are sized as a multiple of the message type+control_block for the shared ptr):

#include <boost/pool/pool_alloc.hpp>
#include <geometry_msgs/TwistStamped.h>
#include <memory>

template <typename T>
boost::shared_ptr<T> make_shared_from_pool() {
  using allocator_t = boost::fast_pool_allocator<boost::shared_ptr<T>>;
  return boost::allocate_shared<T, allocator_t>(allocator_t());
}

int main(int argc, char **argv) {
  auto msg = make_shared_from_pool<geometry_msgs::TwistStamped>();

  std::cout << *msg << std::endl;

  return 0;
}

Note: This does not solve the problem of how does the memory for variable-sized arrays inside a ROS message gets allocated (those are still normal vectors that use normal new). It just solves the issue of the message itself, which is fine if the message only includes PODs like TwistStamped.

I discovered this cool tool for printing whenever malloc is called that helped troubleshoot this: https://github.com/samsk/log-malloc2

One other thing to note is that doesn't completely get rid of malloc calls in the critical path, it just greatly reduces them and so long as I roughly free one message for everyone I create in a short period of time (what happens here) I shouldn't get any more calls to malloc while the system is running.

edit flag offensive delete link more

Comments

1

Now I just have to go back and try to implement jog_arm communication like this before anyone notices.

tyler-picknik gravatar image tyler-picknik  ( 2020-05-25 19:54:40 -0500 )edit

so long as I roughly free one message for everyone I create in a short period of time (what happens here) I shouldn't get any more calls to malloc while the system is running.

this is indeed a constraint here. In general I would say this is not satisfiable / guaranteed (as Subscribers could be hanging on to the shared ptrs they receive for an undetermined amount of time), but in your specific case it's probably OK.

Nice answer, and I like how it progressively got improved to be less intrusive and more efficient.

Note that in ROS 2, using different allocators for messages and other entities in rclcpp is actually supported, so that should allow you to even deal with the memory allocations for non-PODs (and non-POD fields).

gvdhoorn gravatar image gvdhoorn  ( 2020-05-26 02:05:15 -0500 )edit

Just re-read wiki/roscpp/Overview/MessagesCustomAllocators and it appears custom allocators are actually supported for messages generated by gencpp.

Might be interesting to take a look at.

It would not be as explicit (ie: clear from the code) a custom allocator is being used when you instantiate a message though (as no make_shared_from_pool(..) would be needed).

gvdhoorn gravatar image gvdhoorn  ( 2020-05-26 02:11:29 -0500 )edit

This would work. The thing is a memory pool like this could have much worse performance if it is used to allocate arrays of different types. One would have be really careful to give a separate tagged pool allocator to each message they are using per the types inside the vectors. If a message had two types of vector that wouldn't work.

tyler-picknik gravatar image tyler-picknik  ( 2020-05-26 07:58:45 -0500 )edit

this is indeed a constraint here. In general I would say this is not satisfiable / guaranteed (as Subscribers could be hanging on to the shared ptrs they receive for an undetermined amount of time), but in your specific case it's probably OK.

This concern also applies to normal malloc/free. If there is more memory being allocated than being freed the program will eventually run out of memory. If the subscriber hangs onto the message the worst that happens is the pool is increased in size with a new malloc as it hits the limit in size. Even in this case, a memory pool for that use case is making the tradeoff of fewer calls to malloc for generally more memory usage if they use a pool like this.

One thing you'll notice is that when this started I declared the allocator and then passed it to the ...(more)

tyler-picknik gravatar image tyler-picknik  ( 2020-05-26 08:04:38 -0500 )edit

@gvdhoorn: Do you know how do the custom message allocators work in the case of nodelets? What if I create a publisher with a custom allocator, and subscribe in another nodlet with the default allocator? Would that trigger roscpp to copy the value? Or would it be a runtime error? Or would it be okay as the subscriber does neither allocate nor deallocate the value?

peci1 gravatar image peci1  ( 2020-06-30 18:28:11 -0500 )edit

@peci1: no, I'm afraid I don't know.

I would expect nothing to really happen though. The subscriber only interacts with the message object, it does not allocate anything. And deallocation is done through the standard shared_ptr dtor, so my expectation would for this to just work. But assumptions and all ..

gvdhoorn gravatar image gvdhoorn  ( 2020-07-01 03:23:19 -0500 )edit
1

I did a test because I was really interested into what would really happen. And the result is - you will trigger a serialization and deserialization if your publisher and subscriber differ in allocators. I created a gist that proves it: https://gist.github.com/peci1/be78ffe... . I also added a note to http://wiki.ros.org/roscpp/Overview/M... so that others don't have to wrap their heads around this problem. The decision about deserialization is made by comparing typeid of the published message (set here) with the typeid of the subscription callback (set here).

peci1 gravatar image peci1  ( 2020-07-01 09:03:27 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

2 followers

Stats

Asked: 2020-05-25 12:07:37 -0500

Seen: 59 times

Last updated: May 26