In case anyone comes back here looking for this I got what I wanted with boost memory pools. Here is a tiny code example creating my message that I would then publish (and would get automagically handled by a memory pool that allocates in blocks that are sized as a multiple of the message type+control_block for the shared ptr):
#include <boost/pool/pool_alloc.hpp>
#include <geometry_msgs/TwistStamped.h>
#include <memory>
template <typename T>
boost::shared_ptr<T> make_shared_from_pool() {
using allocator_t = boost::fast_pool_allocator<boost::shared_ptr<T>>;
return boost::allocate_shared<T, allocator_t>(allocator_t());
}
int main(int argc, char **argv) {
auto msg = make_shared_from_pool<geometry_msgs::TwistStamped>();
std::cout << *msg << std::endl;
return 0;
}
Note: This does not solve the problem of how does the memory for variable-sized arrays inside a ROS message gets allocated (those are still normal vectors that use normal new). It just solves the issue of the message itself, which is fine if the message only includes PODs like TwistStamped.
I discovered this cool tool for printing whenever malloc is called that helped troubleshoot this: https://github.com/samsk/log-malloc2
One other thing to note is that doesn't completely get rid of malloc calls in the critical path, it just greatly reduces them and so long as I roughly free one message for everyone I create in a short period of time (what happens here) I shouldn't get any more calls to malloc while the system is running.
Conceptual question/observation: nodelets only support zero-copy message passing if you publish const pointers (we call them messages but it's really pointers). This means all subscribers share the same message. When would your publisher know when it's safe to change the message it has published (ie: for the next iteration)? The standard workflow for a nodelet would be:
Step 1 is necessary as you can never know how long one of your downstream consumers holds on to the shared ptr.
Second:
is memory allocation such a bottleneck on your machine? What architecture is this? If you're doing this to make this "real-time safe", then there are bigger problems in
roscpp
than a ...(more)Thank you for your reply. The issue is that I was using that workflow and profiled the loop rate of the system and over very long runs every so often the
allocate new message
step takes a long time to return and the system acts weird as a result.The zero-copy part is less important to me than consistent response time (I have plenty of overhead for a few copies). I was able to do this by implementing my system as just separate threads that use a
boost::lockfree:spsc_queue
to communicate. This solves the problem of memory allocations as the input message is created on the stack before the loop starts, filled out, and then copied into the queue. This has the obvious limitation that if I fill the queue faster than I empty it I will lose messages. I'm ok with that as I can control my ...(more)wouldn't figuring out why this happens be the correct way to solve this (memory leak, or perhaps fragmentation?), instead of working around it by trying to change the system?
And from your description this sounds like what you did for
jog_arm
, is that correct?Yes, this is my attempt at improving
jog_arm
and the determinism of the loop rates. I first tried usingnodelets
and ros messaging for message passing between the threads and when I profiled it I discovered the weird performance in the memory allocation. I'm just testing on my laptop which is some old-ish i5.This is also a conceptual question. If I could get zero-copy and pre-allocation of a message pool it would be really nice to use that in many other places.
I'm not sure it'd be possible to use a different memory allocator with the code generated by
gencpp
et al. Perhaps placement new could be used.The default allocator in C++ iirc does not deal very well with memory fragmentation, which you will run into with small-ish messages and frequent (de)allocation. There are allocators available which are much faster and better equipped to deal with that.
Placement new seems like it could approximate your memory pool idea (together with a pre-allocated block of storage and some accounting). Perhaps Boost's memory pool does this automatically (it's been too long, I don't remember).
Note: nodelets do not really exchange any messages. It's all pointers, and messages don't even leave the process. In the end it's threads + some shared memory and pointers with overlayed ROS msg semantics.