Runtime crash of ROS2 talker due to struct representations in C and C++ with clang
[I've modified this question into more of an explanation, because the issue appears to be in my compiler, rather than with ROS2; however, it's an interesting trace through what I think happens when you call rclcpp::init()
, so I thought I'd leave it up.]
I'm trying to run a simple ROS2 (dashing) talker demo on a research system that closely tracks pointer provenance. The system crashes at runtime with a trace that says I'm trying to execute data on the stack.
If I follow the trace, I end up in rcl init_options.c, trying to allocate with a non-existent allocator. Here is the trace from the main function:
from talker.cpp
main:
int main(int argc, char * argv[]
{
rclcpp::init(argc, argv);
...
}
rclcpp::init
is in in utilities.cpp
:
void init(int argc, char const * const argv[], const InitOptions & init_options)
{
using contexts::default_context::get_global_default_context;
get_global_default_context()->init(argc, argv, init_options);
...
}
init
expects three arguments, but we only provide two (argc
and argv
).
The prototype for rclcpp::init
(in utilities.hpp
) deals with this, as it has a default:
void init(int argc, char const * const argv[], const InitOptions & init_options = InitOptions());
InitOptions()
is prototyped in init_options.hpp
as:
explicit InitOptions(rcl_allocator_t allocator = rcl_get_default_allocator())
rcl_get_default_allocator()
is in rcl
s allocator.h
:
#define rcl_get_default_allocator rcutils_get_default_allocator
which takes us to rcutils
allocator.c
, which is where things get interesting:
rcutils_allocator_t rcutils_get_default_allocator()
{
static rcutils_allocator_t default_allocator = {
.allocate = __default allocate,
...
.state = NULL,
};
return default_allocator;
}
So this assigns the function __default_allocate()
as the allocate
function of the default allocator
. __default_allocate()
is also rcutils
allocator.c
:
static void *
__default _allocate(size_t size, void * state)
{
RCUTILS_UNUSED(state);
return malloc(size);
}
So now we have a default allocator
, which is provided to rclcpp::InitOptions
as part of the call to rclcpp::init
shown above.
InitOptions::InitOptions(rcl_allocator_t allocator)
: init_options_(new rcl_init_options_t)
{
*init_options_ = rcl_get_zero_initialized_init_options();
rcl_ret_t ret = rcl_init_options_init(init_options_.get(), allocator);
if (RCL_RET_OK != ret) {
rclcpp::exceptions::throw_from_rcl_error(ret, "failed to initialized rcl init options");
}
}
InitOptions
takes the default allocator
and passes it to rcl_init_options_init()
in rcl
s init_options.c
.
<<< C++ calling C with a structure >>>
rcl_init_options_init(rcl_init_options_t * init_options, rcl_allocator_t allocator)
{
RCL_CHECK_ARGUMENT_FOR_NULL(init_options, RCL_RET_INVALID_ARGUMENT);
if (NULL != init_options->impl) {
RCL_SET_ERROR_MSG("given init_options (rcl_init_options_t) is already initialized");
return RCL_RET_ALREADY_INIT;
}
RCL_CHECK_ALLOCATOR(&allocator, return RCL_RET_INVALID_ARGUMENT);
init_options->impl = allocator.allocate(sizeof(rcl_init_options_impl_t), allocator.state);
...
}
Here, it finally crashes when trying to call allocator.allocate()
.
This appears to be because our version of clang
has different conventions for passing structures as arguments to functions. For C++, it appears to load a pointer to the whole structure into a register. For C, it appears to load the structure members. Therefore, when rcl_init_options_init()
(a C function) calls allocator.allocate()
, which was provided by InitOptions()
(a C++ constructor), it doesn't find what it expects in memory and crashes.
[Update] Before anyone spends any time on this: This appears to be a compiler issue with our version of Clang. We've patched it and I'm currently rebuilding everything. If it works, I'll close this out.