Intermittent segfault when running gtests

asked 2020-10-30 11:40:41 -0600

Pinknoise2077 gravatar image

updated 2020-11-03 10:39:31 -0600

Hi all,

I am trying to debug a strange intermittent issue, but I can't find the problem.

Platform:

  • OS: Ubuntu 16.04 with Kinetic Kame
  • gtest: libgtest-dev:amd64 1.7.0-4ubuntu1
  • cmake: version 3.5.1
  • C++: 14
  • gcc: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609

Symptom(s):

  • Segfault (Segmentation fault (core dumped)) when running a gtest defined with catkin_add_gtest().

Note: I've replaced the paths with ...

[ 32%] Built target _run_tests_... Segmentation fault (core dumped) -- run_tests.py: verify result "../test_results/.../gtest-test_....xml" Cannot find results, writing failure results to '...MISSING-gtest-test_....xml'

  • It only happens very rarely (~1/1500 builds), but happens consistently when it does.
  • It starts to happen with completely unrelated changes (i.e. changes that do not affect the package, and do not have any dependencies to the package / on the package).

Initial thoughts:

  • It might be an internal dependency issue within the package. We are importing a static library and thought it might not be imported before the test runs. I don't know how to add/enforce a dependency on an imported library. After reading the CMake documentation, it seems that you don't need to explicitly add dependencies on imported targets, so not sure what is going on.
  • It might be a gtest problem. I've read about some intermittent issues of segfaults happening
  • I believe this cannot be related to the code itself. There is no randomness in the tests (i.e. arbitrary predefined values) so its unlikely to be the cause of the segfault. If it was, I would expect a consistent (i.e. non intermittent issue) that would segfault every single time the tests are performed.

I unfortunately cannot copy the code here because its proprietary.

Any ideas about what could be going on? Anybody has experienced similar intermittent issues like this with gtests? If yes, what was the cause?

Edit: here is what I see when I run valgrind on the libgtest.so file created for the package.

valgrind ./libgtest.so 
==2024== Memcheck, a memory error detector
==2024== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==2024== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==2024== Command: ./libgtest.so
==2024== 
==2024== Invalid read of size 8
==2024==    at 0x11FD59: testing::internal::FormatCxxExceptionMessage(char const*, char const*) (gtest.cc:2029)
==2024==  Address 0x28 is not stack'd, malloc'd or (recently) free'd
==2024== 
==2024== 
==2024== Process terminating with default action of signal 11 (SIGSEGV)
==2024==  Access not within mapped region at address 0x28
==2024==    at 0x11FD59: testing::internal::FormatCxxExceptionMessage(char const*, char const*) (gtest.cc:2029)
==2024==  If you believe this happened as a result of a stack
==2024==  overflow in your program's main thread (unlikely but
==2024==  possible), you can try to increase the size of the
==2024==  main thread stack using the --main-stacksize= flag.
==2024==  The main thread stack size used in this run was 8388608.
==2024== 
==2024== HEAP SUMMARY:
==2024==     in use at ...
(more)
edit retag flag offensive close merge delete