ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
0

Autoware lidar_localizer not working with ndt_gpu on V100 gpu

asked 2019-09-10 01:36:46 -0500

villie gravatar image

This is a question regarding autoware localizer.

It works perfectly with default pcl ndt library i-e method_type is 0 for ndt_matching node. For ndt_gpu library it works great on one of my computer with GTX-1070 gpu as well but on an another machine with Nvidia Tesla V100, it fails. Here is a detailed report:

I am using latest docker for Autoware docker from https://gitlab.com/autowarefoundation... with commit id: 0506e18f66834c8557aee43a64a0a87c1e8635f0

To reproduce the issue, I follow instructions here. https://gitlab.com/autowarefoundation...

in launch file: install/lidar_localizer/share/lidar_localizer/launch/ndt_matching.launch I set <arg name="method_type" default="2"/> to enable pcl_anh_gpu as soon as I start I get the following error:

Error: out of memory /home/autoware/Autoware/src/autoware/core_perception/ndt_gpu/src/VoxelGrid.cu 181 terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >' what(): boost: mutex lock failed in pthread_mutex_lock: Invalid argument [ndt_matching-6] process has died [pid 33002, exit code -6, cmd /home/autoware/Autoware/install/lidar_localizer/lib/lidar_localizer/ndt_matching __name:=ndt_matching __log:=/home/autoware/.ros/log/d4d41132-d390-11e9-9d98-ac1f6b4112c2/ndt_matching-6.log]. log file: /home/autoware/.ros/log/d4d41132-d390-11e9-9d98-ac1f6b4112c2/ndt_matching-6*.log

My output for nvidia-smi is :

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:61:00.0 Off | 0 | | N/A 41C P0 58W / 300W | 3292MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:62:00.0 Off | 0 | | N/A 35C P0 40W / 300W | 11MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 | | N/A 35C P0 38W / 300W | 11MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 | | N/A 36C P0 38W / 300W | 11MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+

I have 4 gpus with 16GB memory each, there is no way that it can run out of memory.

The strange thing is that on my GTX-1070 with 8GB of memory, this runs without any issue.

The contents of log file /home/autoware/.ros/log/d4d41132-d390-11e9-9d98-ac1f6b4112c2/ndt_matching-6.log

Log file: method_type: 2 use_gnss: 1 queue_size: 1 offset: linear get_height: 1 use_local_transform: 0 use_odom: 0 use_imu: 0 imu_upside_down: 0 imu_topic: /imu_raw localizer: velodyne

(tf_x,tf_y,tf_z,tf_roll,tf_pitch,tf_yaw): (1.2, 0, 2, 0, 0, 0)

Update points_map.

--

Upon analyzing code and debugging different scenarios with different maps, this is what I have found out so far:

the CMakeLists.txt in ndt_gpu is setting architecture wrong:

if ("${CUDA_CAPABILITY_VERSION}" MATCHES "^[1-9][0-9]+$") set(CUDA_ARCH "sm_${CUDA_CAPABILITY_VERSION}") else () set(CUDA_ARCH "sm_52") endif ()

it should be sm_70 for V100 but CUDA_CAPABILITY_VERSION is empty on my system and sm_52 is being set. I have set it to sm_70 and compute_52, but the problem persists.

In the code side, the error can happen at different points depending on maps, but it always either fails with running out of memory or goes in some infinite loop in buildParent() kernel

Not knowing cuda at ... (more)

edit retag flag offensive close merge delete

Comments

I posted an answer below but I just want to check - can you please run the following:

nvcc --version
./usr/local/cuda/bin/nvcc --version

And report the output?

Josh Whitley gravatar image Josh Whitley  ( 2019-09-11 16:52:36 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2019-09-11 16:51:30 -0500

Josh Whitley gravatar image

@villie - Please see the answer https://answers.ros.org/question/3294... and the issue https://gitlab.com/autowarefoundation.... I know these are not very helpful at this time but it is where we are regarding this issue until a CUDA expert comes along and helps us resolve them.

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2019-09-10 01:36:46 -0500

Seen: 744 times

Last updated: Sep 11 '19