Getting error: The NVIDIA driver was unable to open 'libnvidia-glvkspirv.so.440.59'

asked 2020-03-30 15:08:03 -0500

tleyden2 gravatar image

I've followed these instructions, and I'm able to run nvidia-smi inside of the ade env, as well as running ade start --update --enter without errors.

However when I run this step:

In the same terminal window, start the LGSVL simulator:

/opt/lgsvl/simulator

I'm seeing this error when I run ade$ ./opt/lgsvl/simulator:

The NVIDIA driver was unable to open 'libnvidia-glvkspirv.so.440.59'.  This library is required at run time.

In the Player.log, I see:

Desktop is 1920 x 1080 @ 144 Hz
[Vulkan init] extensions: count=15
[Vulkan init] extensions: name=VK_KHR_device_group_creation, enabled=0
[Vulkan init] extensions: name=VK_KHR_display, enabled=1
[Vulkan init] extensions: name=VK_KHR_external_fence_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_memory_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_semaphore_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_get_physical_device_properties2, enabled=0
[Vulkan init] extensions: name=VK_KHR_get_surface_capabilities2, enabled=0
[Vulkan init] extensions: name=VK_KHR_surface, enabled=1
[Vulkan init] extensions: name=VK_KHR_xcb_surface, enabled=0
[Vulkan init] extensions: name=VK_KHR_xlib_surface, enabled=1
[Vulkan init] extensions: name=VK_EXT_acquire_xlib_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_utils, enabled=0
[Vulkan init] extensions: name=VK_EXT_direct_mode_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_display_surface_counter, enabled=0
Vulkan error VK_ERROR_INCOMPATIBLE_DRIVER (-9) file: ./Runtime/GfxDevice/vulkan/VKContext.cpp, line: 333
Vulkan error./Runtime/GfxDevice/vulkan/VKContext.cpp:333
Vulkan detection: 0
No supported renderers found, exiting

(Filename: ./PlatformDependent/LinuxStandalone/main.cpp Line: 639)
edit retag flag offensive close merge delete

Comments

I was able to work around the issue by running the lgsvl simulator outside the docker container on the host itself. Since the docker container is launched with --net=host, the lgsvl running on the host can still reach the ros2 bridge on localhost:9090. But I'd still be interested in knowing why the simulator works on the host but not in the container.

tleyden2 gravatar image tleyden2  ( 2020-03-30 15:53:40 -0500 )edit

When you run nvidia-smi both inside and outside ade, what driver version do you see in the upper-right corner of the output? Do either of these match the version you see when you run grep "X Driver" /var/log/Xorg.0.log? How did you install your NVidia driver and libvulkan? There have been several updates recently and if you install the update and try to run the simulator without restarting, the Nvidia driver will throw errors similar to this.

Josh Whitley gravatar image Josh Whitley  ( 2020-04-02 21:07:30 -0500 )edit

In both inside and outside of ade, I'm seeing Driver Version: 440.59, which matches the version in /var/log/Xorg.0.log. Unfortunately I installed the drivers using a non-standard approach: apt-get install system76-driver-nvidia based on these instructions. I also did try restarting the machine and I'm still seeing the The NVIDIA driver was unable to open 'libnvidia-glvkspirv.so.440.59'. This library is required at run time. error. Again, the simulator does work on the host, and I'm using the lgsvlsimulator-linux64-2020.01 version. How do I get the simulator version inside ade to compare with this?

tleyden2 gravatar image tleyden2  ( 2020-04-11 13:25:00 -0500 )edit

The version that's used in ADE is in the image name in .aderc-lgsvl. Currently, that should be 2020.01, the same that you are using outside the container. The Pop!_OS driver should be fine (I love Pop!_OS, BTW - the guys and girls from System76 are awesome!). We just got 2020.03-rc1 and I'm working on an update.

Based on this issue, it looks like this is a problem with Vulkan, which is not yet supported in containers. I'm checking to see if it can be disabled in LGSVL.

Josh Whitley gravatar image Josh Whitley  ( 2020-04-11 15:36:46 -0500 )edit

Can you try running sudo apt update && sudo apt install libvulkan1 inside ADE and see if this solves the problem?

Josh Whitley gravatar image Josh Whitley  ( 2020-04-11 15:40:12 -0500 )edit

Nice! I'm actually on Ubuntu 18 since I didn't want to diverge from what most people are using. I tried to install libvulkan1 inside ADE, but it told me it was already installed. If you say Vulkan doesn't work inside containers, why aren't most autoware.auto developers hitting the same issue? One update: after updating ADE to ade-lgsvl 2020.03-native-bridge, now when I start the /opt/lgsvl/simulator inside ADE it fails immediately with aborted and in the unity3d player logs shows: Caught fatal signal - signo:11 code:1 errno:0 addr:(nil). Obtained 1 stack frames. #0 (nil) in (Unknown).

tleyden2 gravatar image tleyden2  ( 2020-05-18 23:58:35 -0500 )edit

Please be aware that the instructions for how to launch the simulator have changed with the "native bridge" version. Check the LGSVL page for details.

Do you have the Nvidia Docker driver installed? See these instructions. I only just learned that ade launches with GPU support enabled by default - something I intend to change with a merge request against ade-cli as soon as I can make time.

Josh Whitley gravatar image Josh Whitley  ( 2020-05-19 08:59:02 -0500 )edit

Yeah, I have the nvidia-container-toolkit installed and when I run docker run --gpus all nvidia/cuda:10.0-base nvidia-smi it shows the same output I see on the host. It's on Driver Version: 440.59 and CUDA Version: 10.2. Also, I did start the simulator with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp /opt/lgsvl/simulator but it still crashes.

tleyden2 gravatar image tleyden2  ( 2020-05-19 23:39:47 -0500 )edit