ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
0

Improvement of allocation of big python list to custom message data field

asked 2022-08-30 03:10:44 -0500

fabbro gravatar image

updated 2022-08-30 14:30:31 -0500

Short explanation:

I am trying to use ROS2 to publish sensor data that I have stored on disk that I "acquired" from a simulator. The sensor data frame is in the form of a protobuf RepeatedScalarContainer. I have no problem in parsing and deserializing the data from protobuf (.bin). I already cast this data into a python list (let's call it sensor_data_list) and this works just fine and it is not my problem.

The problem is when I try to allocate a sensor dataframe (a quite big list of size ~4 million elements which is equivalent to 18 MB, when I use sys.getsizeof()) into a field of a custom message. I think that this is not a ROS2 problem but rather how I handle data in python. However, I thought that someone in the ROS community might have had the same problem or some similar experience.

Detailed explanation:

I needed to create my own sensor custom data to handle a specific sensor data (no, I could not use already available sensor data msgs available in ROS, but this is not the point of my question anyway). Let's call this custome message sensor_data_custom_msg.

My data handling workflow in my ROS2 sensor node is the following:

  1. deserialize protobuf sensor data and preload it into a big python list. This is done at the beginning of the node initialization on purpose since we don't care about "spending/wasting" some time in the initialization phase of the node.
  2. create the actual sensor publisher node by passing to it the preloaded data in previous point
    in the timer_callback of the node I initialize a sensor_data_custom_msg and I pass to the callback the preloaded data.
  3. The problem is at this step: At each time step what I would like to do is to fill the sensor_data_custom_msg field data with a specific element of the list sensor_data_list to then publish my sensor_data_custom_msg filled with sensor data stored on disk. This operation however is taking a lot of time (~0.4 seconds). Therefore, from here I started investigating a bit the problem. What I found is:

If I print the length and type of the data I am trying to fill the sensor_data_custom_msg.data I get respectively: length: 4608000; type: <class 'list'>. You can then understand that my sensor data is quite big and here comes my question: is there a better way to deal with such big lists in python? Can I optimize or change the way I allocate such big chunk of data in python?

Another thing that I would like to point out is about computational resources. I have a rather powerful machine. I am working on ubuntu 20.04 (pop_os) and I have an i9 processor with 32GB of RAM. I am using ROS2 inside a Docker container and the container is not the problem. Indeed, if you are asking yourself: "does the container have enough resources from the host computer?" I think that the answer is yes. Indeed ... (more)

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2022-08-30 03:30:40 -0500

ljaniec gravatar image

I think you can try to use a Python generator for getting elements from your list.

Next thing would be numpy - its implementation under Python API is in C, considerably faster than Python. It will take some time to learn it, but basics for numpy.array can be learned in 1-2 hours if you know Python well. You should check it :)

edit flag offensive delete link more

Comments

Thanks for your answer. Generators are indeed the first thing I am investigating since that's what I found browsing the web, I should have mentioned it in my question. I also already tried to use NumPy. I tried to instead of using a list to convert the parsed data to a NumPy array but I did not see any improvements. I will try again to understand if I missed something.

fabbro gravatar image fabbro  ( 2022-08-30 03:44:45 -0500 )edit

If you can add your code as chunks in the question edit, we could then check it and perhaps suggest some improvements. If you are familiar with pandas, you could try loading sensor data as chunks (e.g. https://www.geeksforgeeks.org/how-to-...)

ljaniec gravatar image ljaniec  ( 2022-08-30 03:56:50 -0500 )edit

After looking a bit more into generators I understood that this is not the place to use them. Generators help when you have a memory problem and you don't want to load the whole data into memory. In my case this is not the issue. My issue is the "copy" of a big list (class <list> in pyton) into a custom ROS2 message field which is defined as a float32[] data in its .msg file. I confirm that I also tried to change the type of the preloaded data from list to numpy array and it did not improve performance either.

fabbro gravatar image fabbro  ( 2022-08-30 11:20:15 -0500 )edit

Well, you could try implementing this particular node in C++ to see if it helps with speed.

In general, it seems that perhaps you need a different approach/architecture for your case - eliminating that single list with 460k elements would be a priority. Usually some kind of batching/multithreading/data fusion and/or compression is needed to process such large data faster.

ljaniec gravatar image ljaniec  ( 2022-08-30 13:38:55 -0500 )edit

Yes, this is another possibility indeed. I wanted to leave it as a last option since I am not super familiar with C++. I would like to add also another information: the size of the data I am trying to allocate is 18MB, so it is not that small. Another option is that maybe this is the maximum speed I can achieve with the architecture I have.

fabbro gravatar image fabbro  ( 2022-08-30 14:29:31 -0500 )edit

Question Tools

1 follower

Stats

Asked: 2022-08-30 03:10:44 -0500

Seen: 128 times

Last updated: Aug 30 '22