Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

I managed to compile and run with success, but I discourage this approach, do I will not close the question until I found a better way to solve the problem. A normal system update will ruin this, and I'm not sure if this will not have any consequences in the future when I need to use BOOST library. Anyway, here's my approach:

As I said, in the first place you need to add this line in the beginning of your code:

    #define __CUCACC__
    #include <ros/ros.h>

A normal catkin_make would give two errors with this approach. For the first one:

    sudo nano /usr/include/boost/type_traits/is_floating_point.hpp

Replace this line:

    #if defined(BOOST_HAS_FLOAT128)

with:

    #if defined(BOOST_HAS_FLOAT128) && !defined(__PGI)

For the following errors:

    sudo nano /usr/include/boost/core/swap.hpp

Comment all the line with:

    BOOST_GPU_ENABLED

There should be 3 lines.

With this I compiled and run the code. I added a pi generator example to test the speed in CPU and GPU. If someone wants to test..:

#define __CUDACC__
#include <ros/ros.h>
#include <iostream>
#include "std_msgs/String.h"
#include <sstream>

#define N 2000000000
#define vl 1024

int main(int argc, char **argv)
{

  ros::init(argc, argv, "pgi_test_node");

  ros::NodeHandle n;

  ros::Publisher chatter_pub = n.advertise<std_msgs::String>("chatter", 1000);

  ros::Rate loop_rate(10);

  int count = 0;
  while (ros::ok())
  {

   std_msgs::String msg;

    std::stringstream ss;
    ss << "hello world " << count;
    msg.data = ss.str();

    ROS_INFO("%s", msg.data.c_str());

  double pi = 0.0f;
  long long i;

  #pragma acc parallel vector_length(vl) 
  #pragma acc loop reduction(+:pi)
  for (i=0; i<N; i++) {
    double t= (double)((i+0.5)/N);
    pi +=4.0/(1.0+t*t);
  }

  printf("pi=%11.10f\n", pi/N);

  chatter_pub.publish(msg);

    ros::spinOnce();

    loop_rate.sleep();
    ++count;
  }


  return 0;
}

Timers are not even necessary, if you just comment the pragmas, the loop will run on CPU and you can clearly see the difference.

I managed to compile and run with success, but I discourage this approach, do I will not close the question until I found a better way to solve the problem. A normal system update will ruin this, and I'm not sure if this will not have any consequences in the future when I need to use BOOST library. Anyway, here's my approach:

As I said, in the first place you need to add this line in the beginning of your code:

    #define __CUCACC__
    #include <ros/ros.h>
 

A normal catkin_make would give two errors with this approach. For the first one: one:

    sudo nano /usr/include/boost/type_traits/is_floating_point.hpp
 

Replace this line: line:

    #if defined(BOOST_HAS_FLOAT128)

with:

with:

    #if defined(BOOST_HAS_FLOAT128) && !defined(__PGI)
 

For the following errors: errors:

    sudo nano /usr/include/boost/core/swap.hpp
 

Comment all the line with: with:

    BOOST_GPU_ENABLED
 

There should be 3 lines. lines.

With this I compiled and run the code. I added a pi generator example to test the speed in CPU and GPU. If someone wants to test..: test..:

#define __CUDACC__
#include <ros/ros.h>
#include <iostream>
#include "std_msgs/String.h"
#include <sstream>

#define N 2000000000
#define vl 1024

int main(int argc, char **argv)
{

  ros::init(argc, argv, "pgi_test_node");

  ros::NodeHandle n;

  ros::Publisher chatter_pub = n.advertise<std_msgs::String>("chatter", 1000);

  ros::Rate loop_rate(10);

  int count = 0;
  while (ros::ok())
  {

   std_msgs::String msg;

    std::stringstream ss;
    ss << "hello world " << count;
    msg.data = ss.str();

    ROS_INFO("%s", msg.data.c_str());

  double pi = 0.0f;
  long long i;

  #pragma acc parallel vector_length(vl) 
  #pragma acc loop reduction(+:pi)
  for (i=0; i<N; i++) {
    double t= (double)((i+0.5)/N);
    pi +=4.0/(1.0+t*t);
  }

  printf("pi=%11.10f\n", pi/N);

  chatter_pub.publish(msg);

    ros::spinOnce();

    loop_rate.sleep();
    ++count;
  }


  return 0;
}

Timers are not even necessary, if you just comment the pragmas, the loop will run on CPU and you can clearly see the difference.

I managed to compile and run with success, but I discourage this approach, do so I will not close the question until I found a better way to solve the problem. A normal system update will ruin this, and I'm not sure if this will not have any consequences in the future when I need to use BOOST library. Anyway, here's my approach:

As I said, in the first place you need to add this line in the beginning of your code:

    #define __CUCACC__
    #include <ros/ros.h>

A normal catkin_make would give two errors with this approach. For the first one:

    sudo nano /usr/include/boost/type_traits/is_floating_point.hpp

Replace this line:

    #if defined(BOOST_HAS_FLOAT128)

with:

    #if defined(BOOST_HAS_FLOAT128) && !defined(__PGI)

For the following errors:

    sudo nano /usr/include/boost/core/swap.hpp

Comment all the line with:

    BOOST_GPU_ENABLED

There should be 3 lines.

With this I compiled and run the code. I added a pi generator example to test the speed in CPU and GPU. If someone wants to test..:

#define __CUDACC__
#include <ros/ros.h>
#include <iostream>
#include "std_msgs/String.h"
#include <sstream>

#define N 2000000000
#define vl 1024

int main(int argc, char **argv)
{

  ros::init(argc, argv, "pgi_test_node");

  ros::NodeHandle n;

  ros::Publisher chatter_pub = n.advertise<std_msgs::String>("chatter", 1000);

  ros::Rate loop_rate(10);

  int count = 0;
  while (ros::ok())
  {

   std_msgs::String msg;

    std::stringstream ss;
    ss << "hello world " << count;
    msg.data = ss.str();

    ROS_INFO("%s", msg.data.c_str());

  double pi = 0.0f;
  long long i;

  #pragma acc parallel vector_length(vl) 
  #pragma acc loop reduction(+:pi)
  for (i=0; i<N; i++) {
    double t= (double)((i+0.5)/N);
    pi +=4.0/(1.0+t*t);
  }

  printf("pi=%11.10f\n", pi/N);

  chatter_pub.publish(msg);

    ros::spinOnce();

    loop_rate.sleep();
    ++count;
  }


  return 0;
}

Timers are not even necessary, if you just comment the pragmas, the loop will run on CPU and you can clearly see the difference.