ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question

Revision history [back]

Executive Summary

Use /diagnostics to publish hardware diagnostics data from every device driver.

Use diagnostics_aggregator to collect and group diagnostics data on any significant system.

Background: /diagnostics

To begin with, it is best practice to set up diagnostics on all robot hardware at a minimum. Most of the drivers included with ROS include some form of diagnostics messages. The ROS diagnostics toolchain is not a computation graph level concept (like parameters, nodes, or topics), but is instead built on top of the /diagnostics topic.

Hardware drivers publish to the /diagnostics topic a diagnostic_msgs/DiagnosticArray message.

DiagnosticArray.msg contains a header (sequence number, timestamp, and frame_id) and an array of diagnostic_msgs/DiagnosticStatus messages. The DiagnosticStatus.msg contains:

  • byte level - One of three states (OK, WARN, ERROR), which represents the overall hardware health.
  • string name - The name of the device this DiagnosticStatus represents
  • string hardware_id - A unique hardware identifier, possibly a serial number or UUID
  • diagnostic_msgs/KeyValue[] - An array of key/value pairs used to represent any additional pertinent information about the sensor. (For example "temperature":"35C", "frequency:100Hz", "voltage:24V")

Any node subscribing to this /diagnostics topic will receive the raw diagnostics messages (which can be overwhelming on a large system like the PR2).

To visualize raw diagnostics messages in ROS, you can currently use the runtime_monitor by simply running:

rosrun runtime_monitor montior

More Background: The diagnostic_updater

The diagnostic_updater is not quite relevant to the aggregator, but is an often-overlooked tool. It provides convenience functions for working with the DiagnosticArray messages with your hardware drivers in C++.

With the diagnostic_updater libraries, you can create an object for interacting with DiagnosticArray messgaes, as well as monitoring frequency status, and over/under monitoring for critical values in your hardware device (temperature, voltage, etc).

This was mainly included in this write-up so that no one tries to reinvent what is already written.

The Diagnostic Aggregator

diagnostic_aggregator is a package for aggregating and analyzing diagnostics data.

Assuming that you have a working robotic system publishing raw diagnostic data to /diagnostics, you will see that the raw data accumulates quickly, and becomes cumbersome to actually sort through. For this reason, we use the diagnostic_aggregator. It allows us to group and sort data into namespaces (much like the ROS computational graph). It will also rate-limit the aggregated diagnostics output to ~pub_rate (typically 1 Hz).

From the wiki page, this can transform something like:

Left Wheel
Right Wheel
SICK Frequency
SICK Temperature
SICK Connection Status
Stereo Left Camera
Stereo Right Camera
Stereo Analysis
Stereo Connection Status
Battery 1 Level
Battery 2 Level
Battery 3 Level
Battery 4 Level
Voltage Status

Into something that is more readable, like:

My Robot/Wheels/Left
My Robot/Wheels/Right
My Robot/SICK/Frequency
My Robot/SICK/Temperature
My Robot/SICK/Connection Status
My Robot/Stereo/Left Camera
My Robot/Stereo/Right Camera
My Robot/Stereo/Analysis
My Robot/Stereo/Connection Status
My Robot/Power System/Battery 1 Level
My Robot/Power System/Battery 2 Level
My Robot/Power System/Battery 3 Level
My Robot/Power System/Battery 4 Level
My Robot/Power System/Voltage Status

Additionally, each group is given a level, which allows you to quickly see at-a-glance, where the errors are on your machine. ERROR and WARN propagates up the tree. For instance, an ERROR on Left propagates up to an ERROR on Wheels, and an ERROR on My Robot.

This can then be inspected using the robot_monitor tool.

Why Should I Use This

Using /diagnostics is best practice on a robotic system of any scale. It makes troubleshooting hardware (and software) easier in almost all cases.

Using /diagnostics_agg is good practice on any larger robotic system. It is also good practice on any sort of production system, as it allows more flexibility and clarity when looking at diagnostics data.

Additionally, if the system is already set up to use aggregated diagnostics, the user may choose to write additional analyzer plugins for their system, further customizing diagnostic analysis.

Helpful Resources

Executive Summary

Use /diagnostics to publish hardware diagnostics data from every device driver.

Use diagnostics_aggregator to collect and group diagnostics data on any significant system.

Background: /diagnostics

To begin with, it is best practice to set up diagnostics on all robot hardware at a minimum. Most of the drivers included with ROS include some form of diagnostics messages. The ROS diagnostics toolchain is not a computation graph level concept (like parameters, nodes, or topics), but is instead built on top of the /diagnostics topic.

Hardware drivers publish to the /diagnostics topic a diagnostic_msgs/DiagnosticArray message.

DiagnosticArray.msg message, which contains a header (sequence number, timestamp, and frame_id) and an array of diagnostic_msgs/DiagnosticStatus messages.

The DiagnosticStatus.msgDiagnosticStatus contains:

  • byte level - One of three states (OK, WARN, ERROR), which represents the overall hardware health.
  • string name - The name of the device this DiagnosticStatus represents
  • string hardware_id - A unique hardware identifier, possibly a serial number or UUID
  • diagnostic_msgs/KeyValue[] - An array of key/value pairs used to represent any additional pertinent information about the sensor. (For example "temperature":"35C", "frequency:100Hz", "voltage:24V")

Any node subscribing to this /diagnostics topic will receive the raw diagnostics messages (which can be overwhelming on a large system like the PR2).

To visualize raw diagnostics messages in ROS, you can currently use the runtime_monitor by simply running:

rosrun runtime_monitor montior

More Background: The diagnostic_updater

The diagnostic_updater is not quite relevant to the aggregator, but is an often-overlooked tool. It provides convenience functions for working with the DiagnosticArray messages with your hardware drivers in C++.

With the diagnostic_updater libraries, you can create an object for interacting with DiagnosticArray messgaes, as well as monitoring frequency status, and over/under monitoring for critical values in your hardware device (temperature, voltage, etc).

This was mainly included in this write-up so that no one tries to reinvent what is already written.

The Diagnostic Aggregator

diagnostic_aggregator is a package for aggregating and analyzing diagnostics data.

Assuming that you have a working robotic system publishing raw diagnostic data to /diagnostics, you will see that the raw data accumulates quickly, and becomes cumbersome to actually sort through. For this reason, we use the diagnostic_aggregator. It allows us to group and sort data into namespaces (much like the ROS computational graph). It will also rate-limit the aggregated diagnostics output to ~pub_rate (typically 1 Hz).

From the wiki page, this can transform something like:

Left Wheel
Right Wheel
SICK Frequency
SICK Temperature
SICK Connection Status
Stereo Left Camera
Stereo Right Camera
Stereo Analysis
Stereo Connection Status
Battery 1 Level
Battery 2 Level
Battery 3 Level
Battery 4 Level
Voltage Status

Into something that is more readable, like:

My Robot/Wheels/Left
My Robot/Wheels/Right
My Robot/SICK/Frequency
My Robot/SICK/Temperature
My Robot/SICK/Connection Status
My Robot/Stereo/Left Camera
My Robot/Stereo/Right Camera
My Robot/Stereo/Analysis
My Robot/Stereo/Connection Status
My Robot/Power System/Battery 1 Level
My Robot/Power System/Battery 2 Level
My Robot/Power System/Battery 3 Level
My Robot/Power System/Battery 4 Level
My Robot/Power System/Voltage Status

Additionally, each group is given a level, which allows you to quickly see at-a-glance, where the errors are on your machine. ERROR and WARN propagates up the tree. For instance, an ERROR on Left propagates up to an ERROR on Wheels, and an ERROR on My Robot.

This can then be inspected using the robot_monitor tool.

Why Should I Use This

Using /diagnostics is best practice on a robotic system of any scale. It makes troubleshooting hardware (and software) easier in almost all cases.

Using /diagnostics_agg is good practice on any larger robotic system. It is also good practice on any sort of production system, as it allows more flexibility and clarity when looking at diagnostics data.

Additionally, if the system is already set up to use aggregated diagnostics, the user may choose to write additional analyzer plugins for their system, further customizing diagnostic analysis.

Helpful Resources

click to hide/show revision 3
Marking as community wiki.

Executive Summary

Use /diagnostics to publish hardware diagnostics data from every device driver.

Use diagnostics_aggregator to collect and group diagnostics data on any significant system.

Background: /diagnostics

To begin with, it is best practice to set up diagnostics on all robot hardware at a minimum. Most of the drivers included with ROS include some form of diagnostics messages. The ROS diagnostics toolchain is not a computation graph level concept (like parameters, nodes, or topics), but is instead built on top of the /diagnostics topic.

Hardware drivers publish to the /diagnostics topic a diagnostic_msgs/DiagnosticArray message, which contains a header (sequence number, timestamp, and frame_id) and an array of diagnostic_msgs/DiagnosticStatus messages.

The DiagnosticStatus contains:

  • byte level - One of three states (OK, WARN, ERROR), which represents the overall hardware health.
  • string name - The name of the device this DiagnosticStatus represents
  • string hardware_id - A unique hardware identifier, possibly a serial number or UUID
  • diagnostic_msgs/KeyValue[] - An array of key/value pairs used to represent any additional pertinent information about the sensor. (For example "temperature":"35C", "frequency:100Hz", "voltage:24V")

Any node subscribing to this /diagnostics topic will receive the raw diagnostics messages (which can be overwhelming on a large system like the PR2).

To visualize raw diagnostics messages in ROS, you can currently use the runtime_monitor by simply running:

rosrun runtime_monitor montior

More Background: The diagnostic_updater

The diagnostic_updater is not quite relevant to the aggregator, but is an often-overlooked tool. It provides convenience functions for working with the DiagnosticArray messages with your hardware drivers in C++.

With the diagnostic_updater libraries, you can create an object for interacting with DiagnosticArray messgaes, as well as monitoring frequency status, and over/under monitoring for critical values in your hardware device (temperature, voltage, etc).

This was mainly included in this write-up so that no one tries to reinvent what is already written.

The Diagnostic Aggregator

diagnostic_aggregator is a package for aggregating and analyzing diagnostics data.

Assuming that you have a working robotic system publishing raw diagnostic data to /diagnostics, you will see that the raw data accumulates quickly, and becomes cumbersome to actually sort through. For this reason, we use the diagnostic_aggregator. It allows us to group and sort data into namespaces (much like the ROS computational graph). It will also rate-limit the aggregated diagnostics output to ~pub_rate (typically 1 Hz).

From the wiki page, this can transform something like:

Left Wheel
Right Wheel
SICK Frequency
SICK Temperature
SICK Connection Status
Stereo Left Camera
Stereo Right Camera
Stereo Analysis
Stereo Connection Status
Battery 1 Level
Battery 2 Level
Battery 3 Level
Battery 4 Level
Voltage Status

Into something that is more readable, like:

My Robot/Wheels/Left
My Robot/Wheels/Right
My Robot/SICK/Frequency
My Robot/SICK/Temperature
My Robot/SICK/Connection Status
My Robot/Stereo/Left Camera
My Robot/Stereo/Right Camera
My Robot/Stereo/Analysis
My Robot/Stereo/Connection Status
My Robot/Power System/Battery 1 Level
My Robot/Power System/Battery 2 Level
My Robot/Power System/Battery 3 Level
My Robot/Power System/Battery 4 Level
My Robot/Power System/Voltage Status

Additionally, each group is given a level, which allows you to quickly see at-a-glance, where the errors are on your machine. ERROR and WARN propagates up the tree. For instance, an ERROR on Left propagates up to an ERROR on Wheels, and an ERROR on My Robot.

This can then be inspected using the robot_monitor tool.

Why Should I Use This

Using /diagnostics is best practice on a robotic system of any scale. It makes troubleshooting hardware (and software) easier in almost all cases.

Using /diagnostics_agg is good practice on any larger robotic system. It is also good practice on any sort of production system, as it allows more flexibility and clarity when looking at diagnostics data.

Additionally, if the system is already set up to use aggregated diagnostics, the user may choose to write additional analyzer plugins for their system, further customizing diagnostic analysis.

Helpful Resources