Best practices for reproducibility

melodic

asked 2019-11-26 20:09:52 -0500

pitosalas
628 ●116 ●149 ●169

Working with a half a dozen Turtlebots and 14 students we have a real problem with reproducibility. That is to say things work and then don't work without us knowing what might have changed. Now I can think of many many culprits:

some package that was inadvertently or automatically updated (changed)
slight hardware differences we might not even know about
timing ordering of launching different nodes
draining batteries
and so on and so forth

We have a problem with reproducibility. I am sure we're not the only one, by far. It's a question of software and hardware "hygiene" I suspect. We've thought of some techniques to solve this but haven't implemented them yet::

Have one linux image that is authorized and install it on bare metal (ok)
Followed by a shell script that installs very specific versions of everything (hard, but may be doable)
Prohibit (via passwords?) anyone from installing or deinstalling anything (not sure how to do this)
Turn off all automatic update mechanisms (not sure how to do this)

My question is, how do you avoid this problem? What are your best practices? What are your tools?

edit retag flag offensive close merge delete

Comments

It would be good if you could give some examples of what you feel are "problems with reproducability".

Right now you only list (what you have identified as) potential causes with a list of potential solutions, but you don't really describe what the problems are you are running into.

Working/not working is too vague, and rather binary.

Turtlebots are real systems, closed-loop-ish controlled. If for instance you'd like each and every one of them to reach exactly the same spot in a map, that is, without some serious tweaking and calibration, not going to happen.

gvdhoorn ( 2019-11-27 01:40:25 -0500 )edit

Hoi! When I say "not working" I mean something pretty fundamental. As an example the student tells me they got something to work but then they run it again to show it to me and it doesn't work at all. The lidar stopped spinning for no apparent reason; the new navigation destination in rviz doesn't do anything; some weird error that I've never seen before shows up on the log. Sometimes rebooting the robot and re-running roscore etc does it, sometimes power cycling the whole robot makes the problem go away.

pitosalas ( 2019-11-27 21:03:31 -0500 )edit

When I say "not working" I mean something pretty fundamental.

that may be, and it may be perfectly clear to you, but if you don't write these things down, we can't know, so can't help you.

gvdhoorn ( 2019-11-28 01:58:37 -0500 )edit

I know what you’re saying. The very nature of irreproducibility is that it’s different every time, and that it’s hard for me pin in down what exactly went wrong. Let me refer back to my original question, which was not to solve a particular problem, but asking for experts like you for their best practices. And the previous response I received actually had some specific and actionable practices. So, what is your advice for “good hygiene” in a scenario like mine? (e.g. always brush your teeth at least once a day, don’t drink coffee after 9pm, always look twice before crossing the road)

pitosalas ( 2019-11-29 17:18:40 -0500 )edit

add a comment

answered 2019-11-26 23:34:09 -0500

stevemacenski

8272 ●34 ●503 ●129 https://www.linkedin.c...

Updated packages: Docker should help with bullet 1 if you give everyone the same docker image to work with (a colleague I'm sure will come by and mention Singularity, which is another option I haven't personally explored but what I read is incredibly tempting).

Slight hardware differences: Can you be more specific how this is causing you issues? In most cases I'm not sure how to get around that. if its calibration-like parameters, you can have a file in each robot with calibration results for the specific robot on the computer (or version controlled with IDs).

Ordering of bringup: If in ROS2 I'd say lifecycle. If in ROS1 but not concerned too much with adding a few seconds to bringup, bash scripts.

Draining batteries: Not sure what all I can say there - if batteries are getting old, replace them.

It sounds like some type of containerized environment. If you'd like to hide that from your students if they're not well versed in it (which, yeah, I don't think any students would be) you can wrap the docker pulls and getting into the session for them and after that its essentially just a terminal. Admittedly there's some learning curve, but if your students know ROS or able to learn it in the course, Docker (singularity) isn't that big of a step.

Comments

a colleague I'm sure will come by and mention Singularity

hah ;)

gvdhoorn ( 2019-11-27 01:38:20 -0500 )edit

Thanks for the great ideas. A few bits of feedback:

Docker: is a good idea. Better I think than trying to have shell scripts. You were I think referring to using it on the students' computer but there's nor reason not to also use it on the robot itself, yes?
Singularity: I tried it once about 6 months ago and found it really difficult. I never got it working to appreciate it's value. At the time it seemed very 'rough'

The students are all 'fairly' proficient with the shell, linux, ros and so on. It is true that it is often "ready fire aim" with them. Too quick to try something and if it works not looking back and not worrying about why it worked.

See also my comments to my frequent correspondent @gvdhoorn

pitosalas ( 2019-11-27 20:58:36 -0500 )edit

add a comment

Best practices for reproducibility

Comments

1 Answer

Comments

Question Tools

Stats

Related questions

Best practices for reproducibility edit

Comments

1 Answer

Comments

Question Tools

Stats

Related questions

Best practices for reproducibility