Robotics StackExchange | Archived questions

Bloom takes ~15 minutes to pull down rosdistro

I've noticed this when releasing packages for noetic. It looks like it is downloading about 140 megabytes, and github is letting me download at 50 to 200 KB/sec. I'm on a gigabit connection and have a fast connection to other sites.

Open a pull request from 'jonbinney/rosdistro:bloom-laser_proc-0' into 'ros/rosdistro:master'?
Continue [Y/n]? 
==> git checkout -b bloom-laser_proc-0
Switched to a new branch 'bloom-laser_proc-0'
==> Pulling latest rosdistro branch
remote: Enumerating objects: 9, done.
remote: Counting objects: 100% (9/9), done.
remote: Compressing objects: 100% (7/7), done.
Receiving objects:   7% (10814/140559), 3.65 MiB | 69.00 KiB/s   

Anyone know what's going on? Is the rosdistro really just that big? Anyway to speed this up?

Asked by jbinney on 2020-04-24 14:23:13 UTC

Comments

I haven't seen 15 minutes before, but 1-3 is typical for me, which is still a little irritating. I think its because of the crazy number of diffs its applying from 41,000+ commits. Perhaps we should squash the first 30,000 into 1 to help.

Asked by stevemacenski on 2020-04-24 16:03:37 UTC

Interestingly i can clone the the rosdistro repo in 30 seconds. Not sure why "Pulling latest rosdistro branch" takes so long.

Asked by jbinney on 2020-04-24 18:22:37 UTC

Seems to be a github issue. I just reproduced it by cloning using an xauth token - it ran very slow. Then tried again, and it ran quickly. It looks like github has had some "degradation" events on their status page.

Asked by jbinney on 2020-04-24 18:33:35 UTC

Huh interesting. Perhaps my internet is just crap then or 30 seconds feels like a small eternity these days.

Asked by stevemacenski on 2020-04-24 21:48:33 UTC

In my experience, most of the "why does X take so long with tool Y" where Y uses some part of Github come down to Github either intentionally throttling or having some sort of transient problem.

It's like that with wstool, rosdep, Bloom and some of their friends.

Asked by gvdhoorn on 2020-04-25 07:11:50 UTC

There was an attempt to use shallow clones https://github.com/ros-infrastructure/bloom/pull/538 but it didn't end up working for the contributor. @tfoote and I recently had a casual discussion about the sustainability of the GitHub-as-a-Database approach used by the official rosdistro and the conclusion was that it can't last forever without taking some affordances but there's no current plan to change. One thing that might be worth doing is adding additional special-case behavior to make both the content change and pull request via the GitHub API when the target rosdistro index is hosted on GitHub using either the repository contents API https://developer.github.com/v3/repos/contents/ or the Git Data API https://developer.github.com/v3/git/ and falling back to a local clone strategy only when that fails.

Asked by nuclearsandwich on 2020-04-25 09:55:17 UTC

Just to make sure someone hears this: whatever we end up doing, it would be really nice to make sure we keep history in tact. So the squash suggested by @stevemacenski would not be what I would like to see happen.

Future software archeology (like we did with rosinstall_generator_time_machine) is made really difficult with such operations.

+1 to see whether GH's APIs could be used for this.

Asked by gvdhoorn on 2020-04-25 10:00:25 UTC

One alternative to shallow clones might be to have bloom use the "--reference" option when cloning rosdistro. Then there could be one long-lived clone in "~/.config/bloom/rosdistro" and only new commits would be pulled down each time the user made a release.

Asked by jbinney on 2020-04-25 12:46:43 UTC

Just to make sure someone hears this: it would be really nice to make sure we keep history in tact

I want to avoid getting too far into it as it's somewhat of a derailment. But I do hear you. As I said we have no plan to make a change. I believe we have plenty of time before the repo collapses under its own weight. Whatever options we explore will likely be posted for public comment. I can't foresee a scenario where release history is not publicly audit-able.

Asked by nuclearsandwich on 2020-04-27 08:55:24 UTC

Answers