Robotics StackExchange | Archived questions

Custom Buildfarm - Nodes not automatically added

Hello, dear ROS-community.

tl;dr for the post: When conducting the buildfarmdeployment scripts, usually there should be some machines automatically added into jenkins which are labeled with e.g. "agenton_master" or similar. This does not happen automatically with my implementation and I guess it is because of some configuration error.

In the last couple of weeks, I tried to set up a customized buildfarm for my company, according to the official documentations of the Buildfarm-Deployment https://github.com/ros-infrastructure/buildfarm_deployment and regarding the configurations on https://github.com/ros-infrastructure/buildfarm_deployment_config

With a lot of time invested and a lot of read forum posts, I decided, that I needed to bring my problem to you guys and figure out, if there is any chance of getting it done at all.

Goal: We want to be able to set up a jenkins server with the possibility of doing CI for our packages, also regarding the fully automated query of new commits to certain packages. We figured, it should be doable with the ROS-Buildfarm and the customizable features, as well as the configurable jobs of the rosbuildfarmdeployment.

Setup: As it is written in the documentations, we created three virtual machines, one for each application (Master, Repository, Agent), which particularily should not be made with LXC-containers (interesting fact: docker does not work properly from inside LXC it seems). The machines have proper resources (like specified), regarding to CPU and memory as well as disk space. They run Ubuntu 16.04.5 LTS, xenial.

Step by Step: I made changes to the different files in the buildfarm-configurations like specified and then put the buildfarmdeploymentconfig folder on all three machines and ran the setup with installation and reconfiguration. There were no puppet-errors written to the terminal when the ./reconfigure.bash command was finished.

Error and Guesswork: The configurations were made without professional knowledge about the whole topic of puppet, jenkins etc., so what I would guess is, that there is some stupid mistake to be found in those configs which finally leads to my error. The agent and repo machines are not to be found initialized in the Jenkins-UI on the left side in build processor status, or in the node overview, which does lead to problems for continuing the setup with the official ros-buildfarm stuff from https://github.com/ros-infrastructure/ros_buildfarm (I would have liked to show you an image of the jenkins overview but it does not work without at least 5 Karma, for spam protection reasons I guess...)

Why this leads to me posting here is also quiet easy to understand: as there are lots of different jobs to be created by the ros-buildfarm-deployment later on, they usually only work with the correspondance of the correct machines and creating the right nodes manually does not work so well unfortunately (it lead to an authentification error when trying to activate the node). I would like to have the whole buildfarm_deployment process finished correctly as I move on with figuring out the different jobs (devel-job, pre-release-job, etc.)

At this point, I would really love to just attach the config files to this post, but as new user, I can't do that. At least I tried to crop the comments and I edited out the specific adresses and keys while also adding some of my own comments for specific stuff which I would like to know more about (%%% ......%%%). Sorry for the incoming ugly wall of text. The key in common.yaml is the fitting public key to the private one in master.yaml so them not fitting should not be a problem. But I don't know, where this key needs to be assigned specifically, so which user on which of those machines should have the keys in their .ssh/ directory? Also, does it matter, where the whole process of reconfiguration run from and as which user it was done?

common.yaml

jenkins::slave::ui_user: 'admin'

jenkins::slave::ui_pass: '#jbcrypt.......5q' %%% I used the jbcrypt script to hash my password for access to jenkins and put the hash in here, is that the correct way? %%%

jenkins::swarm_version: &jenkins_swarm_version '3.14'

jenkins::slave::version: *jenkins_swarm_version

jenkins::slave::masterurl: 'http://master:8080'

master::ip: ip_of_master

repo::ip: id_of_repo

timezone: 'Europe/Berlin'

ssh_keys:
    'jenkins-agent@agent':     
       type: ssh-rsa   
key: AAA...7nNQ==    
user: jenkins-agent
require: User[jenkins-agent]

ssh_host_keys: %%% I'm not sure about this one, do I need to specify adresses here for repo, agent and master? %%%

repo: ssh-ed25519 A...Gm

agent: ssh-ed25519 A...YI

master: ssh-ed25519 A...Hi

autoreconfigure: false

master.yaml

user::admin::name: admin

user::admin::password_hash: '#jbcrypt:......5q'  %%% Just like above, jbcrypt script to hash my password for access to jenkins and put the hash in here. So this is the same user and password as in common.yaml %%%

credentials::jenkins-slave::username: jenkins-agent

credentials::jenkins-slave::id: 1e7d4696-7fd4-4bc6-8c87-ebc7b6ce16e5 %%% I don't know about this one, it does not seem like it needs to be changed? %%%

credentials::jenkins-slave::passphrase: 4lRsx/NwfEndwUlcWOOnYg== %%% It says that this is put in for an "empty" passphrase. Should be fine? %%%

jenkins::private_ssh_key: 
    -----BEGIN RSA PRIVATE KEY-----
    MI...mYQ=
    -----END RSA PRIVATE KEY-----

repo.yaml

%%% The following PGP keypair was created with GPG2, so there is no "Version" Tag in the block. %%%

jenkins-agent::gpg_key_id: 'ECD25CDA426D0B66'

jenkins-agent::gpg_private_key: |

BEGIN .......=3OFT......BLOCK-----

jenkins-agent::gpg_public_key: |

BEGIN ........dzve...BLOCK-----

%%% What does the "signing_key" specifically mean in the blocks below? Is it just the key to sign the repo-content with? Then it should be the ID of above, right? %%%

jenkins-agent::reprepro_updater_config: |

[ubuntu_building]

architectures: amd64 arm64 armhf i386 source

distros: xenial

repository_path: /var/repos/ubuntu/building

signing_key: ECD25CDA426D0B66

upstream_config: /home/jenkins-agent/reprepro_config

[ubuntu_testing]

%%% Same as above except for the name and /testing in repository path %%%

[ubuntu_main]

%%% Same as above except for the name and /main in repository path %%%

%%% As we will probably only build our own packages and their respective dependencies, I'm not sure about the whole upstream import thing. That also means, that this whole section is probably done wrong. %%%

jenkins-agent::reprepro_config:
'/home/jenkins-agent/reprepro_config/ros_bootstrap.yaml':

ensure: 'present'

content: |

name: ros_bootstrap

method: http://repos.ros.org/repos/ros_bootstrap

suites: [xenial]

component: main

architectures: [i386, amd64, arm64, armel, armhf, source]

verify_release: blindtrust

There were no changes made to the hiera.yaml or agent.yaml as there did not seem to be a need of that. What I also found, was some strange error-line in the puppet.log in /var/log/puppet.log near the end of it, as the deployment-process with ./reconfigure.bash was finished, which I sadly can't make any sense of:

2019-05-10 12:28:06 +0200 Puppet (err): java -jar /usr/share/jenkins/jenkins-cli.jar -s http://127.0.0.1:8080 -auth admin:#jbcrypt:$2a...5q groovy = < /tmp/configuregituser.groovy returned 255 instead of one of [0]

2019-05-10 12:28:06 +0200 /Stage[main]/Profile::Jenkins::Master/Rosjenkins::Groovy[/tmp/configuregituser.groovy]/Exec[/tmp/configuregituser.groovy]/returns (err): change from notrun to 0 failed: java -jar /usr/share/jenkins/jenkins-cli.jar -s http://127.0.0.1:8080 -auth admin:#jbcrypt:$2...5q groovy = < /tmp/configuregituser.groovy returned 255 instead of one of [0]

(I also cropped out the jbcrypt password here)


I hope there is some obvious mistake to be corrected about the whole subject. Thank you for reading and your patience.

Asked by StSt_Robotics on 2019-05-10 08:24:58 UTC

Comments

Your post is quite long. I understand you feel you need to provide sufficient information, and I thank you for that, but if possible, please summarise your main questions at the top of your question.

That will provide a "reading guide" if you will for people reading the rest of your question, or, if they'd already know the answer, allow them to skip the rest of your question.

Asked by gvdhoorn on 2019-05-10 08:42:43 UTC

I added a tl;dr which I hope makes the subjected question clearer. Thanks for your suggestion.

Asked by StSt_Robotics on 2019-05-13 04:25:30 UTC

When a dependency of a puppet resource fails, that resource is not even attempted. If your agent_on_master is missing, we should figure out which resources failed to configure properly.

There were no puppet-errors written to the terminal when the ./reconfigure.bash command was finished.

Puppet isn't always clear about when it errors. There is a PR to improve the user experience of finding errors in the configuration process https://github.com/ros-infrastructure/buildfarm_deployment_config/pull/48

The configure_git_user.groovy script runs to set a git user configuration for Jenkins's git operations. If there are no other (err) lines in your puppet.log then that one must have stopped the agent_on_master from configuring properly. If there are other (err) lines can you please share them as well.

Asked by nuclearsandwich on 2019-05-26 10:04:33 UTC

Answers