part 1: Deciding which CI format to use

Over the past couple of years I’ve had to learn 3 different CI config formats:

  • Travis CI
  • Circle CI
  • Gitlab CI

Now I “have to” learn Github actions as well? Why isn’t there an industry standard so that I can carry my knowlege with me when moving from one service to the next? It feels like I’m on a treadmill wasting time. So far, I haven’t even seen some big advantages from one format to the next which would justify the differences. In this part, I search the web for any universal Open Source standardized format that might exist. Wait, is that what Jenkins is for? I never quite understood what Jenkins was…

TASK: Find a universal CI config format if it exists
TASK_ID: afdb8ee2feb8be6db7925837828dc2c0
CREATED: 2022-09-18 19:11

Will use github actions

TASK_ID: afdb8ee2feb8be6db7925837828dc2c0

In the end it seems Jenkins isn’t usefull for me because I want to store the configuration in git and not use some fiddly web UI. As far as I can tell, no universal system exists and it is easiest at this time simply to use github actions and learn yet another config format.

Part 2: Run manager tests in CI

I setup running the tests in github actions using a nixos Docker image. It turns out that there are multiple ways of running tests on github actions in containers. One way is to set up a job that runs in a container. Unfortunately, I was not able to use the git checkout step using this method. Another way is to create a step in a job that has the uses: attrubte set to the name of a docker image. This worked fine. However, the tests did not run successfully. Three tests broke because I was running as root in the container, and one test broke for another unknown reason.

TASK: Set up CI to test manager
TASK_ID: e59db446d76a2ab972abb1bfab616376
CREATED: 2022-09-01 18:49

TASK: Set up CI to run tests conditionally based on changes
TASK_ID: 8c2b82fd591898b1807aeee26c793d7e
CREATED: 2022-09-01 18:52

Turns out this is very easy with github actions

TASK_ID: e59db446d76a2ab972abb1bfab616376

In the next part. I will create a new image for running the tests, which will have a proper user configured so that file perimssions tests will work.

Part 3: Setting up a custom Docker image for running CI tests in

TASK: Set up docker image with normal user in CI pipeline
TASK_ID: 8886c40d54bf08d3ef40ae5d7207ebf6
CREATED: 2022-09-20 19:06

Turns out it's not needed

TASK_ID: e59db446d76a2ab972abb1bfab616376 8886c40d54bf08d3ef40ae5d7207ebf6

So I set up a custom Docker image:

FROM nixos/nix
RUN nix-shell -p busybox --command "adduser test -D -h /github/home"
USER test

But it didn’t work. NixOS doesn’t like it when you don’t do things the nix(tm) way. I got an error when I ran nix-shell in the image:

nix-shell --command "cd manager ; cargo test"
error: could not set permissions on '/nix/var/nix/profiles/per-user' to 755: Operation not permitted

I didn’t know what to do. Should I stop using a Dockerfile and do things the nix(tm) way and use buildLayeredImage in some sort of ci specific flake? I found a nice tutorial for doing something like that, but I couldn’t figure out how do do the very thing I needed: set up a normal user in the container, using buildLayeredImage. Indeed, I couldn’t find any real documentation at all…

So I asked on the nixos matrix channel, and got some other suggestions. One was to use the install-nix-action for github actions. The other was to use sourcehut’s nixos vms. Since I didn’t really want to switch to sourcehut at the moment, I decided to go with the first option. I was also pointed to this tutorial, which will supposedly help me do just that.

In that guide I learn that it is even possible to cache the results of building the nix environment, which sounds amazing. You need a cachix account to do that though, and I don’t want to set one up just yet.

TASK: Setup cachix account and configure caching TASK_ID: f1bf9f508f8193dcf6baa516c4f11033 CREATED: 2022-10-16 19:49 ESTIMATED_TIME: U1 MILESTONES: fast-ci

Part 4: cargo fmt

In the first part of “part 4”, I set up cargo fmt and went to push it to git, but for some reason the network wasn’t working in my VM.

** TASK: Set up CI to do `cargo fmt`
*** TASK_ID: c4cea87b7e9a0db374d6679570555e08
*** CREATED: 2022-09-01 18:51
*** MILESTONES: mvp ci

TASK_ID: c4cea87b7e9a0db374d6679570555e08

I ended up having to restart my host machine to get it to function.

Part 5: cargo fmt try #2

TASK: Set up precommit hook to do `cargo fmt` everywhere
TASK_ID: f7b43334d359dd3d2aa47c3c28fbece4
CREATED: 2022-09-01 18:51

TASK_ID: c4cea87b7e9a0db374d6679570555e08 f7b43334d359dd3d2aa47c3c28fbece4

This time things went smoothly, though apparently you can’t pass an array of commands to run in a github action :/.

Part 6: Figuring out the flaky test

Back in part 3 I found a flaky test, but didn’t know why it was failing. By the end of the screencast, however, the test was passing, so I thought that there had been some config problem that had been solved. But it turns out this was not the case. So now, I need to find out why the test is flaky.

TASK: Figure out why the test ageing_cellar::organize_sockets_dir::tests::test_socket_dir_old_socket is flaky TASK_ID: a70acc872494bb716e620fa735fd8eed CREATED: 2022-10-23 16:01 ESTIMATED_TIME: W4 DONE MILESTONES: ci mvp

An assert is failing in the test, so I’m getting:

ageing_cellar::organize_sockets_dir::tests::test_socket_dir_old_socket stdout ----
thread 'ageing_cellar::organize_sockets_dir::tests::test_socket_dir_old_socket' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', src/ageing_cellar/
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

TASK_ID: a70acc872494bb716e620fa735fd8eed

In the end, the flaky test started passing again, so I just added a bit of debug output to the assert command, and otherwise didn’t actually figure out the problem.

Part 7: Code coverage

Cargo tarpaulin works just fine. I could set up coveralls or codecov but I’ve never really found their (remote) output necessary, it is perfectly easy to examine coverage directly in emacs or VSCode locally so I don’t need to waste my time with yet another proprietary SaaS product that needs to be managed… For now I’ve set the coverage cliff to %78 but I’d like it to be %100.

TASK: Set up CI to show code coverage
TASK_ID: 47c7ff403b446e8b42a87401d35fd450
CREATED: 2022-09-01 18:52

TASK: Set up CI to test kcf code
TASK_ID: 216868ec2a5f6adf295dd6688737c56c
CREATED: 2022-09-01 18:53

TASK: Set up CI to ensure kcf code is black
TASK_ID: 0f0a4c7c0df8a6683b9f292c3cc0c5f5
CREATED: 2022-09-01 18:53

TASK_ID: 47c7ff403b446e8b42a87401d35fd450 216868ec2a5f6adf295dd6688737c56c 0f0a4c7c0df8a6683b9f292c3cc0c5f5

So configuring taurpalin seemed simple enough once I dealt with that one flaky test (that fails every time with tarpaulin.). But once I put it in the github action, I got this weird problem. Another test was timing out.

ageing_cellar::clean_socket_dir_and_kill_orphan_services_interactively::tests::test_term_than_ignore ... Oct 23 18:58:58.864 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response
Error: "Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response"
Error: Process completed with exit code 1.

No idea why, given that the test works both locally (even with tarpaulin), and in the github action runner (without tarpaulin) :O.

Well after hitting my head against the wall for a while, it seems this is a nown bug. Seems for now, I need to use #[ignore] to skip those tests when running in tarpaulin, and then cargo test --ignored to run them in a speparate CI step…

Since I have to wait a while for the CI to re-run each time I bang my head against the wall with a flaky test, I managed to set up the CI flow for the kcf tools in the meantime. That was mostly problem free :) (except for a couple more tarpaulin non-compatible tests…).