Have `nproc` accurately reporter number of CPUs available to container
Simen Bekkhus
When spinning up multiple processes for parallel computing, it's useful to restrict the number of processes to the number of cpu cores available on a system in order to actually take advantage of those cores. However, knowing that number programmatically instead of hard-coding the value is as far as I can tell impossible.
For instance running the following code in a Node.js container will return
32
: node -p "require('os').cpus().length"
. This is the same as Travis CI. But if we run nproc
in Travis CI, we get the correct number (2
) back, whilst on CIrcle we still get 32
.So this is a feature request to make sure
nproc
returns the number of cores available to the container, and not the machine the container is running on.---
An even better solution for my problem would be if the number of cores available to a build was injected as an environment variable, that way I can avoid doing a system call from my code. I can open up a separate feature request for that if you want?
CCI-I-578
B
Ben Limmer
This one-liner from https://discuss.circleci.com/t/environment-variable-set-to-the-number-of-available-cpus/32670/4 seems to work:
echo $(($(cat /sys/fs/cgroup/cpu/cpu.shares) / 1024))
Having this "just work", or the environment variable would be much preferred IMO.
S
Sebastian Lerner
Hi folks, I'm a product manager at CircleCI. While we do not have an out-of-the-box solution for this yet, I wanted to share a workaround for our Docker resource classes that we've piloted internally that seems to work well for this use case.
For context, the Docker resource classes use cgroups under the hood to manage the resources granted to the container (as Kyle points out in his post above). It is possible to use a cgroup-aware mechanism to calculate the number of CPUs. Below is an example for how to do it in Node:
const fs = require('fs/promises');
const os = require('os');
async function cgroup_cpu_count() {
const quota_s = await fs.readFile('/sys/fs/cgroup/cpu/cpu.cfs_quota_us')
const period_s = await fs.readFile('/sys/fs/cgroup/cpu/cpu.cfs_period_us')
const quota = parseInt(quota_s)
const period = parseInt (period_s)
return quota/period
}
async function cpu_count() {
try {
const cgroups = await fs.stat('/proc/self/cgroup')
return cgroup_cpu_count()
} catch {
return os.cpus().length
}
}
cpu_count().then(console.log)
P
Peter Darton
Sebastian Lerner: Thanks for that; I've given it a try but I've found that it seems to give an answer that's twice the CPU count that I'd expect.
e.g. a "large" docker gives an answer 8 whereas, officially, a large docker only has 4 CPUs; a "medium" docker answers 4 instead of 2 etc.
S
Sebastian Lerner
Peter Darton: Hey Peter, this is, at the moment, expected behavior with our platform. Due to the way we pack workloads on to the underlying compute that runs Docker jobs, very occasionally surplus CPUs exist. Instead of letting them go unused, we will grant them to workloads on a random basis.
This is not something that is guaranteed to continue, nor is there a deterministic way of knowing whether you will receive those surplus CPUs ahead of time.
P
Peter Darton
Sebastian Lerner: Thanks for the explanation. I will code accordingly...
P
Peter Darton
For the benefit of anyone else finding this, this is what I ended up doing to "make it work".
(Making it "work everywhere" without copy/pasting code everywhere is another problem ... maybe a PR on an orb somewhere)
In my top-level packages.json's
scripts
section, I have: "scripts": {
"build": "lerna run build --",
"test": "lerna run test --",
"ci:build": ".circleci/ci_lerna.sh run ci:build --",
"ci:test": ".circleci/ci_lerna.sh run ci:test --"
},
where "build" and "test" are targets that devs can invoke locally, whereas CircleCI's config.yml calls the (circleci/node orb's "run" job to run npm run ... to run the) "ci:build" and "ci:test" versions.
I've then got the following script saved as
.circleci/ci_lerna.sh
:#!/bin/bash
set -eu
set -o pipefail
function weAreOnDocker() {
grep -q docker </proc/1/cgroup >/dev/null 2>&1 && [ -f /sys/fs/cgroup/cpu/cpu.cfs_quota_us ]
}
function calcConcurrencyForDocker() {
local quota
local period
quota=$(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
period=$(cat /sys/fs/cgroup/cpu/cpu.cfs_period_us)
echo $((quota / period))
}
if [ -n "${LERNA_CONCURRENCY:-}" ]; then
echo "$0: env var LERNA_CONCURRENCY is set to ${LERNA_CONCURRENCY}"
elif weAreOnDocker; then
LERNA_CONCURRENCY="$(calcConcurrencyForDocker)"
echo "$0: we are on docker with ${LERNA_CONCURRENCY} CPUs"
else
echo "$0: we are using defaults, $(lerna --help | grep ' --concurrency ' | sed -e 's,^.*\([[]default:.*\),\1,g')"
fi
if [ -n "${LERNA_CONCURRENCY:-}" ]; then
echo "$0: lerna --concurrency ${LERNA_CONCURRENCY}" "$@"
exec lerna --concurrency "${LERNA_CONCURRENCY}" "$@"
else
echo "$0: lerna" "$@"
exec lerna "$@"
fi
i.e. Lerna gets told how much concurrency to use when run on a CircleCI docker executor ... with the option of setting an env var to override the autodetection if desired.
P
Peter Darton
So, I've recently started using CircleCI, I'm trying to do Node builds using lerna, jest etc, and this whole tooling ecosystem has sizing "magic" that makes it make "sensible" decisions on how much stuff to run in parallel
...but this all falls over in a heap on a CircleCI docker container that tells the tooling that it's got more CPUs & RAM than the resource_class allows it to use.
This results in build failures.
I've been googling and, "surprise surprise", this issue is not unique to CircleCI - it's a "doing builds on docker" thing - ... but as it's not unique there are some solutions.
e.g. https://github.com/lxc/lxcfs#using-with-docker looks really promising.
Simen Bekkhus
The environment variable might be covered by https://circleci.com/ideas/?idea=CCI-I-264