From prime step to OCI layer¶
Rockcraft is a tool that creates OCI images using the same concepts and mechanisms that create snaps and charms: the lifecycle language from Craft Parts. There is a significant difference between the way the Craft lifecycle works and the OCI specification, and one of Rockcraft’s jobs is to bridge the gap between these two worlds. This page describes how this is accomplished.
Note
It is not necessary to know these details to use the tool effectively, but they might illuminate some concepts and help understand why the contents of a given rock are the way they are.
Consider the following snippet of a rockcraft.yaml
that creates a rock
containing a bare-bones Python 3.10 interpreter:
# (...)
base: [email protected]
parts:
python-part:
plugin: nil
stage-packages:
- python3-minimal
This rock has Ubuntu 22.04 as its base and includes python3-minimal
.
Conceptually, this means that at build time Craft Parts will pull in the
python3-minimal
Ubuntu package and whatever dependencies it needs to work.
Indeed, if we run rockcraft prime --shell-after
, we can see the final
contents ready to be packed in the prime directory - this is the directory
available at build-time through the ${CRAFT_PRIME}
environment variable:
$ rockcraft prime --shell-after
$ cd ../prime
$ ls
bin etc lib lib64 sbin usr var
$ ls usr/bin/
debconf debconf-copydb debconf-show dpkg-divert dpkg-realpath dpkg-trigger py3clean python3
debconf-apt-progress debconf-escape dpkg dpkg-maintscript-helper dpkg-split perl py3compile python3.10
debconf-communicate debconf-set-selections dpkg-deb dpkg-query dpkg-statoverride perl5.34.0 py3versions update-alternatives
As we can see, the prime directory has the contents of the python3-minimal
package but also many of its dependencies, direct and otherwise. Once the
lifecycle is finished, Rockcraft packs the contents of the prime directory as a
new OCI layer, directly as if the prime directory were the filesystem root
/
.
Note
The following sections only apply to rocks with Ubuntu bases - bare
rocks
don’t need prime pruning nor usrmerge
handling.
Pruning the prime
directory¶
One consequence of the inclusion of a stage-package
’s
dependencies is that the prime directory ends up having many files that the base
Ubuntu layer already has. This can be seen, for example, by using a tool like
Dive:
What dive
tells us is that about 60 MB
worth of files are duplicated
between the base Ubuntu 22.04 layer and the “primed” layer: for example, the
file /usr/lib/x86_64-linux-gnu/libcrypto.so.3
exists both in the base layer
(as part of the base Ubuntu system) and in the primed layer (pulled in by
belonging to a package that is an indirect dependency of python3-minimal
).
Starting from version 1.1.0
, Rockcraft “prunes” those files in the prime
directory that also exist, with the same contents, ownership and permissions, in
the base layer. The end result is semantically the same, because the layers are
“stacked” together when creating containers from the rock. This “pruning” can be
seen in the logs generated by Rockcraft:
(...)
Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/Sc/Gran.pl as it exists on the base
Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/Bc/EN.pl as it exists on the base
Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/PatSyn/Y.pl as it exists on the base
Pruning: /root/prime/usr/lib/x86_64-linux-gnu/perl-base/unicore/lib/Dt/Init.pl as it exists on the base
Pruning: /root/prime/usr/share/perl5/Debconf/Element/Noninteractive/Multiselect.pm as it exists on the base
(...)
usrmerge
and the lifecycle layer¶
After pruning, the contents of the prime directory are packed as a new OCI layer. In concrete terms, this means that the files and directories are added to a tar archive, which means that each file (or directory) gets added to the archive together with the “destination” path that it should have when the archive is extracted.
In most cases, the file’s original path (relative to the root of the archive)
and its destination path once extracted are the same, so the file that exists in
the prime directory as a/b/c/file.txt
should be extracted as
a/b/c/file.txt
.
However, there are cases where this “destination” path should be changed. For example, consider again the contents of the previous rock’s prime directory:
$ ls -l
total 5
drwxr-xr-x 2 root root 3 Dec 7 20:30 bin
drwxr-xr-x 9 root root 10 Dec 7 20:30 etc
drwxr-xr-x 4 root root 4 Dec 7 20:30 lib
drwxr-xr-x 2 root root 2 Dec 7 20:30 lib64
drwxr-xr-x 2 root root 2 Dec 7 20:30 sbin
drwxr-xr-x 7 root root 7 Dec 7 20:30 usr
drwxr-xr-x 4 root root 4 Dec 7 20:30 var
$ ls bin/
pebble
So bin/
is a regular directory and contains the pebble
binary, to
serve as the rock’s entrypoint. However, consider the base directory structure
of an Ubuntu system:
$ ls -l /
total 84
lrwxrwxrwx 1 root root 7 ago 27 2022 bin -> usr/bin
drwxr-xr-x 5 root root 4096 nov 27 13:59 boot
drwxrwxr-x 2 root root 4096 ago 27 2022 cdrom
drwxr-xr-x 20 root root 5900 dez 7 19:57 dev
drwxr-xr-x 148 root root 12288 dez 7 15:15 etc
drwxr-xr-x 3 root root 4096 ago 27 2022 home
lrwxrwxrwx 1 root root 7 ago 27 2022 lib -> usr/lib
lrwxrwxrwx 1 root root 9 ago 27 2022 lib32 -> usr/lib32
lrwxrwxrwx 1 root root 9 ago 27 2022 lib64 -> usr/lib64
lrwxrwxrwx 1 root root 10 ago 27 2022 libx32 -> usr/libx32
bin
is actually a symbolic link to usr/bin
. This is the usrmerge, and
it’s been present in Ubuntu for many years now. Note that many other entries
are also symlinks, like lib
(to usr/lib
) and lib64
(to usr/lib64
).
These two filesystems interact in a surprising way when stacked as OCI layers.
If bin/pebble
is added to the layer’s archive as bin/pebble
plus an
entry for the bin/
directory (which is a regular directory in the prime
contents), once the two layers are stacked together in a container the bin/
directory from the “prime layer” will overwrite the bin -> usr/bin
symlink from the “base layer”, which will make everything that assumed that
the base binaries from usr/bin/
would always be accessible through bin/
break.
This issue is made much worse if the instead of breaking bin/
we break the
lib*/
symlinks. Consider:
$ ldd /bin/bash
linux-vdso.so.1 (0x00007ffdf2af4000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f6053cbd000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6053a00000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6053e6b000)
The bash
binary links to multiple dynamic libraries, but has a hardcoded
path to the /lib64/ld-linux-x86-64.so.2
dynamic loader. This loader is the
program that does the actual finding of dynamic dependencies at runtime, and in
an Ubuntu system its actual location is at /usr/lib64/ld-linux-x86-64.so.2
.
So if the /lib64 -> usr/lib64
symlink is broken because the prime directory
contains lib64
as a regular directory, then the vast majority of the
binaries in the final rock’s base system will simply fail to run because their
loader is no longer available at /lib64/ld-linux-x86-64.so.2
.
To fix this, Rockcraft will take the base system into account when creating the
archive for the prime layer. For instance, when considering bin/pebble
,
Rockcraft will:
Skip adding
bin/
as a regular directory, to avoid breaking the base system, andAdd
bin/pebble
asusr/bin/pebble
in the layer archive.
This can be seen in the logs:
(...)
Creating new layer
(...)
Skipping /root/prime/bin because it exists as a symlink on the lower layer
(...)
Adding to layer: /root/prime/bin/pebble as 'usr/bin/pebble'
(...)
Finally, as mentioned in the beginning none of this applies for rocks with
bare
bases, as there is no base system to contain duplicates that need to be
pruned or symbolic links that need to be taken into account.