Background
As I mentioned in my [post about containers][container-post], I want to write a container tool in rust to learn more about rust and containers. I started doing that with my friend Kevin, and our (humble) progress so far is on Github in a project we called bucket (like a container, that’s sometimes rusty… or something).
We’re following a long a talk given by Eric Chiang at CoreOS Fest called Containers From Scratch (slides). In the talk, Eric walks through the linux command line utilities that you can string together yourself to make a “container”. We got as far as creating a “root filesystem”, chroot
ing into it, and then making a separate PID namespace by calling unshare
. It was during this unshare
step that I ran into an error message that led me to learn more about mount
that I want to document here.
What does mount
do?
mount
is a command line utility which wraps the mount
system call. (Another fun fact I learned: the #’s after a name in a man page indicates what section that man page is in. So mount(2)
is the system call because it is in the “System calls” section of the manual and mount(8)
is the command line utility because it is in the “System Administation Commands and Daemons” section. More info in this SO Answer).
If you’re like me, you have had to call mount some times when you’ve plugged in a USB drive and it hasn’t worked correctly. Usually, the OS handles the mounting for you. For instance, when I plug in a USB stick to my computer, it pops up a Files window with the contents.
First, it’s worth knowing how to get information about current mounts. The place for that is mtab
. mtab
is a read-only file that lives at /etc/mtab
(and in my case is a symlink to /proc/self/mounts
) which lists the current active mounts. When I plugged in my USB drive, I saw the following line at the bottom of mtab
:
/dev/sdc2 /media/paul/P16G hfsplus ro,nosuid,nodev,relatime,umask=22,uid=0,gid=0,nls=utf8 0
If we break that down, we have a device on the far left called /dev/sdc2
. (Quick side note: to break down sdc2
, first ignore the sd
, then you have c
which indicates it’s the third drive that the OS has seen. sda
is my computer’s hard drive, I’m not sure what sdb
is. Then the 2
means what partition on that drive, so this is the 2nd partition. (more info in this Superuser answer))
After that, we have the mount point. This is directory where the files on the device are accessible. In this case, I can go to /media/paul/P16G
to access the files on the USB stick.
Next is hfsplus
indicating the filesystem type. Then there are several options and then 2 0’s indicating the dump/pass options. 0’s mean we don’t back up this mount (dump) and we don’t run fsck
on it to detect errors (pass). The fstab
wikipedia page has more info on this file’s format.
If I want to unmount this, I can run umount /media/paul/P16G
. I can then remount it with sudo mount -t hfsplus /dev/sdc2 /media/paul/P16G
. (Interestingly, I had to create the P16G directory since when I unmounted it, the directory went away. I’m not sure why that is.)
Another interesting place related to mount
is fstab
. It’s the same format as mtab
, but instead of reflecting the current mounts, it represents what mounts should be created at boot time. If you change that file, you will have new mounts when you reboot.
That gives us enough information to move on to unshare
.
What does unshare
do?
The particular unshare
command we were looking at was:
sudo unshare -p -f --mount-proc=$PWD/rootfs/proc chroot rootfs /bin/bash
This creates a new PID namespace, mounts a proc
filesystem under the rootfs/proc
directory and then executes chroot
and finally bash
.
The tricky part I was running in to was that I was getting a “Invalid argument” error message when I tried to run this command.
First, I’ll look at mounting a proc filesystem. It turns out I can do this anywhere and I don’t need a device like I did for mounting a USB drive. Instead, whatever I pass in as the device becomes a kind of dummy label. For instance, I can say mount -t proc dummy ~/fs1
and I’ll get a proc
filesystem in the fs1
directory. If I do ls ~/fs1
, it will look identical to doing ls /proc
. Additionally, I get the following line in mtab
:
dummy /home/paul/fs1 proc rw,relatime 0 0
Now, I want to look at what the unshare
command does when it has the --mount-proc
command because it starts to demonstrate why I ran in to my error. Here are some lines from the unshare source code:
if (procmnt &&
(mount("none", procmnt, NULL, MS_PRIVATE|MS_REC, NULL) != 0 ||
mount("proc", procmnt, "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0))
err(EXIT_FAILURE, _("mount %s failed"), procmnt);
I can see there are 2 calls to mount(2)
(the system call). The first isn’t creating a mount, but it’s taking the mount point and making it private recursively. This is the same as saying mount --make-private $procmnt
. Interestingly, if I unmount fs1
, and then try to make it private with mount --make-private fs1
, it fails, saying that fs1
is not a mountpoint. This is not the exact error I was getting from unshare
, but I assume it has the same source. If I run mount --bind fs1 fs1
, and then run the make private command it works (and similarly with the unshare
command). The problem was that the unshare
command expects the target for mounting the new proc filesystem to already be a mount point.
What does --make-private
do?
It’s nice to know why the command was failing and how to fix it, but what might be more important to understand is why unshare
would be trying to make the proc
filesystem private to begin with.
The mount(8)
manpage explains the different options for the sharing status of a file system so I won’t repeat them here. However, it’s useful to know how to tell what the sharing bits are for a particular filesystem. To get that, I can look at /proc/self/mountinfo
and I can see for my procfs a line like this with the word shared in it:
170 24 0:4 / /home/paul/fs1 rw,relatime shared:146 - proc dummy rw
If I call mount --make-private fs1
, the shared:146
portion of the line goes away.
How does a shared mount differ from a private one? A shared mount means that subsequent calls to umount
and mount
will propagate to other --bind
mounted file systems. In a private mount, those calls will not propagate. To illustrate this, here is an example:
$ mkdir fs1 fs2 subd1
$ sudo mount --bind fs1 fs1
$ sudo mount --bind fs1 fs2
$ sudo mount --make-private fs1
$ mkdir fs1/sub_mount
$ ls fs2
sub_mount
# ^ Files are propagated between the mounts
$ sudo mount --bind subd1 fs1/sub_mount
$ touch subd1/hello
$ ls fs1/sub_mount
hello
$ ls fs2/sub_mount
$ # nothing
It turns out that unshare
mounts all its new mounts as private by default (see unshare(1)). Since the purpose of unshare is to isolate it from other things, this makes sense conceptually, but practically, I am having a hard time thinking of when a proc file system being shared would be a thing that mattered, but then again I do not know if there is more mounting that happens during the creation of a proc
fs.
Conclusion
This post was a bit of a random walk around mount
and unshare
, but I learned something during it, so I wanted to record it in case I find myself wondering about it later.
[container-post]:{% post_url 2017-04-19-containers %}