minijail

sandboxing and containment tool used in Chrome OS and Android

View on GitHub

minijail0(1): sandbox a process

Synopsis

minijail0 [OPTION]… <PROGRAM> [args]…

Description

Runs PROGRAM inside a sandbox.

Normally minijail will fork+exec the specified program so that it can set up the right security settings in the new child process. The initial minijail process will stay resident and wait for the program to exit so the script that ran minijail will correctly block (e.g. standalone scripts). Specifying -i makes that initial process exit immediately and free up the resources.

This option is recommended for daemons and init services when you want to background the long running program.

Most programs don’t expect to run as an init which is why minijail will do it for you by default. Basically, the program needs to reap any processes it forks to avoid leaving zombies behind. Signal handling needs care since the kernel will mask all signals that don’t have handlers registered (all default handlers are ignored and cannot be changed).

This means a minijail process (acting as init) will remain resident by default. While using -I is recommended when possible, strict review is required to make sure the program continues to work as expected.

-i and -I may be safely used together. The -i option controls the first minijail process outside of the pid namespace while the -I option controls the minijail process inside of the pid namespace.

The flags field is optional and may be a mix of MS_XXX or hex constants separated by | characters. See mount(2) for details. MS_NODEV|MS_NOSUID|MS_NOEXEC is the default value (a writable mount with nodev/nosuid/noexec bits set), and it is strongly recommended that all mounts have these three bits set whenever possible. If you need to disable all three, then specify something like MS_SILENT.

The data field is optional and is a comma delimited string (see mount(2) for details). It is passed directly to the kernel, so all fields here are filesystem specific. For tmpfs, if no data is specified, we will default to mode=0755,size=10M. If you want other settings, you will need to specify them explicitly yourself.

If the mount is not a pseudo filesystem (e.g. proc or sysfs), src path must be an absolute path (e.g. /dev/sda1 and not sda1).

If the destination does not exist, it will be created as a directory (including missing parent directories).

You may specify a mount propagation mode in which case, that will be used instead of the default MS_PRIVATE. See the mount(2) man page and the kernel docs Documentation/filesystems/sharedsubtree.txt for more technical details, but a brief guide:

· slave Changes in the parent mount namespace will propagate in, but changes in this mount namespace will not propagate back out. This is usually what people want to use.

· private No changes in either mount namespace will propagate. This is the default behavior if you don’t specify -K.

· shared Changes in the parent and this mount namespace will freely propagate back and forth. This is not recommended.

· unbindable Mark all mounts as unbindable.

If the program exits, all of its children will be killed immediately by the kernel. If you need to daemonize or background things, use the -i option.

See pid_namespaces(7) for more info.

rlim_type may be specified using symbolic constants like RLIMIT_AS.

rlim_cur and rlim_max are specified either with a number (decimal or hex starting with 0x), or with the string unlimited (which will translate to RLIM_INFINITY).

auto will use stderr if connected to a tty (e.g. run directly by a user), otherwise it will use syslog.

Sandboxing Profiles

The following sandboxing profiles are supported:

Implementation

This program is broken up into two parts: minijail0 (the frontend) and a helper library called libminijailpreload. Some jailings can only be achieved from the process to which they will actually apply:

· capability use (without using ambient capabilities): non-ambient capabilities are not inherited across execve(2) unless the file being executed has POSIX file capabilities. Ambient capabilities (the --ambient flag) fix capability inheritance across execve(2) to avoid the need for file capabilities.

· seccomp: a meaningful seccomp filter policy should disallow execve(2), to prevent a compromised process from executing a different binary. However, this would prevent the seccomp policy from being applied before execve(2).

To this end, libminijailpreload is forcibly loaded into all dynamically-linked target programs by default; we pass the specific restrictions in an environment variable which the preloaded library looks for. The forcibly-loaded library then applies the restrictions to the newly-loaded program.

This behavior can be disabled by the use of the -T static flag. There are other cases in which the use of this flag might be useful:

· When program is linked against a different version of libc.so than libminijailpreload.so.

· When execve(2) has side-effects that interact badly with the jailing process. If the system uses SELinux, execve(2) can cause an automatic domain transition, which would then require that the target domain allows the operations to jail program.

Author

The Chromium OS Authors <chromiumos-dev@chromium.org>

Copyright © 2011 The Chromium OS Authors License BSD-like.

See Also

libminijail.h, minijail0(5), seccomp(2)