Sandboxing: Apparmor

Sandboxing: Apparmor

This is the fifth installment on a series of various sandboxing techniques that I’ve used in my own code to restrict an applications capabilities. You can find a shorter overview of these techniques here. This article will be discussing sandboxing with Apparmor.

Mandatory Access Control:

Mandatory Access Control (MAC), like Discretionary Access Control (DAC), is meant to define permissions for a program. Users and Groups are DAC. But what if you want to confine a program with full root? As discussed, root with full capabilities is quite dangerous – and in the case of SyslogParser quite a few of those capabilities are necessary.

Apparmor is a form of Mandatory Access Control implemented through the Linux Security Module hooks in the Linux kernel. MAC is “administrator” defined policy, and can confine even root applications.

Apparmor is a *bit* out of scope for this series, as it doesn’t actually involve any code, but it’s still relevant.

The Code:

While Apparmor itself doesn’t have code in SyslogParse, here’s the profile for the program.

# Last Modified: Wed Aug 13 18:57:15 2014


/usr/bin/syslogparse mr,
/var/log/* mr,

/etc/ mr,

/sys/devices/system/cpu/online r,

/lib/@{multiarch}/* mr,
/lib/@{multiarch}/libc-*.so mr,
/lib/@{multiarch}/libm-*.so mr,
/lib/@{multiarch}/libpthread*.so mr,

/usr/lib/@{multiarch}/* mr,
/usr/lib/@{multiarch}/libcap-ng*.so* mr,

/usr/lib/@{multiarch}/libstdc*.so* mr,


Apparmor is incredibly straight forward. There is a path, and then there is one or more letters. These letters stand for certain things.

r = read
m = map
w = write

All of this is pretty straight forward. SyslogParse gets the number of CPU cores from /sys/devices/system/cpu/online , so it needs “r” access.

It needs to read some libraries in order to function.

And that’s it. Sort of… apparmor on my system is, unfortunately, quite broken. The tools for enforcing/ complaining crash on me (I have a lot of weird profiles that I experiment with), which is actually why I started building SyslogParse. So this profile is a bit incomplete. It still needs some capabilities defined for chroot, setuid/setgid, and possibly more file access.


When enabled an Apparmor profile will begin enforcing policy as soon as the process begins. That means that, even if running as root, an attacker is always confined to those files defined in the profile. Apparmor is quite powerful, and combined with the other sandboxing techniques used it’s a very nice reinforcement – writing to the chroot, for example, is denied throughout the process by both DAC and MAC now.

Apparmor is, unfortunately, only available on certain distributions.

Next Up: Final Conclusion

Sandboxing: Chroot Sandbox

Sandboxing: Limited Users

This is the fourth installment on a series of various sandboxing techniques that I’ve used in my own code to restrict an applications capabilities. You can find a shorter overview of these techniques here. This article will be discussing sandboxing a program using Limited Users.

Users and Groups:

Linux Discretionary Access Control works by separating and grouping applications into ‘users’ and ‘groups’. A process in user A is, in terms of DAC, isolated from a process in user B.

There’s also user 0, the root user, which is a privileged user account.

Only a program with root, or with CAP_SETUID / CAP_SETGID can manipulate its own UID/GID. In the case of SyslogParse, we have root, and we definitely want to lose it when we can.

So, after getting the file handles we need, here’s the code for dropping to a limited user account (if you’ve read the previous articles this happens right after the chroot).

if (setgid(65534) != 0)
err(0, "setgid failed.");
if (setuid(65534) != 0)
err(0, "setuid failed.");

Very simple. So here’s a simple explanation.

if (setgid(65534) != 0)
err(0, "setuid failed.");

setgid(65534) sets the GID to 65534. This is the “nobody” group on my system. Nobody is an unprivileged user often used by programs wanting to drop privileges. If 65534 doesn’t exist, all the better – dropping to a GID that doesn’t exist is great.

if (setuid(65534) != 0)
err(0, "setgid failed.");

setuid(65534) is changing the user to 65534, which, as above, is the nobody user. Same as before, if the user doesn’t exist, that’s dandy.


Dropping privileges is a hugely beneficial thing to do. By separating the code into a “privileged stuff done all at once, then never again” you can drop privileges before doing anything dangerous, and there goes an attackers ability to escalate.

Dropping root privileges is incredibly important. The attack surface and amount of post-exploitation work an attacker can do shrinks drastically.

In the case of SyslogParse, as any attacks would be for local escalation (it does no networking), an attacker would probably lose privileges by exploiting it if going from any normal compromised process. At this stage they are in a chroot with no read or write access, running in an unprivileged user/ group with no capabilities, they have access to 22 system calls and some very nice to have calls, such as read() are denied, and their only chance for getting a few capabilities is by exploiting a few lines of code that involve opening a file.

I was going to have the next section be on rlimit, but it’s really not important and also not viable unless you’ve built the application from the bottom up to never write to a file, which will typically involve a broker’d architecture.

Next Up: Apparmor

Sandboxing: Linux Capabilities

This is the second installment on a series of various sandboxing techniques that I’ve used in my own code to restrict an applications capabilities. You can find a shorter overview of these techniques here. This article will be discussing Linux Capabilities.

Intro To Linux Capabilities:

On Linux you’re likely familiar with the root user. Root is the ‘admin’ account of the system, it has privileges that other processes don’t. But, what you may not have known, is that those privileges given to root are actually enumerable and defined. For example, root has the capability CAP_SYS_CHROOT, which is what allows it to call chroot().

Let’s say a program needs root, but only because it calls chroot at some point. Instead of giving all of the privileges except CAP_SYS_CHROOT.

So, if your program has to run as root (as mine does), you can actually drop some of your root privileges while maintaining others. How effective is this? Jump down to the conclusion below to see – hint: it can go between great and awful.

The Code: (includes should have a # in front, but WP is mean)

include include


Let’s break this down ~line by ~line.

include include

This includes the headers for linux capabilities (so that you can refer to them as their type) and cap-ng, the library we’ll be using to actually drop privileges.


This line will clear all privileges. If you were to apply this, you’d have essentially dropped all root privileges.


This line is where we state which capabilities we’d like. After the capng_clear() we have none, but the program does need a few.

The first two parameters are effectively saying to add these rules.

The third, fourth, and fifth parameters are the defined capabilities to allow.

The last parameter is a -1, which lets capng know that your list of capabilities is terminated.


And now, with this call, the rules are applied. Only these capabilities are given to the program… “only”.

This was really easy to do. Three lines of code and a large number of capabilities are gone. But, what’s left?

CAP_SETUID/ CAP_SETGID : Quite dangerous, as it means you can interact with processes of other UID/GID’s by simply making your UID/GID the same as theirs.

CAP_SYS_CHROOT : Not as scary, you can chroot, and if you retain the ability to chroot you can then break out of that.

CAP_DAC_READ_SEARCH : You can read all files that root can read. Password files, sensitive files, whatever. All yours to read.

So in a lot of ways you’re just dropping from root…. to root. You’re still quite powerful and dangerous if you drop these capabilities, it’s not a very large barrier. An attacker who gains the above privileges still gains quite a lot. But, in the case of SyslogParse, all capabilities are dropped eventually.

The nice thing about capabilities is you can do it as soon as the program starts. After you’ve gotten your file descriptors and all that, you can go ahead and start real sandboxing, then do the actual dangerous stuff. In this case, I had to give a lot of scary permissions. But for someone else, maybe all they need is to bind port 80, and in that case you just give CAP_NET_BIND_SERVICE, drop everything else, and that’s pretty nice.

It honestly feels like a “Well, it’s better than giving it full root” in this case, which is bittersweet. It still pretty much feels like full root. But uh, hey, it’s better than giving it full root.

Sandboxing: Seccomp Filters

This is the first installment on a series of various sandboxing techniques that I’ve used in my own code to restrict an applications capabilities. You can find a shorter overview of these techniques here. This article will be discussing seccomp filters.

What is Seccomp? An Introduction:

System calls are your way of asking the kernel to do something for you. You send a message saying “Hey, open a file for me” and it’ll probably do it for you, barring permission errors or some other issue. But, if you can talk to the kernel, you can exploit the kernel. Many vulnerabilities are found in kernel system calls, leading to full root privileges – bypassing sandboxing techniques like SELinux, Apparmor, namespaces, chroots, you name it. So, how do we deal with this without patching the kernel, as a developer? Seccomp filters.

Seccomp is a way for a program to register a set of rules with the kernel. These rules deal with the system calls a program can make, and which parameters it can send with them.

When you create your rules you get a nice overview of your kernel attack surface. Those calls are the ways your attacker can attack the kernel. On top of that ,you’ve just reduced kernel attack surface – if an attacker requires system call A and you’ve only allowed system calls B through D, they can’t attack with system call A.

Another nice benefit is the ability to restrict capabilities. If your program never writes a file, don’t give it access to the write() system call. Now you’ve reduced the kernel attack surface, but you’ve also stopped the program from writing files.

The Code:

Seccomp code is fairly simple to use, though I haven’t found any really good documentation. Here is the seccomp code used in my program, SyslogParse, to restrict its system calls.

scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_KILL);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(rt_sigreturn), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(tgkill), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(access), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 2,

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fstat), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(open), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(close), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(brk), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mprotect), 1,

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mmap), 2,

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(munmap), 2,

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(madvise), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(execve), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(clone), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(getrlimit), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(rt_sigaction), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(rt_sigprocmask), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(set_robust_list), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(set_tid_address), 0);

if(seccomp_load(ctx) != 0) //activate filter
err(0, “seccomp_load failed”);

I’ll go through this bit by bit.

scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_KILL);

This should be fairly simple to understand if you’ve written basically any code. This instantiates the seccomp filter, “ctx”, and then initializes it to kill on rule violations. Simple.

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex), 0);

This line is a rule for the “futex” system call. The first parameter, “ctx”, is our instantiated filter. “SCMP_ACT_ALLOW”, the second parameter, is saying to allow when a condition is met. The third is a macro for the futex() system call, as that’s the call we want to allow through the filter. The last parameter, “0”, is how many rules we want to add to this that deal with parameters.

Simple. So this rule will allow any futex system call regardless of parameters.

I chose futex in this example to demonstrate that seccomp can not protect you from every attack. Despite the heavy amount of sandboxing I’ve done in this program, this filter will do nothing to stop attacks that use the futex system call. Recently, one vulnerability was found that could do just that – a call to futex would lead to control over the kernel. Seccomp just isn’t all powerful, but it’s a big improvement.

Note: I found all of these syscalls by repeatedly running strace on SyslogParse with different parameters. Strace will list all of the system calls as well as their arguments and makes creating rules very easy.

if(seccomp_load(ctx) != 0) //activate filter
err(0, "seccomp_load failed");

seccomp_load(ctx) will load up the filter and from this point on it is enforced. In this case I’ve wrapped it to ensure that it either loads properly or the program won’t run.

And that’s it. That’s all the code it takes. If the program makes a call to any other system call it crashes with “Bad System Call”.

Seccomp is quite easy to use and is the first thing I’d make use of if you are considering sandboxing. All sandboxing relies on a strong keernel, but as a developer you can only change your own program, and seccomp is a good way to reduce kernel attack surface and make all other sandboxes more effective.

Linux has something like 200 system calls (can’t find a good source, anyone know a more definitive number?), and SyslogParse has dropped that down to about 22. That’s a nice drop in privileges and attack surface.

Next Up: Linux Capabilities