Sandboxing: Seccomp Filters

This is the first installment on a series of various sandboxing techniques that I’ve used in my own code to restrict an applications capabilities. You can find a shorter overview of these techniques here. This article will be discussing seccomp filters.

What is Seccomp? An Introduction:

System calls are your way of asking the kernel to do something for you. You send a message saying “Hey, open a file for me” and it’ll probably do it for you, barring permission errors or some other issue. But, if you can talk to the kernel, you can exploit the kernel. Many vulnerabilities are found in kernel system calls, leading to full root privileges – bypassing sandboxing techniques like SELinux, Apparmor, namespaces, chroots, you name it. So, how do we deal with this without patching the kernel, as a developer? Seccomp filters.

Seccomp is a way for a program to register a set of rules with the kernel. These rules deal with the system calls a program can make, and which parameters it can send with them.

When you create your rules you get a nice overview of your kernel attack surface. Those calls are the ways your attacker can attack the kernel. On top of that ,you’ve just reduced kernel attack surface – if an attacker requires system call A and you’ve only allowed system calls B through D, they can’t attack with system call A.

Another nice benefit is the ability to restrict capabilities. If your program never writes a file, don’t give it access to the write() system call. Now you’ve reduced the kernel attack surface, but you’ve also stopped the program from writing files.

The Code:

Seccomp code is fairly simple to use, though I haven’t found any really good documentation. Here is the seccomp code used in my program, SyslogParse, to restrict its system calls.


scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_KILL);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(rt_sigreturn), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(tgkill), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(access), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 2,
SCMP_A0(SCMP_CMP_GE, 1),
SCMP_A0(SCMP_CMP_LE, 2)
);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fstat), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(open), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(close), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(brk), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mprotect), 1,
SCMP_A2(SCMP_CMP_NE, PROT_EXEC)
);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(mmap), 2,
SCMP_A0(SCMP_CMP_EQ, NULL),
SCMP_A5(SCMP_CMP_EQ, 0)
);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(munmap), 2,
SCMP_A0(SCMP_CMP_NE, NULL),
SCMP_A1(SCMP_CMP_GE, 0)
);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(madvise), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(execve), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(clone), 0);

seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(getrlimit), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(rt_sigaction), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(rt_sigprocmask), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(set_robust_list), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(set_tid_address), 0);

if(seccomp_load(ctx) != 0) //activate filter
err(0, “seccomp_load failed”);

I’ll go through this bit by bit.


scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_KILL);

This should be fairly simple to understand if you’ve written basically any code. This instantiates the seccomp filter, “ctx”, and then initializes it to kill on rule violations. Simple.


seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex), 0);

This line is a rule for the “futex” system call. The first parameter, “ctx”, is our instantiated filter. “SCMP_ACT_ALLOW”, the second parameter, is saying to allow when a condition is met. The third is a macro for the futex() system call, as that’s the call we want to allow through the filter. The last parameter, “0”, is how many rules we want to add to this that deal with parameters.

Simple. So this rule will allow any futex system call regardless of parameters.

I chose futex in this example to demonstrate that seccomp can not protect you from every attack. Despite the heavy amount of sandboxing I’ve done in this program, this filter will do nothing to stop attacks that use the futex system call. Recently, one vulnerability was found that could do just that – a call to futex would lead to control over the kernel. Seccomp just isn’t all powerful, but it’s a big improvement.

Note: I found all of these syscalls by repeatedly running strace on SyslogParse with different parameters. Strace will list all of the system calls as well as their arguments and makes creating rules very easy.


if(seccomp_load(ctx) != 0) //activate filter
err(0, "seccomp_load failed");

seccomp_load(ctx) will load up the filter and from this point on it is enforced. In this case I’ve wrapped it to ensure that it either loads properly or the program won’t run.

And that’s it. That’s all the code it takes. If the program makes a call to any other system call it crashes with “Bad System Call”.

Seccomp is quite easy to use and is the first thing I’d make use of if you are considering sandboxing. All sandboxing relies on a strong keernel, but as a developer you can only change your own program, and seccomp is a good way to reduce kernel attack surface and make all other sandboxes more effective.

Linux has something like 200 system calls (can’t find a good source, anyone know a more definitive number?), and SyslogParse has dropped that down to about 22. That’s a nice drop in privileges and attack surface.

Next Up: Linux Capabilities

Leave a Reply

Your email address will not be published. Required fields are marked *