The seccomp filters implemented in the 3.5 and Ubuntu kernel is really cool and I’m bored so I want to write about it (hooray for having a blog.) I’m going to explain what seccomp filters actually do at as low a level as I feel comfortable. I’ll leave some stuff out and gloss over a few other things because either 1) I personally don’t know it well enough 2) it would take forever to explain. I want to make this as accessible as possible for those readers who aren’t necessarily familiar with all of this terminology.
Seccomp Filters are a compile-time whitelist of what System Calls can be made by the compiled program. If a new system call (one that hasn’t been whitelisted) is called the program closes.
What Is A System Call?
A system call is basically how a program speaks to the kernel. Programs are basically (or literally, I guess) instructions, they want to get something done. Oftentimes they have to (for performance or ease of use reasons) outsource that action to the kernel. They do this through a system call, something like write(). The () is your parameters, so you might have (and this is not a real world example at all nor is it even correct, in reality a write() creates a file buffer among other things, passing the information to the syscall) write(“hello world”) and your program passes that to the kernel, which sees “the syscall is ‘write’ and the argument is ‘hello world'” and then it does what it wants to do and you end up writing “hello world” somewhere.
What’s The Issue?
There are a few issues with this. The first is that the previously mentioned kernel is the highest level that software can reach in terms of the OSI model of security. This means exploits in the kernel are also going to be at the highest level and they can practically do anything at that point including directly interact with your hardware. Following this it’s only possible to exploit code that you can interact with either directly or indirectly. A system call is a way for programs at any level to interact with the kernel therefor it’s a way for any program to escalate to kernel level via an exploit.
The other issue is that there are a lot of system calls and new ones can be created over time as new kernel features appear. This means new kernel attack surface and it also means new capabilities for programs. What if I don’t want my program to be able to write? Well it has access to write() so I would have to find some other way to stop that like LSM – but there’s a lot of other syscalls not so easily stopped. By whitelisting the syscalls we implement absolute least privilege, meaning that programs can only use the syscalls they really need.
The short answer is that abusing syscalls allows for new and unforseen behaviors as well as the potential for privilege escalation. Filtering syscalls directly limits kernel attack surface and what programs can do.
Where Filters Really Help
To understand where these filters really help I think I should explain the concept of least privilege. Least privilege is the implementation of a program in which the program only has access to what it needs and nothing more. This means if there’s files A-Z on a system and the program only ever uses A, B, C, then it won’t have access to D-Z. It may also not need Inter Process Communication abilities with various programs, the IPC may be restricted too. Maybe it shouldn’t be able to execute specific files, again, limit it. The idea is to make it so that it can do only what it needs to function and nothing else.
This is one of the more important concepts in computer security. What this means is that if the aformentioned program gets exploited and my critical file is at E the hacker can’t get to E, they’re stuck only using some useless config files at A-C. And maybe there’s a way to exploit program F but, again, they can’t access F so the visible attack surface is reduced.
The simplest way out of a good sandbox (one not full of holes or, in our case, letters) is usually privilege escalation and a kernel exploit is great for that. So if the above program is exploited and then I send it write(exploit code) I’ve made breaking out a lot simpler.
This is where seccomp filters are best used. Reinforcing least privilege. They directly reduce visible kernel attack surface thereby reinforcing any strong sandbox.
Right now Chrome, OpenSSL, and a few other programs have implemented these filters. It’s not too difficult to implement them and I’d really like to see it in more applications, especially running services. In an ideal world everything would have seccomp filters as least privilege should be applied universally but I’d settle to have a few services like cupsd running with one. The biggest issue is that third party libraries can have compatibility issues.
What I Left Out
I didn’t go into libraries and APIs, I just kinda combined the ideas into the system calls themselves. For those interested in programming you already know what an API is and you probably know what a library is.
If I got anything wrong let me know. I’m a crap programmer and I extrapolate a lot. If you notice a gaping hole in what I’m saying point it out (be gentle) and I’ll be happy to learn something and will correct it asap.