Wednesday, August 3, 2011

RESTful API Design

I finally managed to take some time and write out my thoughts on how to design a powerful yet beautiful RESTful API. Read my essay on readthedocs here:

http://restful-api-design.readthedocs.org/en/latest/index.html

Thursday, December 2, 2010

Network security monitoring with KVM

This blog post talks about how built-in Linux functionality can be used to implement a network security monitoring solution for a KVM based hypervisor. This gives an "inside the box" view for people who like to know more about the internals of Linux and KVM.

Disclaimer: I am part of the product marketing team for virtualization at Red Hat, but the opinions expressed herein are my own. The functionality as described here may or may not become part of high level management systems like Red Hat Enterprise Virtualization at some point.

With Linux/KVM, the most frequently used network setup is that where a number of virtual machines are attached to a software bridge. At least one physical network adapter is added to the bridge as well, which takes care of the uplink to rest of the network. The bridge acts as a (virtual) switch: traffic is only sent to the switch port that connects the destination Mac address.


Recently, i came across the following need: suppose you want to use a network security system (such as an intrusion detection system - IDS) to detect, or even thwart, security attacks that are in progress in the virtual environment. In the physical world, there's a well established way to do this: you use security appliance "black boxes", and connect those to mirror ports on your switch infrastructure. A mirror port is a specially configured switch port that gets a copy of all frames that pass through the switch. Inside the security appliance, the network card is put in promiscuous mode, which allows the security software to read and analyse all packets that arrive on it. Port mirroring goes under different names. Cisco calls this a Switch Port ANalyser (SPAN) port - other vendors have different names.

What we need to for a network security system in the virtual world is exactly the same: we need a port mirror on all our virtual switches. This port mirror should copy all frames on the virtual switch to a (in this case virtual) security appliance that runs on the same host. So the question is: is it possible to create mirror ports for Linux bridges?

We tried various approaches, including approaches using "iptables" and "ebtables". There's also a very ugly approach that sets the bridge's "ageing time" to 0, making it effectively a hub. The approach that i liked best in the end was to use the Linux traffic shaping capabilities to create a true port mirror. Traffic shaping on Linux is configured using the "tc" command, which is part of the iproute-2 package. A lot of good documentation on traffic shaping can be found in the Linux Advanced Routing and Traffic Control Howto by Bert Hubert. It does a good job of explaining traffic shaping, but unfortunate it doesn't talk about port mirroring. That's why i decided to write this log entry and explain how it works.

First, a very quick overview of how traffic shaping works on Linux. This is a very high level description and it leaves out some important details. For more thorough introduction see Bert Hubert's howto above. In essence, traffic shaping is the process of deciding if, when, and which packets are sent out on a network interface. The key object in traffic shaping is that of a queueing discipline, or qdisc. A qdisc has two main operations: enqueue packets, and dequeue them. A packet can always be enqueued, but it is the decision of the qdisc if, when and how to dequeue one. For example, a qdisc may decide to make packets available immediately, but it may also decide to re-arrange packets (to prioritize certain traffic), or to delay/drop packets (e.g. to enforce bandwidth controls). Two types of qdiscs exist: simple "classless" ones and more powerful "classful" ones. The difference is that the classfull ones can use ,atching logic to classify packets into different categories, in order to provide different policies for them. The picture below illustrates the two types of queueing discplines:


Queueing disciplines are attached to network devices. Any network device will work, be it a physical network device (eth0, etc), a bridge, or a bridge interface that connects to a virtual machine. Normally queueing disciplines are attached to the output, or egress direction of a device. This is how traffic shaping normally works: we do not control what others send to us (like with email), but we can control how we send packets to others. Input, or ingress, qdiscs do exist though, and we will use them below.

Let's get down to it now. Below are the commands that can be used to set up the port mirror. It uses the tc filter action "mirred," which was written specifically for the task of mirroring packets (thanks to Herbert Xu for the pointer).

  1. # tc qdisc add dev vnet1 ingress
  2. # tc filter add dev vnet1 parent ffff: \
          protocol ip u32 match u8 0 0 \
          action mirred egress mirror dev vnet0
  3. # tc qdisc replace dev vnet1 parent root prio
  4. # tc filter add dev vnet1 parent 8002: \
          protocol ip u32 match u8 0 0 \
          action mirred egress mirror dev vnet0

Easy, isn't it? :) Let's go through these in detail. In the code above we have assumed that the security appliance is attached to the bridge interface "vnet0", and the VM to be monitored is attached to bridge interface "vnet1".

  1. # tc qdisc add dev vnet1 ingress

What we do here is to create a new qdisc called "ingress". As mentioned above, qdiscs normally don't work on ingress so this is really a special qdisc that you can consider an "alternate root" for inbound packets.


  2. # tc filter add dev vnet1 parent ffff: \
          protocol ip u32 match u8 0 0 \
          action mirred egress mirror dev vnet0

This is where we copy packets that are generated by the VM. This line says: add a new filter, and attach it to node "ffff:". The ID "ffff:" is the fixed ID of the ingress qdisc. Normally nodes are dynamically allocated, but not for ingress (I assume because you can have just one). The filter only matches for IP packets ("protocol ip"). The part "u32 match u8 0 0" specifies a matching expression. In this case, we use the "u32" matcher, with arguments "u8 0 0". This means match any packet where the first byte, when ANDed with the value 0, returns 0. In other words, all packets are selected. When the filter matches, the action "mirred" is executed with arguments "egress mirror dev vnet0". This tells mirred to copy the packet to the device "vnet0".

  3. # tc qdisc replace dev vnet1 parent root prio

Here we replace the qdisc that is directly attached to the root node with a new qdisc of type "prio". You may select another qdisc if you desire, but the reason why we replace it is to make sure that we attach a classfull qdisc. By default, the classless qdisc "pfifo_fast" is used, and being a classless qdisc, it doesn't evaluate filters.


  4. # tc filter add dev vnet1 parent 8002: \
          protocol ip u32 match u32 0 0 \
          action mirred egress mirror dev vnet0

This line copies packets that are destined towards the virtual machine. The filter is attached to the egress side of the bridge interface, which is where normally all qdiscs operate. The filter is added to the qdisc with node ID 8002:. This may be different on your system. After step 3 you should check the ID that has been allocated with "tc qdisc show dev vnet1". The protocol, match and action parameters are identical to step 2.

That's it! To monitor the traffic for all virtual machines, these steps have to be executed for all bridge interfaces. Inside the virtual machine that connected to vnet0, you can use a tool like "wireshark" to confirm that you're indeed getting a copy of all network traffic.

Improvements / open questions
  1. It would be nice if a filter could match any protocol, rather than just 1 at a time. Of course it is very unlikely that your router would route anything else than IP to your host, so this limitation does not matter much for threats from the outside. It does allow however that virtual machines on the same host or on the same LAN communicate with each other without being detected, if they would use a non-IP protocol.
  2. It would be nice if there were a simpler way to specify a match that is always true, rather than the not very obvious match "m32 match u8 0 0".
  3. The security appliance could be put in-line very easily with the same approach, by using the "redirect" command to the mirred action instead of the "mirror" command. This would not copy the packet but instead forward the original packet. It would be the responsibility of the security appliance to forward the ticket back to the original destination (or not).
Conclusion

The goal of this article was twofold. First, it shows how to use the Linux traffic shaping functionality to implement a port mirror for a virtual switch based on a Linux bridge. Secondly, it shows my personal belief that KVM is a very advanced hypervisor architecture. By being based on Linux, it allows you put all the goodies that have been developed for Linux over the past decade or so to good use, without re-inventing the wheel.

Saturday, August 23, 2008

Keeping your email archive while changing webmail providers

Many of you will have experienced this at one time. You want to change from local email to webmail, or you want to switch from one webmail provider to another, but you also want to keep your email archive.

If the new provider supports IMAP (such as Gmail) then this is easy. If you are currently using local email with a "thick client", then you can add your new webmail provider as an IMAP account. This will allow you to drag-and-drop emails from your old account to your new account. This method is very reliable and has high fidelity. Folder structures and email flags (such as "read", "followed-up" etc) can be preserved. If you are currently using webmail, chances are that a migration will be easy as well. Probably, your current provider supports the POP protocol. In this case you can install one of the many good fee email clients on your system (I would recommend Thunderbird), download your current email via POP, and then use the same method as for the migration from a local email setup.

Things are getting more complicated if your new webmail provider doesn't support IMAP. This situation happened to me a few weeks ago to me when my wife wanted to migrate her email from Thunderbird to Yahoo Mail. Yahoo Mail only supports POP and with this protocol it is not possible to upload mail.

But even without IMAP, there is hope ... In theory, it should be possible to remail all mail messages from your archive to your new account. Resending an email verbatim email is also called "bouncing" and is different from "forwarding" in that the original message and headers are preserved.

A concern I had with remailing was about about the fidelity of the approach. After some experimentation it turned out that all headers including the sent date and sender are transferred OK. Per-message flags like "read" and "forwarded" don't surive a bounce, and neither does a folder or tag structure. These issues are inherent to the approach and I found no way to work around this. In my case I found these drawbacks acceptable.

Unfortunately, not all email clients support bouncing. The clients that support is are mostly "old-school" email clients such as mutt, elm and Pine. The feature seems to have been lost in more recent email clients such as Thunderbird and Outlook (although a plugin for Thunderbird is available here). But in any case, the user interface would most likely require you to remail your email archive one message at a time to your new account. This would not be very efficient.

So, I decided to write a small python program to do the trick. The program called bounce-mbox.py and is available here. This program takes a Thunderbird email archive as input (which is just a single file in the "mbox" format), and remails each message from it verbatim to a recipient.

Invoking the program without arguments displays a small help text:

[gjansen@columbus ~]$ ./bounce-mbox.py
Usage: bounce-mbox.py [options] mbox-file recipient smtp-server

Options:
-h, --help show this help message and exit
-p PORT, --port=PORT use alternate SMTP port PORT
-r LIST, --remove=LIST
remove additional headers in LIST
-d DELAY, --delay=DELAY
sleep SECS seconds between each message
-j JOURNAL, --journal=JOURNAL
keep track of sent messages in JOURNAL

The program requires three mandatory arguments: an mbox file, a recipient, and an SMTP server.

The mbox file is the file that contains all your email messages to bounce. By default it should be an mbox file as produced by Thunderbird. The reason that this is Thunderbird specific is because email clients use a few internal email headers (starting with "X-") to keep track of certain properties, and these headers should preferably be removed before bouncing the message. By default, bounce-mbox.py contains a list of headers to remove that is suitable for Thunderbird. To modify this list, use the "--remove" argument to specify a comma-separated list of headers to remove. The headers are case-insensitive.

The recipient argument should be the email address of your new web mail account. All messages in the mbox file will be remailed to this address.

The SMTP server argument is the SMTP server to bounce the emails to. Not all SMTP servers accept all messages (in order to protect against spam), so if you don't own your own SMTP server you have two choices here: your ISP's SMTP server and the SMTP server that is resposible for your new email account. Sometimes ISP's block the SMTP port to all servers apart from theirs. If your ISP does this then the only choice is to use their SMTP server.

Because we will be sending potentially large numbers of email, care must be taken that you are not mistaken for a spammer and that you don't bring down the SMTP server you are using. To this extent, it is possible to specify a time to wait between sending messages with the "--delay" parameter. By default, bounce-mbox.py introduces a 0.5 second delay.

To make the operation more reliable, it is possible to specify a journal with the "--journal" parameter. If journaling is enabled, the unique message-ids of all emails that have been sent are added to the journal. If, for some reason, something goes wrong mid-way, then the journal can be used as input to a subsequent run of the program to ensure these messages are not sent again.

So, in summary, the procedure to send your email to your new webmail account would be:
  1. Use Thunderbird to store all your messages in a single local folder.
  2. Locate the mbox file corresponding to the single local folder somewhere in your thunderbird profile directory.
  3. Use bounce-mbox.py to remail all messages in the mbox file.
Happy email changing!

Saturday, June 14, 2008

Troubleshooting SUID programs

Today I was trying to resolve an issue with PulseAudio on my Fedora 9 system. For some reason, pulseaudio would not acquire real-time privileges. I had already grepped through the logs, which gave me a solution direction but not a solution. The next step I wanted to try is to run pulseaudio under a debugger.

But .. pulseaudio is SUID root. It requires this so that it can start up its I/O threads with real-time priority. Unfortunately, this also means that you cannot debug (or strace, both use the ptrace() system call) the process as an unprivileged user.

My first attempt around this was to run the debugger as root, as below.

# gdb pulseaudio

However, this has the problem that you will also run pulseaudio as root. SUID applications are intended to be run as an unprivileged user (otherwise they would not be SUID) so you can expect erratic behaviour. My problem was finding out why PulseAudio didn't start up with real-time privileges. If I just ran the program as root chances are that this would mask the problem.

So my next thought was to use sudo to run pulseaudio under my normal user id, through the debugger (edited for brevity).

# gdb sudo
(gdb) set args -u gjansen -H pulseaudio

However I wanted to set a breakpoint in pulseaudio's main() function which is not possible this way. Issuing the "b main" command in gdb would set a breakpoint in sudo's main() function, not in that of pulseaudio!

After thinking of the problem for a bit more, I concluded that you really need to run the program you're troubleshooting under your own UID, and attach the debugger as root from another session.

However, again a problem arises: the problem that I was troubleshooting was in the initialization code in pulseaudio. I would never be able to start up pulseaudio, switch to another console, find the PID, and attach the debugger in time. I really needed a way to start pulseaudio and pause it immediately. You can try this by pressing CTRL-Z immediately after you start it. However, you need to be extremely quick and the approach still seemed clunky.

So I decided to solve the problem properly. I created a small shared library that you can LD_PRELOAD with an SUID executable and which stops the program before it reaches its main() function. How does it work? It is actually quite simple. ELF files have a .init section that contains code which is executed by the dynamic linker before it transfers control to the program. This section is used e.g. for executing the contructors of global objects in C++. With some gcc magic, you can add your own function to this section. This is what my library does: it add a small function to the .init section that sends a SIGSTOP to itself. It will also output the process ID, as a convenience.

The code is given below:

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>

static void __attribute__((constructor))
stop_init_func()
{
pid_t pid = getpid();
printf("Sending SIGSTOP to myself (pid = %d)\n", pid);
kill(pid, SIGSTOP);
}

To compile this code in a shared library, save the code above a file called "stop.c" and issue the following commands:

$ gcc -fPIC -c -o stop.o stop.c
$ gcc -shared -o stop.so stop.o

The library needs to be installed in /usr/lib (or /usr/lib64 on bi-arch systems), and needs to be SUID root. The latter is required to allow it to be preloaded into SUID programs.

# cp stop.so /usr/lib
# chmod u+s /usr/lib/stop.so

Now let's show how to use this library with a simple SUID "hello world" program.

$ ls -l hello
-rwsr-xr-x 1 root root 8225 2008-06-14 14:17 hello
$ LD_PRELOAD=stop.so ./hello
Sending SIGSTOP to myself (pid = 32269)

[1]+ Stopped LD_PRELOAD=stop.so ./hello
$

As you can see the program printed its process ID and then stopped itself. Now you can attach gdb as root in another session as displayed below (edited for brevity).

# gdb hello 32269
Attaching to program: /home/gjansen/Scratch/link/hello, process 32269
Redelivering pending Stopped (signal).
Reading symbols from /usr/lib64/stop.so...done.
Loaded symbols for /usr/lib64/stop.so
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0000003573232507 in kill () from /lib64/libc.so.6
(gdb)

Voila! The program has not yet executed its main() function, so if you want you can set a breakpoint there or anywhere else. Then enter "c" to continue the program.

UPDATE 2008/08/23: The source code for stop.c as well as a Makefile can be downloaded from here (thanks to the guys at freeHg).