Saturday, August 23, 2008

Keeping your email archive while changing webmail providers

Many of you will have experienced this at one time. You want to change from local email to webmail, or you want to switch from one webmail provider to another, but you also want to keep your email archive.

If the new provider supports IMAP (such as Gmail) then this is easy. If you are currently using local email with a "thick client", then you can add your new webmail provider as an IMAP account. This will allow you to drag-and-drop emails from your old account to your new account. This method is very reliable and has high fidelity. Folder structures and email flags (such as "read", "followed-up" etc) can be preserved. If you are currently using webmail, chances are that a migration will be easy as well. Probably, your current provider supports the POP protocol. In this case you can install one of the many good fee email clients on your system (I would recommend Thunderbird), download your current email via POP, and then use the same method as for the migration from a local email setup.

Things are getting more complicated if your new webmail provider doesn't support IMAP. This situation happened to me a few weeks ago to me when my wife wanted to migrate her email from Thunderbird to Yahoo Mail. Yahoo Mail only supports POP and with this protocol it is not possible to upload mail.

But even without IMAP, there is hope ... In theory, it should be possible to remail all mail messages from your archive to your new account. Resending an email verbatim email is also called "bouncing" and is different from "forwarding" in that the original message and headers are preserved.

A concern I had with remailing was about about the fidelity of the approach. After some experimentation it turned out that all headers including the sent date and sender are transferred OK. Per-message flags like "read" and "forwarded" don't surive a bounce, and neither does a folder or tag structure. These issues are inherent to the approach and I found no way to work around this. In my case I found these drawbacks acceptable.

Unfortunately, not all email clients support bouncing. The clients that support is are mostly "old-school" email clients such as mutt, elm and Pine. The feature seems to have been lost in more recent email clients such as Thunderbird and Outlook (although a plugin for Thunderbird is available here). But in any case, the user interface would most likely require you to remail your email archive one message at a time to your new account. This would not be very efficient.

So, I decided to write a small python program to do the trick. The program called bounce-mbox.py and is available here. This program takes a Thunderbird email archive as input (which is just a single file in the "mbox" format), and remails each message from it verbatim to a recipient.

Invoking the program without arguments displays a small help text:

[gjansen@columbus ~]$ ./bounce-mbox.py
Usage: bounce-mbox.py [options] mbox-file recipient smtp-server

Options:
-h, --help show this help message and exit
-p PORT, --port=PORT use alternate SMTP port PORT
-r LIST, --remove=LIST
remove additional headers in LIST
-d DELAY, --delay=DELAY
sleep SECS seconds between each message
-j JOURNAL, --journal=JOURNAL
keep track of sent messages in JOURNAL

The program requires three mandatory arguments: an mbox file, a recipient, and an SMTP server.

The mbox file is the file that contains all your email messages to bounce. By default it should be an mbox file as produced by Thunderbird. The reason that this is Thunderbird specific is because email clients use a few internal email headers (starting with "X-") to keep track of certain properties, and these headers should preferably be removed before bouncing the message. By default, bounce-mbox.py contains a list of headers to remove that is suitable for Thunderbird. To modify this list, use the "--remove" argument to specify a comma-separated list of headers to remove. The headers are case-insensitive.

The recipient argument should be the email address of your new web mail account. All messages in the mbox file will be remailed to this address.

The SMTP server argument is the SMTP server to bounce the emails to. Not all SMTP servers accept all messages (in order to protect against spam), so if you don't own your own SMTP server you have two choices here: your ISP's SMTP server and the SMTP server that is resposible for your new email account. Sometimes ISP's block the SMTP port to all servers apart from theirs. If your ISP does this then the only choice is to use their SMTP server.

Because we will be sending potentially large numbers of email, care must be taken that you are not mistaken for a spammer and that you don't bring down the SMTP server you are using. To this extent, it is possible to specify a time to wait between sending messages with the "--delay" parameter. By default, bounce-mbox.py introduces a 0.5 second delay.

To make the operation more reliable, it is possible to specify a journal with the "--journal" parameter. If journaling is enabled, the unique message-ids of all emails that have been sent are added to the journal. If, for some reason, something goes wrong mid-way, then the journal can be used as input to a subsequent run of the program to ensure these messages are not sent again.

So, in summary, the procedure to send your email to your new webmail account would be:
  1. Use Thunderbird to store all your messages in a single local folder.
  2. Locate the mbox file corresponding to the single local folder somewhere in your thunderbird profile directory.
  3. Use bounce-mbox.py to remail all messages in the mbox file.
Happy email changing!

Saturday, June 14, 2008

Troubleshooting SUID programs

Today I was trying to resolve an issue with PulseAudio on my Fedora 9 system. For some reason, pulseaudio would not acquire real-time privileges. I had already grepped through the logs, which gave me a solution direction but not a solution. The next step I wanted to try is to run pulseaudio under a debugger.

But .. pulseaudio is SUID root. It requires this so that it can start up its I/O threads with real-time priority. Unfortunately, this also means that you cannot debug (or strace, both use the ptrace() system call) the process as an unprivileged user.

My first attempt around this was to run the debugger as root, as below.

# gdb pulseaudio

However, this has the problem that you will also run pulseaudio as root. SUID applications are intended to be run as an unprivileged user (otherwise they would not be SUID) so you can expect erratic behaviour. My problem was finding out why PulseAudio didn't start up with real-time privileges. If I just ran the program as root chances are that this would mask the problem.

So my next thought was to use sudo to run pulseaudio under my normal user id, through the debugger (edited for brevity).

# gdb sudo
(gdb) set args -u gjansen -H pulseaudio

However I wanted to set a breakpoint in pulseaudio's main() function which is not possible this way. Issuing the "b main" command in gdb would set a breakpoint in sudo's main() function, not in that of pulseaudio!

After thinking of the problem for a bit more, I concluded that you really need to run the program you're troubleshooting under your own UID, and attach the debugger as root from another session.

However, again a problem arises: the problem that I was troubleshooting was in the initialization code in pulseaudio. I would never be able to start up pulseaudio, switch to another console, find the PID, and attach the debugger in time. I really needed a way to start pulseaudio and pause it immediately. You can try this by pressing CTRL-Z immediately after you start it. However, you need to be extremely quick and the approach still seemed clunky.

So I decided to solve the problem properly. I created a small shared library that you can LD_PRELOAD with an SUID executable and which stops the program before it reaches its main() function. How does it work? It is actually quite simple. ELF files have a .init section that contains code which is executed by the dynamic linker before it transfers control to the program. This section is used e.g. for executing the contructors of global objects in C++. With some gcc magic, you can add your own function to this section. This is what my library does: it add a small function to the .init section that sends a SIGSTOP to itself. It will also output the process ID, as a convenience.

The code is given below:

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>

static void __attribute__((constructor))
stop_init_func()
{
pid_t pid = getpid();
printf("Sending SIGSTOP to myself (pid = %d)\n", pid);
kill(pid, SIGSTOP);
}

To compile this code in a shared library, save the code above a file called "stop.c" and issue the following commands:

$ gcc -fPIC -c -o stop.o stop.c
$ gcc -shared -o stop.so stop.o

The library needs to be installed in /usr/lib (or /usr/lib64 on bi-arch systems), and needs to be SUID root. The latter is required to allow it to be preloaded into SUID programs.

# cp stop.so /usr/lib
# chmod u+s /usr/lib/stop.so

Now let's show how to use this library with a simple SUID "hello world" program.

$ ls -l hello
-rwsr-xr-x 1 root root 8225 2008-06-14 14:17 hello
$ LD_PRELOAD=stop.so ./hello
Sending SIGSTOP to myself (pid = 32269)

[1]+ Stopped LD_PRELOAD=stop.so ./hello
$

As you can see the program printed its process ID and then stopped itself. Now you can attach gdb as root in another session as displayed below (edited for brevity).

# gdb hello 32269
Attaching to program: /home/gjansen/Scratch/link/hello, process 32269
Redelivering pending Stopped (signal).
Reading symbols from /usr/lib64/stop.so...done.
Loaded symbols for /usr/lib64/stop.so
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0000003573232507 in kill () from /lib64/libc.so.6
(gdb)

Voila! The program has not yet executed its main() function, so if you want you can set a breakpoint there or anywhere else. Then enter "c" to continue the program.

UPDATE 2008/08/23: The source code for stop.c as well as a Makefile can be downloaded from here (thanks to the guys at freeHg).