Saturday, August 23, 2008

Keeping your email archive while changing webmail providers

Many of you will have experienced this at one time. You want to change from local email to webmail, or you want to switch from one webmail provider to another, but you also want to keep your email archive.

If the new provider supports IMAP (such as Gmail) then this is easy. If you are currently using local email with a "thick client", then you can add your new webmail provider as an IMAP account. This will allow you to drag-and-drop emails from your old account to your new account. This method is very reliable and has high fidelity. Folder structures and email flags (such as "read", "followed-up" etc) can be preserved. If you are currently using webmail, chances are that a migration will be easy as well. Probably, your current provider supports the POP protocol. In this case you can install one of the many good fee email clients on your system (I would recommend Thunderbird), download your current email via POP, and then use the same method as for the migration from a local email setup.

Things are getting more complicated if your new webmail provider doesn't support IMAP. This situation happened to me a few weeks ago to me when my wife wanted to migrate her email from Thunderbird to Yahoo Mail. Yahoo Mail only supports POP and with this protocol it is not possible to upload mail.

But even without IMAP, there is hope ... In theory, it should be possible to remail all mail messages from your archive to your new account. Resending an email verbatim email is also called "bouncing" and is different from "forwarding" in that the original message and headers are preserved.

A concern I had with remailing was about about the fidelity of the approach. After some experimentation it turned out that all headers including the sent date and sender are transferred OK. Per-message flags like "read" and "forwarded" don't surive a bounce, and neither does a folder or tag structure. These issues are inherent to the approach and I found no way to work around this. In my case I found these drawbacks acceptable.

Unfortunately, not all email clients support bouncing. The clients that support is are mostly "old-school" email clients such as mutt, elm and Pine. The feature seems to have been lost in more recent email clients such as Thunderbird and Outlook (although a plugin for Thunderbird is available here). But in any case, the user interface would most likely require you to remail your email archive one message at a time to your new account. This would not be very efficient.

So, I decided to write a small python program to do the trick. The program called bounce-mbox.py and is available here. This program takes a Thunderbird email archive as input (which is just a single file in the "mbox" format), and remails each message from it verbatim to a recipient.

Invoking the program without arguments displays a small help text:

[gjansen@columbus ~]$ ./bounce-mbox.py
Usage: bounce-mbox.py [options] mbox-file recipient smtp-server

Options:
-h, --help show this help message and exit
-p PORT, --port=PORT use alternate SMTP port PORT
-r LIST, --remove=LIST
remove additional headers in LIST
-d DELAY, --delay=DELAY
sleep SECS seconds between each message
-j JOURNAL, --journal=JOURNAL
keep track of sent messages in JOURNAL

The program requires three mandatory arguments: an mbox file, a recipient, and an SMTP server.

The mbox file is the file that contains all your email messages to bounce. By default it should be an mbox file as produced by Thunderbird. The reason that this is Thunderbird specific is because email clients use a few internal email headers (starting with "X-") to keep track of certain properties, and these headers should preferably be removed before bouncing the message. By default, bounce-mbox.py contains a list of headers to remove that is suitable for Thunderbird. To modify this list, use the "--remove" argument to specify a comma-separated list of headers to remove. The headers are case-insensitive.

The recipient argument should be the email address of your new web mail account. All messages in the mbox file will be remailed to this address.

The SMTP server argument is the SMTP server to bounce the emails to. Not all SMTP servers accept all messages (in order to protect against spam), so if you don't own your own SMTP server you have two choices here: your ISP's SMTP server and the SMTP server that is resposible for your new email account. Sometimes ISP's block the SMTP port to all servers apart from theirs. If your ISP does this then the only choice is to use their SMTP server.

Because we will be sending potentially large numbers of email, care must be taken that you are not mistaken for a spammer and that you don't bring down the SMTP server you are using. To this extent, it is possible to specify a time to wait between sending messages with the "--delay" parameter. By default, bounce-mbox.py introduces a 0.5 second delay.

To make the operation more reliable, it is possible to specify a journal with the "--journal" parameter. If journaling is enabled, the unique message-ids of all emails that have been sent are added to the journal. If, for some reason, something goes wrong mid-way, then the journal can be used as input to a subsequent run of the program to ensure these messages are not sent again.

So, in summary, the procedure to send your email to your new webmail account would be:
  1. Use Thunderbird to store all your messages in a single local folder.
  2. Locate the mbox file corresponding to the single local folder somewhere in your thunderbird profile directory.
  3. Use bounce-mbox.py to remail all messages in the mbox file.
Happy email changing!

No comments: