12 Years of Gmail, Part 2: Bootstrapping

Posted on 08 November 2016 in Technology • Tagged with 12 years of gmail, mailbox, graphing, plotly, python, sqlite, takeout inspector

This post is part of my series, 12 Years of Gmail, taking a look at the data Google has accumulated on me over the past 12 years of using various Google services and documenting the learning experience developing an open source Python project (Takeout Inspector) to analyze that data.

Jumping back in to Python has been just as fun as my first experiences with it. After brushing off some of the dust, I have managed to put together a (very) small package that does a couple of basic things with a Google Takeout Mail (mbox) file:

  1. Parses and standardizes the format of email addresses;
  2. Imports key messages data in to an sqlite database;
  3. Produces simple graphs of top recipients and senders.

Parsing Email Addresses

The mailbox Python module makes it very simple to get an mbox file in to Python and play around using the mailbox.Mailbox and email.Message classes. Here is an example using my mbox file:

0
1
2
3
4
5
6
7
8
9
import mailbox
email = mailbox.mbox('/path/to/email.mbox')

# The number of emails in the mbox file.
print len(email.keys())
114407

# The "Delivered-To" header of the first email.
print email …

Continue reading