Friday, October 15, 2010

US Unemployment Rate since 1948

This is a quick response to a discussion on the Visual Analytics LinkedIn group regarding a Wall Street Journal (WSJ) Graph of 'Historical US Unemployment'.

The WSJ chose a heatmap where ordinarily a simple line chart would suffice and discussion revolved around whether the heatmap was successful in providing information to the reader anymore than a line chart would.

Consequently, I put together a quick line plot in python + matplotlib using the following data sources:
The following graph is a little different to the WSJ one with respect to:
  • WSJ chose colour bands for each percentage point, I have chosen 0-4 as good,  4-8 ok, 8+ a bit high, loosely based on the colours of the WSJ example. I have no idea if these values are considered good or not by economists.
  • Periods of recession are represented by vertical bands, as opposed to white dots in the WSJ example.
  • For readability, the dates have been labeled every 4th year (as per Bureau) - it would vary depending on zoom level if it was implemented as a dynamic chart for the web.

Graph of US Unemployment rates 1948 - 2009


Graph of Australian and US Unemployment Rates 1948 - 2010
Note: Australian data only available from late 70s, source: Australian Bureau of Statistics - Labour force data




Code:
import datetime
import pylab
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FuncFormatter as ff
from numpy import ma,logical_or, arange

rawdata = []
ymd = []

k = []
mdata = []

def chomp(string):
    return string.rstrip('\r\n')

f = open('SeriesReport20101014161301.csv', 'rb')
rawdata = f.readlines()
f.close()
for i in xrange(1,len(rawdata)): #skip header
    rawdata[i] = chomp(rawdata[i])
    yrates = rawdata[i].split(',')
    if i == 1:
        startyear = int(yrates[0])
    elif i == (len(rawdata)-1):
        endyear = int(yrates[0])+1

    for q in xrange(1,13):
        mdata.append(yrates[q])


for x in xrange(startyear,endyear):
    for g in xrange(1,13):
        k.append(datetime.date(x, g, 1))

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot_date(k, mdata, '-',label='unemployment rate', color='#571d1c', linewidth=1)
ax.set_xlim(datetime.date(1947, 12, 1),datetime.date(2010, 1, 1))
ax.xaxis.set_major_locator(mdates.YearLocator(4))
ax.xaxis.set_minor_locator(mdates.MonthLocator(1))

# acceptable? rate
p = plt.axhspan(0, 4, facecolor='green', alpha=0.05) 
#getting worse
p = plt.axhspan(4, 8, facecolor='#aaaaff', alpha=0.05) 
# bad?
p = plt.axhspan(8, 12, facecolor='red', alpha=0.05) 
#ax.xaxis.grid(color='white', linestyle='-', linewidth=0.5)

ax.set_ylim(0,12)
ax.yaxis.set_major_locator(pylab.MultipleLocator(1))
ax.yaxis.set_minor_locator(pylab.MultipleLocator(0.1))
#ax.yaxis.grid(color='white', linestyle='-', linewidth=0.5)

#recession http://en.wikipedia.org/wiki/List_of_recessions_in_the_United_States
plt.axvspan(datetime.date(1948, 11, 1), datetime.date(1949, 10, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1953, 7, 1), datetime.date(1954, 5, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1957, 8, 1), datetime.date(1958, 4, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1960, 4, 1), datetime.date(1961, 2, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1969, 12, 1), datetime.date(1970, 11, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1973, 11, 1), datetime.date(1975, 3, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1980, 1, 1), datetime.date(1980, 7, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1981, 7, 1), datetime.date(1982, 11, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1990, 7, 1), datetime.date(1991, 3, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(2000, 11, 1), datetime.date(2001, 10, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(2007, 11, 1), datetime.date(2009, 6, 1), facecolor='orange', alpha=0.1)


labels = ax.get_xticklabels()
for label in labels:
    label.set_rotation(30)
plt.legend(loc='best')
plt.xlabel('Date' )
plt.ylabel('% Rate' )
plt.title('U.S. Unemployment rate for the period 1948 - 2009')
plt.show()