This notebook shows how to create maps in a notebook, without having to write a ton of JavaScript or Python code. It uses folium, which leverages Leaflet.js, a popular JavaScript library to create interactive maps. folium
supports base maps using tilesets from MapBox, OpenStreetMap, and others, out of the box. folium
also makes it very easy to plot data on maps using GeoJSON and TopoJSON overlays.
You will need to install the following in order to run this notebook.
!pip install folium==0.1.3
!pip install xlrd==0.9.3
!pip install seaborn==0.5.1
!pip install matplotlib==1.4.3
!pip install pandas==0.15.2
%matplotlib inline
import matplotlib.pyplot as plt
import pandas
import seaborn
!wget -O 13staxcd.txt http://www2.census.gov/govs/statetax/13staxcd.txt
df = pandas.read_csv('13staxcd.txt', index_col='ST').dropna(axis=1)
# Because, yeah, values are in 1000s of dollars
df = df * 1000
df.head()
We need a second file that provides descriptions for the tax item codes (the TXX numbers).
!wget -O TaxItemCodesandDescriptions.xls http://www2.census.gov/govs/statetax/TaxItemCodesandDescriptions.xls
tax_codes_df = pandas.read_excel('TaxItemCodesandDescriptions.xls', 'Sheet1', index_col='Item Code')
tax_codes_df.head()
The sum of taxes collected over by state and local governments for all categories in fiscal year 2013 is over $846 billion, with a 'b'.
print '${:,}'.format(df.sum().sum())
According to the data source:
The Annual Survey of State Government Tax Collections (STC) provides a summary of taxes collected by state for 5 broad tax categories and up to 25 tax subcategories. These tables and data files present the details on tax collections by type of tax imposed and collected by state governments.
The only thing missing from the data thus far are the "5 broad tax categories", and which of the 25 subcategories make up each one. We had to look this up, and download another Excel file. There's also this report, which provides some details about tax categorization, but also seems to contradict the Excel spreadsheet. Oh, the humanity.
!wget -O agg_tax_categories.xls http://www2.census.gov/govs/estimate/methodology_for_summary_tabulations.xls
tmp = pandas.read_excel('agg_tax_categories.xls')
tmp[8:21].dropna(how='all').dropna(how='all', axis=1).head()
After some investigation, we can write a short function to retrieve the major tax category by tax item code.
def category(tax_item):
'''Return tax category for the tax item code.'''
if tax_item == 'T01':
return 'Property Taxes'
elif tax_item in ['T40', 'T41']:
return 'Income Taxes'
elif tax_item in ['T09', 'T10', 'T11', 'T12', 'T13', 'T14', 'T15', 'T16', 'T19']:
return 'Sales and Gross Receipts Taxes'
elif tax_item in ['T20', 'T21', 'T22', 'T23', 'T24', 'T25', 'T26', 'T27', 'T28', 'T29']:
return 'License Taxes'
return 'Other Taxes'
Sum all taxes collected by broad category.
# assign broad category to each tax item code
tmp = df.copy()
tmp['Category'] = tmp.index.map(category)
# aggregate taxes collected by each state by broad category
by_category = tmp.groupby('Category').sum()
# sum across all states
totals_by_category = by_category.sum(axis=1)
print totals_by_category.map('${:,}'.format)
Plot the total taxes collected for by broad category.
totals_by_category.plot(kind='pie', labels=totals_by_category.index,
figsize=(10,10), autopct='%.1f%%')
Here is a violin plot (a combination of boxplot and kernel density plot) that shows the distribution of taxes collected for each category.
data = by_category.T
fig, ax = plt.subplots(figsize=(14,10))
seaborn.violinplot(data, color="Set3", bw=.2, cut=.6,
lw=.5, inner="box", inner_kws={"ms": 6}, ax=ax)
print data[['Income Taxes', 'Sales and Gross Receipts Taxes']].describe()
Sum the taxes across all categories and view the states that collect the most taxes.
taxes_by_state = df.sum().sort(inplace=False, ascending=False)
taxes_by_state[:10].map('${:,}'.format)
It may not surprise anyone that California and New York top the list; however, it may surprise some that California collected almost twice as much tax revenue as New York. Here is a bar chart to help visualize the magnitude of taxes collected by state.
fig, ax = plt.subplots(figsize=(12,8))
data = taxes_by_state.reset_index()
data.columns = ['State', 'Taxes']
# plot values in $ billions
seaborn.barplot(data.index, data.Taxes / 1000000000,
ci=None, hline=.1, ax=ax)
ax.set_xticklabels(data.State)
ax.set_ylabel('$ Billions')
ax.set_xlabel('State')
ax.set_title('Taxes Collected by US State and Local Governments, FY 2013')
plt.tight_layout()
We want to overlay our tax data over a map of the United States. To do this, we'll use the following:
Combine the data for the 25 tax subcategories with the cumulative amounts for the 5 broad categories. This will allow us to map both sets.
# the aggregate data by broad category
tmp = by_category.T
# make up our own tax item codes for broad categories
codes = ['I','L','O','P','S']
# create complete list of category names
category_names = tax_codes_df.Description.append(
pandas.Series(tmp.columns, index=codes)
)
# merge broad category data with data for 25 subcategories
tmp.columns = codes
data = df.T.merge(tmp, left_index=True, right_index=True)
data.head()
!wget -O us-states-10m.json https://raw.githubusercontent.com/knowledgeanyhow/notebooks/master/tax-maps/data/us-states-10m.json
us_topo_map = 'us-states-10m.json'
import os
assert os.path.isfile(us_topo_map)
statinfo = os.stat(us_topo_map)
assert statinfo.st_size > 0
Our tax data is indexed by state. We need a way to bind our data to the state geometries in our map. The geometries in our TopoJSON file are keyed by FIPS codes (Federal Information Processing Standard). So we need to obtain the FIPS codes for US states (from the US Census Bureau), and add them to our data.
!wget -O us_state_FIPS.txt http://www2.census.gov/geo/docs/reference/state.txt
fips = pandas.read_csv('us_state_FIPS.txt', delimiter='|', index_col='STUSAB')
fips.head()
Add FIPS column to our data.
data['FIPS'] = data.index.map(lambda x: fips.loc[x]['STATE'])
data['FIPS'].head()
Folium utilizes IPython's rich display to render maps as HTML. Here are two functions that use different mechanisms to render a map in a notebook. Either will work in modern browsers.
import folium
from IPython.display import HTML
def inline_map(map):
"""
Embeds the HTML source of the map directly into the IPython notebook.
This method will not work if the map depends on any files (json data). Also this uses
the HTML5 srcdoc attribute, which may not be supported in all browsers.
"""
map._build_map()
return HTML('<iframe srcdoc="{srcdoc}" style="width: 100%; height: 510px; border: none"></iframe>'.format(srcdoc=map.HTML.replace('"', '"')))
def embed_map(map, path="map.html"):
"""
Embeds a linked iframe to the map into the IPython notebook.
Note: this method will not capture the source of the map into the notebook.
This method should work for all maps (as long as they use relative urls).
"""
map.create_map(path=path)
return HTML('<iframe src="files/{path}" style="width: 100%; height: 510px; border: none"></iframe>'.format(path=path))
Now we create a function that accepts a tax code, creates a basemap of the United States, and adds a TopoJSON overlay with the appropriate state tax data bound to it.
def create_tax_map(tax_code, path='tax_map.html'):
'''
Create a base map with tax data bound to a GeoJSON overlay.
'''
# lookup tax category name
tax_name = category_names.loc[tax_code] + ' ($ Millions)'
# lookup tax data
d = data[['FIPS',tax_code]].copy()
d[tax_code] = d[tax_code] / 1000000L
# compute a color scale based on data values
max = d[tax_code].max()
color_scale = [max*q for q in [0, 0.1, 0.25, 0.5, 0.75, 0.95]]
# create base map
map = folium.Map(location=[40, -99], zoom_start=4, width=800)
# add TopoJSON overlay and bind data
map.geo_json(geo_path=us_topo_map, data_out='tax_map.json',
data=d, columns=d.columns,
key_on='feature.id',
threshold_scale=color_scale,
fill_color='PuBuGn', line_opacity=0.3,
legend_name=tax_name,
topojson='objects.states')
map.create_map(path=path)
return map
inline_map(create_tax_map('T40'))
Use a widget to choose the tax category and render the map interactively.
from IPython.html import widgets
from IPython.display import display
from IPython.html.widgets import interact
tax_categories = category_names.to_dict()
tax_categories = dict(zip(tax_categories.values(), tax_categories.keys()))
dropdown = widgets.Dropdown(options=tax_categories, value='T40', description='Tax:')
def show_map(tax_code):
display(inline_map(create_tax_map(tax_code)))
widgets.interact(show_map, tax_code=dropdown)