----- March 24, 2014 -----

IRS Releases Tax Return Summary Data for States, Counties and ZIP Codes

The Internal Revenue Service has released a new set of summary data that should be of interest to anyone studying patterns of income and wealth in the United States. These new files are basically aggregations at the various geographic levels of most of the line items that are reported on a federal income tax return. This means, for example, that you can find out how many people in a given county or ZIP code submitted a return (in tax year 2011) and what the total value of Adjusted Gross Income was on those returns. That income is then further broken down by its components such as Salaries & Wages, Capital Gains, Interest Income, taxable Social Security benefits, etc.

The announcement of the new products recently appeared on the IRS Statistics of Income What's New page entry:

2011 Individual Income Tax ZIP Code and County Data—United States’ ZIP code and county data for Tax Year 2011 are now available on Tax Stats. The data present selected income and tax return items by State, ZIP code, county, and size of adjusted gross income. These data are based on individual income tax returns filed with the IRS. (February, 2014)

The data are reported in separate files, one summarizing all the returns for the specified geographic areas, and the other providing summaries based on a set of Adjusted Gross Income intervals (for example, a summary for all returns where the AGI was $200,000 or more, and another for returns with AGI less than $25,000).

The Missouri Census Data Center has downloaded and converted the files as provided by the IRS so they are accessible via Uexplore/Dexter in the new irstaxes data directory, which is accessible from the Economic Indicators section of the MCDC Data Archive home page.

Our conversions involve some significant value-added modifications. We have assigned more mnemonic names than those suggested by the Technical Documentation names provided by the IRS (we use the name AGI instead of A00100, for example, for the variable containing the Adjusted Gross Income), and - more importantly - we have more than doubled the number of data items by calculating and adding various means and percentages. So you get not only the aggregate AGI values (in thousands of dollars) but also the Average AGI in simple dollars; and you get not only a variable NFarms with the count of "Farm Returns", but also a PctFarm variable telling you what percentage of all returns were from farms. The resulting enhanced data sets (identified by the word "plus" in their names) are added to our archive collection along with the original data sets with the original suggested IRS variable names. For details see the Readme file in the irstaxes data directory.

To give you an idea of what you have on these data sets we pulled a sample of the our "plus" data set containing summaries at the state and county level for the entire U.S. with no AGI category summaries (uscntysplusnoagi11), selecting state level summaries for five Midwestern states (using Dexter, of course). We then had the data set transposed so that the variables were turned into the rows and the geographic areas (states) were the columns. Here is a partial listing (there are over 170 variables altogether) of that extract:

You can see the entire report by running the Dexter sample query transpose_states sample query in the irstaxes/Queries directory.

Ranking Geographic Areas Using IRS Database Indicators

These new data lend themselves well to generating reports and files that rank geographic areas based on any of the numerous indicators now available to us. We have coded another sample query, this time using the Rankster application (a post processor that shares the Uexplore front-end data browser with Dexter) to generate a report that shows the top 40 ZIP codes in every state as ranked by average Adjusted Gross Income on their 2011 tax returns. Here is the Missouri portion of that report (slightly cropped to fit better here):

Not surprising to see that the Ladue ZIP code in St. Louis County is easily the highest ranked in the state. But the appearance of Westphalia and Lohman would seem to indicate that the average values might be skewed due to a small number of very large values. Users can access the entire report at topAGIzipsBystate.Rankster_report.html in the ersatzes/Queries directory. They can also access the query itself in that same directory, file name topAGIzipsByState.html, to see the actual Rankster query form that was used to generate the report. Using that as a template you could easily do comparable reports based on any of the other dozens of indicators and rank them to look at the lowest instead of highest values (or both).

Thematic Maps

We have created a set of thematic maps that make use of these data to display some spatial patterns.

  1. Map 1 displays the state of Missouri by ZIP code and shows the 40 ZIP codes with the hightest average Adjusted Gross Income .

  2. Map 2 displays the state of Missouri by ZIP code and shows the 40 ZIP codes with the lowest average Adjusted Gross Income

  3. Map 3 displays the entire U.S. by county showing average Adjusted Gross Income .

  4. Map 4 displays the entire U.S. by county showing Percentage of Farm Returns .

The data for each of these maps was created (using setups available in the irstaxes/Queries folder) from these new data sets using Dexter and Rankster. The specific queries modules used were mozipranks.html (uses Rankster) and uscomapsdata.html (uses Dexter).

User Feedback

We would like to acknowledge the contribution of Jane Traynham of the Maryland State Data Center, who served as our test user and feedback-provider during the development of these data sets. We asked if she'd be willing to share her thoughts regarding this new data resource and she gratiously provided us with the following testimonial:
The Missouri Census Data Center’s recent addition of the IRS 2011 tax return data to their Dexter application provides users with an easy way to customize their State, county or ZIP Code data to extract just the data items they are interested in. The files contain field names that are easier to understand and additional valuable fields such as Average AGI and percents have already been calculated.

Additional summarized data by state for selected fields and rankings of all ZIP Codes by highest Average AGI are also available in a nicely formatted html report style. By changing selections such as filters or variables in Dexter the user may limit the number of records selected, change the geography, or even select the file format for the output. A great, tool!