This page serves as a guide for researchers interested in U.S. postal geography codes. It provides some general background information about ZIP codes and describes and links to various reference materials, data files, web sites, and other resources of interest to users wanting to work with ZIPs. This version focuses on the Census Bureau's 2000 vintage ZIP Census Tabulation Areas (ZCTAs), close cousins of ZIP codes. We place special emphasis on tools for linking ZIP (/ZCTA) codes to other geographies (such as counties, cities, metro areas) and to demographic information from the latest decennial census.
ZIP codes are a very messy kind of geography. They were created by the U.S. Postal Service as a tool to help deliver the mail more efficiently. ("ZIP" is actually an acronym for "Zone Improvement Plan", where "Zone" is a reference to the 2-digit postal zones that were used by the post office prior to implementing nationwide ZIP codes back in the early 1960's. Because it is an acronym we always use the uppercase for it.) ZIP codes have been adopted by marketing people and by all kinds of other researchers as a standard geographic area, like a city or a county. We see maps of ZIP codes in telephone books and from commercial vendors that make us think of them as spatially defined areas with precise boundaries, similar to counties. But, from the perspective of the agency that defines them, the U.S. Postal Service, ZIP codes are not and never have been such spatial entities. They are simply categories for grouping mailing addresses. As such, ZIP codes do in most cases resemble spatial areas since they are comprised of spatially clustered street ranges. But not always. In rural areas, ZIP codes can be collections of lines (rural delivery routes) that in reality do no look much like a closed spatial area. In areas where there is no mail delivery (deserts, mountains, lakes, much of Nevada and Utah) ZIP codes are not really defined. You may see maps that show ZIP code boundaries that include such areas, but these are not post-office-defined official definitions. An area will not be assigned a ZIP code until there is a reason for it, i.e. until there needs to be mail delivered there. So the actual definition of a ZIP code "boundary" is quite fuzzy at best, and a purely extrapolated guess (at what it would be if someone were to start receiving mail there) at worst. If you have an application that requires extreme geographic precision, especially in sparsely populated areas, then you need to avoid using ZIP codes.
An important thing to keep in mind about ZIP codes is that they change over time. In some cases these changes can be quite dramatic, but more commonly they are small and subtle. When a ZIP codes changes its definition it does not change its name like a census tract. The ZIP code that was called '63301' in St. Charles county, Mo in 1985 was subsequently broken into first two and then three ZIP codes. These new codes were not called 63301.01, 63301.02 and 63301.03; they were called 63301, 63303 and 63304. So what is referred to as 63301 today represents about a third of the area that it referred to in 1985. The new code 63303 did not exist in 1985 and it has already changed its definition so that it now represents about half of the area it included when it was initially created (by splitting 63301 into 63301 and 63303; a few years later the initial 63303 ZIP was subdivided into 63303 and 63304.). What this means, of course, is that ZIP codes are really terrible units for doing any kind of time-series analysis unless you have some way of keeping track of all the changes over time. Otherwise, you may wind up concluding that there has been a dramatic downward trend in the population of 63301 since 1980, when in fact just the opposite is true. At least when you attempt a time-series study of 63304 it becomes apparent that this geographic entity did not exist before 1990.
What the world really needs to deal with ZIP code geography properly is a large geographic equivalency file relating ZIP codes to other relevant geographies with a time dimension. Instead, what we have is a pair of such equivalency files that relate ZIP codes to geographic entities as used for tabulating the 1990 and 2000 decennial censuses. The 1990 file uses ZIP codes as they were defined around July of 1991. (Because it takes a long time to do the research, it may be that the currency of the ZIP codes used varied somewhat from area to area.) In 2000 the Census Bureau had decided to adopt the new things they called ZIP Census Tabulation Areas (ZCTAs), which were really very similar to what they called ZIP codes for tabulating earlier censuses. The equivalency files we we are referring to here are the MABLE databases and corresponding Geocorr web applications which we'll be talking about in more detail below. For now, what we want to emphasize is that when we talk about ZIP codes (or ZCTAs) we really need to keep a time reference in mind. Just as when you work with census tracts you need to know whether you mean 1980 or 1990 tracts, or when you are talking about the countries of Europe — time is an important dimension.
You might think that what we should always assume is that if we do not specify a time, then we are probably referring to the most current definitions of ZIP codes, and that any reference materials should be periodically updated to reflect these definitions. Easier said than done, of course. This would be a huge task. But even if you could maintain all your lists with the latest definitions in some cases there are reasons why it may be preferable not to. This has to do with the fact that the Census Bureau tabulated the results of the last two decennial censuses to produce summary files describing these entities as they were defined at a certain point in time. The data tables on these files describe the characteristics of the residential ZIP codes as they were (more or less) defined at the time the census tabulations were prepared (circa July, 1991 for the 1990 census; Jan. 1, 2000 for the 2000 census.) "More or less"? Yes — for the purposes of creating these tabulations in 1990 the Bureau had private vendors provide them with files that related each of the 1990 census blocks (the smallest geographic unit identified for each 1990 census return) with the then-current ZIP code definitions. The files created by these vendors were used to create a data product called the "ZIP Block Equivalency Files" or "STF3B Headers Files". They define which geographic areas were used to approximate the ZIP code areas being summarized by the STF3B data tables. There is a built in "fuzz factor" in this equivalency list since — while the Census Bureau has created census blocks so that they do not cross any other census-defined geographic unit — blocks can and do (frequently) cross ZIP codes. Typically, ZIP code "boundaries" fall along back lot lines — they almost never split down the middle of a street. If they did, you would need to have two postal carriers — one from each of the two ZIP codes, travel the same street and deliver just to their side. Census blocks, however, almost always split down the middle of streets. As a result, blocks near the boundaries of ZIP codes typically split ZIP codes. If you picture the classic rectangular city census block, it is an area bounded by portions of 4 city streets. Each of those street faces has its own street name, address range and ZIP code, and it is quite common for the ZIP codes for the 4 streets to not all be the same. When this happens, the Census Bureau (or its vendor agents in 1990) assigns the entire block to a single ZIP/ZCTA as used for tabulating the census. So a city block might have 12 households with 40 persons in a census block living in ZIP A and 6 households with 15 persons living in ZIP B; for the sake of doing the census tabulation, all 18 households and 55 persons will be tabulated as part of ZIP A. We sometimes describe this phenomenon as "rounding off the ZIP data to blocks". These rounding errors are unbiased and may cancel each other out to some extent, but they are still an important source of potential error. In the block to ZIP equivalency files prepared by the vendors in 1990 and assigned by the Census Bureau in 2000, each census block was assigned to one and only one ZIP or ZCTA. The results of these assignments were used in creating the Master Area Block Level Equivalency ("MABLE") files used in the Geocorr web applications.
Most of the various files that we reference from this page will be dealing with ZIP/ZCTA codes as they were defined for the purposes of tabulating the 2000 census.
Another important and exasperating characteristic of ZIP codes is that they do not conform to any other geographic schemes. Most geographic units are part of some hierarchical system, and frequently they will recognize other boundaries such as counties or states. But ZIP codes follow no rules whatsoever with respect to other geographies. ZIP codes can and do cross state lines (rarely, but just enough to cause some problems and confusion), county lines (about 10% of ZIPs are in more than one county), political jurisdictions (cities, congressional districts), metro areas, etc.
This aspect of ZIPs (specifically as defined for the 1990 census) and several other useful bits of information about them are discussed in the geographic glossary file provided with Geocorr.
ZCTAs (ZIP Census Tabulation Areas) are what the U.S. Census Bureau is now using as an alternative to ZIP codes as geographic entities for publishing data based on actual ZIP codes. A 5-digit ZCTA (there are 3-digit ZCTAs as well) is typically nearly identical to a 5-digit U.S.P.S. ZIP code, but there are important distinctions. The Census Bureau has created a web site where they explain some of the differences:
- In most instances the ZCTA code equals the ZIP Code for an area.
- In creating ZCTAs, the Census Bureau took the ZIP addresses will end up with a ZCTA code different from their ZIP Code.
- Some ZIP Codes represent very few addresses (sometimes only one) and therefore will not appear in the ZCTA universe.
- The term ZCTA was created to differentiate between this entity and true USPS ZIP Codes.
- ZCTA is a trademark of the U.S. Census Bureau; ZIP Code is a registered trademark of the U.S. Postal Service.
- The Census Bureau does not have U.S. Postal Service ZIP Code boundary files, nor do we have information or possible sources of such files.
Note that ZCTAs are new for 2000; there are, strictly speaking, no historical ZCTAs for doing any time-series analysis. The Bureau published results of the 2000 Census aggregated to these geographic units on Summary Files 1 and 3. Unlike the ZIP codes used for tabulating earlier censuses, these ZCTA areas are spatially complete and you can easily do mapping with them. You can download ZCTA boundary files from the Census Bureau's cartographic boundary files page.
The Bureau has created special XX ZCTAs (ZCTAs with a valid 3-digit ZIP but with "XX" as last 2 characters of the code — such as "631XX") which represent large unpopulated areas where it made no sense to assign a census block to an actual ZIP code. Similarly, HH ZCTAs such as 633HH (the H stands for Hydrography, we assume) represent large bodies of water within or bordering a 3-digit ZIP area. There are typically no persons or households in an XX or HH ZCTA. Applications that use ZCTA codes for population-based applications (as opposed to spatial based) can generally ignore these special ZCTAs.
You can think of the zcta_master data set as a very large table with over 33,000 rows and 60 columns. Each row corresponds to one State/ZCTA combination. The columns describe the ZCTAs in several ways. To see a complete detailed listing and explanation of these columns (or "variables") see the metadata page for this data set. Most of the columns provide information about what geographic entities intersect with the ZIP (ZCTA). Here is a display of one row (observation) from the zcta_master data set, showing the entry for ZIP/ZCTA 65201 (basically, downtown Columbia, Mo).
This output is a PDF file generated using the Dexter web extraction program that involved only three simple parameter entries on our part. We specified that we wanted our output in PDF format; we specified a filter of the form "zcta5 equal to (=) 65201"; and we specified that we wanted ZCTA5 to be used as the ID variable to appear as the first item in each row of the display.
What can you say about ZIP 65201 based upon these data? You can say that it it is primarily in Boone County, MO (FIPS code 29019) and secondarily in Callaway county (FIPS code 29027). The pctcnty variable tells you that 99.8% of the people who lived in this ZCTA at the time of the 2000 census also lived in the primary county (Boone). We can also see that the ZIP is mostly (78.4%) in the city (place, using the census terminology) of Columbia, which has a FIPS code 15670 (variable PlaceFP). The variables PlaceFP2 and pctplace2 tell us that the remainder of the ZIP (i.e. the part that is not within the city of Columbia) is in an unincorporated area.
Another series of codes identifies various metropolitan and urbanized areas associated with the ZIP. We see that 65201 has 89% of its population living in the Columbia, MO urbanized area and 99.8% in the Columbia MSA (both the old 2000 MSACMSA version and the current core-based version, CBSA).
We have a number of spatial data items associated with the ZCTA; we have land area in square miles, the "internal point" coordinates for the ZCTA as published with the 2000 census data, and a custom coordinate pair obtained by taking the population-weighted averages of the internal point coordinates of all the census blocks that were within the ZIP (stored as popcentrLat, popcentrLon). And finally we have a series of key demographic and economic status indicators taken from the 2000 census tables: total population and housing units, median household income, mean poverty ratio and average housing value. Each of the last 3 have the actual value as well as a corresponding index value that tells you how it compares to the value for the entire country. We see that 65201 had a median household income value of $26,955 and that this was only 64.2% of the value for the U.S. as a whole. This reflects the fact that this area, which includes at least portions of 3 university campuses, is heavily inhabited by college students — who typically have lower incomes and smaller households. Note that the other two economic measures also indicate an area that is below average in terms of economic wellbeing, but that those measures have indices in the low 80s, which would indicate the area is not really a slum.
Go to the Uexplore home page. Choose Geography/GIS as the major data category, and then choose georef as the filetype. This takes you to Uexplore referencing our geographic reference data sets collection. Click on the Datasets.html file to get a directory page with metadata and links to the subcollection. The zcta_master data set is the second one listed (this could easily change in the future). Click on its name to select it and to invoke Dexter. Follow the detailed metadata link at the top of the Dexter query form. From that page there is another link to a separate page where we go into more detail about each variable.
If you just want to grab all rows and all variables in CSV format, just go to Section III. Choose Variables and click the box that says you want to keep all the variables. Then click one of the Extract Data buttons. The result is a CSV file that is quite large — 15 mb. We have stored a permanent copy of this CSV file in the georef directory and you can access it directly here.
One of the most frequently asked questions regarding ZIP codes (ZCTAs) is: "If I know the ZIP code of my [customer/survey respondent], how can I tell what state and county they live in?" For these kinds of questions, we build things called equivalency files or correlation lists that show the relationship between two sets of geographic codes. Using the geographic header data provided on the block-level records on the 2000 Census Summary File 1 series, we were able to build such an equivalency file that related all combinations of state 5-digit ZCTAs to counties in the U.S. (50 states + DC). We stored the results in a tabular file in our data archive. This file can be accessed via the web using our Uexplore/Dexter interface, or browse the complete collection of our correlation list data files here. From this page you can access the ZCTA-to-county correlation list data set (or you can take the shortcut to the Dexter application to access this data set by simply clicking on the data set name/link we just provided).
From here, just select the output format(s) of interest from Section I. Choose HTML in addition to the default CSV format. In Section II. Choose rows (observations), you can tell the application which rows you are interested in. The rows in this data set correspond to ZCTAs crossed with counties. Some typical filters that you might want to apply are:
There is no need to enter any filters, if you want the entire data set. There is also a box where you can enter a number that will limit the number of observations/rows on output. To just see what the data looks like you might want to enter 100 in this box to see the first 100 rows of the table (any filters will be applied first).
Now scroll down the page to section III. Choose columns (variables) . We suggest you click on the box indicating that you want to keep ALL the columns. Now click on the Extract Data button to run the query. It should take a few seconds for your results to be displayed in the form of an output menu page that lets you view your multiple-output-file results. Click on the HTML Report link to see your output in HTML format, or on the Delimited File link to see your delimited file.
Assuming you entered the Delaware filter (state = 10) in section II, your first output row should contain the following variables and values:
Clearly, what you have here is a tool for assigning county codes to your customer file using ZIP/ZCTAs as the link. An analysis we did using 1990 census data indicated that if you assigned the primary county code to a record based on the ZIP code you would be right over 98% of the time. (See below for a discussion of an alternative resource that can help you find counties for ZIP codes that are not also ZCTAs.)
The MCDC has created a directory on their public census data server which has a complete set of geographic header data as distributed with the Summary File 1 data from the 2000 census. These header records have information about the geographic entities summarized on the SF1 data files. There are a great many such entities, and they range from state and county level records all the way down to 2000 census blocks. There are over 9 million of the latter entities nationwide. There are two reasons that people interested in ZIP/ZCTA geography may be interested in this collection.
You can access this collection of geographic reference files here. There are two files per state, with the block level records stored separate from all the other geographic levels. You will probably find the xxgeos files more useful, since they contain summaries for ZCTAs. The block-level files are useful for relating ZCTA geography to other levels, but for this kind of analysis you will probably want to use the Geocorr 2000 application (see below), which makes use of a database that was built largely from these header files.
If you followed the link above to explore the xxgeos data directory, you can now click on the file degeos.sas7bdat. To extract data for the ZCTA-related data you have to specify that you just want rows where the geographic summary level code (SumLev) is either 871 (ZCTA within state) or 881 (ZCTA within county). This means that in section II. Choose rows you should select SumLev as the value of variable/column, In List as the value of operator and then type in the value list "871:881" in the text entry box in the value column. In III. Choose columns you should select the ID variables SumLev, State, County and ZCTA5 and all the numerics. If you chose HTML as one of your output formats, you will generate a report that looks like this:
|.....rows omitted here ....|
This is an excellent resource if you are looking for basic geographic information about ZCTAs (2000 vintage for now) and ZIP codes. The emphasis is on names associated with the codes (both "preferred" and alternate) and location (city, state, county). Latitude, longitude coordinates are also provided. Access the data at http://federalgovernmentzipcodes.us/, where you can download it in CSV, Excel, or mySQL format. The data were revised using USPS updates through 11-15-10.
The files have the following key features:
Are we repeating ourselves? Didn't we already deal with this above in the section on ZCTAs? No, in that section we talked about relating ZCTAs to counties, not ZIP codes. As already noted, there are important differences.
ZCTAs get old (frozen at the time of the latest census) and do not include proxies for non-residential ZIP codes such as PO-box-only ZIPS and "unique" ZIPS assigned to large companies or other organizations. These caveats can easily result in being unable to link 10% or so of your ZIP code file when using the tools described above for ZCTAs. We now have an alternative source that can help us get a more complete list. That source is the ZIP codes master file described in the previous section. It contains data for real ZIP codes, not ZCTAs, and it contains a field identifying the county in which the ZIP is (all or mostly) located. Unfortunately, it only contains the name and not the FIPS code for the county. So it's not the perfect solution. But it should be good enough for a lot of applications.
To use the master ZIP codes file on a one-at-a-time manual basis you can simply generate one of the directories we pointed to above and do a manual lookup of each ZIP. To automate the process you will need to generate a file containing the ZIP code and the county. The tool for doing that is once again Dexter. Access the zipcodes dataset in the georef data directory. The query as defined requests output in the form of a CSV file (no report file, no database file); no filtering (you get the entire country); and relevant variables selected (you can ask for more, or less by simply modifying the select lists in section III of the form). You can, if you want and you know how, modify the query any way you want. But all you have to do is click on one of the Extract Data buttons. Then when your output menu page is displayed, you'll need to click on the link(s) to your output file(s). The only hard part will be figuring out how you will use the resulting lookup table file to do the actual encoding.
The Geocorr 2014 web application is an updated version of the original application. Both applications do essentially the same thing, but the newer version uses census 2000 geography and later, while the earlier version used 1990 geography. Geocorr allows you to dynamically generate files and reports that show how various geographic layers are related to one another. For example, you can choose one or more states as your geographic universe of interest and then ask the program to show you how ZCTAs within those states relate to just about any other geographic layer you can think of.
To see how easy it can be, visit the Geocorr application. Choose your state from the first select list. Then select ZIP/ZCTA from the Source geographies select list (on the left), and Congressional District 111th from the Target geographies select list (on the right). In the Output Options section, enter a title for the report such as "ZIP to Congressional District Equivalencies Using Geocorr". Then click the Run request button.
For processing many states, you might have to break it down and do about 10 or 20 states at a time.
We happened to stumble upon a curious resource on a Census Bureau web page dealing with definitions of metropolitan and micropolitan statistical areas. It was under a section titled "Geographic relationship files", with a downloadable file titled "2007 ZIP code to 2006 CBSA". The result looks something like this:
Perhaps not the best example, since this ZIP code is entirely within a single CBSA. But it does cross county boundaries, being on the border of St. Louis City (a county equivalent independent of St. Louis County). The thing that makes this data resource special is the use of sub-ZIP code geography at the two and four-digit ZIP-suffix levels. It varies by ZIP code. All of this information is stored in the MCDC public archive, where it can be accessed via the Dexter extraction utility. It can be accessed as data set zip07_cbsa06 in the corrlst data directory. Be sure to take advantage of the link to detailed metadata at the top of the Dexter query form.
The Census Bureau created detailed demographic summaries for ZCTAs (both complete and within county) as part of their 2000 Summary File 1 and Summary File 3 data products. These are very large collections of detailed tables that you might have occasion to use if you have a specific item of interest that requires you to go deeper than what most users will want to go. There are, for example, over 16,000 cells of tabular data for every ZCTA on Summary File 3. You're probably going to want this boiled down to something more readily accessible.
Fortunately, you will probably never have to get involved directly with the summary files. Both the Census Bureau and the MCDC have created demographic profile products which take these thousands of data tables cells and boil them down to a few hundred key data items, which are then presented in easy to read reports. You can view these data one ZCTA at a time in your browser, or you can access data files that have the boiled-down data available for all ZCTAs in formats that can be readily loaded into a spreadsheet or database access package (e.g. Excel or Access). Doing this is going to require that you become familiar with either the data.census.gov access tool or the Uexplore/Dexter application (to access the MCDC's data sets). Neither tool is terribly difficult to use but it does mean you have to invest a little time before you can access that first set of data.
There are actually two sets of demographic profiles at the ZIP/ZCTA level based on the 2000 Census from the Missouri Census Data Center. But the collection based on Summary File 1 (which means complete count, short form data) is very limited in demographic detail and will rarely be what you want. When you want the "good stuff" — data on income, poverty, educational attainment, etc. — you want Summary File 3 (which means sample data based on the long form) and this translates into what the MCDC calls the dp3_2k< (Demographic Profile 3, 2K Census) product. You can access these products using various entry points. If all you want to do is go straight to the main menu for the ZCTA profiles then you want the Census 2000 Profiles menu page (choose "Zip Codes (ZCTA)" as area type, then filter by state).
The profile is divided into 29 sections (topics). The header line for each of these sections is a hyperlink to the metadata for that topic. There are many hyperlinks on the report page. Among the things you will learn if you read the usage notes page carefully: