All About Public Use Microdata Areas (PUMAs)
2010 edition, revised 2014
Public use microdata areas (PUMAs) are familiar to researchers who use the Census Bureau's public use microsample (PUMS) files. They are the only sub-state geographic identifiers on the PUMS records. But now that these areas are being used to publish summary tables based on the American Community Survey data, they have become much more widely used. Because they are required to have a minimum population of 100,000, all PUMA areas exceed the 65,000 population threshold, thus insuring that there will be single-year ACS data for them published each year. Our purpose here is to describe these entities and refer you to resources that will help you to understand and use them.
PUMAs are redefined every ten years in conjunction with the decennial census. This document describes the PUMAs as they were defined for use with the 2010 census, replacing the old 2000 PUMAs for current data reporting. There have been some very noteable changes. The differences are not just in the specific geographic boundaries, but in the guidelines used to create them as well. Two of the more important changes are:
- Incorporated city boundaries may no longer be used to define PUMAs — only continguous counties and/or census tracts may be used.
- The local agencies that define the PUMAs were asked to assign mnemonic names to the PUMAs. Having such locally assigned descriptive names makes these entities much more valuable as summary units in reports for users who have not memorized the PUMA codes.
We first saw the new 2010 PUMAs (also referred to sometimes by us as "2012 PUMAs", especially in connection with the American Community Survey where data for these entities was first reported for vintage year 2012) in the spring-summer of 2012. They were to be used in the 2010 census Public Use MicroSample (PUMS) files (had they ever been created — the product was cancelled) and are now being used in the ACS PUMS files (starting with vintage 2012). As mentioned, they were first used in ACS summary data products starting with vintage 2012 (released in calendar year 2013).
Starting with vintage 2012 ACS PUMS products for multi-year periods, there are two PUMA fields used, with a pseudo-value of 00009 indicating "not available". In the 2010-2012 data files, if a record is based on a survey conducted in 2010 or 2011 then the variable puma00 is defined, and puma10 has a value of 00009; otherwise (survey taken in 2012) you get a value of 00009 for puma00 and a useful value for puma10. This makes using these files to do PUMA-level data analysis problematic.
The Census Bureau has created a very comprehensive set of documents on its Public Use Microdata Areas (PUMAs) page. This collection of pages summarize the criteria that was used in defining these entities and contains links to pages with more detail. A reference information section provides general links to various related documents and sites.
Some Basic Facts About PUMAs
- PUMAs are redefined every ten years for use in the decennial census. There is a cooperative program between the Census Bureau and the states that allows local input to suggest boundaries for them. (They are similar to census tracts in this regard.) Like census tracts, many PUMAs retain their definitions across decades.
- Prior to 2010 there were actually two kinds of PUMAs:
- 1% or "Super" PUMAs were defined for each state and had to have a population of at least 400,000. These are NOT what we are describing here, although they are related.
- 5% PUMAs (or, simply PUMAs) were smaller than and nested within the Super-PUMAs of a state. They were required (by the Census Bureau) to have a population of at least 100,000. The term PUMA, as used in this document, refers to these "5% PUMAs".
- Super PUMAs are being done away with starting with the 2010 definitions.
- PUMAs may not cross state boundaries. Where population constraints permit, they should not cross metropolitan area boundaries. States are entirely comprised of PUMAs (i.e. there is no territory that is not assigned to a PUMA).
- PUMAs are assigned 5-digit codes that are unique within state. Typically, larger (measured in land area) and more rural PUMAs have codes that end with "00". PUMAs that represent portions of a large county will have the same first 3 digits with the last 2 digits being assigned as "01", "02", etc.
- Prior to 2010, PUMAs were not assigned formal names. Starting with 2010, the local agencies that participated in the PUMA boundary suggestion program were asked to provide suggested names for these areas. So PUMAs now have names. In Missouri (only) we created an informal table that assigned a short descriptive name to each of the 2000 Missouri PUMAs. These labels have been quite useful for reports that display data at the PUMA level. (See table.)
- Prior to 2010, PUMAs could be defined in terms of counties, census tracts, and/or places. Starting with 2010, they can only comprise congiguous counties or census tracts. Large urban counties are typically subdivided into multiple PUMAs with boundaries based on census tracts. In less populated rural areas, PUMAs are typically comprised of smaller (population-wise) contiguous counties.
- The 3-digit geographic summary level code for a PUMA is 795. Why would you care or ever need to know this? If you were using Dexter to access one of our mixed-geography data sets such as the one at acs2012.usmcdcprofiles and only wanted to keep those observations/rows that summarized PUMAs you would need to know this code in order to specify the filter "SumLev Equals 795".
PUMA Maps and Geographic Equivalencies
The Census Bureau publishes PUMA reference material here. Included on this page is a link to a set of individual PUMA maps organized by state. These can be useful when you one to zero in on a single PUMA or set of PUMAs. But normally we prefer a map product that can show us an entire state's worth of PUMAs or a metro region overview. For that type of map, see our reference below to the Proximity One PUMA maps products.
PUMA codes (both the old — 2000 vintage — and the new (2010 vintage, aka "2012" PUMAs) are included in the MCDC's MABLE database, which means they can be used within the Geocorr 2014 web application. See more detailed discussion, below.
Using Geocorr, we generated a 2000-PUMA to 2012-PUMA equivalency file in both CSV (comma-separated) and SAS data set formats. The CSV file is puma2k_puma2010.csv in our corrlst ("correlation lists") data directory. We used the 2010 pop as the weight variable, so this file tries to measure the overlaps as of 4-1-10. The variable afact indicates the portion of the 2000 PUMA;s 2010 pop living in the 2010 PUMA; the variable afact2 goes the other way, showing what portion of the 2010 PUMA's pop also resides (resided) in the 2000 PUMA. The SAS data set version is accessible using our Dexter utility (access corrlst.puma2k_puma2010, which lets you easily create state-based subsets). To help you see what these data look like we generated a nicely formatted listing of the Missouri subset of this data set.
Using Maps to See Which PUMAs Are Where
By far the best resource we have found for letting you actually see where the PUMAs are located within a state is at this ProximityOne puma2010 web page. In addition to some excellent general background info and access to the latest 1-year (2012 currently) ACS data profiles at the 2010 PUMA level, you can find a box on the right labeled "PUMA Map Views by State". Scroll down and find your state and click on it. You'll get an excellent overview map of the state, clearly displaying the PUMA boundaries on a based map that shows county boundaries and major towns. Urban areas get their own separate inset maps. It's a .com site but access to these maps is free.
PUMA Master Dataset and Web Application
We have created a special puma_master dataset in our public data archive. Each observation (record/row) in the dataset describes a single 2010 PUMA. It provides information regarding the PUMA's location by indicating intersections with other more familiar geographies. For example, what county/counties, place(s), metro area, Congressional District(s), etc. it intersects with. It is like somebody ran a bunch of Geocorr runs (see below for a description of the Geocorr application) and merged them altogether in this single resource. The dataset also contains a set of key indicator variables from a recent set of ACS summary data.
Using Geocorr to Relate PUMAs to Other Geographic Codes
The Geocorr apps create reports and/or comma-delimited files showing how different geographic layers correspond to one another. A good example relevant to the current topic involves using the application to generate a report showing how PUMAs relate to counties in the state of Colorado. To do this, invoke the application and fill out the form as follows:
- Choose Colorado as the state to process.
- From the Select one or more source geographies: list, choose PUMA.
- From the Select one or more target geographies: list, choose County.
- Skip down to the Output Options section and check the box labeled Generate 2nd allocation factor (AFACT2): portion of target geocodes in source geocodes. This means that our report will not only show us what portion of the PUMA population resided in the county in 2000, but also what portion of the county population resided within the PUMA.
- Ignore the rest of the options. Click any Run request button to invoke the Geocorr program.
In your browser a page will be generated summarizing the results and providing hyperlinks to the two output Files. If you click on the listing (report format) link you should see a report, the first few lines of which should look like this:
These are just the lines of the report dealing with the first five values of the source geocode, which is called puma12 in Geocorr. Each line of the report represents the intersection of the PUMA area with a target geocode — a county. The first line of the report tells us that the intersection of PUMA 00100 (Northeast Colorado...) with Bent County had 6,499 persons living in it, according to the 2010 census. The first "alloc factor" column has a value of 0.059, which is telling us what portion of the PUMA's total population is represented by this intersection. So just under 6% of this PUMA's population is in Bent County. The last column, county to puma12 alloc factor, is the allocation factor going the other way. The value of 1.000 tells us that the entire county of Bent is (was) contained in this PUMA. In fact, as you scan the lines of the report, you can see there are a dozen counties listed that have a value of 1.000 in this column, indicating that they fall entirely within the PUMA. Two other counties (Elbert and Weld) are just partially contained in the PUMA.
This example shows us how to relate the PUMA areas to counties. We can just as easily (by changing our selection in the target geocodes select list) get comparable reports for other geographic levels such as places (cities), congressional districts, CBSAs (metropolitan and micropolitan statistical areas), urbanized areas/urban clusters, etc.
Summary Data at the PUMA Level
The Census Bureau did not publish any summary data for these units based on the 2010 census. You will not be able to find a PUMA summary level on sf1 or using the American Factfinder web site. However, it would be possible to aggregate data at the census tract level to create such summaries. You could use Geocorr to generate the required tract-to-PUMA equivalency file.
Data from the American Community Survey are available at the PUMA level. Data published for vintage years 2011 and earlier use the old 2000 PUMAs as the units. Starting with vintage 2012 they will be by the new 2010/2012 PUMAs. Such data can be accessed via American FactFinder or from the MCDC web site. Both our ACS Profiles and ACS Profile extract app (accessible via our Quick Links box) can be used to access PUMA-level data.
PUMA data can also be extracted using Uexplore/Dexter from our various acs2006 thru acs2012 data directories. The usmcdcprofiles and usmcdcprofiles3yr data sets contain summaries at the PUMA level; just filter using the SumLev = 795 filter spec.
Detailed tables in our basetbls and btabs5yr subdirectories of the more recent acs[yyyy] subdirectories also contain data at the PUMA level.