Metadata for dataset /pub/data/georef/zcta_master

Rev. 8/19/2013, jgb

This metadata pertains to the data set zcta_master dataset within the /pub/data/georef data directory of the MCDC data archive.

This dataset was created as a concatentation of the 51 state datasets stored in the zctamstr subdirectory of this (/pub/data/georef) data directory. It completely replaces the previous version done in early September, 2012. The geographic codes and area names identified in this dataset reflect (unless indicated otherwise) the latest geographic codes. There are four primary sources of data used to generate the dataset:

  1. The geographic headers data on the block level summaries from Summary File 1, 2010 census.
  2. Data taken from the Census Bureau's TIGER/Line "faces" datasets, with post-2010 geographies for 2010 census blocks.
  3. A file provided by John Snow, Inc. (downloaded and processed by us circa 2010) showing relationships of true ZIP codes and Census Bureau ZCTA codes. This is the source of the altzips variable showing which USPS ZIP codes are associated with the ZCTA.

  4. Data from the latest (or at least recent) 5-year period estimates for ZCTAs from the Census Bureau's American Community Survey data. (As of August, 2014 we were using 2008-2012 vintage.) There is a variable on the data set, ACSYears, that indicates the vintage.

Basic Geographic Codes

FIPS state code. This is the primary state code. Out of the 32,653 ZCTAs defined for the 2010 cenus, 101 (about .3%, i.e 3/10ths of 1%)) of them are in more than one state. About half of these (55) have more than 5% of their population in the non-primary state. The psf variable ("primary state flag") variable is set to a value of 1 to indicate if you are looking at a record for the primary state for the ZCTA. That flag is set to 1 for ALL observation on this data set. It is important to note that all the geocodes in the observation (counties, places, CDs. etc) are within this primary state. See also stab, the state postal abbreviation variable.

ZIP Census Tabulation Area (5-digit). We sometimes like to think of this as the "same thing" as a ZIP code but it really is not. Close enough for many (most?) applications though.

A name assigned to the ZIP code associated with the ZCTA. This name comes from the John Snow directory file where it is the "City name" suggested by the USPS. This is the name that can be used as the last line of an address to get mail delivered to the ZIP. However, we think that name is not always the best. For example: ZIP code 63119 comes with a "city name" of "Saint Louis", which is close. This ZIP code is actually just east of the city of St. Louis and is almost entirely within the suburb city of Webster Groves. Since we know what place (city) the ZIP (ZCTA) intersects with we are able to "improve" on the original name by assigning one based on the city (if any) with which it (all or mostly) intersects. So you will find that the value of this variable for the 63119 observation is "Webster Groves, MO". If the ZCTA lies all or mostly outside any unincorporated area (with no CDP assigned) then the original post-office city name is retained.

State postal abbreviation. For example, "MO". Values are in upper case. Corresponds to the State variable.

Geographic Summary Level. It will always be 860 to remind users that we are dealing with complete ZCTA summaries (as opposed to, for example, 871 summaries which are state portions of ZCTAs.

The county in which the ZCTA is all or mostly located. Over 90% of ZCTAs fall entirely within a single county.

The "secondary" county for the ZCTA, i.e. the county (within the primary state) with which it has the 2nd largest intersection. Over 90% of the time this value will be blank. This and all secondary geocodes in the observation are only for the primary state. So, for example, ZCTA 99128 is 86.6% in Washington state, and 13.7% in Idaho. But the observation will show that the primary County is Whitman, WA and PctCounty is 100. The secondary county (County2) is blank. This is true about all secondary geocodes: they are based on examining only the geography of the primary state.

The place (city) with which the ZCTA has the largest intersection. This can be an incorporated municipality or a CDP (Census Designated Place). It can also have a value of 99999 to indicate an unincorporated area with no CDP assigned. If a ZCTA is 70% unincorporated and 30% within a city the value of placefp will be 99999 and the city code will appear as the value of placefp2. ( "FP" in the name is short for "FIPS". ) These codes are unique within state.

The place/city with which the ZCTA has the second greatest intersection. See placefp, above. A value of 99999 indicates unincorporated area.

The county subdivision (township, town, minor civil divsion, census county division, etc. - what these entities are called varies by state) with which the ZCTA has the largest intersection. This is a FIPS code, not a name. The code is unique within state.

The county subdivision with which the ZCTA has the second largest intersection. Frequently blank.

The Congressional District for the 113th Congress, as elected in 2012 and effective on 1-1-2013. This has been updated from the previous version when we used the 111th district codes which were in effect at the time of the 2010 census.

The secondary CD code for those ZCTAs that cross CD boundaries. Usually blank. Also a 113th congress code.

The Public Use Microsample Area as defined for use in the 2000 census PUMS datasets. This code was also used to publish American Community Survey data through the 2011 vintage data. Even though we now have new PUMA codes for "2010" the Bureau has not yet (as of Aug.2013 -but soon to change with the 2012 vintage data) started using them in their ACS products, nor have they yet released any 2010 PUMS files using the new codes. [As of Aug, 2013 it appears that the 2010 PUMS product has been canceled due to funding shortages.] So these "old" PUMA codes are [were] still the most useful, as of September, 2012.

Public Use Microsample Areas for 2010. We use "12" instead of "10" because these new codes were not defined until 2012. See note re puma2k, above. These codes will start appearing in the ACS products (including the ACS PUMS files) in September, 2013.

New England City and Town area (NECTA). Only defined for the six New England states, blank everywhere else. In New England a value of 99999 is used to indicate an area that is not within any NECTA.

New England City and Town area division. Like NECTA, it is only defined for the six New England states.

Combined New England City and Town area. Like NECTA, it is only defined for the six New England states.

Core-Based Statistical Area. The new metro areas (since 2002 or so, replacing the old MSA/CMSA/PMSA system). There are 2 kinds of CBSAs: Metropolitan Statistical Areas and Micropolitan Statistical Areas. The former are based upon a metro core area of at least 50,000 population (latest census or official estimate), while the later (Micropolitan areas) have a core area of at least 10,000 (and less than 50,000). The codes appearing here were those in effect at the time of the 2010 census. They can and are updated throughout the decade with the most recent changes were published in February of 2013 and codes in this file reflect those 2014 vintage definitions. See for the latest updates and explanations. Note that CBSAs are county-based - a county will never cross a CBSA boundary (but ZCTAs can, since they cross county boundaries).

Metropolitan Divisions are subsets of CBSAs. Most CBSAs do not have subdivisions. See CBSA for vintage info.

Combined Statistical Area. Sometimes adjacent CBSAs such as Baltimore and Washington, DC are grouped into these larger units.

The CBSA type variable is blank to indicate not within a CBSA or will have a value of "Metro" or "Micro" to indicate that the CBSA is either a Metropolitan or Micropolitan statistical area.

The formal definition or urban (vs. rural, all areas are one or the other) is updated each decade using the detailed population data gathered in the census. There is a lag in getting the new definitions published so that tables on originally published summary files do not have any data for the tables that are to report the urban/rural breakdown. We had to access the special TIGER system files that allowed us to classify each 2010 census block as either urban or rural for 2010. We then aggregated the block-level populations by U/R classification to derive this key measure.

The Urban Area code as defined for 2010. This 5-digit code can identify an Urbanized Area or an Urban Cluster. These are similar areas, but are split into two types based on the size of the population cluster defined as the core of each UA. This code identifies the urban area with which the ZCTA has the largest intersection. There is no secondary UA variable (but there is a pctUA variable to indicate what portion of the ZCTA is within the UA). This will be blank to indicate an area that is entirely rural. If any portion of the ZCTA is within a UA then it is reported as the value of UA (and UAName) and pctua will reflect the portion of ZCTA's population that lives within the UA.

Contains a blank to indicate non-urban area. Values of UA and UC are possible and represent the type associated with the UA variable value.

FIPS county code. Has the same value as County but we do not associate a format code with it so if you do a Dexter extract it will appear in the output as a 5-character code rather than the name of the county.

fipco2 Same idea here for the secondary county code. fipco2 will appear in output files as the code associated with county2, with the latter appearing as a county name.

This is the FIPS county code (3-digit) that was in effect in 2000.

Census Division

Census Region


In this section the variables are all percentages that measure the degree of intersection (using the 2010 census total pop counts as the weight variable) of the State-ZCTA with various geographic codes. The variable name is always of the form pct<geographic-variable>, where geographic-var is the name of the geographic variable. So, for example, the variable pctcnty tells us what percentage of the State-ZCTA's 2010 population (census count) also lived within the county identified in variable cnty. Values are true percentages, not decimal fractions: "95.0", not ".950" . To estimate the 2010 pop living in the intersection multiply the percentage by Pop10ZCTAstate and divide by 100. For example: popcnty2=pctcnty2*Pop10ZCTAstate/100 .


















Name Variables

Each of the variables in this section contain the name of an area whose code appears in the Geocodes section, above. The variable name here is just a concatentation of the geographic code variable (with the "fp" suffix dropped, if applicable) and "name". So, for example, we have placename as the variable containing the name of the area whose code is stored in the placefp variable.

Name of the primary place (city) associated with the ZCTA.

Name of the CBSA (metropolitan or micropolitan area) associated with the ZCTA.

Name of the metropolitan division.

Name of the combined statistical area.

Name of the New England City and Town area.

Name of the Urbanized Area/ Urban Cluster.

Name of the "2012" PUMA. Names were assigned to PUMAs for the first time following the 2010 census. (Hence there is no name variable for the puma2k variable.)

ZCTA Data from the American Community Survey

When working with ZIP-based files (typically, address files of customers, patients, survey respondents, etc. -- we'll use the generic term "constituents" for entities associated with the ZIP codes) it can be very helpful to characterize the constituents as to basic demographic or economic indicators. Are we dealing with persons living in areas that are very poor or very wealthy, where there are very many or very few Hispanics, with a very high or very low median age, with a large group quarters population, etc. This kind of information is available from the Census Bureau's American Community Survey. In this section we report variables taken from a recent release of ACS key indicators.

There are a small number of ZCTAs for which we found no matching data on the ACS data file.

This is a constant that identifies the time period for the ACS data. The value was "2006-2010" in the initial release and in August, 2014 it has/had a value of "2008-2012".

An identifier string intended to be used as a key to link data to ESRI shape files (i.e. the mapping files used in ArcInfo/ ArcGIS software). Same value as the ZCTA5 variable.

Internal point latitude coordinate. From the ACS data.

Internal point longitude coordinate. Will be negative to indicate west longitude. From the ACs data.

Land Area Sq. Miles (from the ACS data)

Total Area Sq Mls (from the ACS data).

2010 Census Total Pop count

Total Population 2000 census. Estimated.

Total Pop ACS Period Estimate. For example, when acsyears="2008-2012" this is the average of the population estimate for the five years 2008, 2009,..., 2012. Just as all these other ACS-based indicators are period estimates.

Median age in years

% Under 18 years of Age

% 65 years and over

% White alone

% Black or African American

% Asian

% Hispanic or Latino of any race

Total households

Median household income

Family Households

Median family income

Persons for whom poverty status is determined

% Persons Below Poverty

% Living in Group Quarters

% In College or graduate school

% Bachelor degree or higher

% Foreign born

Total housing units

Occupied housing units

% Renter-occupied units

Median Home Value

Median Rent

Miscellaneous Variables

This is the ZCTA's 2010 total population across all states. Only of interest in those rare cases where the ZCTA crosses state boundaries.

Almost always a value of 1 to indicate that the ZCTA intersects with only 1 state. A value of 2 says it intersects with two states.

Primary State Flag: a value of 1 says this is the primary state associated with the ZCTA. This means the only state in 99+% of the cases. A value of 0 indicates a "sliver" case where the ZCTA has a small portion that crosses a state line.

Pop2010 as % of ZipTotPop . Usually has a value of 100.

Special ZIP codes, such as those used for large corporations, universities, military bases, P.O. Box-only ZIP codes, etc. can be linked to a residential ZIP code based on their physical location. We used the ZIP-to-ZCTA equivalency file from John Snow, Inc. to create this special field with a list of "special" ZIP codes that may be physically located within the ZIP code corresponding to ZCTA5.

This is the count of alternate ZIP codes as stored in the altzips variable.

Missouri Census Data Center

MCDC file: /pub/data/georef/zcta_master.Metadata.html
Metahtml run date: 10SEP12