Metadata for dataset /pub/data/georef/zcta_master

Rev. 8/19/2013, jgb

This metadata pertains to the data set zcta_master dataset within the /pub/data/georef data directory of the MCDC data archive.

This dataset was created as a concatentation of the 51 state datasets stored in the zctamstr subdirectory of this (/pub/data/georef) data directory. It completely replaces the previous version done in early September, 2012. The geographic codes and area names identified in this dataset reflect (unless indicated otherwise) the latest geographic codes. There are four primary sources of data used to generate the dataset:

  1. The geographic headers data on the block level summaries from Summary File 1, 2010 census.
  2. Data taken from the Census Bureau's TIGER/Line "faces" datasets, with post-2010 geographies for 2010 census blocks.
  3. A file provided by John Snow, Inc. (downloaded and processed by us circa 2010) showing relationships of true ZIP codes and Census Bureau ZCTA codes. This is the source of the altzips variable showing which USPS ZIP codes are associated with the ZCTA.

  4. Data from the latest 5-year period estimates for ZCTAs from the Census Bureau's American Community Survey data. For the initial version in 2012 we used a dataset at the ZCTA level that we (Missouri Census Data Center) generated by allocating/aggregating census tract level data to ZCTAs. But we have now (Aug. 2013) replaced those data with ACS-based data from the first release of official ZCTA level data for the period 2007-2011. Note that this update includes the internal point latitude, longitude fields as well as the landsqmi and areasqmi variables.
  5. Note that sources 1 and 2 are also the basis of the mable12 datasets, which can be found in this data archive in the /pub/data/mable12 directory, and which are used as the source for the MCDC's MABLE/Geocorr12 web application.

Basic Geographic Codes

FIPS state code. This code combined with the ZIP code (ZCTA5) are the keys for the dataset. In a few rare cases a ZCTA crosses state lines. In each of those cases there is a primary state where more than 95% of the ZCTA's population live, and then a "sliver" of the ZCTA that crosses into another state. The psf variable ("primary state flag") variable is set to a value of 1 to indicate if you are looking at a record for the primary state for the ZCTA. A value of 0 for psf means you are looking at the "sliver" portion for the bordering state. All the geocodes in the observation (counties, places, CDs. etc) are within this state. See also stab, the state postal abbreviation variable.

ZIP Census Tabulation Area (5-digit). We sometimes like to think of this as the "same thing" as a ZIP code but it really is not. Close enough for many (most?) applications though. These are proxies for residential ZIP codes as defined on Jan. 1, 2010 and "rounded" off to census blocks.

A name assigned to the ZIP code associated with the ZCTA. This name comes from the Snow directory file where it is the "City name" suggested by the USPS. This is the name that can be used as the last line of an address to get mail delivered to the ZIP. However, we think that name is not always the best. For example: ZIP code 63119 comes with a "city name" of "Saint Louis", which is close. This ZIP code is actually just east of the city of St. Louis and is almost entirely within the suburb city of Webster Groves. Since we know what place (city) the ZIP (ZCTA) intersects with we are able to "improve" on the original name by assigning one based on the city (if any) with which it intersects. So you will find that the value of this variable for the 63119 observation is "Webster Groves, MO". If the ZCTA lies all or mostly outside any unincorporated area (with no CDP assigned) then the original post-office city name is retained.

State postal abbreviation. For example, "MO". Values are in upper case. Corresponds to the State variable.

Geographic Summary Level. It will always be 871 to remind users that we are dealing with State-ZCTA summaries, not complete ZCTAs.

The county in which the ZCTA is all or mostly contained. Over 90% of ZCTAs fall entirely within a single county.

The "secondary" county for the ZCTA, i.e. the county with which it has the 2nd largest intersection. Over 90% of the time this value will be blank.

The place (city) with which the ZCTA has the largest intersection. This can be an incorporated municipality or a CDP (Census Designated Place). It can also have a value of 99999 to indicate an unincorporated area with no CDP assigned. If a ZCTA is 70% unincorporated and 30% within a city the value of placefp will be 99999 and the city code will appear as the value of placefp2. ( "FP" in the name is short for "FIPS". ) These codes are unique within state.

The place/city with which the ZCTA has the second greatest intersection. See placefp, above. A value of 99999 indicates unincorporated area.

The county subdivision (township, town, minor civil divsion, census county division, etc. - what these entities are called varies by state) with which the ZCTA has the largest intersection. This is a FIPS code, not a name. The code is unique within state.

The county subdivision with which the ZCTA has the second largest intersection. Frequently blank.

The Congressional District for the 113th Congress, as elected in 2012 and effective on 1-1-2013. This has been updated from the previous version when we used the 111th district codes which were in effect at the time of the 2010 census.

The secondary CD code for those ZCTAs that cross CD boundaries. Usually blank. Also a 113th congress code.

The Public Use Microsample Area as defined for use in the 2000 census PUMS datasets. This code was also used to publish American Community Survey data through the 2011 vintage data. Even though we now have new PUMA codes for "2010" the Bureau has not yet (as of Aug.2013 -but soon to change with the 2012 vintage data) started using them in their ACS products, nor have they yet released any 2010 PUMS files using the new codes. [As of Aug, 2013 it appears that the 2010 PUMS product has been canceled due to funding shortages.] So these "old" PUMA codes are [were] still the most useful, as of September, 2012.

Public Use Microsample Areas for 2010. We use "12" instead of "10" because these new codes were not defined until 2012. See note re puma2k, above. These codes will start appearing in the ACS products (including the ACS PUMS files) in September, 2013.

New England City and Town area (NECTA). Only defined for the six New England states, blank everywhere else. In New England a value of 99999 is used to indicate an area that is not within any NECTA.

New England City and Town area division. Like NECTA, it is only defined for the six New England states.

Combined New England City and Town area. Like NECTA, it is only defined for the six New England states.

Core-Based Statistical Area. The new metro areas (since 2002 or so, replacing the old MSA/CMSA/PMSA system). There are 2 kinds of CBSAs: Metropolitan Statistical Areas and Micropolitan Statistical Areas. The former are based upon a metro core area of at least 50,000 population (latest census or official estimate), while the later (Micropolitan areas) have a core area of at least 10,000 (and less than 50,000). The codes appearing here were those in effect at the time of the 2010 census. They can and are updated throughout the decade but no changes have been published since December, 2009 (thru September, 2012). See for latest updates and explanations. Note that CBSAs are county-based - a county will never cross a CBSA boundary. (But a ZCTA can). As of August, 2013 the codes we are using come from the lastest mable12 database and that has not been updated to reflect the changes that were made in February of 2013. So these CBSA and related codes (metrodiv and csa) are still 2012 vintage.

Metropolitan Divisions are subsets of CBSAs. Most CBSAs do not have subdivisions. See CBSA for vintage info.

Combined Statistical Area. Sometimes adjacent CBSAs such as Baltimore and Washington, DC are grouped into these larger units.

The CBSA type variable is blank to indicate not within a CBSA or will have a value of "Metro" or "Micro" to indicate that the CBSA is either a Metropolitan or Micropolitan statistical area.

The formal definition or urban (vs. rural, all areas are one or the other) is updated each decade using the detailed population data gathered in the census. There is a lag in getting the new definitions published so that tables on originally published summary files do not have any data for the tables that are to report the urban/rural breakdown. We had to access the special TIGER system files that allowed us to classify each 2010 census block as either urban or rural for 2010. We then aggregated the block-level populations by U/R classification to derive this key measure.

The Urban Area code as defined for 2010. This 5-digit code can identify an Urbanized Area or an Urban Cluster. These are similar areas, but are split into two types based on the size of the population cluster defined as the core of each UA. This code identifies the urban area with which the ZCTA has the largest intersection. There is no secondary UA variable.

Contains a blank to indicate non-urban area. Values of UA and UC indicate portion within an Urbanized Area or an Urban Cluster.

FIPS county code. Has the same value as County but we do not associate a format code with it so if you do a Dexter extract it will appear in the output as a 5-character code rather than the name of the county.

fipco2 Same idea here for the secondary county code. fipco2 will appear in output files as the code associated with county2, with the latter appearing as a county name.

This is the FIPS county code (3-digit) that was in effect in 2000.

Census Division

Census Region


In this section the variables are all percentages that measure the degree of intersection of the State-ZCTA with various geographic codes. The variable name is always of the form pct<geographic-variable>, where geographic-var is the name of the geographic variable. So, for example, the variable pctcnty tells us what percentage of the ZCTAs 2010 population (census count) also lived within the county identified in variable cnty. Values are true percentages, not decimal fractions: "95.0", not ".950" .


















Name Variables

Each of the variables in this section contain the name of an area whose code appears in the Geocodes section, above. The variable name here is just a concatentation of the geographic code variable (with the "fp" suffix dropped, if applicable) and "name". So, for example, we have placename as the variable containing the name of the area whose code is stored in the placefp variable.

Name of the primary place (city) associated with the ZCTA.

Name of the CBSA (metropolitan or micropolitan area) associated with the ZCTA.

Name of the metropolitan division.

Name of the combined statistical area.

Name of the New England City and Town area.

Name of the Urbanized Area/ Urban Cluster.

Name of the "2012" PUMA. Names were assigned to PUMAs for the first time following the 2010 census. (Hence there is no name variable for the puma2k variable.)

ZCTA Data from the American Community Survey

When working with ZIP-based files (typically, address files of customers, patients, survey respondents, etc. -- we'll use the generic term "constituents" for entities associated with the ZIP codes) it can be very helpful to characterize the constituents as to basic demographic or economic indicators. Are we deailing with persons living in areas that are very poor or very wealthy, where there are very many or very few Hispanics, with a very high or very low median age, with a large group quarters population, etc. This kind of information is available from the Census Bureau's American Community Survey. In this section we report variables taken from a recent release of ACS key indicators. For the initial release of the dataset (September, 2012) we use data estimated by the MCDC by allocating census tract level data to ZCTAs.

There are a small number of ZCTAs for which we found no matching data on the ACS data file.

This is a constant that identifies the time period for the ACS data. The value is "2006-2010" in the initial release.

An identifier string intended to be used as a key to link data to ESRI shape files (i.e. the mapping files used in ArcInfo/ ArcGIS software.

Internal point latitude coordinate. (Estimated, for now).

Internal point longitude coordinate. Will be negative to indicate west longitude. (Estimated, for now).

Land Area Sq. Miles (estimate, for now).

Total Area Sq Mls (estimated, for now).

2010 Census Total Pop count

Total Population 2000 census. Estimated.

Total Pop ACS Period Estimate. For example, when acsyears="2006-2010" this is the average of the population estimate for the five years 2006, 2007,..., 2010. Just as all these other ACS-based indicators are period estimates.

Median age in years

% Under 18 years of Age

% 65 years and over

% White alone

% Black or African American

% Asian

% Hispanic or Latino of any race

Total households

Median household income

Family Households

Median family income

Persons for whom poverty status is determined

% Persons Below Poverty

% Living in Group Quarters

% In College or graduate school

% Bachelor degree or higher

% Foreign born

Total housing units

Occupied housing units

% Renter-occupied units

Median Home Value

Median Rent

Miscellaneous Variables

This is the ZCTA's 2010 total population across all states. Only of interest in those rare cases where the ZCTA crosses state boundaries.

Almost always a value of 1 to indicate that the ZCTA intersects with only 1 state. A value of 2 says it intersects with two states.

Primary State Flag: a value of 1 says this is the primary state associated with the ZCTA. This means the only state in 99+% of the cases. A value of 0 indicates a "sliver" case where the ZCTA has a small portion that crosses a state line.

Pop2010 as % of ZipTotPop . Usually has a value of 100.

Special ZIP codes, such as those used for large corporations, universities, military bases, P.O. Box-only ZIP codes, etc. can be linked to a residential ZIP code based on their physical location. We used the ZIP-to-ZCTA equivalency file from John Snow, Inc. to create this special field with a list of "special" ZIP codes that may be physically located within the ZIP code corresponding to ZCTA5.

This is the count of alternate ZIP codes as stored in the altzips variable.

Missouri Census Data Center

MCDC file: /pub/data/georef/zcta_master.Metadata.html
Metahtml run date: 10SEP12