Notes on the NCHS Bridged Race Population Estimates

Rev. 08/05/13

General Information

These estimates were commissioned by the National Center for Health Statistics and generated for them by the U.S. Bureau of the Census. They are alternative versions of the estimates done by the Bureau in the "casrh" series - county, age, sex, race and hispanic origin. These are annual population post-censal (mostly - with some inter-censal) estimates at the state and county levels for the 4 demographic categories mentioned (age, sex, race and hispanic origin). These numbers have been generated for the years 1990 through (at the time of this writing) 2011, with new values being generated with an approximate 1-year lag. (For example, the estimates for July 1, 2013 should be available circa July, 2014.) These estimates differ from the standard casrh estimates in two critical ways:

  1. The race categories are very different. The "bridged race" categories used on these files are:
    • White
    • Black or African American
    • American Indian, Eskimo, Aleut
    • Asian & Pacific Islander.
    There is no separating the Asians from the Native Hawaaians or other Pacific Islanders, and there is no "Other" race category -- these have all been allocated to one of the other 4 categories (in a process called "bridging"). Note that there are no multi-race categories. Using bridging techniques all persons who indicated they were of multiple races were re-assigned to a single race group. Detailed methodology is available from the NCHS web site.
  2. While the commonly-available numbers in the casrh series use 5-year age cohort categories, these estimates are for single years of age, except for the 85-and-over category.

These numbers are derived from the same basic source as the other official Census Bureau population estimates, and where comparable demographic categories are used, the numbers should match. For example, if you sum all the estimates for hispanic persons for a given county across the Age categories 00 through 04, you should get the same number that appears for that 0-4 cohort, hispanic, for that county. Detailed methodology descriptions is available at NCHS File Documentation web site.

We now have these estimates for all of the decades of the 1990's and 2000's (2000-2009), and we add post-2010 data as it becomes available. For the decade 2000-2009 we offer both the original post-censal estimates, as well as updated inter-censal estimates (see the Census Bureau web page regarding these updates if you are not familiar with the concept.)

In addition to all these between-census, July 1 estimates NCHS also provides comparable data based on the 2000 and 2010 decennial censuses. We stored all these decennial data in single national data sets per year. They look just like one of the estimates datasets, except that they have only a single numeric population count rather than a time series of estimates. Summary (_sumry) versions of these dat asets have also been created. These sets are named usbridged2kcen and usbridged2k_sumry for the 2k (2000) census and usbridged2010cen and usbridged2010_sumry for the 2010 census.

The Datasets

We currently have seven data sets per state (substitute the state postal code for XX in the set names):

  1. XXnchsbridged19xx: detail data for 1990-1999.

  2. XXnchsbridged19xx_sumry: summary data for 1990-1999.

  3. XXnchsbridged20xx: detail data for 2000-2009. Original, post-censal estimates. (To get detailed inter-censal estimates for this decade use the usnchsbridged200x_intrcnsl data set and filter for the state).

  4. XXnchsbridged20xx_sumry: summary data for 2000-2009. Original, post-censal estimates.

  5. XXnchsbridged20xxi_sumry: summary data for 2000-2009. Inter-censal estimates.

  6. XXnchsbridged201x: detail data for 2010-20yy. Latest estimates, replaced with new version each year.

  7. XXnchsbridged201x_sumry: summary data for 2010-20yy. Summary version of previous set.

Annual Processing

Each year we now download a compressed file from the NCHS web site containing a huge txt file with the estimates for every county in the U.S. over the entire post-2010-census time period (starting with 7-1-10) for which the estimates are available. We run SAS conversion setups to (re)create a pair of datasets per state. (We write over the old data sets, creating new generations of data; estimates published in a previous year may change in a future estimate year, due to challenges or other revisions.) The two data sets are as follows:

  1. Detail data set: A direct transcription of the raw input file. Each observation here represents a set of July 1 estimates starting with 2010 and going through the latest-available year (currently 2011) for a specific county, single year of Age, race, sex and hispanic origin.
  2. Summary data set: The second dataset is a direct derivative of the first and contains summaries and restructuring of the raw data. It has the same name as the first dataset but with _sumry appended; so the two datasets for California are canchsbridged201x and canchsbridged201x_sumry.
  3. The category variables Sex, Race and Hispanic are gone; these 3 dimensions are now represented by variables Total, Male, Female, White, WhiteNH (white and non-hispanic), ...., Hispanic and NonHispanic. The variable Hispanic has gone from being a single-character category code ('1' just meant non-hispanic) to being a numeric variable giving us the count of hispanic persons. Note that in creating these summaries we lose cross-category detail: you cannot use the _sumry dataset to get age or race crossed with sex; the only crossing categories here are the 4 race by non-hispanic categories. The critical exception to this rule is Age, which remains a categorical row-identifier variable. Thus, you can get age crossed with any of the other demographic items from this dataset.
  4. Where the original dataset has the time dimension going across the row with a different variable for each year, in the _sumry dataset things have been transposed so that there is a category variable Year and each row represents data for the specified year.
  5. The Age_cohort1 and Age_cohort2 category variables have been added to the rows. Age_cohort1 takes on the 5 distinct values 00, 18, 25, 45 and 65 corresponding to the 5 broad age categories 0-17, 18-24, 25-44, 45-64 and 65+. The values of Age_cohort2 take on the 2-character lower limit of the (18) 5-year cohorts as used on the casrh datasets. So, for example, when the value of Age (indicating the single year of age value) is between 25 and 29 the value of Age_cohort2 is 25. These additional category variables are provided to make it possible to get data aggregated to these levels by using the aggregation feature of the Advanced Options section in Dexter.
  6. For each combination of geography and year there is a summary row/observation where all the Age variables (i.e., Age, Age_cohort1 and Age_cohort2) are blank. These rows are summaries across all ages. So if you want to find out the total number of hispanic persons by county for your state in 2009, code a filter (in Dexter) specifying that you want Year Equals 2009 AND Age Equals _ (enter the single underscore to indicate a blank value). Then be sure to keep the variable Hispanic, which will contain the estimated count of hispanic persons, regardless of age, for the geographic area indicated by SumLev and County for the year indicated by Year.

Code Values

These files use category codes that you need to know to interpret the data. These include both custom demographic category codes as well as standard FIPs geographic codes. Here are variables, the codes used and their meanings.

Access the Data Via Uexplore/Dexter

Access the data in the /pub/data/popests/nchsbri data directory. It will be much easier to find things if you navigate via the Datasets.html file in this directory.