These estimates were commissioned by the National Center for Health Statistics and generated for them by the U.S. Bureau of the Census. They are alternative versions of the estimates done by the Bureau in the "casrh" series - county, age, sex, race and hispanic origin. These are annual population post-censal (mostly - with some inter-censal) estimates at the state and county levels for the 4 demographic categories mentioned (age, sex, race and hispanic origin). These numbers have been generated for the years 1990 through (at the time of this writing) 2011, with new values being generated with an approximate 1-year lag. (For example, the estimates for July 1, 2013 should be available circa July, 2014.) These estimates differ from the standard casrh estimates in two critical ways:
- The race categories are very different. The "bridged race" categories used on these files are:
. There is no separating the Asians from the Native Hawaaians or other Pacific Islanders, and there is no "Other" race category -- these have all been allocated to one of the other 4 categories (in a process called "bridging"). Note that there are no multi-race categories. Using bridging techniques all persons who indicated they were of multiple races were re-assigned to a single race group. Detailed methodology is available from the NCHS web site.
- Black or African American
- American Indian, Eskimo, Aleut
- Asian & Pacific Islander.
- While the commonly-available numbers in the casrh series use 5-year age cohort categories, these estimates are for single years of age, except for the 85-and-over category.
These numbers are derived from the same basic source as the other official Census Bureau population estimates, and where comparable demographic categories are used, the numbers should match. For example, if you sum all the estimates for hispanic persons for a given county across the Age categories 00 through 04, you should get the same number that appears for that 0-4 cohort, hispanic, for that county. Detailed methodology descriptions is available at NCHS File Documentation web site.
We now have these estimates for all of the decades of the 1990's and 2000's (2000-2009), and we add post-2010 data as it becomes available. For the decade 2000-2009 we offer both the original post-censal estimates, as well as updated inter-censal estimates (see the Census Bureau web page regarding these updates if you are not familiar with the concept.)
In addition to all these between-census, July 1 estimates NCHS also provides comparable data based on the 2000 and 2010 decennial censuses. We stored all these decennial data in single national data sets per year. They look just like one of the estimates datasets, except that they have only a single numeric population count rather than a time series of estimates. Summary (_sumry) versions of these dat asets have also been created. These sets are named usbridged2kcen and usbridged2k_sumry for the 2k (2000) census and usbridged2010cen and usbridged2010_sumry for the 2010 census.
We currently have seven data sets per state (substitute the state postal code for XX in the set names):
XXnchsbridged19xx: detail data for 1990-1999.
XXnchsbridged19xx_sumry: summary data for 1990-1999.
XXnchsbridged20xx: detail data for 2000-2009. Original, post-censal estimates. (To get detailed inter-censal estimates for this decade use the usnchsbridged200x_intrcnsl data set and filter for the state).
XXnchsbridged20xx_sumry: summary data for 2000-2009. Original, post-censal estimates.
XXnchsbridged20xxi_sumry: summary data for 2000-2009. Inter-censal estimates.
XXnchsbridged201x: detail data for 2010-20yy. Latest estimates, replaced with new version each year.
XXnchsbridged201x_sumry: summary data for 2010-20yy. Summary version of previous set.
Each year we now download a compressed file from the NCHS web site containing a huge txt file with the estimates for every county in the U.S. over the entire post-2010-census time period (starting with 7-1-10) for which the estimates are available. We run SAS conversion setups to (re)create a pair of datasets per state. (We write over the old data sets, creating new generations of data; estimates published in a previous year may change in a future estimate year, due to challenges or other revisions.) The two data sets are as follows:
- Detail data set: A direct transcription of the raw input file. Each observation here represents a set of July 1 estimates starting with 2010 and going through the latest-available year (currently 2011) for a specific county, single year of Age, race, sex and hispanic origin. View the sample listing of the first 200 observations of the Missouri nchsbridged dataset. Note that it starts right out with data for the first county (29001=Adair: the variable stores County as a 5-digit FIPS code but we specified a format code of $county. in the Formats text box in Sec. V.c of the Dexter query form) in the state and has 86 rows/observations with code values of 1 for sex, race and Hispanic. So these 86 rows are estimates for white male non-Hispanics by single years of age. In obs 87 we see the value of Sex is now 2 (female) and we now get 86 rows of estimats for white female non-Hispanics. The dataset has only a few variables: just the 2 geographic (state & county) and 4 demographic ID variables (age, sex, race, hispanic) and the PopJLxx time-series estimates. But it has a great many rows. It is summarized data but the detail is such that it almost resembles microdata. The dataset is rewritten each year and a new estimate (year) is added to the time series, while the time-series variables that were already there get "refreshed".
- Summary data set: The second dataset is a direct derivative of the first and contains summaries and restructuring of the raw data. It has the same name as the first dataset but with _sumry appended; so the two datasets for California are canchsbridged201x and canchsbridged201x_sumry. We have placed a sample listing of part of one of our _sumry datasets in the nchsbri directory. The _sumry dataset has fewer rows and more variables than the original dataset. Important distinctions include
- It contains summaries at the state level, aggregated up from county level data. A SumLev variable has been added to the _sumry dataset, and it takes on values of
040to indicate a state level summary and
050to indicate a county level summary.
- The category variables Sex, Race and Hispanic are gone; these 3 dimensions are now represented by variables Total, Male, Female, White, WhiteNH (white and non-hispanic), ...., Hispanic and NonHispanic. The variable Hispanic has gone from being a single-character category code ('1' just meant non-hispanic) to being a numeric variable giving us the count of hispanic persons. Note that in creating these summaries we lose cross-category detail: you cannot use the _sumry dataset to get age or race crossed with sex; the only crossing categories here are the 4 race by non-hispanic categories. The critical exception to this rule is Age, which remains a categorical row-identifier variable. Thus, you can get age crossed with any of the other demographic items from this dataset.
- Where the original dataset has the time dimension going across the row with a different variable for each year, in the _sumry dataset things have been transposed so that there is a category variable Year and each row represents data for the specified year.
- The Age_cohort1 and Age_cohort2 category variables have been added to the rows. Age_cohort1 takes on the 5 distinct values 00, 18, 25, 45 and 65 corresponding to the 5 broad age categories 0-17, 18-24, 25-44, 45-64 and 65+. The values of Age_cohort2 take on the 2-character lower limit of the (18) 5-year cohorts as used on the casrh datasets. So, for example, when the value of Age (indicating the single year of age value) is between 25 and 29 the value of Age_cohort2 is 25. These additional category variables are provided to make it possible to get data aggregated to these levels by using the aggregation feature of the Advanced Options section in Dexter.
- For each combination of geography and year there is a summary row/observation where all the Age variables (i.e., Age, Age_cohort1 and Age_cohort2) are blank. These rows are summaries across all ages. So if you want to find out the total number of hispanic persons by county for your state in 2009, code a filter (in Dexter) specifying that you want
(enter the single underscore to indicate a blank value). Then be sure to keep the variable Hispanic, which will contain the estimated count of hispanic persons, regardless of age, for the geographic area indicated by SumLev and County for the year indicated by Year.
- Year Equals 2009
- Age Equals _
These files use category codes that you need to know to interpret the data. These include both custom demographic category codes as well as standard FIPs geographic codes. Here are variables, the codes used and their meanings.
- Age: 00=Less than a year old 01=1 year old ... 84=84 years old 85='85 and over' .
Note that these are stored as 2-byte character strings, not as numerics. Age can have a blank value on _sumry datasets to indicate a summary across all ages.
- Age_cohort1 and Age_cohort2 (_sumry datasets only): See explanation above.
- Sex: 1=Male 2=Female
- Race: 1=White 2=Black or African American 3=American Indian, Eskimo or Aleut 4=Asian or Pacific Islander .
- Hispanic: 1=Non Hispanic 2=Hispanic .
Note: On the _sumry datasets the variable Hispanic is a numeric count of hispanic persons rather than a category variable.
- Sumlev (_sumry datasets only): 040=State 050=County .
- County : these are 5-character FIPS county codes. You can view these codes for any state by going to the Cure for the Common Codes home page and clicking on the state. Note that there is no format associated with this code so when you do an extract you will get the code instead of the name. If you prefer to get the county name you can specify that the (MCDC custom) format code $county be associated with this variable by entering
in the Format text box in Section V.c of the Dexter query form.
Access the Data Via Uexplore/Dexter
Access the data in the /pub/data/popests/nchsbri data directory. It will be much easier to find things if you navigate via the Datasets.html file in this directory.
This file last modified Monday August 05, 2013, 11:44:48