| Summary File 3 - 2000 Census: MCDC Standard Extract
MCDC Filetype: sf32000x
If you are interested in median family income by the age and/or race of the householder, this is not the place to look. Here, all you will find is the simple median family income. You will also see race and age data, but not cross-classified. We believe that a relatively small, carefully chosen subset of the available data can be used to answer a large percentage of user's questions. So we have spent considerable time considering what variables we wanted to include in these extracts. It is like creating a Greatest Hits collection; noone is going to agree with all your choices. We had direct input from a number of people regarding what to include here. If we had included every item that at least one person thought should be included we would easily have had over a thousand variables. We wanted to keep our count under 250. At last count (we reserve the right to tweak these files by adding new variables from time to time -- we'll never drop one, however) we had 217 independent variables and another 188 derived percentages.
Just because a variable is not included on this dataset does not mean it is not an important piece of information. Census data are used by so many people for so many different kinds of applications that there is no way you can create an extract that serves everyone's needs for all requests. That is not the intention. We still have the more detailed table files to go to, and we have created tools for providing links directly from this extracted data to the more detailed "parent" tables. (Look for these links on the corresponding profile reports, described below.)
The primary intended use of this collection is to generate profile reports based upon the data. These profiles should provide a basic overview of the area. What kind of people live there? What are their ages, their racial breakdown, their income levels and poverty status; their propensity to own versus rent, the age of the housing, how long have people lived there, the number of PhD's and the number who never made it through high school. Access to these reports is available using the MCDC's menu-driven dp3_2k web application. A sample report for Washington has been saved to the sf32000x filetype directory. The reports are normally generated on the fly using a cgi-bin program that reads the sf32000x database.
Another series of profile reports combine data from these 2000 demographic profile datasets with comparable data from the 1990 census. The MCDC's menu-driven dp3_2kt web application is one of the most popular on the MCDC web site.Variables.pdf data dictionary file to see how the variables are assigned to "Tables"; these correspond to where they are used in the Profile reports, of course) are also available in our corresponding SF1 standard extract datasets. There is a variable there named Age20_24, just as there is one on this dataset with that name; both are attempts at reporting the number of people in their early 20's residing in an area. The value of this variable on this dataset (sf32000x) for Washington, MO is 721, while the corresponding value on the sf12000x dataset is 765. The difference is what statisticians call "sampling error". Here is the first part of Table 1 for Washington, from out dp3_2k standard profile report:
The "Total Persons (Sample Est)" field is the population of the city based on the sample estimate. The actual enumerated population was 13,243 and 1,598 of these people (unweighted sample count) got the long form questionnaire, which was 12.1 percent of the total. All the SF3 "sample" data are based on the responses of these 1,598 people. The estimate is off by 151 people, or about 1.1%. The sampling error tends to be higher for smaller geographic areas. It is not a serious problem for Washington, but it could be for a city of less than 1000. The Bureau actually oversamples in places of under 2500, but it can still be a problem.
Each person who fills out the long form is assigned a "sampling weight" value. The average sampling weight for persons in Washington was about 8.25 (the 12.1% sampling rate is about 1 in 8.25 people). That SF3 count of persons aged 20-24 was derived by counting each person filling out the long form who checked their age as being in that interval not as one person but rather as 8.25 persons (on average -- the sampling weights will actually vary from person to person, it's a very complex sampling scheme.) The bottom line here is that the figures on SF1 are somewhat more accurate that those on SF3 (complete count data is "better" than sample-based data). The problem with using the SF1 data is one of consistency when trying to analyze an area. It can be pretty confusing to users when two tables describing the total population or even some subset of it, do not add up to the same totals. For example, when we report the Marital Status data in Table 6 (sample data not available on SF1) we use as our universe persons aged 15 and over. This is the sample estimate of such persons, consistent with the five marital status counts that follow it. These differences tend to disappear for higher level summaries (states, larger cities and counties) but can be very significant for smaller geographic areas. The simple fact of the matter is that you really have to take sample census data for areas of less that a few thousand people with a grain of salt. The sampling errors for such areas can be significant. Sample data for areas with fewer than 100 people are basically worthless, as far as knowing the characteristics of those areas. They do have considerable value as building blocks to aggregate to larger entities such as school districts or 10-mile radii of proposed nuclear reactor sites. The sampling error you get when you aggregate 100 areas of 100 persons each is equivalent to the error for a place with 10,000 people - i.e. not bad in most cases. But watch out for tables with small universes.
Users might want to view the Census Bureau's note regarding this matter as it relates to 2000 census data.uexplore web application page. But in case you did not, the URL for accessing the data (and some related metadata, including this page) via uexplore is http://mcdc.missouri.edu/cgi-bin/uexplore?/pub/data/sf32000x. A lot of the information provided on this Readme page assumes that you are familiar with the uexplore application and that you are interested in using it to extract data from the MCDC's sf32000x data collection. If you are new to uexplore and Dexter, you may want to at least look at the uexplore overview page before continuing here. (Note: Dexter is the actual extraction program; uexplore is the navigation program that lets you select datasets from which to extract.)
By far the easiest way to access the various datasets within the directory is to click on the Datasets.html page within the sf32000x directory. Here you will find a more logical ordering of the datasets along with much more detailed descriptions and metadata references.
To access datasets with complete SF3 table files you need to access the sf32000 filetype, which means using the same URL as for this sf32000x filetype and just dropping the final x. You may want to access the MCDC's SF3 home page (which is also the Readme file for the sf32000 directory.) That file will provide more general background about the Summary File 3 2000 data, with access to complete Technical Documentation. We understand that for many -- perhaps the great majority -- of users, all that information will be a lot more than they may have the time or interest to digest. This standard extract has been created mostly for those people.
Questions about this page or or about Summary File 3 in general should be addressed to Glenn Rice at OSEDA.