Summary File 1 — Standard Extract, 2000 Census


This is the standard 225 (+/-) variable extract from the full 8000+ - variable full SF12000 filetype. Where sf12000 is made up of large, complex tables, sf12000x is made up of a collection of key indicator variables. The variables on sf12000 have names like p18i5 while the names on this collection are mnemonic like age5_9 and pct_vacant.

Summary File 1 is the first-published and most often used complete count summary product from the 2000 decennial census.

As of now (Sept. 2001) this file is also the most detailed of any data file based on the 2000 census that has been released. This will change later this year when SF2 (Summary File 2) is released, although the additional detail of that set of tables will be mostly of interest to those wanting to do in depth studies of racial or ancestral groups. For most data needs, SF1 will continue to the most practical source of information.
If you are not familiar with the general concepts behind Summary File 1, we suggest you look at our Readme file for our sf12000 filetype or at the technical documentation pointed to from that page.

The good thing about the sf12000 files is that they contain some very detailed tabulations for a lot of different geographic summary levels, all the way down to the census block. The bad thing about this is that the complete files are very complex and can be difficult to use. The information is there, but it often needs to be extracted and simplified for common uses. The purpose of these standard extract versions is to boil down the information to something that we hope most data users will find adequate for a great many applications.

Alternative Data Sources

For users needing more detail than what is contained in these extracts, the alternative source is the data in the full sf1 directory, sf12000. For a similar file of key indicators, there are the datasets in the sf1prof directory, but these are limited to governmental units (i.e., no summaries for anything more than states, counties and cities.) (On the other hand, we have that data for every state, county and city over a certain size in the whole country in a single data set.) It should be obvious that we modeled our standard extract on these Bureau-defined standard demographic profile sets.

Data Files: What Goes Where

If you have looked at the contents of the full sf12000 data directory, you will be familiar with the way that data was organized and broken down into multiple datasets for a geographic area. Different kinds of tables were stored in different datasets because some tables did not have any data below the level of census tract. The sf12000x sets are much simpler. In general, there are only a few data sets for each state, and you do not have to look in more than one data set to get all the extract variables available for a given geographic area.

As with most of the files in our data collection, the first 2 characters of the file/dataset name are the postal abbreviation of a state (or "us" for a national file). The rest of the name indicates what geographic entities are summarized within the dataset. We have a basic set of three such datasets per state. For Missouri we have the following data files:

Related Reports

There are two standard data product reports that use these data. The dp1_2k (Demographic Profile 1, 2000 Census) data product is accessible from a variety of locations on the web, with the main page here.
A closely related report is dp1_2kt, where the "t" stands for "trend". To access a trend report for an area you can follow the menu pages to the dp1_2k report and then the link at the bottom of that report page which will take you to the trend report (not available for all geographic summary levels or areas).

Displaying Block Level Reports

There are some who think it would be better if block level data could not be viewed online. We understand the reasons for this, but we also recognize that there are sometimes legitimate needs to take a quick look at data at the block level. You can do this with our dp1_2k application module, but only if you are willing and able to modify the URLs yourself -- they are not on any menus. The URL to display the report for block 1002 in Boone county, census tract 0001.00 is . If you follow the menus and go to the report for the census tract here the URL would be the same, except that the last part - "&bl=1002" would not be there. So the easy way to do it is to generate a report for a tract or block group, then edit the URL to get data for a block within the area.

The 1990 Population Variables

While these data sets primarily contain data collected in the 2000 Census, we have made one rather important exception to that rule, at least for Missouri data sets. We have attempted (and, in nearly all cases, succeeded) to link a population count for each area as reported in the 1990 Census on STF1. The 1990 population count is stored in TotPop90, and we then caculated the change and percent change variables, Change and PctChange, by subtracting the 1990 figure from the 2000 count. The 1990 population figure is unadjusted -- it is as reported in the original 1990 census summary files/reports. Also, in the case of political jurisdictions, we use the population of the area as it was defined in 1990. Thus, if the city of O'Fallon annexed areas that contained 15,000 people in 1990 but were not then part of the city, the figure we report does not include those 15,000 people. (In other data sets, such as the intercensal population estimates files from the Bureau, this is not handled this way; instead they use adjusted figures and attempt to adjust the older numbers so that they pertain to the current geographic boundaries.)
For smaller geographic units -- especially census tract, block group and block -- we have gone to some trouble to make sure that the area being summarized has the same spatial definition -- the current, 2000 definition. In many cases, tracts, block groups and blocks are totally redefined across the decade so it is difficult to get comparable data, to say what the change was in a given neighborhood. But we have used a geographic equivalency file to help us create data sets where 1990 census data has been retabulated into 2000 geographic units. These data sets are stored in the stf901 and stf901x2 filetype directories. Data sets within these directories whose names end with "00" are those that have been allocated to 2000 geography. Thus, for example, the data set motrs00.sas7bdat in the /pub/data/stf901x2 subdirectory contains data extracted from the original STF1 file from 1990 for Missouri, but those data have been retabulated so that they contain summaries for the 2000 census tract geographic units. It is from these data sets, that we took the 1990 population figures that are now stored as part of these sf12000x extracts. This has only been done for Missouri (we have no geographic equivalency files for other states, and building such a file is not trivial.) For Illinois and Kansas, we did match against the original 1990 STF1 files and where a tract or block group code matched, we assumed it was basically the same area and inserted the 1990 population count. However, this is not as reliable as we had at first thought. We did it this way for Missouri earlier, before we had completed doing the reaggregation of our 1990 data to 2000 geography. We were somewhat dismayed to discover that there were tracts in Boone county which had not changed their codes, but which had dramatically changed the land area they represented. This is against Census Bureau guidelines given to local tract boundary committees, but obviously they were only guidelines. Bottom line is that we cannot say just how reliable these 1990 population counts may be for other states. Where a new tract or block group was created, there will be a missing value for the 1990 population. This will always be the case at the census block level, since all census block codes are new for 2000.

Source Code

Want to see how we did it? You have to read SAS (it's much easier to read than to write), but you can view the source code used to define these variables right here. We always keep the code in the Tools directory within the filetype directory. In this case the file to look at is Here is a sample of what you will see there:
*--households by type--;
 femalehouseholder=p18i14; *-female headed family non-mc hhs;
 fem_childunder18=p18i15; *--female headed hh no husband w kids; 
From this you should be able to determine that the way we calculated the number of families with own children < 18 was by summing the 8th, 12th and 15th elements of table p18. On the other hand, most of the other items here were already in the original sf1 tables and we simply copied their values and gave them more mnemonic names. In order for this code to make sense, of course, you need to know what in the world "p18i8" is. That is documented reasonably well in a file called in the sf12000/Tools directory.


This is one of the more important and frequently accessed data directories in the archive. It contains an extract of about 225 variables, derived from over 8000 table cells on the full SF1 summary record (only a small fraction of those 8000 cells were used in the extraction, of course.) It is our belief that a majority of data users will be able to answer whatever questions they need answered from the SF1 data by using these data, without having to deal with the considerably more complex full SF1 collection. We feel that the addition of a 1990 population count with derived change measures represents an important enhancement to the utility of the data.