OverviewThis is the standard 225 (+/-) variable extract from the full 8000+ - variable full SF12000 filetype. Where sf12000 is made up of large, complex tables, sf12000x is made up of a collection of key indicator variables. The variables on sf12000 have names like p18i5 while the names on this collection are mnemonic like age5_9 and pct_vacant.
Summary File 1 is the first-published and most often used complete count summary product from the 2000 decennial census.
As of now (Sept. 2001) this file is also the most detailed of any data file based on the 2000 census that has been released. This will change later this year when SF2 (Summary File 2) is released, although the additional detail of that set of tables will be mostly of interest to those wanting to do in depth studies of racial or ancestral groups. For most data needs, SF1 will continue to the most practical source of information.
If you are not familiar with the general concepts behind Summary File 1, we suggest you look at our Readme file for our sf12000 filetype or at the technical documentation pointed to from that page.
The good thing about the sf12000 files is that they contain some very detailed tabulations for a lot of different geographic summary levels, all the way down to the census block. The bad thing about this is that the complete files are very complex and can be difficult to use. The information is there, but it often needs to be extracted and simplified for common uses. The purpose of these standard extract versions is to boil down the information to something that we hope most data users will find adequate for a great many applications.
Alternative Data SourcesFor users needing more detail than what is contained in these extracts, the alternative source is the data in the full sf1 directory, sf12000. For a similar file of key indicators, there are the datasets in the sf1prof directory, but these are limited to governmental units (i.e., no summaries for anything more than states, counties and cities.) (On the other hand, we have that data for every state, county and city over a certain size in the whole country in a single data set.) It should be obvious that we modeled our standard extract on these Bureau-defined standard demographic profile sets.
Data Files: What Goes WhereIf you have looked at the contents of the full sf12000 data directory, you will be familiar with the way that data was organized and broken down into multiple datasets for a geographic area. Different kinds of tables were stored in different datasets because some tables did not have any data below the level of census tract. The sf12000x sets are much simpler. In general, there are only a few data sets for each state, and you do not have to look in more than one data set to get all the extract variables available for a given geographic area.
As with most of the files in our data collection, the first 2 characters of the file/dataset name are the postal abbreviation of a state (or "us" for a national file). The rest of the name indicates what geographic entities are summarized within the dataset. We have a basic set of three such datasets per state. For Missouri we have the following data files:
- moi.sas7bdat : the "i" stands for "inventory". In the census terminology, an inventory summary is one for a complete geographic area, such as a census tract or a block group. The inventory file for Missouri has summaries for the state 040), counties (050), county subdivisions (townships - 060), complete places (cities - 160), places within counties (155), census tracts (140), block groups (150), congressional districts (106th congress - 500), MSA's (Metropolitan Statistical Areas - as defined at the time of the 2000 census -390), and 5-digit ZCTA's -- both complete and within county (871 and 881). The 3-digit codes in the previous sentence are the summary level codes that you can use to extract data for just those geographic types. For example, to select census tract data you should specify on the Filter Specifications and Sort Criteria page of the uexplore/xtract application:
SumLev Equal to(=) 140or, to select county, tract and block group summaries for Greene County (county FIPs code 29077) your filter would be:SumLev In List 050:140:150 And County Equal to(=) 29077
- moh.sas7bdat: the "h" stands for "hierarchal". While there is generally more interest in data by complete census tracts than there is for the portion of a tract within a specific community (township and/or place), for certain users and applications the finer hierarchal geography may be needed or preferred. So we keep these, but on a separate dataset because there are a great number of them. The summary levels we keep on the moh dataset include 070, 080, 091 and 158. To see what these are about see the Summary Level Sequence Chart from the SF1 Technical Documentation.
Typically, you would use one of these hierarchal levels if you wanted data to be restricted to the portion within a given place. So, for example, to extract data at the block group level (split by township) for the city of Ashland, MO. you would extract from this moh data set and your filter would be:SumLev Equal to(=) 091 And PlaceFP Equal to(=) 02242You can look up the FIPS place codes for Missouri in various places, including the pages used in our Summary File 1 Profiles at URL geographic codes lookup for Missouri. Since Ashland has fewer than 10,000 people you need to scroll down to the end of the Mo places and click on "Other MO Cities (less than 10K)".
(Change "places" to "counties" in this URL to view the FIPS county codes. These pages have codes for the entire U.S.)
- moblks.sas7bdat: There are over 279,300 geographic areas summarized on the full SF1 file for the state of Missouri. Of these 261,992 -- 93.8% -- are census blocks summaries. So it just makes a lot of sense to put these on a separate file (just as it probably would have made sense to do this on the full sf12000 datasets, but that would have created more complexity in exchange for some efficiency in access and we opted for making it easier for us rather than the computer.)
When extracting from the moblks data set a good thing to keep in mind is that 35% of all blocks in Missouri have no population. Thus, for many applications, you may want to ignore these empty areas. The xtract application makes it easy to do this filtering by simply specifying that you want the variable TotPop to have a value greater than 0. A query to select all census blocks in Adair county that have some population would look like:TotPop Greater Than (>) 0 And County Equal to(=) 29001You need to be careful when extracting from this data set, since it is quite easy to create extracts that are simply too large to process. Filtering is critical, as well as being specific about what variables you want to keep.
- us.sas7bdat: This is what the Bureau calls the SF1 "Advance National File", as released in December, 2001. ("Advance" as distinguished from the "Final" version, which will have Urbanized Areas and urban/rural pop and housing counts which this Advance file does not.) Here is where you go when looking for data for the U.S. (totals - SumLev=010), or for any state(040), county(050), city(160), township(060), metropolitan area(380,381,385,386), or ZIP code(860) in the entire United States.
Related ReportsThere are two standard data product reports that use these data. The dp1_2k (Demographic Profile 1, 2000 Census) data product is accessible from a variety of locations on the web, with the main page at http://mcdc2.missouri.edu/websas/dp1_2kmenus/.
There are also a series of geographic comparison reports that display smaller amounts of these data items but for a collection of geographic units within a geographic universe (state or county). See http://mcdc2.missouri.edu/webrepts/dp1_2k/indexcr.html to see what these look like. Currently (Dec. 2001), we have these just for Missouri.
A closely related report is dp1_2kt, where the "t" stands for "trend". To access a trend report for an area you can follow the menu pages to the dp1_2k report and then the link at the bottom of that report page which will take you to the trend report (not available for all geographic summary levels or areas). There is also a menu set similar to what we have for the dp1_2k report series. See, for example, the report for St. Charles county.
Displaying Block Level ReportsThere are some who think it would be better if block level data could not be viewed online. We understand the reasons for this, but we also recognize that there are sometimes legitimate needs to take a quick look at data at the block level. You can do this with our dp1_2k application module, but only if you are willing and able to modify the URLs yourself -- they are not on any menus. The URL to display the report for block 1002 in Boone county, census tract 0001.00 is http://mcdc2.missouri.edu/cgi-bin/broker?_PROGRAM=websas.dp1_2kt.sas&_SERVICE=sasapp&st=mo&co=019&tr=0001.00&bl=1002 . If you follow the menus and go to the report for the census tract here the URL would be the same, except that the last part - "&bl=1002" would not be there. So the easy way to do it is to generate a report for a tract or block group, then edit the URL to get data for a block within the area.
The 1990 Population VariablesWhile these data sets primarily contain data collected in the 2000 Census, we have made one rather important exception to that rule, at least for Missouri data sets. We have attempted (and, in nearly all cases, succeeded) to link a population count for each area as reported in the 1990 Census on STF1. The 1990 population count is stored in TotPop90, and we then caculated the change and percent change variables, Change and PctChange, by subtracting the 1990 figure from the 2000 count. The 1990 population figure is unadjusted -- it is as reported in the original 1990 census summary files/reports. Also, in the case of political jurisdictions, we use the population of the area as it was defined in 1990. Thus, if the city of O'Fallon annexed areas that contained 15,000 people in 1990 but were not then part of the city, the figure we report does not include those 15,000 people. (In other data sets, such as the intercensal population estimates files from the Bureau, this is not handled this way; instead they use adjusted figures and attempt to adjust the older numbers so that they pertain to the current geographic boundaries.)
For smaller geographic units -- especially census tract, block group and block -- we have gone to some trouble to make sure that the area being summarized has the same spatial definition -- the current, 2000 definition. In many cases, tracts, block groups and blocks are totally redefined across the decade so it is difficult to get comparable data, to say what the change was in a given neighborhood. But we have used a geographic equivalency file to help us create data sets where 1990 census data has been retabulated into 2000 geographic units. These data sets are stored in the stf901 and stf901x2 filetype directories. Data sets within these directories whose names end with "00" are those that have been allocated to 2000 geography. Thus, for example, the data set motrs00.sas7bdat in the /pub/data/stf901x2 subdirectory contains data extracted from the original STF1 file from 1990 for Missouri, but those data have been retabulated so that they contain summaries for the 2000 census tract geographic units. It is from these data sets, that we took the 1990 population figures that are now stored as part of these sf12000x extracts. This has only been done for Missouri (we have no geographic equivalency files for other states, and building such a file is not trivial.) For Illinois and Kansas, we did match against the original 1990 STF1 files and where a tract or block group code matched, we assumed it was basically the same area and inserted the 1990 population count. However, this is not as reliable as we had at first thought. We did it this way for Missouri earlier, before we had completed doing the reaggregation of our 1990 data to 2000 geography. We were somewhat dismayed to discover that there were tracts in Boone county which had not changed their codes, but which had dramatically changed the land area they represented. This is against Census Bureau guidelines given to local tract boundary committees, but obviously they were only guidelines. Bottom line is that we cannot say just how reliable these 1990 population counts may be for other states. Where a new tract or block group was created, there will be a missing value for the 1990 population. This will always be the case at the census block level, since all census block codes are new for 2000.
Source CodeWant to see how we did it? You have to read SAS (it's much easier to read than to write), but you can view the source code used to define these variables right here. We always keep the code in the Tools directory within the filetype directory. In this case the file to look at is sf12000x.sas. Here is a sample of what you will see there:
From this you should be able to determine that the way we calculated the number of families with own children < 18 was by summing the 8th, 12th and 15th elements of table p18. On the other hand, most of the other items here were already in the original sf1 tables and we simply copied their values and gave them more mnemonic names. In order for this code to make sense, of course, you need to know what in the world "p18i8" is. That is documented reasonably well in a file called phlabels.sas in the sf12000/Tools directory.*--households by type--; tothhs=p15i1; families=p18i6; fam_childunder18=p18i8+p18i12+p18i15; married_couple=p18i7; marrcouple_childunder18=p18i8; femalehouseholder=p18i14; *-female headed family non-mc hhs; fem_childunder18=p18i15; *--female headed hh no husband w kids;
SummaryThis is one of the more important and frequently accessed data directories in the archive. It contains an extract of about 225 variables, derived from over 8000 table cells on the full SF1 summary record (only a small fraction of those 8000 cells were used in the extraction, of course.) It is our belief that a majority of data users will be able to answer whatever questions they need answered from the SF1 data by using these data, without having to deal with the considerably more complex full SF1 collection. We feel that the addition of a 1990 population count with derived change measures represents an important enhancement to the utility of the data.
Missouri Census Data Center