The Geocorr 1990 application allows you to create reports and/or comma-separated value (CSV) files showing various kinds of information about U.S. geography, especially census geography (i.e., things like census tracts and blocks — defined by the U.S. Census Bureau for the purpose of tabulating the decennial census).
Probably the most common application of Geocorr is the creation of a correlation list report showing how one kind of geography relates to another. For example, you can create a report showing how ZIP codes correspond to counties and cities (places) for the state of Missouri (or just for specified counties in Missouri.) This is the typical use, there are several others which are really just special cases. For example, you can specify that the target geography — normally, the set of geographic areas to which you want to compare the first, or source geography — is the "Entire Universe"; in this case, what the application is doing is simply listing a series of geographic codes and related names and 1990 population counts for a specified area. It's a very easy way to list all the ZIP codes in Kansas City (for example).
We'll present a set of simple examples to give you a flavor of the kinds of things that geocorr can do for you. You should spend some time with the extensive help page available for the application. If you are not familiar with one or more of the various geographic units shown in the geocode selection lists then you should follow the links to the geographic glossary page.
As you go though these examples, we suggest you enter the specifications as described and run the application, so you can see for yourself what the results will look like. You might even want to get daring and create your own examples. After reviewing the results of a run, use the back button to return to the form page and try changing one of the parameters and running the request again to see what difference it makes.
It is important to note that while Geocorr may appear to be overly complex for a casual user, the most typical applications (as illustrated, we hope, by these examples), require the user to specify only a small number of them. You can begin doing simple correspondences and geocode reference listings, and then when you master these basic applications, proceed to the more advanced options involving things such as circular areas, bounding boxes and the much-misunderstood "Concentric Ring Pseudo-Geocodes". You can get 80% of the functionality of the application using about 20% of the options. Casual and first-time users should begin by focusing primarily on the Input Options and Output Options sections. They are the only required specifications.
The "classic" application of this tool is to look at how one geographic layer (such as census tracts) relate to another layer (such as five-digit postal ZIP codes) in some geographic universe (such as Missouri). Make the following parameter selections (options flagged with a "(D)" represent default selections, meaning these will be chosen if you do not modify the selection that is preset by the application). Hit one of the several Reset Defaults buttons prior to entering these specifications for a new example in order to restore all the defaults. Note that normal action is for the application to remember what you have entered previously. But if you run the application and then upon seeing the results decided you just want to change one or two little options, this you can easily do.
|Source geocodes||County (1990), Census Tract/BNA (1990)||If you select census tract without selecting county, Geocorr will select it for you, since a tract without a county makes no sense.|
|Target geocodes||Five-digit ZIP/ZCTA||You should have lots of questions about this choice. What's the source? How current, complete? See the geographic glossary entry to get more information.|
|Weighting Var.||Population||Shows up in output report and file.|
|Have weighted centroids...||(check it)||See help file for details. Puts an x-y coordinate pair on each outline line/record.|
|Generate a CSV...||(check it)||Program will know to generate a comma separated value file with results.|
|Generate a listing||(check it)||Generates a report-format output file. This is what the human looks at. Check "Include both geographic codes and names" option here. Then you'll get names for the counties and the ZIP codes (but not for the tracts — census tracts are not named).|
|Geographic filter options|
|County codes||019||This is the three-digit FIPS county code for Boone co. Note that the "county codes" label is a link to code pages showing all FIPS county codes. Could also enter "29019", but since we have selected just one state we can get by with just the three-digit county code.|
Click on the Run Request button to initiate processing. A Perl script will examine what you entered on the HTML form to verify that there are no invalid or potentially dangerous characters being passed. It will then invoke SAS and pass it the form information and tell it to run the special geocorr SAS program. Geocorr should take anywhere from a few seconds to several minutes to execute, depending on system load and on your request. This example should take about 20 seconds if there is a normal load on the system. The hourglass will appear while it is running. When it finishes you should be presented with a menu screen labeled "Results of Query". This menu page will list the five output files produced by the request with a brief description of what is contained in each. You can almost always ignore the first of these files (the SAS program log). The summary log should always be checked for any warning or other messages regarding the query. It will provide information about when and where the application ran, what parameters were specified, the number of records written to the output files, how long it took, etc.
The important outputs are the geocorr.lst and geocorr.csv files. Click on each to view them in your browser. Use the browser to save them to a local file and/or to print them. Notice the dramatic difference in their formats, but the nearly identical nature of their contents. Because we asked for "codes and names" on the listing file and not on the CSV file, there will be some content difference in this case. But the basic data content is the same.
Look very carefully at the first two data lines (after the title and column header liens) of the listing file. These two lines have information about the first value of the source geocodes we requested — the first county-tract. It shows each of the values for the target geocode(s) that intersect with this area. In the example, we see that tract 0001.00 is partly in ZIP 65201 and partly in 65203. The degree of the intersection is measured by the weighting variable, 1990 total population. This small tract has only 430 people in it, and of these, about 408 lived in 65201 in 1990 and the other 22 lived in 65203. The AFACT (allocation factor) column shows the decimal portion of the source area contained in the target area — ".949" in the first line means that 94.9% of the tract is contained in the ZIP code for that line. This is based on 1990 population. If we had chosen Housing Units or land area for our weighting variable, we'd see different value for this factor.
Notice that many of the tracts appear on only one line — they correspond entirely to a single ZIP code. And notice that the values of the AFACT column always sum to 1.0 for all the lines corresponding to one tract.
The columns labeled INTPTLNG and INTPTLAT are poorly named. These appear as a result of our checking the option to have weighted centroids calculated and kept on the output file(s). Where do these come from? The geocorr program is working with a database that has observations at the 1990 census block level. Each observation has these internal-point coordinates indicating where the spatial centroid of the census block is located. When the program generates a line of the output files, it is really just combining information from all the blocks that are in the intersecting areas. The first line of the report comes from looking at all blocks that are in tract 0001.00 and in ZIP 65201 and summing the 1990 populations of those blocks. At the same time the program looks at the latitude-longitude coordinates of each block centroid and weights each by multiplying it by the population of that block. Prior to output after processing all blocks for a tract-ZIP pairing the program divides the weighted coordinate sums by the population total for the area, creating this weighted centroid. This location is biased towards where the people actually live within the area, rather than just on the geometry of the census blocks (if land area is chosen for the weighting variable, then the resulting weighted centroids are more of a spatial center).
The comma delimited (CSV) file can be browsed and then saved to your local disk with your browser's save as command. (You might even want to configure your browser to invoke a helper application to customize processing.)
After you save the file, you should be able to open it for processing by most spreadsheet and data base programs in Windows. Notice that the first line of the file contains the names of the fields — when you import these data into Excel or Lotus you'll see that these names appear as the first row of the spreadsheet. To get a more detailed description of what these variables are you can browse the varlst.lst file, the last entry on your Query Results page. This is usually a very short file, and in most cases we ignore it (because we already know what the fields are — but you may find it very helpful in trying to interpret what you have).
In this example, repeat all the options chosen in the first example, except for the following:
|State||Missouri, Illinois||Will need to hold down Ctrl key to select two items from the select list.|
|Target geocodes||County||Output will show places (cities) related to counties.|
|Weighting Var.||Housing Units||Instead of default (population).|
|Generate second allocation factor...||(check it)||Turn on. This will cause the program to do double work in terms "allocation factors". Now we get the portion of the source codes in the targets, *and* the portion of the targets in the source areas.|
|Generate a CSV...||(uncheck it)||It'll run a little faster without it, so if you don't need it....|
|Geographic filter options|
|County codes||(blank)||Will not be filtering at county level.|
|Metro areas||7040||Selects St. Louis MSA.|
Hit one of the Run Request buttons to submit the new request. Follow the usual procedure to view your output elements by clicking on the filenames on the query results page. The key output is the listing file. What you should see if you entered the options as specified is a rather long report that lists all of the cities (places) in the St. Louis MSA (including the Illinois side.) It is not really much of a report in terms of showing any geographic correlation. Mostly, it simply tells you what county each place is located in. There are a few cases of a place being in more that one county, in which case it shows you what portion of the place is in each of the counties. Note that the value of AFACT2 represents the portion of the county that is in the place (so we see that about 14% of the population of Madison county, Ill is in the city of Alton).
Remember that everything is frozen in the 1990 time-frame. The data you see for O'Fallon, Mo. is based on the boundary of that city as defined for the 1990 census; it is not the current definition of that place. Likewise, of course, the housing unit (weight variable) counts are from the 1990 census.
If you are familiar with the St. Louis metro area, you might expect to find the cities of Troy and Warrensburg, Missouri in this report. These cities are in Lincoln and Warren county, which were added to the official metro area (MSA) definition in 1992. But the metro codes stored in the MABLE database are as of the 1990 census so these two counties will not be selected. You could fix this by going back and entering the FIPS codes for the two "missing" counties in the box provided for filtering by county.
While the primary purpose of Geocorr is to look at the relationships between different geographic layers, it can also be quite useful as a tool for simply looking at a single geographic layer. In this example we use the "Entire Universe" option for the target geocodes, essentially telling the program that we have no target codes. In this case our output will show simply the geographic codes and related names for the source geography, along with the value of the weighting variable (1990 Total Population in this example). If necessary, hit a Reset Defaults button before starting these specs.
|State||[your state]||Remember, you must select at least one.|
|Source geocodes||County, Metro Area||You are requesting a listing of the counties (or county equivalents) and the corresponding MSA/CMSA areas. If you are unfamiliar with the MSA/CMSA concept go to the geographic glossary file and read the explanations there.|
|Target geocodes||Entire Universe||Basically, this says you don't want any target layer(s); you just want to know about the source geographic areas in their entirety.|
|Weighting Var.||Population||Shows up in output report and file.|
|Ignore zero...||(check it)||You should almost always choose this.|
|Generate a CSV...||(uncheck it)|
|Geographic filter options|
|Generate a listing||(check it)||Generates a report-format output file. You must check either the CSV file option or this one — otherwise you have no output.|
|Include both geographic codes and names||(select this)||Important for this kind of request. You want to see both the codes and the names associated with those codes.|
Leave all other parameters and options unspecified. Click on the Run Request button to initiate processing. Wait patiently for the Query Results page to come back to you.
The important output is the geocorr.lst report file. Click on it to see the report. It should be sorted by (state and) county. The column labeled COUNTY contains the five-digit FIPS code and the field labeled COUNTYNM has the name of the county (including the state abbreviation.) These columns are followed by the MSACMSA and MSANAME field with comparable data (code and name) for the metropolitan area. For counties or portions of counties (only in New England) falling outside any metropolitan area you'll see the code "9999" with "Non-metro" for the name. The POP column contains the 1990 complete count population for the county/metro area. For all but a few counties in New England, this figure will represent the population of the entire county. The AFACT (allocation factor) column is a constant "1.000" as it always will be when "Entire Universe" is specified for the target geocode.
As an optional exercise for the more serious Geocorr user you might try rerunning this request but with the following changes:
In this case what you will see is a report very much like the one you just generated in that it will be counties within metro areas on each line of the report. But the AFACT values will now almost all be less than 1.0. Remember that AFACT is defined as the "portion of the area defined by the source geocodes contained within the area defined by the target areas". In this instance, it become the decimal fraction portion of the county population which is also included in the metro area. (In the original example, AFACT represented the portion of the county-metro combination that was contained with the Entire Universe.)
The usual precautions about printing a document using a web browser apply here. You may need to tweak some of your options — such as your fixed font size — to get this report to print without truncation. In some rare cases you may need to bring the report down to a local file and use a word processor to format and print it just the way you like it. But unless you have specified a large number of geocodes, this will rarely be necessary. In most cases, simply using the print button on your browser should display the report quite well.
In this example, we finally look at some of the options pertaining to the use of x-y coordinates for filtering the data. Specifically, what we'll want to do is determine the total 1990 population living within a 30-mile radius of the city of Washington, Mo. Actually, we'll break that population down by county. Hit the Reset Defaults button and let's start over with the options for this sample. Any option not mentioned, just leave it with the default setting.
|State||Missouri||If the n-mile circle went outside the state we would not pick up those populations.|
|Source geocodes||County||When you choose county, state is implied.|
|Target geocodes||Entire Universe||We're not really doing a correlation list in this example. We just want the sum of our weight variable — the 2000 total population — for the circle we'll specify.|
|Include both geographic codes and names||(select this)||Select this for listing file, so we'll know what the counties are.|
|Point and distance options|
|Coordinates of point||Latitude 38.545881, Longitude 91.019346||A leading "-" on the longitude is optional. West longitude is assumed.|
|Label for point||Washington, Mo||Not required but useful.|
|Radius of circle or largest ring||30||This means 30 miles.|
Hit the Run Request button to run the job. You have told Geocorr to find 1990 census blocks whose centroids are within 30 miles of a specified point, which we hope is near the center of the city of Washington, MO. We have specified that we want to look at the relationship of counties to the Entire Universe for this geographic area. If you read the fine print, you'll be told to expect some extra items in your report when you specify a point and radius. The intptlng and intptlat variables contain the weighted average of the block centroid coordinates for all census blocks that were aggregated to create the output summary line. These are of value only as a general indicator of the "center" of this geographic intersection. The distance variable is the distance (in miles or kilometers, depending on the option you selected on the form — miles, in our example) between the specified point and (intptlat,intptlng). It thus represents sort of an "average" distance.
The POP item on the output file represents the sum of the block populations for all the census blocks used to create the geographic summary area. In this case the output line for Franklin county has a POP figure that is the total of 1990 population for all blocks that are both in Franklin county and within 30 miles of our point. To get the overall total population for the 30-mile circle, we shall need to add all POP figures from our output report. Or, we could go back and rerun the application and choose state instead of county as our source geocode; then we would get only a single output line — the 30-mile circle intersected with the state.
A very common request is to determine demographic profiles of circular areas about a given location (typically, the location is an existing or proposed site for a business, school, service center, etc). In this example, we see how Geocorr can be used to extract the required block group geography that will tell you what geographic areas you will have to aggregate to get your demographic profile. Geocorr does not (currently) link to any detailed demographic data and does not produce any profiles, but it does provide you with a key component of such an application by selecting the geographic units for the circular or ring ("donut") areas.
In this example, we'll determine the latitude, longitude coordinates of the UM St. Louis (UMSL) campus in St. Louis county, MO. We'll generate a CSV file containing all the block group codes within a series of concentric "rings" about that site. Hit the Reset Defaults button and let's start over with the options for this sample. As usual, any option not mentioned should be left with its default setting.
|Source geocodes||Block group||County and tract will also be selected for you.|
|Target geocodes||Concentric Ring Geocode||Geocorr will assign the ring code dynamically based on x-y coordinates and series of ring values specified below.|
|Ignore blocks...||(check it)||Almost always saves time to ignore blocks with no weight.|
|Generate listing...||(uncheck it)||Only want CSV file, do not want report.|
|Sort by target...||(check it)||So the output is sorted by the ring geocode numbers first, then by block group.|
|Use tabs on CSV||(check it)||Output file will have tabs between fields instead of commas.|
|Point and distance options|
|Coordinates of point||Latitude 38.70763, Longitude -90.31118||A leading "-" on the longitude is optional. West longitude is assumed.|
|Label for point||UMSL||University of Missouri St. Louis.|
|Custom list of ring radii||#1: 1; #2: 3; #3: 5||Fill in the ring radii in ascending order. Note that we do not enter the radius value or the number of equidistant rings.|
Hit the Run Request button. Wait... Browse the summary.log and geocorr.csv files to see the results. Note how the fields "line up" when browsing the CSV file with tabs as delimiters. Also browse the very small varlist.lst file to see what it contains. Basically, just labels for the variables on the CSV file. Print it if you need it for documentation. Notice how the point label "UMSL" appears in label for the distance variable.
If you were going to do this a lot or you were in a big hurry you could make this example run somewhat faster by telling the program which county or counties your circles fall within. In this case, we could have gone down near the bottom of the form to the Geographic Filtering Options section. There, in the text box for the County codes we could have entered "189 510" to specify that we wanted to restrict our query to the 2 counties with these FIPS codes. These are the codes for St. Louis county and St. Louis city. Do not do this, of course, unless you are sure that your circle will not go beyond the counties entered (or unless you don't care and want to limit the search to these counties anyway).
A typical use of such an output file would be to save it from your browser to a file and then to bring it into another program where you would use it to select all the block groups it contains from a data extract file (from STF3, for example). Then you could sum the numbers for those block groups (multiplied by the values of the AFACT variable to "allocate" the data when a BG is in more than one ring.) We hope some day to enhance this application to allow this kind of post-processing to be integrated into a system which uses the geocorr application.