Missouri Census Data Center

Dexter Xsample: Detailed Query Description

Query ID: acsbasetabs1


We want to generate a report for the nation by state, looking at poverty rates for whites vs. blacks by gender and age (under 18, 18-64, 65 and over). Access the basic information in a detailed American Community Survey (vintage 2006) table and use a postprocessing tool (either Excel or SAS) to create the final report product.

Filetype accessed: acs2006 -- This collection of datasets includes the detailed tables in the summary subdirectory as well as the much smaller dataset(s) with profiles data. This example works with the detailed "base" tables in the summary subdirectory.

Data Set accessed: ustabs17_20 . See detailed metadata for this data set, noting values of key variables SumLev and Geocomp.

Dexter features used: Nothing very fancy. Instead, we focus on pre- and postprocessing. The preprocessing section looks at how to use metadata tools within American FactFinder to locate the table containing the kind of information we seek. The postprocessing section deals with how to process the output to get the final data product we want. It is very common when working with these complex data sets that some kind of custom aggregation, filtering, and/or percentage calculations are required to get the final results.

Can be readily modified to:

  • Work with data from later years. Just substitute acs2008 for acs2006 (for example) and change all references to the summary subdirectory to basetbls. (We renamed the summary subdirectory for 2006 to basetbls, but we have created an alias for it of summary so the query shown here still works - as of July, 2010 at least).
  • Look at other race categories and/or hispanic groups.
  • Deal with more (or less) detail by age.
  • Ignore gender.
  • Look at fewer or different geographic areas.

Degree of Difficulty: 4 - moderately difficult. The dexter part is not that hard, but you have to know something about the data and metadata to find the table of interest, and then you have to have some other skills in order to do the postprocessing. The SAS postprocessing is pretty challenging.

Saved Query File: View It Run It

Before You Can Begin With Uexplore/Dexter you have to know how to find the data you are looking for. That may be the most challenging aspect of utilizing these data - although once you find out about the various tools that are available to assist you, it is not nearly as hard as it may seem. What you are dealing with here is American Community Survey detailed (also referred to as "base") tables. There are over 1100 tables in the collection, and they can be fairly complex. Many of them have several "dimensions", such as a table summarizing the population by age, race and sex (3 dimensions). Tables also have universes associated with them that must be considered before deciding if they are providing information about the persons or housing units or ancestry reportings in which we are interested. You also need to be sure you know precisely what data you need (this is pretty much always the case, of course, but it can be particularly important when dealing with this kind of demographic detail). In this particular example, the problem statement is fairly simple. You do need to consider what definition of "white" and "black" you'll use, since there are at least two ways to look at these categories under the new rules that allow users to identify themselves as being more than one race. For the sake of this example (and because that is how the data are available) we shall use the more exclusive "race alone" definitions (as opposed to the inclusive "race alone or in combination" versions).

We'll use the American FactFinder web site to search for relevant tables. If you are not familiar with this site it would be worth a few hours of your time to explore its many features. The URL is http://factfinder.census.gov/home/saff/main.html?_lang=en. To access detailed data from the ACS choose the "Data Sets" option on the menu down the left side of page, and then American Community Survey from the resulting drop-down. You'll then want to accept the default choice of the most recent set of data (2006). While you can do a lot of things with this tool, all we are going to talk about here is how FactFinder can be used to assist you in finding tables related to specific subjects or that have been tagged with certain keywords.

Having chosen a specific data set in AFF you will see a light gray (background) area on the right side of the screen containing a "Select from the following" menu. One of the menu options is Detailed Tables; that is what you should choose (click on).

Screen image of AFF with ACS 2006 selected
The next thing you should see is the Select Geography screen, the first of two basic steps in building a query in AFF. It is not a very relevant step when all you want to do is look for tables, but it is nevertheless required (because in some instances, what tables are available are dependent upon what geography is selected; but that is not the case here). Just accept the default geographic selection of the United States; do this by clicking on the Add button, causing it to be displayed in the "Current geography selecions:" box. Now click on the Next button to complete your geography selection and proceed to the Select Tables screen. This is where we do our work.

There are 3 tabs across the top of the screen allowing you to "Choose a table selection method". The default is "Show all tables", which is both the default and the least useful of the 3 (unless you already know what table you need, which is rare). Clicking on the "By subject" tab lets you choose a subject from a drop-down list (which does not appear immediately - you have to click on the little down arrow at the right of the select box first). From this drop-down you get to select a subject - try choosing "Poverty (Families and Individuals)" and then clicking the Search button. This results in the display of another drop-down select list of tables relating to the chosen subject (poverty); this list appears in the select window just below the Search window. There will be over 100 tables to choose from; they report poverty in a lot of different ways and in conjunction with a lot of other characteristics such as age, race, tenure (rent vs. own), disability status, education, etc. So this is still a pretty big and complicated list to have to deal with.

Fortunately, there is a 3rd tab - "by keyword". Choosing this tab causes the screen to display a blank text box with the instruction "Enter a keyword and click 'Search" (referring to the Search button just to the right of the text box). There is also a link to "Search Tips" just to the right of the Search button, and you really need to click on that link to find out exactly how this works. It's fairly simple most of the time. Because most of the time what you are looking for is a table that has data crossed within two or more categories. Try typing age race sex poverty in the box and clicking Search. In response FactFinder will search its metadata, looking for tables that contain these keywords in the table title or one of the row headers. In this instance it should find only two tables that have been tagged (in the Bureau's metadata file which we have found to be very good, though not perfect) with the 4 requested keywords:

  • B17001F. Poverty Status in the Past 12 Months by Age (Some Other Race Alone)
  • B17001G. Poverty Status in the Past 12 Months by Age (Two or More Races)
This result is both good news and bad news. It is good in that it has indeed found just what it promised: two tables associated with all 4 of the keywords; much easier to search just two tables instead of more than 1000 or 100. But the bad news is that neither of these tables is really what we need. They are both summaries for racial categories that are not the ones we want. If we could find comparable tables for the white and black populations we'd be in business. This is where it helps to be familiar with the Bureau's general scheme for organizing and naming their detailed tables. Knowing that, you would recognize that these two tables with the "F" and "G" suffix codes in their names are part of a standard sequence of similar tables with alpha suffix codes "A" through "H", these codes indicating a consistent racial or ethnic group as the table universe. An "A" suffix consistently indicates a "White Alone" universe and "B" indicating "Black Alone" (with "alone" meaning persons who chose just that race and no other). It turns out that the 2 tables we really want for this particular query are B17001A and B17001B. The reason they did not show up in our search is because the word "race" does not explicitly appear in the title (which is apparently the most common and reliable way for tables to be tagged wiht a word in the metadata file, but it is not absolutely necessary). But once you understand the rules of the game you can get around this limit by substituting "alone" for "race" in your list of search words.

Note that as you search for the appropriate table(s) you can view more detailed information regarding exactly what a table contains by highlighting it within the Selected tables box and then clicking the What's this? button. A new window will open ("pop up") and will display a detailed description of the table that looks like this:

Screen image of AFF with ACS 2006 selected

(Not shown is the vertical scrollbar - this is not the entire table; the pattern repeats itself for persons above the poverty level).

You could at this point go ahead and select the two tables in FactFinder and then go back (by clicking on the "Geography" bread crumbs link across the top of the screen) and choosing your geography. (It is strongly recommended that you use these bread crumb links to navigate between the various screens of the FactFinder query process and that you avoid using your browser's Back button to return to previous screens.) You would be able to first display and then save-as the results to a csv or an Excel spreadsheet file. You would then do your "postprocessing" in Excel. But we are going to go ahead and show how to extract the data via Dexter.

Navigating the MCDC Archive Via Uexplore to Access the Data

  • Start at the archive home page at /applications/uexplore.shtml.

  • Select the American Community Survey category from the Major Category Index box.
    Major Category selection

  • Select the acs2006 filetype directory to explore.
    Choose acs2006 filetype

  • Select the summary subdirectory to explore. (This is where we keep the detailed tables).
    Select summary subdirectory

  • Select the ustabs17_20.sas7bdat dataset because it contains all base tables starting in the b17, b18, b19 and b20 series -- tables related to "Poverty, INcome, Earnings, Disability" as described.
    Choose dataset

  • This takes you to the Dexter application with the appropriate dataset selected.

Annotated Dexter Query Form (As filled out to define this query)

Section I: We select both CSV (the default) and SAS dataset as output formats. We can then do our postprocessing in either Excel or SAS, depending upon software preference. Note the link to Detailed metadata if we need it. (To take us to page that define the values of key variables such as the summary level and geographic compondent codes.)

Screen image of dexter query form, part 1

Section II: The filter specifies that we only want summary at the nation and state levels. The summaries at these levels include special summaries for geographic components, which we are not interested in. So our filter has 2 conditions:
  1. SumLev Equal to 010:040 specifies that the geographic level has to be either the nation (010 code) or the state (040 code).
  2. Geocomp Equal to 00 specifies that the data is for the entire geographic area and is NOT a geographic component.

Screen image of dexter query form, part 2

Section III: We select the desired data elements (variables/ columns). For identifiers all we need is the Name of the area (Areaname and Name are the same value unless we are dealing with a geographic component summary, and we have filtered those out for this query). Selecting the items from the Numerics list is the more challenging task. This is where we need to recall what we discovered during our "pre-Uexplore/Dexter" step visiting American FactFinder to search for relevant tables. We know from that step that we need tables B17001A and B17001B. Scrolling through the list of numeric variables (with over 7500 table cell items) note that they are in archive standard table-naming convention and are sorted by the table ID. So we can scroll down looking for the entries for table B17001A. Once we find the first cell (variable) in that table we click on it to select it. Then we keep scrolling down through the 59 items of that table and come to item B17001Bi1 (the 1st item in table B17001B) and then keep scrolling to the end of that table. We hold down the shift key and click on the last item in this table - B17001Bi59, which selects all the variables in the list starting with B17001Ai1 and ending with B17001Bi59. The screen snapshot only shows part of the selected items.
Screen image of dexter query form, part 3, initial view

(Feel free to ignore this discussion of an alternate way of choosing the variables).
Note that there is another way to make these selections, a way which is probably more difficult for most, but might seem easier to a few of us (mostly old timers and/or programmers) who would rather type in a spec that select off a menu. It not only requires careful typing on the form, it even requires reading the technical documentation to learn about how to enter such shortcut variable lists.
Screen image of dexter query form, part 3, alternate approach
Here we take advantage of the rarely used option of typing in the list of variables to keep rather than selecting them off the two pull-down menus. We enter the ID variable name first and follow it with b17001Ai: b17001Bi: . This is shorthand notation (SAS(c) variable list syntax) meaning all variables that start with b17001Ai or b17001Bi . (Case does not matter; B17001aI would work as well.) This is the shorthand way of specifying that you want an entire table. If this makes no sense to you then don't worry about it; the very large majority of users will never want to use this so-called "shortcut".

Section IV and Section V: These sections are not used for this query. We use the Extract Data button at the bottom of Section III to submit the query.


The query takes about 10-15 seconds to run, since it has to access a fairly large data set (about 290M). The next thing you should see on your screen is the standard Dexter output menu with links to the Summary log file and to the two requested output files: a csv file and a SAS dataset file. What we want to do now is discuss how you might complete this example by postprocessing the results using your favorite software tool. For the large majority of you, that will probably by Excel and for an important minority it might be SAS. If neither is your favorite tool then you may well want to stop reading at this point.

Postprocessing with SAS

Our assumption is that if you are reading this section that you are interested in processing these kinds of data using the SAS software package. We assume you have access to SAS and that you are familiar with (or are willing to become familiar with) the tabulate procedure in SAS. This is one of the most powerful tools within SAS for handling the kind of data that can be extracted from detailed Census files. We shall be employing a strategy that we commonly use, whereby we take the summary data (pre-defined tables) and use a data step to convert the aggregate data into a form that looks more like microdata. In this new dataset, each observation represents a cell of data with identifiers corresponding to the catgegories of interest with a single numeric variable containing the count of persons in that cell. Having done this we then pass this "micro-fied" data to the tabulate procedure and let it do the work of turning it back into a table with just the categories and statistics of interest.

The process begins, of course, with capturing the Dexter-generated SAS dataset on our local system so we can access it with our local SAS software. To this we simply right click on the

    Database (sas) File
link on the Dexter Data Extraction output page. Then we choose the "Save target as ..." option (this is using the IE browser, the exact wording of the option will vary with browser) and then specify the location on our local drive where we want this file to be downloaded and saved. For the sake of this example we shall say that we specify that the file is to be saved in c:\downloads\xtract.sas7bdat . The data is now ready to be locally processed via SAS.

To access the data from SAS you need to specify the directory where the data are located as a SAS data library. We code the following statement to do this:
libname sasout 'c:\downloads';
We can now refer to the extracted data using the SAS dataset reference naming conventions as sasout.xtract (where the "sasout" part tells SAS where to find the dataset file, and "xtract" tells it the name of the file to look for with the standard SAS-dataset-suffix, i.e. "sas7bdat"). There are a number of windows in the SAS interactive environment (aka "Display Manager") that can be used to view the data. The viewtable and fsbrowse windows are excellent for viewing the data. The var window is good for viewing metadata: variable names, types, lengths and labels. We entered the command
var sasout.xtract
and the result was a new window with the metadata partially displayed. We say "partially" because the window is actually not big enough to show all the data (and amazingly cannot even be resized from its original limited dimensions.) What we really want to be able to see is the names of the variables along with their labels. The name B170001Ai6 is not very mnemonic; but when viewed with its label ("6 to 11 years") within the window it becomes apparent that it is the count of white males below the poverty level who are age 6 to 11. (You have to know the table universe to know what the race category is, but the other items can be pretty well determined by looking at the labels that precede it in the table; you will also note that there is indentation (leading tab characters) in the variable labels which should help you to understand how items are nested within categories. The label for B170001Ai6 actually begins with 3 tabs, telling you that it is at level 3 and hence has two category qualifiers; these turn out to be "Below poverty" and "Male". In order to see the variable names and labels on the screen at the same time you have to modify the display so that the columns for Type, Length, Format and Informat basically go away. You do this by dragging the rigth borders of these columns in the header row to the left until they disappear. It's a pain but it only takes a few seconds to do this. At this point you have a nice metadata display that looks like this:
Screen image of SAS var window
You can get comparable information for all the tables available in the 2006 ACS detailed collection by uexplore'ing the acs2006 data directory and selecting the file BasevarLabels.sas . (This document has the added feature of table titles and universes which do not get captured as part of the variable labels on the SAS datasets.)

Now that we know what variables we have and what they mean, we can think about how we can write some code that will turn all these cell counts into a form that can be readily re-tabulated into the exact categories of interest. We basically just need to get rid of a lot of age detail. Also, what we want are just poverty rates -- defined as the number of persons below the poverty level as a percent of all persons for whom poverty status was determined. (You get the latter -- the denominator used in the poverty rate calculation -- by adding the persons above the poverty level to those below it.) Here is (some of) the code for the SAS data step that takes our extracted data and creates the tabulate input dataset:

Screen image of SAS source code
(See the complete program code and program output).

This program is a classic example of using SAS to "reshape" a data set. What we have are 2 very large tables with cells containing counts of persons. Most of these cells have associated with them an age (single year), sex and poverty status (above or below the poverty level). Each of the two tables (b17001a and b17001b) has a race associated with it. And, of course, each observation has a particular geography associated with it (either it's the US totals or the totals for a specific state). We use a subsetting if statement to delete the summary observation at the nation level. We said that "most of" the cells have the various attributes associated with them; some of the cells represent subtotals and thus are lacking one of more of the characteristics. We ignore these subtotal cells (SAS variables) and focus only upon those that indicate an age, a gender and a poverty status. What we now do is create 4 new variables (age, race, sex and poor) that are stored as character strings (except for poor) and represent categories, and a single new numeric variable, Count, that contains the count of persons associated with these categories. The variable poor is a special category variable that is assigned a value of 1 to indicate a cell of persons who are classified as being poor, and a value of 0 if they are not poor. The reason we use a numeric variable here instead of a $1 character variable has to do with a coding technique that we use in our Proc Tabulate step. Note that the varible Age gets assigned values of "1" through "3" representing the three cohorts we are interested in. "1" represents the 0 to 17 cohort, "2" the 18 to 64 cohort and "3" the 65+ group. We assign race='w' to indicate we shall be outputting counts of persons from table b170001a, which contains counts of white persons only. We then assign sex='m' to indicate we are going to output a count of white males. Finally we assign poor=1 to indicate that we are dealing with persons who are indeed poor. The result of these 4 assignments says that we are talking about white males aged 0-17 who are poor. How many such persons are there? If your understand the coding of the summary tables on the acs base tables dataset then you should understand that the variable b17001ai4 is the number of white males aged 0-4 who are poor, and b17001ai5 is the number of white males aged 5 who are poor, etc. (see the Sasout.Xtract Properties window image above; the person who was writing this code had that window or its equivalent open and was using it as they coded this statements).
Count=sum(of b17001ai4-b17001ai9);
Here we collapse across the more detailed age groupings to get the broader cohort in which we are interested. We sum all the data cells for age cohorts between 0 and 17; the "sum(of )" syntax says to sum the 6 data cells starting with item 4 and ending with item 9 in the table "b17001a". The output statement immediately follows this assignment statement and it writes an observation with just a very few variables: State, Race, Sex, Age, Poor and Count (as specified by the Keep statement at the end of the data step). Once we have captured this collapsed cell of data we continue through the b17001a table by going to the next age cohort (18-64). We use a value of '2' for Age to indicate this new collapsed cohort and we then assign Count as the sum of the next 5 cells (item 10 to item 14) in the b17001a table. As each collapsed age grouping is specified and a Count value gets calculated an Output statement captures that cell. Once we have finished with the poor males section of the table we proceed to the poor females and output a similar set of 3 cells for the poor white females. This completes processing the poor persons section of the table, at which point we set Poor=0 and proceed to process the second half of the table, outputting cells of non-poor white persons of specified genders and age cohorts. And when we have finished outputting all these cells for the white population we then turn our attention to the black population and essentially repeat our code except that instead of referencing cells in table b17001a we reference table b17001b.

The data step in this program basically takes the date from the two tables apart and creates a dataset that looks almost like microdata except that it has a Count variable that says this is how many we have in these 4 catgories (5 if you count State). The Proc Tabulate step lets us take this "micro-filed" data and put it back together again so we can see the bigger picture but with less detail regarding age and with poverty rates calculated which we did not have before. Here is the code we use:

               *---And now, tabulate the results----; 
               title 'Final Ouput for Xsample acsbasetabs1'; 
               proc tabulate data=tab_input;  
                class state race age sex;
               	var poor;
              	freq count;  *<-----Important to use freq instead of weight-----; 
	              format race $race. age $age. sex $sex. state $32.;
	              table (state all='U.S. Totals')*race*(age all='All ages'),  
                      (sex all)*poor=' '*mean='Poverty Rate'*f=percent7.1; 

While it is beyond the scope of this document to explain the wonders of proc tabulate we want to focus on a coding technique that we have used here and find extremely handy. It involves the way we process the poor "category" variable. Usually, the variables that define the classifications to be used in tabulating the data within the tabulate tables appear in the class statement. Variables that appear in the var statement in tabulate are those for which statistics such as sums, means, mins and maxes can be specified. You might expect Count to be such a variable in this case. But instead we use Count in the freq statement. This tells tabulate to treat the input dataset like microdata only each input observation should be treated as if it had occurred the number of times indicated by the value of the variable Count. What we ask tabulate to do for us in the table statement is to calculate a mean value of the variable poor for various categories and to display that mean with a format code of percent7.1 and to label it as 'Poverty Rate'. What is this all about?

Think about a simple case where you were looking at 50 cases within a category grouping and of those 50 cases Poor had a value of 1 in 15 of them and a value of 0 in the other 35. What percentage of those 50 cases were poor? (We are ignoring the use of the Freq variable here but that doesn't change the logic.) The answer is pretty clearly 30% (15 out of 50) and if you take the mean of the variable Poor you get a value of .15 (mean=the sum of the values divided by the number of them so you sum the 1's and divide by the total number of cases; this is equivalent to getting the decimal portion of the time that the variable is true (has a value of 1). Converting this to a percentage is just a matter of shifting the decimal point and maybe displaying it with a "%" symbol. This is exactly what happens when we use the percent format code as in *f=percent7.1 .

Postprocessing With Excel

In this section we assume that you generated output in csv (comma delimited) format and imported it into Excel. In this section we describe how you can use Excel to create a report which shows the poverty rates for the 3 specfified age groups.

First we must define our ranges for each category. Tables 1 and 2 below shows the ranges we will use for each ethnic/age group and poverty status.
For example, to calculate the total number of white males aged 0 to 17 who are below poverty we would use Table 1 which states we need to add cells E through J together for each row. (in excel this is done by typing =sum(E3:J3) in a cell if we are on row 3).

Table 1: White Males
Poverty Status0 to 1718 to 6465 plus
Below PovertyE - J K - 0 P - Q
Above PovertyAH - AM AN - AR AS - AT
Population for which Poverty Status
has been determined
AH - AM & E - J AN - AR & K - O AS - AT & P - Q
Poverty Rate=SUM(E:J)/(SUM(AH:AM)+SUM(E:J)) =SUM(K:O)/(SUM(AN:AR)+SUM(K:O)) =SUM(P:Q)/(SUM(AS:AT)+SUM(P:Q))

Table 2: White Females
Poverty Status0 to 1718 to 6465 plus
Below Poverty S - X Y - AC AD - AE
Above Poverty AV - BA BB - BF BG - BH
Population for which Poverty Status
has been determined
AV - BA & S - X BB - BF & Y - AC BG - BH & AD - AE
Keep in mind that to calculate a Poverty rate, we must divide one number into another. The numerator will be a 'below poverty' range we defined in Table 1 above (for example, cell range E-J for white males aged 0 to 17). The denominator will be the sum of the 'below poverty' and 'above poverty' ranges, which is defined in the third row in Table 1 above (example, cell range AM-AM & E-J for white males aged 0 to 17). For this exercise we will be focusing on the Poverty Rate Row (the last row) in Tables 1 and 2 which defines in pseudo excel code how the formulas are defined.

Now in the spreadsheet, move to a clear cell to the right. I picked column DR. In row 1 and row 2 add the headers I created in Figure 1 starting in column DR.

Figure 1

Next, we will define the formulas on row 3 (United States) in the excel spreadsheet. (make sure you are in row 3, this is important!).

Now type =SUM(E3:J3)/(SUM(AH3:AM3)+SUM(E3:J3)) in that cell and press the enter key as shown in Figure 2.

Figure 2

If all went well, the value of 0.1332342 should now up as shown in Figure 3. If that value does not show up, or you get an error, check the syntax to see that you have an equals sign at the beginning, and that you have parenthesis in the right places. The cell that contains 0.1332342 contains the poverty rate for all white males aged 0 to 17 in the United States.

Figure 3

Now let me explain what this formula does. An element such as E3, refers to the Eth column and the 3rd row. The equal sign states that we want to write a formula. SUM(E3:J3) says to add the values in cells E3, F3, G3, H3, I3, and J3 together. The / states we want to divide. (SUM(AH3:AM3)+SUM(E3:J3)) says we want to sum up all values in the cells AH3, AI3, AJ3, AK3, AL3, and AM3 together, and to sum those with the values in cells E3, F3, G3, H3, I3, and J3. Because we have parenthesis around SUM(AH3:AM3) + SUM(E3:J3), this addition takes priority and is calculated before the division occurs.

Move over to the right one cell (still in row 3), and type =SUM(K3:03)/(SUM(AN3:AR3)+SUM(K3:03)) which is the next formula in the poverty rate row in Table 1 above and refers to white males aged 18 to 64. If you typed it right, you should have the value of 0.0846298 in the cell as shown in figure 4. If not, please use the advice above to check for typos.

Figure 4

Use the formulas in the Poverty Rate row in Table 1 and 2 to calculate the poverty rates for the rest of the categories for the United States. When finished your spreadsheet should look similar to Figure 5.

Figure 5

Once you finish typing the formulas in the cells for row 3, we can easily calculate the values for the rest of the rows. Right click on the cell with 0.1332342 (the cell that contains the poverty rate for white males aged 0 to 17.) and select copy as shown in figure 6.

Figure 6

Now select the cell right below it in row 4 and do a group select by pressing and holding the shift key, and press the down arrow until all cells directly under the cell containing 0.1332 are selected as shown in figure 7 until you reach row 53 as shown in figure 8.

Figure 7

Figure 8

Now do a right click and select paste from the menu as shown in figure 9.

Figure 9

All the values in the selected cells should be displayed as shown in figure 10. What occurred here, is that the formula was copied and was pasted to the other cells. Excel was smart enough to know to look at the cells in the adjacent columns to do the calculations itself.

Figure 10

Now repeat this process (of copying and pasting) for the other 5 cells in which we have established formulas in row 3. When you are finished your results should look similar to what is shown in figure 11.

Figure 11

To format the columns by adding a percent sign, etc, we need to select all cells we just created and right click and select format as shown in figure 12.

Figure 12

A window appears, select Percentage as shown in figure 13, and press the okay button and the results should be similar to what you see in figure 14.

Figure 13

Figure 14

We now need to hide the data columns B through DQ. To do this we need to select those columns as shown in Figure 15.

Figure 15

Now right click the mouse and choose hide from the popup menu. If you do this correctly you should get something similar to what we see in figure 16.

Figure 16

I leave it as an exercise for you to calculate the poverty rates for black males and females. The steps are the same, the only thing that changes are the columns you will use in your formulas. Use the results from the SAS demo to see if your answers are correct.


  1. Modify the extraction so that it does not include race, i.e. select essentially the same data for the total populations rather than the white and blacks subpopulations. If you a SAS person, modify the postprocessing module to handle this new kind of data as well.

  2. Repeat the previous exercise but for the hispanic population.

  3. Use AFF to find a table in the 2006 ACS that will tell you how many children (under 18) live in a household where the head of household is their grandparent. Then extract this table (using dexter) for all cities (places) in the state of Missouri for 2006.