Dexter Xsample: Detailed Query Description
Query ID Code: popcom1
SummaryWe access a recent dataset in our population estimates collection, generating a report showing the latest population estimates for all counties in the Kansas City Metropolitan Statistical Area (2006 definition). We show the change in population since the 2000 census and the components of that change: natural increase (or decrease) and net migration (total and domestic).
Filetype accessed: popests -- Post-2000 estimates of the population from the U.S. Census Bureau.
Data Set accessed: uscom06 - we choose this data set because we want not only the latest estimate for each county, but also the components of change. At the time we coded this example (May, 2007) this was the only data set released by the Bureau that had the 2006 estimates for counties. See detailed metadata for this data set.
Dexter features used:
Can be readily modified to:
Degree of Difficulty: 4 - moderately difficult. The extraction portion (selecting observations and variables) is reasonably straightforward but several advanced features are used to enhance the readability of the report.
Annotated Dexter Query Form (As filled out to define this query)
Sections I & II: In Section I we specify the "none" option for Delimited File and "HTML" as the value for Listing/Report.
In Section II we code the condition that says we only want data for the Kansas City Metro Area. As noted above, we need to understand that current metro area definitions require looking at the CBSA code rather than the older MSACMSA codes (which are frozen in time as of 2000). How did we know what the code value was for Kansas City? We went to the Detailed Metadata page (using the link at the top of the Dexter query form) and on that page in the Key variables section we followed the link to cbsa. This page shows us all the distinct values that the CBSA code variable takes on within this data set and provides the "value labels" for those code. (For example, it tells us that 10220 is the code for the Ada, OK Micropolitan Statistical Area. We can scroll down or use our browser's "Find" command to locate the entry for Kansas City.)
Section III: Choosing the variables is straightforward. Notice that even though we used the CBSA code variable to do our filter, we use the variable cbsaname for the report. In the Numerics select list we choose the variables that let us see how the population has changed since 2000. We choose 3 variables that have to do with the components of that change:
Section IV: Nothing here that is technically difficult, just using these text boxes to specify a pair of titles and a footnote for the report. Depending on how you plan to use or distribute this report (is it just for your own use or is it for your boss or is it going to be made available via the web?) it can be pretty important to document your results. We identify the subject matter in the main title; the geographic universe and detail level as well as the time period are identified in the subtitle. The footnote identifies the source of the data, at least to you and me.
Section V: In the b. Advanced Report Formatting Options subsection we begin by checking the box that says to use variable labels (instead of variable names) as column headers in the report. (In the variable select lists in Section III the variable names are displayed and are then followed by a "-" and then the variable label (if any -- if there is no dash then the variable has not been assigned a label, in which case the variable name will also serve as the label for the sake of column headers in a report.)
We enter the variable name cbsaname to be used as a "By variable" in the report. You can use the Dexter online help to get a better explanation of what this is about. The short answer is that this variable will be displayed in a by line rather than as a column of the report. This leaves more horizontal space to display the data columns. We do not have to worry about the "sort required" spec because in this case we know there is only a single value for this variable due to the way we have filtered the data.
We specify that the variable county will be the ID variable for the report. Again, the Dexter online help will explain this option for you. The short explanation is that this specification will cause this variable to appear as the leftmost column of the report and to be highlighted (using a backgound color - blue in this case using the default style definition).
In the c. SAS Variable Attribute Statements subsection we specify a label for the cbsaname variable. When a by line is generated the by variable(s) are identified based on their labels. If we had not made this entry the byline would have displayed as
Here is what the report looks like.