Jgb, rev. 11/24/2015
We see that we are accessing data in the popests/nchsbri subdirectory, data set is scnchsbridged201x. Best way to get here is to navigate to the popests directory, then to the nchsbri directory and then use the Datasets.html directory file. Note the link to the detailed metadata for this data set. That page includes a very short “codebook” explaining the codes used for age, race, sex and Hispanic. If we hit the Quick Look button it will show the first 100 observations of this data sete (and because we have already made our variable choices in Sec. III of the form it will only display those variables). To have a better feel for what the input data set looks like I chose all the ID variables (in Sec.III) and then hit Quick Look to get something like this:
The only columns that change here are Age and the 2 estimate items. Obs 1 is the count of persons aged 0 (i.e. < 1) in Abbeville County who are white (race=1), male (sex=1), and non Hispanic (Hispanic=1). What we’ll do here is add up all these population estimate cells based on just the race and sex categories, and filtering out all observations which do not satisfy our Age <= 16 filter.
We choose to have output only in HTML format. Note that we have checked the option that says to skip the intermediate Output Menu page and just pipe the single output report directly back to the browser.
Our filter tells Dexter to only process rows/observations where the value of Age is <= 16. If you wanted to do multiple cohorts you would have to do multiple runs with multiple age-based filters.
State is a constant on this data set but not a bad idea to carry it along as an ID variable. Too bad there is no variable to document the age cohort. The user actually just wanted the latest estimate (2014 at the time) but we chose the 2010 estimate as well just to show we could aggregate more than one variable.
Nothing too complicated here. Just titles and a footnote to document the report and point users to this Note module. Note that we do not have to specify any Sort option.
This is what came for – the Advanced stuff. The Aggby text box is where we enter a list of variables to “aggregate by”. It tells Dexter to sort the selected rows by the values of these 3 variables and then go through and create one output row/observation for all distinct combinations of the 3 variables. For example, we checked and there were over 1500 observations on the filtered data set (i.e. not counting age>16) that had state=45, Race=1 and Sex=1: white males living in South Carolina. The output file has a single row/observation with value for two population estimates (2010 and 2014) around 333,000 - each of them a sum of those 1500+ values.
The Agg Level item has been left with default value of 1. When doing aggregations you can request summaries at multiple levels. See the online help for details on this. Similarly, we have not altered the default value for the Variables to drop option. If you want to see how many observations were used in creating each output summary observation then you should omit _nag_ from this parameter spec.
(Note the nearest Extract Data button is just above the Sec.V header.)
Does not apply because we asked that the resulting html output be piped directly back to the browser. So what you should see next in your browser after hitting the Extract Data button is this:
To change the geographic universe you need to look at the comparable data for the other 4 states as well as SC. You can do this in at least two ways. The hard way would involve running the query five times and just editing the name of the input data set at the top of the form each time. You could also edit the title each time so you are not saying South Carolina when the data are for Georgia. You could do it that way, but you should not. A new feature added to Dexter late in 2015 lets you access multiple data sets in one run as long as the names of those data sets follow a rigorous pattern. That is the case here. The input data set, scnchsbridged201x, is part of a national collection, with one data set per state. The comparable data for North Carolina is in ncnchsbridged201x . You could access the entire collection by using the new “wild card” feature. You replace the state abbreviation portion of the data set name (first text box at top of page) like so:
The “sc” part has been selected so I can now just type over it, inserting the two underscores needed to trigger wild card processing.
Dexter will now concatenate all data sets in the current directory with names matching the specified pattern. So it will be accessing alnchsbridged201x, aknchsbridged201x, … (through wycnchsbridged201x).
But we don’t really want data for all those states – just the five in the region. We take care of this by coding a filter in Sec. II of the form like so:
We have coded a second condition for selection. In addition to having an Age value <= 16 we now add that the variable State (state FIPS code) must be in the specified list of 5 values. (If you did not know the state values you could have used multiple sources to get them, such as http://mcdc.missouri.edu/sasfmats/Sstate.sas ). So Dexter is going to actually consider (“read”) 51 data sets and only keep data from 5 of them. This turns out to be faster/more efficient than you might think because of indexing and the way the SAS where statement works.
The other thing we said we would do for this variation is use formats to provide better labeling of our data. We want to specify format codes for 3 of our output variables: state, race and sex. We do this in the Advanced Options section of the form, part C as follows:
This Format text box entry lists each variable that we want to have “formatted” (i.e. to have a SAS format applied so that it displays meaningful value labels instead of the codes stored on the data set). The format names are the things that start with $ and end with periods. State is a variable name, and $state. is a format code. The special format codes $race. and $sex. only work for this particular data directory. $state works for any data set where the State variable contains a 2-character FIPS code (which it almost always will).
We also tweak the title to identify the report as coming from this variation. And, finally, we specify a couple of Advanced items to specify a By variable for the report and a pair of ID variables. This will result in State being displayed on a “by line” instead of in a column, and cause Race and Sex to appear as the first columns of the report with special color formatting. This is what we specify:
Here is a partial image of the resulting report. Notice the data for the other selected states. Notice the “State=Georgia” byline. Notice that Race and Sex appear in blue and are the left most (“Id”) columns of the report (replacing “OBS”) and that State displays as Georgia instead of 13, Race displays as White instead of 1, and Sex displays as Male instead of 1. These are the result of using formats.
Start with this variation and change the query in the following ways:
Address these to John Blodgett at OSEDA.