Dexter Online Help

Version 2.0, July, 2010

For a simpler overview of this material (with pictures) you may want to check out our Dexter Quick Start Guide.

The Stuff at the Top of the Form

We reference the portion of the page above the first horizontal bar, i.e. prior to Section I. There are two links at the very top, to the Dexter Quick Start Guide and to this Online help document.

Data Set ID Line

The line at the top of the page identifying the selected Data Set is divided into three parts:

1. The data directory (path) is displayed and doubles as a hyperlink in case you want to return to a Uexplore page referencing the current data collection (directory).

2. A text box containing the name of the data set (without the extension). You can edit this value and hit enter. Doing so will not change anything you can see on the page. What it will do is change links on the page (for example, to detailed metadata, just below) and, most importantly, the input dataset that is going to be passed to Dexter.

It is important to be aware that there is little or no checking to make sure that what you type into the box is valid. For this to work you must enter the name of a dataset that really exists in the specified data directory (spelling definitely matters, case does not; the path/directory cannot be changed from this page) and the new dataset should contain essentially the same variables as the current dataset. The current page contains various select lists (in Sections I and III, for example) that have been generated based on variables (tables) present in the original dataset. These select lists do not get changed to reflect the variables in the new dataset you have entered. The program does not even check to see if the dataset whose name you just typed in even exists. You will only find this out when you Submit the query and get the message saying "Dataset specified does not exist" (from Dexter). So type very carefully. This feature is typically useful when you have a somewhat complicated query that you want to run for a series of similar datasets. You could run it for dataset motracts, then go back and change the dataset and run the identical query for iltracts and again for kstracts, etc. — the same query to access tract level data for Missouri, Illinois and Kansas.

A new wild card dataset specification capability was added to Dexter (November 2015). You can now enter a value containing two consecutive underscores (__) to specify you want all datasets where two characters can replace these underscores. So if a directory contains a collection of data sets for all states in the country you can access all of them (in sequential alphabetical by dataset name order) by replacing the state abbreviation portion of the name with the two underscores. For example in data directory sf12010x there are datasets alslectedinv, akselectedinv, ... wyselectedinv. You could initially select any of these 52 datasets (we have a PR dataset in this collection) to invoke Dexter but then replace the dataset name in the box with __ as the first two characters and your query would then access all 52 data sets. If you only wanted to process say five of the 52, you could use the wildcard feature and then code a state-based filter specification in Sec. II of the form to filter all but the states actually desired. This will only work if all these data sets have the same variables as the data set you specified when you invoked Dexter. None of the datasets accessed can be SAS views (which should not be a problem when accessing a collection with one dataset per state or a time series with one per time period). You cannot specify the Go straight to Rankster option when using wildcards. You should avoid using the Quicklook feature or following links to metadata after entering the wildcard value. Probably other restrictions will pop up that we have not tested for yet. Let us know if something does not work.

3. <# rows> rows/observations, <# cols> columns/variables. This is fairly straightforward, useful information about the dataset selected. (If you overtype the dataset name in the preceding text box, it does not cause this information to be updated.) If the dataset is a view (i.e., a "virtual" dataset that is stored as a program reference rather than a normal dataset), then the rows/observations count will be missing — the observation count is stored as part of a regular dataset but not as part of a view.

Detailed Metadata Link

There may or may not be a line just below the Data Set: line that invites you to See detailed metadata for this dataset. MCDC staff have been creating metadata modules for many of our most frequently-used and important datasets but we still have many datasets without such a module. If the link is there and you are new to this dataset (or at least this type of dataset) then you really owe it to yourself to follow the link and see what you can learn about the data. Note that one of the most important aspects of these meta files is to provide you with key values pages that tell you know the meaning of important identifier codes used on the dataset. Such knowledge is critical for coding data filters (see Section II).

Quick Look

The Quick Look button can be used to generate a quick and easy report showing selected variables for the first few rows of the dataset. You would typically hit this button when you first start your query to see if it helps you get a handle on what the dataset looks like. By default the report will contain all the Identifier variables (as displayed in Section III of the left side select list) for the first 100 rows (unless, of course, the dataset has fewer than this). These defaults can be overriden as follows:

I. Choose Output Format(s)

This section of the form lets you choose up to three output formats. Each of the three lines here contain a set of radio buttons that lets you choose whether you want any output of the specfied type and, if so, which specific format.

The Delimited File buttons let you choose a comma or tab-delimited output file. By default you will get a comma-delimited (CSV) file. Note that CSV is the filetype extension associated with this format; it stands for comma-separated values, and most browsers will be configured to automatically invoke Excel as a plug-in to handle a file of this type. If you choose tab-delimited instead, then the fields are delimited by tab characters rather than commas and the file generated by Dexter has a .txt extension (instead of the default .csv); this will usually result in different behavior by your browser when you click on the output file. In most cases it will just display it in your browser and you can then use File-Save to capture it.

These files will begin with two rows of identifiers before the actual data lines begin. The first row contains the names of the variables and the second row contains their labels. (In some files, and for some variables, there will be no labels assigned in which case the variable name is repeated as the default label.) Choosing "none" in this row indicates that no delimited file is to be generated. (Just be sure that if you make this choice, that you select some option other than "none" from at least one of the remaining two rows; otherwise, you will have selected no output at all.)

If you are expecting to use this format to load your data into a spreadsheet (and we know that is what most people are doing these days) you need to keep in mind that spreadsheet packages have limits on the number of rows and columns they can handle. Many (older) versions of Excel cannot handle more than 255 columns nor more than 65,000 rows.

The Listing/Report choices allow you to specify an output file that is in a format designed for reading rather than for storage and further manipulation. The rows/observations from the dataset become the rows of the report, and the variables become the columns. If you choose plain text or pdf formats then the program will limit the number of columns to display across the page, and will start a new page (with titles and column headings after every so many lines). An HTML format report only displays the headings at the very top and has an unlimited page width, so all variables are displayed on one line. You really should not request a report format if your extract contains many variables, especially if those variables may be long character strings.

Plain text format is the best choice for faster response, since it requires the fewest resources to generate; PDF takes by far the most resources to generate. But PDF is the best choice if you want a report that you want to print and/or share with others.

Most of the options in Section IV and those in Section V-b are relevant to the details of this output file; things such as titles, footnotes, column headers, and even complete styles (font and color choices). Tip: Consider the first checkbox in Section V-b; using labels instead of variable names as column headers can greatly enhance the readability of a report. More ambitious users can combine this with use of the Label option in Section V-c to assign their own variable labels.

The Database File option is probably the least often used. The venerable DBF format is still widely used and is especially popular for use within GIS and desktop mapping systems. These files can also be read into Excel and other spreadsheets, although if that is what you want you are better off going with the default — a CSV file. The latter requires less space and also is able to store a row of variable labels, which DBF format does not support.

The SAS dataset option is for those who want to access the data using the SAS software package. We write this dataset in the latest (v9) format for the Windows platform. Many of our datasets have variables that are linked to custom SAS format codes. For example, a variable containing a five-character FIPS county code may be linked to a $county format code, which causes it to display as the county name associated with the code. In the other output formats you just get the results of applying the format, but with the SAS dataset you get the five-character variable with the associated format code. This can cause problems when accessing from a SAS environment where $county is not recognized. You can avoid the problem by specifying the SAS system option nofmterr. If you would like to recreate that custom format code for your local SAS environment you can access our public formats library and download the source code. Look for the file with the name corresponding to the format name. Character type formats (which are what we use most of the time) will be named starting with an upper-case "S" instead of a "$"; so the code for the $county format code is in the file Scounty.sas. Once you access the code in your browser you'll need to either save it to a local file or you can copy and paste it directly into your SAS program that accesses the dataset. You may need to add a Proc Format statement, since some of these modules will not include that statement. If you plan to use this format for working with many different data sets you may want to look into the possibility of storing the format code in a permanent SAS catalog. But that is beyond the scope of this help file.

The check box at the end of this section is rarely used. Normally when you invoke Dexter it writes things directly to your browser that summarize the query and provide links to the various output files. But if you are only generating a single output element and that element is either a plain text or HTMl report file then checking this box will cause Dexter to send your report file directly to the browser with no itermediate page.

II. Choose rows (observations) to keep by specifying a filter

This is the section where you get to tell Dexter which rows you want to keep for your output file(s). If you skip this section you get all the rows. If you just fill in the box at the top limiting the number of observations/rows then you just get that many output rows (report lines) from the top of the dataset (which is a good way to run a quick test and see what your output is going to look like).

As a general rule (there are important exceptions), the rows of our datasets correspond to geographic entities such as states, counties, cities, etc. Typically what you are doing in this section is specifying the geographic areas for which you want data. You do this by specifying one or more conditions which have to be met before the row is chosen. This process is known as filtering. The form allows for specifying up to five conditions and for specifying logical operators (and, or, and not) for connecting them; it even allows you to indicate parentheses to insert into the filter to control the order in which the conditions are to be evaluated and combined. But it is rare that a filter uses more than two lines and even rarer that you ever need to specify any logical operators or parentheses. Each condition is comprised of three parts: the Variable/Column, the Operator and the Value. The first two of these you select from drop-down menus, but the Value column is where you need to enter something. A typical filter would involve accessing a dataset that has census summary data for every county in the country. You would like to just get the data for California. You do this by selecting the variable "State" from the Variable/Column select list, then selecting "Equal to(=)" for Operator and then entering the code "06" in the Value text box. This tells Dexter to choose only rows where the value of the State code has a value of 06.

How would you know to do this? You need to know three things:

  1. That the variable State contains the FIPS code for the state being summarized on each observation.
  2. That 06 is the FIPS code for California
  3. How this condition stuff works.

Item 1 is fairly easy. Whenever you see a variable named State in this archive, you can be almost certain that it contains a two-character FIPS state code. There may be a label associated with the variable explicitly saying something to this effect (the variable labels are displayed following the variable names separated by a dash on the drop-down Variable select list) but usually this can just be assumed.

The hardest part is knowing what the code is for California (or whatever state you are interested in). These are readily available in lots of places on the web, including our geographic codes lookup web application. Another likely source, which will show you just the values and their meaning for the current dataset, can be accessed by looking at the top of the Dexter input form page to see if there is a link labeled detailed metadata for this dataset. Following that link takes you to a page that provides a good deal of information about the current dataset, including a section labeled Key variables. This section consists of a row of hyperlinks referring to variables within the dataset that we think will be the most likely to be needed to create filters for the dataset. State is almost always a key variable, at least for datasets that have data for more than one state. Clicking on the State link under Key Variables will display a page showing what values are associated with this variable for the current dataset, and even displays a count of how many times each value occurs.

Value of Operator

The values that drop down when you click the down arrow on the select lists under Operator (the middle column of this section) are mostly straightforward relational operators. Equal to, less than, greater than — these are pretty standard. But a few may not be so obvious:

Entering Case-Sensitive Values

Normally, whenever the variable selected in Section II is of character type, then whatever value is entered in the Value text box will be automatically converted to upper case by Dexter and compared to upper-cased values on the dataset. Once in a long while, you may want to override this default behavior and tell Dexter that you want the program to consider case. To do this you must enter a tilde character (~) as the first character in the Value box followed by the actual value. Thus, if I select "SumType" as the variable, "Equal to" as the operator, and enter "~r" as the value, Dexter will only select rows where the value of the variable SumType is a lowercase "r"; it will not select a row if the value of SumType is an uppercase "R". If you omit the tilde from the value, then the generated condition is upcase(sumtype) = upcase("r") and you would select rows that had a value of either lowercase or uppercase R.

Specifying Blank Values

You may have occasion to want to select using a condition requiring that a character variable has a blank value. Dexter has a special way of recognizing blank values: Entering a single underscore (_) will be interpreted as a blank value when the operator is "Equal To", "Not Equal To", or "In List". When a list of values is being specified (i.e., when the operator is "In List"), then use the single underscore within the list. For example, typing "01:02:_:04" in the value text box would result in the third list value being a blank. (Note that the length of a blank character field is not important; do NOT try entering multiple underscores.) Also, note that this only pertains to completely blank values, not to blank characters within text fields. You would enter "San Antonio", NOT "San_Antonio".

Logical Operators and Parentheses

Note: This section can easily be skipped by novice and casual users; learning how to use these features will yield a relatively small gain.

Every once in a long while you have a situation where the filtering condition requires more than the typical simple specifications. Something more than just give me the data for this state or this county or this geographic summary level. Tools are provided to permit creating such complex conditions. These are:

  1. The ability to specify logical operators connecting your variable-operator-value conditions. By default the logical operator is "And".
  2. Checkboxes down the left and right sides allow specifying opening and closing parentheses. These allow you to create groupings of your conditions so that the logical operators that are inside parentheses get applied first.

The only good way to explain this is by example: Suppose you are accessing a dataset that has data summarizing states and counties for the U.S. The variable SumLev has the standard geographic summary level codes, while the variable TotPop contains the total population for the geographic area. You want to select data for all states that have a population of over one million, and for counties with a population over 50,000. On the first line, click and type to create the condition SumLev Equal To(=) 040. On the second line, create the condition TotPop Greater Than 1000000. On the third line, create the condition SumLev Equal To(=) 050, and on the fourth line the condition TotPop Greater Than 50000.

If you were to simply enter these four conditions and leave the relevant logical operators set at the default "And" values without any parentheses specified, what you would be creating is a logically impossible condition (and hence a failed query where no rows are selected). This is because the value of the SumLev variable cannot be equal to both 040 and 050 (as specified on the first and third lines). Clearly, what you have here is not a simple set of four necessary conditions, but rather a pair of sufficient conditions, each consisting of two necessary conditions. What we would like to do is this:

What we have here are two compound conditions, and we want to select the observation if either of those conditions is met. So we need to use an "Or" logical connector. We do this by checking the small circle next to the word "Or" between the second and third rows to choose it as the logical operator. To insure that the program first evaluates the compound expression based on the first two lines, we check the left parentheses box at the left of line 1 and the right parentheses box at the right of line 2. Similarly we use the left paren and right paren boxes on lines 3 and 4, respectively, to insure that these two conditions are logically combined (with the "And" operator) first, before applying the Or operator. The "Or" operator is then applied to the result of And'ing the conditions on lines 1 and 2 (the first condition for the "Or" operator) and the result of And'ing the conditions on lines 3 and 4. If either of these compound conditions is true, the row is selected. Translated into SAS code, our condition looks like this:

(SumLev=040 And TotPop > 1000000) Or (SumLev=050 and TotPop > 50000);

Limiting the Number of Rows Selected

Finally, you can enter a number into the text box at the end of Section II to specify a maximum number of rows that can be selected. Dexter counts the rows selected and when the count reaches these number it acts as if it had encountered the end of the input dataset. Notice that this limits the number of observations selected, not the number of observations accessed. It only counts those rows that make it through your filter. A good use of this option is to do quick test runs by entering a smallish number in this box so you only select this many result rows. You can then examine the results to see if they appear to be what you had in mind.

Accessing Variables Metadata

The form provides a pair of links to the varmeta utility application that will display a listing of each of the variables (either just the ID variables, or all variables on the dataset — there are separate links for these two cases). These reports display some of the key attributes of the specified variables, including their name, label, format, type (character string vs. numeric), and length. In general all (most — there are a few exceptions such as numeric variables named logrecno, key, or obsno) ID variables are character strings (and all character string variables are IDs). The Format column may be one of the most useful. When you see a variable with a custom format code associated with it (such as $state, $county or $cbsa), this means that the value you will see in your extract (other than a SAS dataset) will be the result of converted the stored code to the value label for that code. For example if the variable County is type C, length 5 and has format code of $county, then if the value stored on the dataset is "01001," then the value you will see in your output file(s) will be "Autauga AL". The $county format code looks up the value and returns its label. (You can modify the format associated with a variable in Section V-c.)

III. Choose columns (variables)

In Section II you specify what rows you want to keep; here in Section III you specify the columns (variables) you want to keep. It is a much easier process than coding a filter, which usually involves nothing more than clicking (selecting) values from select lists. It can get tedious when working with datasets that have a lot of variables, but it is conceptually simple. The simplest think to do is check the box at the top that says you want all the variables. It may be simple, but it is not usually what you really want. Most users most of the time will want to be selective. Think about what you'll be doing with the results — do you really need or want hundreds (or thousands in some cases) of variables? Remember the 255 column limit if you are intending to take your extract into Excel.

Note that Section III is the only one where you are required to enter something. All the other parts of the form have built-in defaults that apply if you ignore them. But there is no default for your variable selections — you must do something here, even if it just checking the box saying you want all the variables.

We provide two column select lists, identifiers on the left and numerics on the right. Typically, you will need to choose just a handful of identifiers — be sure to take enough that you will know what your data pertains to. It is the Numerics list that can sometimes be a challenge, depending on how many there are on the dataset.

In a few special cases involving mostly complete-table decennial census files, the Numerics list (which allows you to select variables) is replaced with a Tables list. When this is the case, you are only allowed to select entire tables — at least using the select lists. This is a trade-off. We determined after considerable experience trying to work with some of these oversized datasets that going with variables organized by table was the way to go. Examples of filetypes that support table selections are sf32000 and sf12000. An example of a filetype that should support such selections but does not is sf42000. These are all 2000 census Summary Files.

Other filetypes, such as our standard extracts based on 2000 SF3 tables, sf32000x, have enough variables (between 100 and 2000 is the magic interval; for fewer than 100 it is not worth the bother, and for over 2000 it takes too long to respond to searches) to justify some special extra tools for making it easier to find and choose what you want. These tools (select list filtering) are described in the following paragraph.

Filter by Regular Expression Processing

When you see the Filter by regular expression box below the Numerics select list you have the option of typing something in this text box — a regular expression — that will be used to modify what appears on the Numerics select list. For example, if I am processing a typical sf32000x dataset with its over 400 variables I might be looking for creating an extract of variables related to income or to poverty. I can enter the value "poverty | income" and click on the Filter button. The Numerics select list will be regenerated and will now only include variables where the specified keywords "poverty" or "income" are present in either the variable name or its label. The vertical bar (|) in the expression is an "or" operator that indicates that you want to select variables that contain either word. You can now make your choices off the much shorter list. If you do not like the results of your filter simply click on the Clear button to restore the list to its original all-variables-shown state. Click the Case-sensitve button in the unlikely case that you want your regular expression matching to take into account case in matching strings - so that "Poverty" does not match "poverty".

There is a Third Way

The last thing you see in Section III, just before the Extract Data button, is a text box where you are invited to enter your own variable list. Most of you will not ever do this. It is easy to mess up when you type in your names; that is why we have pull-down select lists. But there are cases where this method can constitute a shortcut, especially for users who are familiar with the syntax of SAS variables lists. Entering a value in this box will cause Dexter to ignore any selections you may have made — the list you type is your entire select list; it is not an addition to previously-selected items. (But if you checked the "ALL columns" box, that takes precedence, and any list you enter here will then be ignored.)

A reason for using this "third way" is that it lets you control the order in which your variables will appear on all your outputs. You need to code an "@" symbol as the first character entered here. Normally the order is based on the order of the variables in the source dataset and cannot be modified. So if I were to enter "@name address age city state zip", I could be certain that on my output report and/or delimited file the variables would appear in the order I typed them, with name first and zip last, regardless of their order on the source dataset. This can come in handy for custom reports and even for spreadsheets where you want control over the left-to-right order of your columns. You cannot use double-dash intervals (see below) when using the leading @ feature.

Perhaps the most common use of this text box is to allow selecting specific variables from a dataset where you get a Tables select list instead of a Numerics select. Say I were accessing the SF3 complete-tables dataset for all P and H tables for ZIP codes (ZCTAs) in the United States (file uszipsph.sas7bdat in data directory /pub/data/sf32000) and I had studied the detailed metadata which pointed me to a varlabs file that was the best tool for seeing what variables contained what information. Furthermore, let's say I was interested in calculating a poverty rate for children aged 0-17 for every ZIP, and the data cells I needed were in table p87. I could just select the entire table (which has data on poverty by age with a total of 17 data cells, of which only 8 were relevant to the number I wanted.) I would type in the text box something like this:

geocode state p87i3 p87i4 p87i5 p87i6 p87i11 p87i12 p87i13 p87i14

This is fine, since we only had to enter the names of eight data cells. But similar requests in the real world could possibly have many more data cells required. This is where knowing how to use special list notation comes in handy. There are basically three key variable list shortcuts that can be used:

  1. Single-dash intervals can be used to specify a list of variables that have a common base name with numeric suffixes. For example the list "p87i3 p87i4 p87i5 p87i6" (used in our last example) can be written as "p87i3-p87i6".
  2. Double-dash intervals can be used to specify a list of variables based on their relative position in the (SAS) dataset. We are very careful when creating datasets for the MCDC archive to insure that variables are placed in a logical order. Identifiers usually come first, followed by the numerics. When numerics are part of tables they are stored in ascending table number order and the variables are in the expected order within the table: if table p87 has 17 cells they will be named "p87i1" thru "p87i17" and they will be stored in that order. What this allows you to do is pick long "data intervals", often comprised of consecutive tables. If I wanted to select all the table cells from table p87 through p93 (all tables that deal with poverty — this is sf3, 2000 census — filetype sf32000) I could simply include in my variable list "p87i1--p93i19" and all of the variables corresponding to the seven tables would be specified (kept on output).
  3. Colon modifier lists are used to specify all variables with names sharing a common root name. This notation is very handy when you want to keep all the variables from a set of tables on one of the MCDC's detailed summary file datasets. If I type "p12i:" in my list I am saying I want all variables with names starting with the characters "p12i". Our table-variable naming convention guarantees that this is exactly the set of variables comprising table p12. The variable names are p12i1, p12i2, ..., p12i31. The two lists "p12i:" and "p12i1-p12i31" yield the same result, but the former is easier to code, because you don't have to look up the number of cells in the table as you would to code a single-dash interval.

IV. Other Options, Non-essential

This short section consists of a set of text boxes where you can specify some useful options. The first three pertain only to the Listing/Report output type. The sort option applies to all outputs. Special characters, including quotes (apostrophes or double quote symbols), percent sybmols and ampersands, will be removed from any values entered for titles or footnotes in this section.

V. Advanced Options

Note: this section is not intended for the casual user.

Part a: Aggregating the Data

You may find occasion where the data stored in one of the archive datasets is more detailed than what you really need. It may have too many rows; you would prefer to collapse across rows to get numeric data summarized by aggregating (summing) the variables in rows that share a set of common identifier, or aggby variables. For example, let's say you are accessing one of the recent population estimates datasets where each row / observation had data for a county. Let's say that the dataset contains a variable named CBSA that has the code for the current (core based) statistical area (Metropolitan or Micropolitan Statistical Area) associated with the county. Let's further suppose that what you really want is not the county populations but instead the CBSA figures. So you want to aggregate the data by CBSA, or perhaps by CBSA-within-state, so that you could get the Kansas portion of the KC MSA and the Missouri portion. Here is how you could do that:

Part b: Advanced Report Formatting Options

These options only make a difference if you chose something other than "none" in Part I from the Listing/Report format options on the secnd line. These options have to do with various cosmetic and formatting features that only pertain to data in a report format.

Part c: SAS Variable Attributes

The general rule is that you don't have to know SAS to use Dexter. But this is one section where, while not required, knowing the language could come in handy. Here we allow you to type in the body of three different SAS statements that are used to modify attributes (names, labels, and formats) of any of the variables you have selected. The syntax of these statements is simple enough that even if you have never coded a SAS statement in your life, and hope to never have to, you should not have too much difficulty using these statements.

Part d: Transpose Output

Use this section to specify that you want to restructure the output data so that what would normally be rows become columns and what would normally be columns become rows. In the classic (most common) case of a geographic summary dataset you would normally have geographic areas as the rows, with characteristics of those geographic areas going across as columns. For example, if you extract two sf3 summary tables for all the counties in your state you would get one row per county with all the table variables (cells) appearing as columns. But maybe you would prefer to have the variables (table cells) represented as rows in your spreadsheet with attribute values for each county appearing in its own column. You can accomplish this with the options in this section. For this to work you have to start by checking the box to indicate that the output is to be transposed.

The by variables option is rarely used. It requires that the data being extracted have the same set of rows repeated for the values of these by variables. Maybe if the dataset contained time series data that repeated a cycle for several values of a variable Year, you could specify year as a by variable for the transposition. It would then remain a column instead of a row, and you get a new row for each variable, one for each value of year.

The Name variables on output ... line lets you specify what the names of the transposed variables will be. You do this by specifying an alpha prefix and then specifying a variable in the dataset whose value will be appended to that alpha prefix to form the name. For example, if you were transposing a dataset with county-level summaries, you might specify a value of "c" as the prefix and the variable "fipco" for the second option. Then if an observation on the dataset had a value of "01001" for the variable fipco then all the values from that row would be transposed into a column and that column would be named "c01001". Note that the resulting name must be unique — there can be only one observation on the extracted dataset with a given value of fipco.

The Assign labels to the output ... line lets you specify the name of a variable in the dataset whose value will become the label of the variables after they are transposed. A frequently used variable for this function is Areaname, i.e. it is a character variable the value of which describes (identifies) the (old) row. The variable label appears as the second row (line) of a CSV output file, with the variable name appearing in the first row. In a report output the column headers are by default the variable names, but you can override this by checking the option in section V-b (Advanced Report Formatting) saying to use variable labels as column headers in reports.

Transposing is a tricky feature. As a practical matter it requires that you have a set of numeric data items with similar characteristics and just one or two ID variables which are used to name and/or label the resulting columns.

Part e: Coordinates of Specified Point (distance calculations and filtering)

This section can be used to provide parameters used to select observations based upon distances from a specified point (calculated by Dexter). This feature only works provided that:

  1. The data set's observations represent geographic areas for which centroid/internal point latitude-longitude coordinates are available. They could also be point observations containing the latitude-longitude coordinates of a specific location, such as you would find on a geocoded address file.
  2. By "available", we mean that:
    • they are contained as variables named intptlat and intptlon in the data set, which must be selected in Section III of the form; or
    • such coordinates can be "looked up" by means of a geoid key variable or sumlev/geocode variable combination in the current data set (in which case these geographic key variables must be selected in Section III). Common geographic units such as state, county, place, ZIP, and PUMA should work, with more to come.
  3. In Section II. Choose Rows, you must have a condition that references the pseudo-variable _distance. Typically, you will specify a condition such as "_distance Less Than 100", which would indicate that you only want to choose geographic areas where the distance is under 100 miles. What distance? That is what the following parameters are used to specify.

The two text boxes labeled Latitude and Longitude can be used to enter the coordinates of a location that will be used to calculate the _distance values. These values should be entered as decimal fractions, with as many digits after the decimal point as you think are needed. A typical value for the latitude would be "38.9898", and for longitude "-92.313400". The leading minus sign on the longitude coordinate is optional and will be assumed if not entered. (The program only works for locations in the western hemisphere.) The trailing zeroes on the longitude value here make no difference and could be left off.

You can also enter the coordinates in degree-minutes-seconds notation rather than as decimal fractions. To do so you must use dd.mm.ss format, where dd is the number of whole degrees, mm is the number of minutes and ss the number of seconds. Both mm and ss should include leading 0s if necessary. For example "38.40.04" would represent 38 degrees, 40 minutes and 4 seconds north latitude. A value of "41.40.64" would be invalid; the value of mm and ss must be between 00 and 59. The program will recognize that you are using this format by checking for and seeing two decimal points (periods) in the entered value.

Alternatively, you can enter a five-digit ZCTA/ZIP code in the Latitude box. The program will recognize a five-character string without a decimal point as a ZIP value. It will then do a lookup of the internal point coordinates for that ZIP code in a reference data set, and will use those values to do _distance calculations.

There are two checkbox options that follow. The first can be checked in oder to specify that you want to keep the calculated _distance variable as well as the intptlat and intptlon coordinate variables which may have been looked up. The second checkbox lets you decide how you want to handle cases where the coordinates of the current observation cannot be determined. Do you want to keep those observations or filter them out? The default is to filter them out, but you can easily override that by checking this box.

Additional Options at the Bottom of the Form

Rankster Option

This is another relatively new feature, added to the application early in 2010. The Rankster dynamic web application can be used to create reports and data extracts where the focus is on ranking the data based on specified variables. In many ways Rankster parallels Dexter in the way that it accesses and processes datasets in the archive. There are two basic ways you can invoke the Rankster application, and they both require using this line. One way to do it is to just skip Dexter and take the specified dataset (in its entirety, i.e., all rows and all columns) and go straight to Rankster. You can do this by clicking the link at the end of this line. The other way to invoke Rankster is by specifying (via the Rankster option: select list) one of the two options other than the default "Not interested" value. The two "yes" options allow you to use Dexter as a preprocessor for Rankster. You can specify that you want to view the Dexter results and then have an option to pass the results of your Dexter query to Rankster; or, you can take the more I-know-what-I'm-doing approach and skip viewing the Dexter results and just go straight to the Rankster query form with the results of the Dexter query specified as the input source for Rankster. By having Dexter serve as a front end module for Rankster means we can take advantage of its nice filtering and variable selection features without having to reimplement them within the Rankster application; it also is a way of sharing the uexplore navigation tool to choose the Rankster data source.

Debugging Options

These options (following the final Extract Data and Reset Defaults buttons) are intended for use by the persons responsible for developing and debugging the software. Please ignore this section. This does not include the very bottom section starting with Questions and comments .... We strongly encourage you to take advantage of that section.