Dexter Quick Start Guide

A primer for the MCDC's Data Extracter application

There is now a video-based companion module for this tutorial. (We have replaced the original with a revised version that uses mp4 format in place of avi modules).

This brief tutorial is intended to assure the casual and/or first-time user of Dexter that the application does not have to be as complicated as the input form makes it look. To be sure, some extracts can get pretty complicated, depending on the data set and what you need to do with the data. But in many cases the program can create useful extracts with almost no input specifications from the user. The general rule of thumb is that Dexter makes it very easy to extract a lot of data; it only gets more challenging when you use it to get exactly what you want.

Note that the section headers on the Dexter query form are also hyperlinks to detailed online help.

Note: Dexter and the query form that is used to communicate your queries to it have been around for over a dozen years now and continue to evolve. The screen shots used in this document were taken in July, 2010. There is a good chance that by the time you read this some of what you see here will not be exactly as they look with the latest version. Suffice to say, we'll try to edit this module for modules that are significantly different. We encourage users to help us keep current and accurate by letting us know when you find things in this document that are just not true or that are not consistent with what you find when using Dexter.

Dexter Form Has 6 Parts - But Only One Requires Input from You

The Dexter query form is divided into 6 parts - the header area and the five query sections. We shall describe each in turn.

  1. The header area begins with links to this Quick Start Guide and to the online help module.

    The line starting with Data Set: identifies the data set being accessed. The path (directory) being accessed is displayed separately and is a link, which if clicked will take you (back, in most cases) to viewing the directory's contents via Uexplore. The data set name is displayed in a text box. Dexter allows the user to edit the name of the data set being accessed, but just because you can does not mean you should. If the set you specify does not exist (typos are easy to make and common) or if the new set specified does not have the exact same variables as the one being replaced then the Dexter application will fail (not immediately, but when you try to do the extraction).

    Note re the Rankster option: Due to lack of use we have moved the Rankster option line to the bottom of the form. We considered deleting it altogether but since we use it quite a bit ourselves we decided to keep it on the form, albeit in a location where users will not be distracted by it. So if you are looking for rankster, you'll now find it at the bottom of the (Dexter query) form.

    For many (but not all) data sets there will be a line after the Data Set line providing the user with a link to "detailed metadata" for the current data set. It is almost always a good idea to follow this link the first time you access a data set.

    The Quick Look button is useful when you want to get an idea of what the data set looks like by previewing a subset. Clicking this button displays the first 100 observations/rows of the data set (unless the data set contains fewer than 100 rows, of course) and, by default, just the identifer variables (the ones displayed in the Identifiers select list on the left side of Section III). We say "by default" because if you want you can select the columns of interest in Section III first, and then hit the Quick Look button to see just your selected variables for the first 100 rows.

    A new feature being added September, 2016 is the (Display Comments text box). Click on the line and a text box will open on the form. You can enter a description of your query here and it will be passed to Dexter and will be displayed on the Summary Log page. And if you save your query page it will also be captured there. We intend to use it ourselves to help document the modules we create for our save Queries modules.

  2. Section I is where you specify what output format(s) you want.

    If you want a comma-delimited file--which for most web users these days is tantamount to Excel because their browser has been configured to invoke Excel when they click on a link to a csv file--then you need do nothing in this section (csv stands for comma separated values). If you want your output in a report format (not usually recommended if you have lots of variables), then make a choice other than "none" from the 2nd set or radio buttons; plain text is the fastest with the least formatting and is good for quick-and-dirty queries; pdf requires the most time to generate but perhaps looks the best. HTML is somewhere in the middle in terms of resources required to generate; it looks good in your browser but may lose something when printed. If you want something in a database format the 3rd row of radio buttons lets you specify output in either dbf or SAS dataset (for Windows) format. Note that the 3 rows of radio buttons are independent of each other; you can select up to 3 separate output formats, one per row. (The only thing you should not do is choose "None" from each row, since that would result in no output. )

    The "Check here" button is an optional convenience that can be safely ignored, especially by new users.

  3. Section II not only takes up the most space on the page, but it is the one that requires the most attention for most queries.

    You can always choose to just not enter anything in this part of the form, which will result in your getting data for every row/observation in the data set. This may often be just what you want. In many cases, even if it is not exactly what you want, you can go ahead and take all of it and then delete what you don't need once you have it in your local file. You can also make use of the text box labeled ...you can limit the # of observations/rows on each output by entering a number here. By entering a small number (50, for example) you can quickly run a test of the extract and examine the results before going back and doing the final complete extract. Most data sets have variables that are keys to identifying the meaning of the data contained in a row; such identifiers are often geographic codes such as State, County, Place, etc. If you know the codes used for these variables it can be pretty simple to create a filter. For example, if the data set you are accessing indicates it has data for every county in the U.S. and you are only interested in data for the state of California then you can specify this by making choices/entries in the first of the 5 rows of entry boxes in Section II. (This is by far the hardest part of what we'll be dealing with here -- but remember, many extracts can be done without having to do any such "filtering"). The form uses drop-down menus in the first (leftmost) two columns of Section II, while the 3rd column is a text entry box where you need to type in a value.

    Note: there are also checkboxes down the left and right sides of the page in Sec. II that are used to insert parentheses into the logical expression. These are rarely used (only for complex logical expresssions involving 3 or more conditions) and should almost always be ignored, especially by new users.

    Assuming that the data set you are accessing has a variable (field) named State and that State contains a 2-digit FIPS (now called ANSI) state code (which it always will in our data collection) then you would select State from the pull-down in row 1, column 1; then select   Equal To (=)   as the value for Operator from column 2. The 3rd column is the hardest because you have to type in something with no menu from which to choose. Enter the value   06   in the text box. To do this you have to know your FIPS/ANSI state codes. You also have to understand the importance of typing the leading zero because this field is not stored as a numeric value, but as a character string. As a character string "6" is not equal to "06", so the leading 0 is required to get the proper filter. Typing a value in the 3rd column that does not match any of the values for the specified variable would result in filtering (i.e. eliminating from the resulting output) all the data and after you hit an Extract Data button would result in an error message telling you this and asking you to "Please check your filter ... and try again". At which point you simply click on your browser's Back button and try again.

    The section snapshot shown here reflects changes to the first row of the section to specify a filter condition saying we want only rows representing data for California. It also shows an entry of 100 in the box that will limit output to the first 100 cases that pass the filter.
    There are lots of examples of how to code filters in the Dexter online tutorial, and a discussion of how to use the metadata to access web pages showing you the values and meaning of those values for many key variables on the data sets. Two commonly used sources for viewing the values for coded variables are:

  4. Section III is the only section where you are required to select something. Meaning it is the

    one section where you must click or type something or Dexter will consider it an error. We could have avoided this by making the default be to keep all variables when the user made no selections. But we decided not to do that because we have so many data sets that have a hundreds and even thousands of variables and we thought it would be better to require the user to at least check a box if keeping all the variables is really what they want. While Section III may be required, and can sometimes be somewhat tedious to navigate if there are a lot of variables, it is really just a matter of making choices off a pair of select menus. We separate the ID variables from the numerics because we find that we almost always need to select a few good identifiers, and then we didn't want to mix them in with all the numerics. In many of our data sets there can be hundreds, if not thousands, of numerics. These side-by-side drop-down select menus contain variable names as well as descriptive labels in most cases. But you may still have to consult a data dictionary to make sure you know exactly what some of these variables really represent. Be sure to hold down the ctrl key when making multiple selections from these select menus.

    For some data sets (those having more than 100 numeric variables) you will see extra options below the select list labeled Filter by regular expression. We suggest first time casual users just ignore this feature until later. What it allows you to do is to enter something in the text box and then click on the Filter button to have the list regenerated so that only entries that match the text box entry appear on the menu. This can be very handy when the list has hundreds of entries. But for the large majority of data sets and queries you will never have to use this feature.

  5. Section IV tells you in its header title that what is specified here is non-essential. It is also very easy and useful.

    These are text box entries that can be used to provide title and footer labeling for report output, and for specifying a sort order for the output. This section can usually be skipped.

  6. Section V is beyond the scope of this document and can be ignored by first-time users. You should learn to do basic extracts before trying to use these advanced features. When you are ready to try these out you can follow the link (the section header) to the online documentation to see what they are all about. Not everything in this section is really all that complicated.

Running the Application and Retrieving the Results

To invoke the Dexter program to have it execute your query you need to click one of the buttons found at the end of sections III through V. You can ignore the extra row of options that follow the Extract Data button at the bottom of the page; these options are intended for internal use by MCDC staff primarily.

The results of the query will be displayed back to your browser. Wait for it to display a message saying Processing completed at <time-and-date>.. Note that at the the top of the screen - on the 2nd line - you should see a job identifier code (22JUL10_00007 as shown).

If you encounter an error while using Dexter that you do not understand, you need to report the error to the Questions and comments regarding Dexter e-mail link at the bottom of the Dexter query form and you need to include this job ID as part of the message you send. Hopefully, such problems will be rare. If the query works you will see a bolded, underlined link to a Summary Log page and then to the requested output file(s): delimited, report and/or database. The Summary Log page is your hard-copy record of what you specified and provides a concise record of the query. If you need to document your source we strongly recommend that you save this file somewhere.
The links to the output file(s) will be to temporary files created in a quasi-temporary directory created when you invoke Dexter. This directory is automatically deleted after 48 hours but can be accessed until then. (Occasionally, we may experience overload on the system causing us to run short of temporary work space, in which case your results (i.e. the output directory) may be deleted in less than 48 hours). We strongly recommend that you download your output files as soon as possible. You can do so by simply clicking on the link; what happens at that point will depend on how your browser had been configured to handle files with the specific file extension. These extensions are:

You also have the option of doing an explicit download by right-clicking on the file link and then selecting the Save Link As option. You will then be prompted to specify the target location on your system.

Note that you can share your results with colleagues or clients by right-clicking on the file links and choosing the Copy link location option to capture (copy to the Windows clipboard) the URL of the file. You can then send e-mail to the person you would like to share your results with, pasting this URL into the body of the e-mail message. This technique is used frequently by MCDC personnel who run queries for users and then simply e-mail a link to the results to the user. (Just remember the 48-hour time limit. This is not a good idea if you are running something late on a Friday afternoon, since the file may be gone by the time the user gets the message first thing Monday morning.)

If your browser is not set up to handle this particular type of file (for example, SAS data sets are rarely directly accessible within a browser) you can right-click on the file link, and choose the "Save link as..." option (or whatever equivalent your browser provides). This allows you to save the file to your desktop where you can then access it with your local software. In the case of a SAS data set the typical process involves doing a right-click and saving the .sas7bdat file to your local system. Then you can open it within a SAS desktop session by coding a libname statement referencing the directory where you saved the file and then by opening the data set within that directory. So, for example, if you save the file as C:\temp\xtract.sas7bdat on your local disk then from your SAS session you would submit the statement
libname temp "c:\temp";
If you have the FSP product on your system you could now open an FSBROWSE of SASVIEW window to view the data. Or you could run a data step that starts with something such as:
proc summary data=temp.xtract (where=[condition]); ....
Note that the file you save to on your desktop should be named xtract.sas7bdat (because the SAS data set name is actually stored within the file). If you want a different name you can rename it using SAS (use the DIR window if you have FSP, otherwise use Proc Datasets).

In many cases you may need to submit the SAS statement options nofmterr; to avoid having problems caused by references to custom SAS format codes that may appear in the data set. Do this prior to attempting to access the data set.

If you view your results and decide they are not exactly what you intended (or you just change your mind, or you like it so much you want to go back and get even more data) you should be able to use your browser's Back key to return to the Dexter query form. Unfortunately, it cannot be guaranteed that when you return that the browser will "remember" all your specifications that you just entered on the form. Our experience has been that with the Firefox browser this is never a problem, our specifications are always restored. With IE it is not always so. The Firefox browser also has a useful feature that is not available wiht IE (version 7): you can save your query to a file on your desktop by doing a File - Save as from the Dexter query screen after it has been filled out. Firefox generates code that preserves all your textbox entries and select-list/option choices. Subsequently you can open this saved htm file (in any browser, not just Firefox) and you will be back where you left off. You can even send your saved query file to someone else and tell them here is how you do it - just open this file in a browser. They can then customize the query and resubmit it. For example, if you did a complex extract of data tables for all the counties in California but your colleague in New York wanted NY counties, they could easily modify the filter row where you specified the condition State Equals 06, specifying instead: State Equals 36 and they would get comparable results for New York counties.

Getting Help With Your Query

You can always ask for help with your query. The ideal situation is to have a query system where users can go and access a wide variety of data strictly on their own without any assistance from the people who built the system. While we continue to work toward that ideal, the current reality is that in very many cases users -- especially new ones not familiar with the software and the data archive -- will simply not be able to easily find and/or extract the data they need. These users should keep in mind that the Missouri Census Data Center is housed within the Missouri State Library (within the Secretrary of State's office) and is funded by the state of Missouri specifically to provide assistance to the public in accessing the public data found in this archive. Not asking for help with your data query would be like going to the public library and being afraid to ask the librarian to help you find a book. (Helping you find the book, of course, is not the same as reading the book to you, or telling you what it "means".) As a participant in the Census Bureau's State Data Center program, the MCDC has an affiliate network of agencies throughout the state of Missouri who are available to assist you with accessing these data (see the About MCDC page for details). You can also use the e-mail link at the bottom of the Dexter query page (and this page as well) to ask for assistance. We cannot guarantee, of course, that we have the data you are looking for, or that Dexter will be the appropriate tool for accessing the data and getting it into the format you require; but we are always available to assist you as best we can with the data and tools we have available. In many cases, MCDC personnel can run the query and point you to the results much faster than they can explain to you how to do it. That is often the best solution if you think you are in that category of first- and last-time user.

To Learn More

The MCDC has a number of online tutorials (besides this one) related to Dexter and Uexplore. To see what's available visit the Uexplore/Dexter tutorials web page. (The Powerpoint modules referenced there should be helfpful, but they were really developed to be used in a hands-on workshop setting.) You can also see a collection of detailed, annotated examples of Dexter queries at the X Samples page. If you would like to see a specific Xsample let us know -- we do take requests. And for those of you who just want the facts, you really can't beat the basic online help.