----- November 17, 2014 -----

New Document Deals with the MCDC Data Archive

Ten Things to Know About the OSEDA/MCDC Public Data Archive provides an array of useful information for anyone interested in exploring our public archive of census and other data. We have numerous documents (tutorials, sample queries, video training modules, etc.) that deal with how to use the Uexplore/Dexter software tools - as seen on this excerpt from the data archive home page:


But until now we had only a single module (a 7-year-old PowerPoint presentation) that dealt just with the underlying data collection. Now there is a more up to date and easier to access module that deals with many of the same topics. A brief summary of the ten "things" follows.

  1. It's a big collection of data. We cite some statistics to summarize just how big the collection is (366 gigabytes, FWIW).

  2. The data are organized into dozens of categories that we refer to as "filetypes". We offer a brief explanation of how and why.

  3. Most of the datasets in the archive (over 90%) contain geographic area summaries. Seems obvious but a lot of people think we have individual cesus returns that we can summarize any way we choose.

  4. Become comfortable with the SumLev and State variables. Sometimes we have separate datasets for each state and for each kind of geography for the states. But lots of other times we have all the summary levels and/or all the states in a single dataset. All you need to know to get (for example) just the state and county level data for Illinois is how to code simple filters using these two variables.

  5. Datasets.html files are your friends. They can make navigating and understanding the archive a lot easier.

  6. It's not just Missouri data. It used to be, a rather long time ago. But it's become a lot cheaper and easier to do the whole country.

  7. Knowledege of SAS(r) is not necessary. However.... Of course you can use these tools without any knowledge of SAS. But there are a few places in the Advanced Options section where knowing some SAS can come in handy.

  8. The most popular datasets in the archive are mostly those based on the 2010 decennial census and the American Community Survey. Half the hits on the archive in a recent 2-year period were against just four filetypes -- two 2010 census collections and the two most recent ACS collections.

  9. Detailed Tables vs. Standard Extracts. This is where we explain the relationship between filetypes such as sf12010 and sf12010x.

  10. ACS Summary Tables Do Not Have A Separate Filetype. We have hidden them in special subdirectories of the acs filetype directories. They are worth finding.

Five-year ACS Estimates To Be Released December 4

The much-anticipated 2009-2013 period estimates are coming soon.