In order to answer most of these questions you are going to have to know how to use the Uexplore/Dexter web tools for accessing the MCDC data archive.
If you are not familiar with these tools we suggest you start at the Data Archive via Uexplore/Dexter section of the MCDC Quick Tour page. This will lead you to other helpful pages such as the Uexplore home page, the Uexplore Application Description page and the Dexter Quick Start Guide.
Another possible place to look for answers (not necessarily involving the data archive or Dexter) is on the MCDC's Population Estimates page, accessible from the navy blue navigation box on the upper left of most MCDC web pages.
For most of the questions you will be using Uexplore/Dexter to access data in the
/pub/data/popestsdata directory. If you have read the appropriate background material you should know that the way you access these data is to go to the Uexplore home page and navigate to the Population Estimates major category and then from there to the popests filetype directory. You should also be aware that once you arrive at that directory via Uexplore the best tool to use for actually looking for relevant datasets is the Datasets.html page. The URL for this starting point page is http://mcdc.missouri.edu/pub/data/popests/Datasets.html.While this page is not intended as an introductory learning tool, if you are one of those people who thinks they can skip the documentation and learn by just studying examples, then an alternate way to use this page is to read the questions and go straight to the answers in the back of the book (bottom of the page). We don't really recommend this approach, but we understand it. We certainly don't expect most users to be able to get all of the right answers, and so we expect that an important value of the page is to learn more about the datasets and the web software tools for accessing them by following along in the Answers section.
There is some information provided here (mostly within the Answers section) that could be construed by many as more information than they really need to know. These are usually technical asides explaining some relatively minor detail. We have used the convention of displaying such non-essential material using teal-colored italics. Feel free to skip over such material. Or, better yet, read it once and if they make no sense, don't worry about it.
For example, a convenient way to keep in synch between the Questions and Answers sections on this page (i.e be able to jump back and forth between a specific question and the corresponding answer) is to open a separate browser window displaying this same page. Scroll to the Questions section in one of the browser windows and to the Answers section on the other. Neither window needs to be very large so you can resize them so you can view all or most of a question and the corresponding answer at the same time. Some of the answers may be too long for viewing all at once, but then you simply have to do a little scrolling in that window. These two browser windows should be in addition to a third browser instance (window), where you will be either attempting to answer the questions or following along with the solutions as provided in the Answers section.
Questions (Go to Answers)
- The Census Bureau releases a set of population estimates for states and counties each year (in the post-2000-census era) showing the components of change (basically in terms of births, deaths and net migration). Use uexplore/dexter to access the dataset with components of change data for the state of Missouri over the period from 2000 to 2005. What was the change in population for the state over that time period, and how much of that change was due to Natural Increase and how much to Net Migration?
- The current Kansas City metropolitan statistical area ("cbsa") includes 15 counties, 9 from Missouri and 6 from Kansas. Keeping this in mind, generate a report in HTML format showing each of the counties in that metro area and indicating what the population change and percent change were over the period from 2000 to 2005. What county had the largest increase in population and how much was it? What was it as a percent of the base estimate?
- The Census Bureau releases a set of population estimates for places (cities) and MCD's (Minor Civil Divisions, known as towns in many areas such as New England) each year. We have created a dataset named ussc04 (for United States SubCounty 2004) that has all of these estimates for all places in the country for each year through 2004. This dataset also has summaries for things like states and counties and even for the portions of places within a county or an MCD (you'll need to look at the values of the variable SumLev). But we are only interested in complete places (cities) with a population estimate (as of 7-1-2004) of at least half a million. Generate a plain text report with each line representing one of these larger cities and showing the 2004 population estimates along with the change and percent change from the estimates base. Sort the report in descending order by the current estimate. What is the 10th largest place in the U.S. and what was its estimated population for 2004?
- Using the same ussc04 dataset as in the previous question, create a csv and an html file that have the latest (7-1-2004) population estimates and the change in population since 2000 for all the towns (MCDs) in Massachusetts. Be sure to keep the county as well as the MCD since you might have 2 towns with the same name in different counties. What was the change in population between 2000 and 2004 for Brookline town in Norfolk county?
- Some of the most complex and yet potentially useful estimates done by the Census Bureau are the county level estimates with demographic detail: age, sex, race and hispanic origin. These are large and fairly complicated datasets for a first-time user. The MCDC keeps a complete collection of these estimates, with a separate dataset for each state. To have a chance at accessing one of these datasets you have to spend some time reading the documentation provided by following the detailed metadata link for the dataset; such links are available either from the Datasets.html page link or the link at the top of the Dexter query form. Having read that metadata page carefully you should be able to attempt to answer this question. How many African Americans (alone, i.e. not counting multirace combinations) over the age of 65 are estimated to be living in Douglas County, NE as of July 1, 2004?
- In the metadata text for the casrh datasets (such as you accessed in order to answer the previous question) there is a reference to a us1dimsums_casrh04 dataset within the popests collection. This is a single dataset (instead of a set of 51 state level datasets) that contains summaries for both states and counties but only for single demographic categories. So if you are looking for combinations such as age by sex, or race by hispanic you will not find them on the us1dimsums_casrh04 dataset. But if all you are looking for are estimates of the total hispanic population then this is a good place to find it. Access popests.us1dimsums_cashr04 to create a report showing all counties in the U.S. that had at least 200,000 persons of hispanic origin as estimated on July 1, 2004. How many such counties are there?
- The MCDC has aggregated recent county level estimates to metro/micropolitan areas (also known under the umbrella term CBSA's). These aggregates are accessible via Dexter or in a special report that we stored in the Reports subdirectory of the /pub/data/popests data directory. You can find the answer to this question either by accessing the dataset or the report. (There is a link to the estimates report library on the MCDC Population Estimates page.)
What is the July 1, 2005 population estimate for the Illinois portion of the Chicago Metropolitan Statistical Area?
- A very interesting collection of alternative population estimates that use a special set of 4 race categories (white, black, Asian & Pacific Islander and American Indian, Eskimo & Aleut) were commissioned by the National Center for Health Statistics and created by the Census Bureau. We have created a subcollection of these in the nchsbri subdirectory of the popests main directory. From uexplore, access the popests filetype and then just click on the nchsbri folder link. There is a Readme.shtml file in the subdirectory that tries to explain what these data are about. And then there are lots of datasets. We have estimates for the intercensal years of the 1990's and for the post-2000 years. And we have two variations on each of these:
- the original very detailed sets with age, race, sex and hispanic cross-classified (similar to the casrh datasets but without the mutltipe summary types.)
- summarized versions of these which we think most users will find easier to use.
Using the summarized (_sumry) version of these estimates for the post-2000 years for the state of California determine the estimated number of Asian and Pacific Islanders living in Alameda County, CA as of July 1, 2004 and the number estimated to have been living there on July 1, 2000. How do these numbers compare with the total population estimated for the same dates?
- What were the 10 fastest growing counties in the United States between July 1, 2004 and July 1, 2005? (Measured in terms of total persons added rather than percentage growth). Produce a report in html format with a custom title and footnote. The report should have only the 10 data lines and it should be sorted so that the fastest growing county appears first and the 10th-fastest appears 10th. Display the change in population as well as the percent change and the latest (7/05) estimate.
Extra credit: Use the Advanced Options section to customize the appearance of your report as follows:
- have the column headers in the report be variable labels instead of names
- have the variable indicating the county's rank appear as the leftmost ID variable (replacing the generic OBS pseudo-variable).
- Use an alternate style for the output -- use Astronomy.
- Check the option (in Section I) that says you want to have the output piped directly back to you without having to see the intermediate page and having to click on the HTML file link.
- We saved the toughest one for last. But if you've come this far you can do it.
Return to the nchsbri subdirectory and access the dataset with detailed data for the state of Delaware for post-2000 years. Use the aggregation feature from Section V to create a report showing, for each county in the state, persons aged 65 and over broken down by race and sex for the years 2000 and 2004. (We want these numbers not by race and sex as separate categories, but crossed - i.e. we want white females, white males, black females, black males, etc.) Use the formats feature of Section V to associate the format code $county with the variable County so that the name prints in the report instead of the code. How many white females aged 65 and over were estimated to be in Sussex county, DE on 7-1-04?
Navigate to the popests.mocom05 dataset (i.e. the mocom05 dataset within the popests data directory). popests.uscom05 would also work, but then you would have to slog through more data and would have to (or at least want to) add a filter to select data only for Missouri. This is the dataset to use because
All of this information is readily available from the Datasets.html page in the popests data directory -- the recommended starting page for these questions.
- it has Missouri as the geographic universe
- includes state as one of the (geographic summary) Units and
- contains "..estimates with components of change for 2000 thru 2005".
The Dexter query parms are rather simple. We used a filter of
SumLev Equals 040but it would have been OK to not code a filter since it would have been easy to scroll through to the state summary if you had generated data for all 116 entities on the dataset. More important to code a filter if you accessed the uscom05 dataset.The variables needed to answer the question are popchang, natrlinc & netmig for the total change, natural increase and net migration, respectively. The variable labels on the numerics select list are sufficient for this. So, the final answer is:
Change=203,627 Natural Increase=115,403 Net Migration=88,224
Extra Credit questions:
- What are the comparable numbers for California?
- What state had the highest net migration over the 5-year period?
- How much of California's net migration over the 2004-2005 year was domestic and how much was international?
- Navigate to the popests.uscom05 dataset. You need to use a dataset that includes data from two states, so you need the United States universe. You need county level summaries and you need data for 2000 to 2005. You don't need the components of change, but so what? As long as you have what you do need, which is just total change in population over the period. The key to this question is knowing how to select just those counties that are in the KC metro area. Access the metadata page for uscom05 (use the "detailed metadata" link at the top of the dexter query form) and then follow the link to the cbsa values from the Key variables section of the metadata page. That's where you find out that
28140is the code for the Kansas City, MO-KS metro area. We use the newer/current "core-based" metro area definition with the 5-digit FIPS codes, as opposed to the old 4-digit codes that were used before we had CBSA's and that are stored on the dataset as variable MSACMSA. There are 15 counties included in the new CBSA, only 11 in the old Kansas City MSA.Code your filter as
cbsa Equals 28140.Identifier Variables to keep are County and maybe (why maybe?) Areaname. For numerics you need to keep at least PopChang and PctChang from the Numerics select list in Section III. We suggest you keep Popest as well. Save a few seconds of having to search for the largest change by coding a value of
-popchangin the Sort text box at the bottom of Section IV. This causes output to be sorted by descending value of population change over the period. (In descending order because of the leading minus sign). If you output to csv/Excel you could do this sort in the spreadsheet as well.The answer is Johnson county, KS had the largest increase (by far) with 55,086 as the change which was 12.2% of its base 2000 population.
As it turns out you do not need to keep the Areaname variable for your report. That is because the variable County displays as the name of the county, rather than as a FIPS code. The variable county is stored on the dataset as a 5-character FIPS code but it has a SAS format code associated with it ($county) which serves as a value-label lookup table that causes it to display as the name.
Extra Credit questions/exercises:
- What were the fastest and slowest growing counties in the KC CBSA during the 1-year period from 7-1-04 to 7-1-05?
- What were the ten states with the largest domestic net migration increases during the latest available year and how did these figures compare with their international net migration figures?
- Explain why coding a filter of
County Contains Johnsonon this uscom05 dataset would result in no data being selected.- You are given that the dataset to access is popests.ussc04. Access the detailed metadata for this dataset and look at values for the key variable SumLev. From this you should be able to glean that what you want to select are level 162 summaries. Code your filter as
SumLev Equals 162.
AND
popest Greater Than or Equal To (>=) 500,000
Note that we show a comma in the value entered in the second part of the filter - "500,000". We do not recommend using such formatting characters, but because many users want to enter them we have taught Dexter to ignore them. Commas and dollar signs are removed from all numeric values entered in Section II.
Code a value of-popestin the Sort text box at the end of Section IV to get the desired sort order.The answer to the question is San Jose, California with a population estimate of 904522.
- If you are still in the browser where you answered the previous question, start working on this question by using the Back button to return to the Datasets.html page and re-select the ussc04 dataset. This will display the Dexter query form. As usual, you will want to access the Detailed metadata page using the link near the top. You need to get values for only two codes in order to answer this question. You want the state code for Massachussets and the SumLev code for towns. (You are being asked to provide numbers for a specific town within a specific county - why do you not need to know the codes for these specfic areas?). You can determine that 061 is the SumLev code for towns, and 25 is the state code required. Before building the filter remember to check the HTML radio button for Listing/Report in Section I so that Dexter generates a report as well as the default csv file. Your filter definition should look as follows:
SumLev Equals 061
AND
State Equals 25
In Section 3 choose geocode, county and Areaname from the ID Variables select list; choose variables and Popest and PopChang (the first two entries) from the Numerics list.
Note that in these variable select lists the first word of the entry is the actual variable name; if the variable has a descriptive label associated with it, then you'll see a hyphen and the label text following the variable name. In the case of the first entry in the Numerics list here "popest" is the variable name and "Current Estimate (7-2004)" is the label.
You could live without geocode variable in your output perhaps but it may come in handy, especially in the csv file as a key if you ever need to link these data with any other for the same geographic areas. No need to choose State or SumLev since we know they are constants for this report. A variable does not have to be chosen in Part III in order to be used as part of a filter definition in Part II.Run the extract and then access the output HTML file by clicking on it to view the report. Use your browser's find function (usually edit - find from the menu bar or type ctrl-f) to search for "Brookline". This should lead you to answer for this question: -919. This town had a net loss of 919 people since the 2000 census. Are you certain that this is the right town, that we are really in Norfolk county here? We don't see the county name in the report, just the FIPS code for it (25021). You could repeat your find command to verify that the name Brookline does not appear again in the report. Or you could go some place (such as our Cure for the Common Codes web site) to confirm you had the right county.
There is another way you could enhance your report and verify that you had the right town in the right county. You can have Dexter display the name of the county in the output instead of the code. This requires using the dreaded Section V - Advanced Options. Under item C there is a box that lets you enter variable names and format codes to be associated with those variables. By typingin this text box you will be associating a format code named $county with the variable named county. This format code takes a 5-character FIPS county code as its argument and returns the name of the county followed by the state postal abbreviation. Note the period to the right of $county here -- that is an essential part of the required syntax here; the period is always required at the end of a format name. (Well, almost always.) If you make this entry on the form and rerun the request you will see that the County column of the report and spreadsheet now contain county names instead of codes. If you really want to make your report look professional and cool, you can take advantage of another Section V option. Enter the value "County" (without the quotes, of course) in the "by variables for report" box under section b. This indicates that a sort will be required, but in this case the data are already sorted by the county values. Try making this entry and see how the report is changed; the value of county is now shown on a separate "by line" instead of as a column of the report.county $county.- This is where we really separate the men from the boys. To begin with -- even before you tackle the task of really reading a detailed metadata sheet -- you need to figure out which metadata file you'll be reading. Which means you have to first determine which of the nearly 100 files shown on the Datasets.html directory page has the data you need. To figure that out, you have to remain calm and look around. Carefully peruse the page, focusing on the Label column where a short description of each dataset is given. If you do this you should soon notice that about half the Label values are essentially the same, with the only difference being the first word of the label, that being a state postal code (ca, ny, mo, etc). For example, you should see around line 23 an entry for dataset alcasrh04, with the label "al county estimates by age, sex, race, hisp:2000-2004". The "al" at the beginning is short for Alabama (see the entry in the Geographic Universe column to confirm this). This is the kind of dataset we are looking for. Now just go down and find one that indicates it has data for ne - Nebraska. They are in alphabetical order within this group of datasets, so it shouldn't be too hard to find - necashr04. Click on the name and when the Dexter query form appears, follow the link to the detailed metadata page for the dataset. Spend a minimum of 10 minutes reading this page. Focus on the Usage Notes section - that is where we have attempted to explain how these datasets are organized.
Most people are unable to use these datasets because they don't understand the SumType variable. It is complicated, but not as bad as it first appears. The complexity is the price we pay for a great deal of flexibility. By filtering using values of SumType you have access to a wide array of different population summary types. (In many cases, these summaries are not explicitly present in the data as released by the Census Bureau - they had to be calculated by the MCDC during our conversion process.) Read the first and seventh usage notes carefully and follow the link to see the values of the SumType variable. Your filter will have 4 parts: summary type, age, race and geographic area. You need to use the Key value links to determine the codes required for this filter. You want to choose a SumType value that indicates a summary by race (alone) and age (because that is what you need; sex and hispanic are not relevant for this query). What value(s) of age do we want? What value of race? And what value of County? Look these up and write them down. Then code your filter as follows:
The next-to-last usage note deals exactly with the case you have, where you want to code a filter for age combined with race alone, the latter being indicated by a lower-case "r" in the SumType value. The leading tilde (~) entered as the value in the first row of the filter indicates to make a case sensitive selection. (Actually, omitting the tilde would not cause the filter to fail in this case, since the race filter would eliminate the cases where it was not a race alone.) Note the required leading zero for the code value. For the age variable we want the value 14 thru 18 so we use the Greater Than of Equal To comparison operator. We can expect to see one output row/line for each of these 5 categories (age cohorts). And, of course, the last line selects Douglas county using its FIPS county code.SumType Equals ~ar.
AND
Race Equals 02
AND
Age Greater Than or Equal To (>=) 14
AND
County Equals 31055It does not really matter much what variables you select in Section III, as long as you include popest_2004. You might want to choose County, Age, Sumtype and Race, just to be sure that your filters worked. You might even want to include variables sex and hispanic just to verify that they are blank on each of the selected observations, as expected because of the SumType filter. You should get 5 lines of output. You'll need to sum up the 5 popest_2004 values that are displayed to get the final answer - there are an estimated 4167 African Americans over the age of 65 in Douglas County, NE as of July 1, 2004.
Save yourself the trouble of having to add those 5 numbers together by letting Dexter do it for you. Go to Section V and fill in the first box there -- aggby variables. Enter
County race in that box. This will cause Dexter to sort the extracted data by county and race (not really necessary in this example since it is already sorted -- there is only a single value for each of these variables) and then aggregate (sum) the numeric variables, outputting one row/observation per combination of the aggby variables. In this case, there is only one such combination. But can you see how easy it would be to just drop the county filter and then just aggregate by county? The result would be a report/file of African Americans over 65 for every county in the state.- Hopefully you are familiar with the basic drill by now. Find the dataset on the Datasets.html page and click on the name. From the dexter query form click on the link to the Detailed metadata page for the us1dumsums_casrh04 dataset. You'll want to limit your query to selections (rows) that are county level summaries, that are summaries for the hispanic population, and that have at least 200,000 hispanic population, as estimated in 2004. This means you'll need to know the SumLev code indicating a county summary and you'll need to know how to recognize a summary of hispanic persons. Look at the values of the SumLev and Hispanic key variables. Then code your filter as:
But what about filtering on SumType, you might ask? Shouldn't we specify that the value of sumtype has to be "h", to indicate a summary based on the values of hispanic (non-hispanic vs. hispanic)? You could, but it turns out not to be necessary. That is because you are already filtering using theSumLev Equals 050.
AND
hispanic Equals 2hispanic EQUALS 2filter. If this condition is true then sumtype must be equal to h. Because the way it works the value of "irrelevant" category variables will always be blank on this dataset. There are 4 such variables - age, sex, race and hispanic - and of these 4 only one will have a non-blank value on any observation. This is what we mean when we say these are "1-dimensional" summaries.There is a mistake in the Usage Notes section. If anyone can find it and report it (using the feedback button at the very bottom of the page) then we'll fix it.
Pick any variables you like from Section III. County and popests_2004 would certainly seem appropriate. To get the answer to the question just view the output and note that your report has 38 rows corresponding to the 38 U.S. counties that had at least 200,000 hispanics in 2004.
- An interesting choice/conundrum. Search for the report containing the answer or use your newly-acquired facility with using Dexter to grab the required number? The answer is the same either way, but we'll outline the process for accessing it both ways.
Let's start with accessing the report first. The hint told you there is a link to our estimates reports from the MCDC Population Estimates page, and we provided a link to that near the top of this page. So go to that page and look for the link to the Reports directory (its in the 2nd paragraph, and bolded, at least for now). Clicking on this link causes the index.shtml page for the Reports directory to display, with descriptions of several reports which serve as links to those reports as well as links that will take you to a Dexter query form for the report datasets. Find the entry with the descriptive hyperlink text: Latest estimates for U.S. totals and metro/micropolitan areas (CBSAs) with components of change since 2000 (pdf). Its a pdf file so you'll need to have the Adobe reader (or better) on your machine to access it. Click on the link to display the report. Use the Adobe find command to quickly locate the string "Chicago". Within seconds you should be able to determine that the correct answer is 8,585,411. Read the value in the Areaname column carefully, since there are 4 consecutive lines that have a summary related to Chicago. This is because it is a large metropolitan area that covers parts of 3 states. The report shows data for the portions within each of those 3 states along with the metro area tri-state total. You want the line that indicates "IL part" - the Illinois part.
The hardest part of finding the answer via Dexter is determining the dataset. Of course, if you followed the path to the report library described in the previous paragraph you could use the Reports index page to tell you that the relevant dataset is uscomcbsas05. Nothing that you could not have determined by a careful study of the Datasets.html page (look for United States in the Geographic Universe column and then for something other than the usual state, county or place in the Units column; you'll see that the value for the uscomcbsas05 dataset is
US cbsa cbsa/state, indicating the dataset has a national summary (US) as well as summaries for complete cbsa's and for cbsa's within state.) The latter is what we want. Click on the uscombscbsas05 name in the first column of the Datasets.html page to display a Dexter query form and go to the Detailed metadata page from there, as usual.Follow the link to display the values of the key variable cbsa, and from that determine that the code for the Chicago MSA is
16980. No need to filter using SumLev or State. Just go ahead and display any line that is relevant to Chicago and you can pick the one you need from that. So your filter becomes simply:. From Section III go ahead and select all the identifiers and then just the popest variable from the Numerics list (noting how much other detail is available, if needed on another day).cbsa Equals 16980Just to make it interesting, choose HTML format in Section I and then in Section V-b do 3 things to enhance (or at least modify) the format of the report:
- Check the box that says "Use variable labels as column headers in reports"
- In the "ID variables for report" text box type in a value of Areaname
- In the "Style to use for html or pdf ouptut:" pull-down select "Brick" as the alternate style ( a combination of font and color selections, etc.)
You should see the same population estimate numbers you saw on the pre-defined report. (The answer to the question, as mentioned above, is still 8,585,411). But you might find some of the other ID variables to be of some interest. Or not.
Since you have now used some of the tools in the Advanced Options section, you are now an Advanced User. As a reward, on your next lunch hour stay inside and follow the link to the Advanced Options section of the Dexter online help (by clicking on the Section V header/hyperlink) and reading it through. Some of it is really not all that advanced. And some of it you just may want to skip.
- You need to find your way back to the popests directory via Uexplore. (Note there is a link to this at the top of the Datasets.html page for that directory). From there click on the nchsbri subdirectory. Then click on the Readme.shtml file and at least skim it. That is your best bet at trying to get a quick handle on what these datasets are about. You don't really need to understand all the finer points of bridged race processing in order to do the extract and get the right answer, of course.
After getting what you can from the Readme page, return to the directory and click on the Datasets.html page, as usual. Look for the dataset that has post-2000 estimates for the state of California and is a summarized version. That would be canchsbridged20xx_sumry. Select it to get to the Dexter query form for that dataset and from there, of course, follow the link to the detailed metadata page. Read the Description and Usage Notes carefully. At this point you should come to realize that what we have here is a fairly simple, but very long dataset. "Long" in the sense that there are a lot of rows, because we have one for each county for each year for each single year of age starting with 0 (representing infants under 1 year of age) through the final 85 and over category. There is also a total-across-all-ages row. That makes 87 rows and these are repeated for each year (2000 thru 2004), for a total of 435 rows per geographic area (state or county). The beauty of Dexter is that we can easily ignore all the rows that we don't care about. We don't care about age here, just the total counts of Asian & Pacific Islanders. Our filter then needs to specify the geography we want (Alameda county), the years (2000 and 2004) and the fact that we want the summaries where age is not relevant. Usage note 4 on the metadata page tells you exactly how to handle this last criteria. So here is the filter:
In Section 3 be sure to keep County and Year as your identifiers and Totpop and AsianPI as your numerics. Extract to find that the answer is of 1,450,398 total persons estimated to be living in Alameda county in 2000, 330,415 were Asian or Pacific Islander. In July of 2004 the numbers were 1,455,235 total persons, of whom 371,430 were Asians or Pacific Islanders.County Equals 06001.
AND
year InList 2000:2004
AND
age Equals _Extra Credit question/exercise: Repeat this query except look only at the population under the age of 18. (There is a Usage Note that is very relevant to this.)
What (you may ask) is going on with the key values page for the variable Age? It shows values from 00 up to 85, each with an N of 295. That is 86 different values, each occurring 295 times which comes to 25,370 total. But the total rows in the dataset as shown on the metadata page is 25,665. That leaves 295 rows unaccounted for. What's going on? The answer is that the program that generates the key values frequency page has a quirk; it ignores blank (or, more precisely, "missing") values. Those 295 missing cases are the ones where the value of age is blank. Which were the exact ones we were interested in for this exercise.
- Some things we probably don't need to even mention any more. You start with the Datasets.html page for the /pub/data/popests data directory displayed in your browser. You need to find a dataset that has the United States as its universe and that includes County as a geographic unit. And it has to contains estimates for 2004 and 2005. That dataset is, I think we can all agree, uscom05. So choose it and go straight to the detailed metadata page for it.
This problem is different from most of the earlier ones in that the key here is not knowing how to access the values of key variables in order to create a filter (Section II emphasis), but is rather about understanding what the variables/columns are about. A couple of the Usage Notes entries on the metadata page help you out by attempting to explain the variable naming conventions used. Most normal human beings will not have a clue as to what the notes mean unless they have actually looked at a complete list of the variables on the dataset. Such a list can be easily accessed from the metadata page by following the link to the Qmeta report for the dataset. This link is on the line that starts with
Access the dataset, containing links to Dexter and to the qmeta application. (There are also links to qmeta from the Dexter query form, in Section III.) The variables that should catch your attention are the "1yr" variables, i.e. the ones that have 1yr as the last characters of their name. These variables, popch1yr and pctch1yr, contain just what you are looking for: the total change and percent change in population over the most recent 1-year period. The other thing that you should observe about these datasets are the ranking variables. These are mentioned in one of the Usage Notes and, of course, you can see them in the Qmeta reports, complete with labels that describe what they are ranks for. It turns out there is one these, rankch1y, that is perfect for what we want to do here.In Section I be sure to specify that you want to generate an HTML format report. In Section II the filter is really pretty simple, once you become aware of the rankch1y variable:
rankch1y Less than or equal to (<=) 10
AND
SumLev Equals 050The second part of the filter is used to select only the county level summaries, omitting the state summaries. The dataset label is a bit misleading since it does not mention state; but the Units field indicates state level data and, of course, the very presence of a SumLev variable on the keyvars list tells you that there are probably mutltiple geographic summary levels.
The variables you need to select in Section III are County, popest, popch1y pctch1y and rankch1y .
This is the first time you've had to use Section IV. While the options here may be "non-essential" they may be required if you are particular about the titling and footnoting of an output report. We really don't care what you use for your title or footnote, just be sure to enter something in both of these text boxes (you can even go crazy and enter a subtitle if you want) and note how that text magically appears at the top and bottom of your html report page. We also suggest entering rankch1y in the Sort text box so that the output is displayed in rank order. The answer is not a number, it's a report, but it should show Maricopa county, AZ as the fastest growing county with a net increase of 136,941 persons in 2004-2005, while Lee county, FL came in 10th with a net increase of 29,835 persons.
For the Extra Credit options you need to do the following:
- In Section I be sure to check the None radio box for the delimited file line and then check the Check here if requesting only a report in plain text or html format that you want piped directly to your browser. box at the bottom of the section.
- In Section V-b check the box Use variable labels as column headers in reports box.
- Enter rankch1y as the value for the Id variables for report box.
- Click on the Style to use.. pulldown menu and select astronomy .
Note how much more interesting this report looks (stylewise) than all the others you have generated so far.
- The dataset to use is denchsbridged20xx. You do not want the _sumry version because you need to access data that have race crossed with sex, rather than summarized separately. On these datasets every row contains the same basic kind of summary. They are all county-level in terms of the geography and there are always the 4 demographic dimensions or categories - age, sex, race and hispanic. These category variables are never blank or irrelevant, they always have a non-missing significant value. The race categories here are the so-called "bridged" race values that are the reason these files exist. No race alone vs. race alone or in combination to deal with as we did when we accessed the casrh estimates in an earlier question. (That is the kind of complexity and inconsistency with previously-released data that NCHS wanted to avoid when they commissioned the Bureau to create these alternative estimates.) What we want to do is boil this set of 4,128 data cells down into an output dataset that has a summary for each county (there are only 3 counties in Delaware, that's why we chose it), each race (4) within that county, and then each sex (2) within that race within that county. So we should have 3 x 4 x 2 = 24 rows of output. What about the age and hispanic dimensions? Age becomes not a category within the report like race, but instead will be used to limit the universe of the report -- all the numbers being reported are understood to be only of persons who are at least 65 years of age. This will be accomplished using a filter to limit the singel-year age categories and then by doing an aggregation that ignores age (and hispanic) in order to create the desired summaries.
Note that there is no Year category variable on this dataset as we have seen on many of the others. The time dimension in this dataset is handled across the rows; each of the numeric variables represents an estimate of persons for a different point in time (year).
Note also that all the ID variables on this dataset are short codes. The meaning of these code values are given in Usage Note 3 on the detailed metadata page.
Pick a report format from Section I. HTML probably makes the most sense.
Code the age filter in Section II:
age Greater than or equal to (>=) 65In Section III select variables county, race, sex (Identifiers) and PopJL00 and PopJl04 as Numerics.
Section V is the crucial part for this example. The aggregation specifications subsection (a), is the most critical. For aggby variables we need to specify the 3 categories we want left after aggregation; enter county race sex as the value for this list. In subsection b enter County in the By variables for report text box. For ID varibles you might want to enter Race. In subsection c, in the Format text box enter the value County $county. (do not forget to type that period at the end of the format). This causes the variable county to print as the name of the county rather than the FIPS code. (Follow the Online doc link, the section header, to get more detailed background info regarding all these options.)
The answer is there were an estimated 16,692 white females living in Sussex county, DE on 7-1-04.
The Agg Level option has a default value of 1. This says to generate only a single level of summarization. The aggby list is county race sex so the value of 1 says generate output summaries when the 1st variable from the right changes value (it is also considered to change if any of the variables to its left change). If you specify a value of 2 for this (Agg Level) option then the program will also generate a summary row when the value of the second variable from the right changes. Try it with this example. You should see output for white males in Dent county on the first line, while females on the second line and then total whites on the third line (a county-race summary as opposed to the previous two county-race-sex summaries). Specifying a value of 3 for agg level would result in an additional line being generated after all the county-race-sex and county-race summaries that shows the totals for the entire county. Entering a value of 4 for agg level would be a mistake.
Please use the (Questions/Comments) line at the bottom of the page to send us feedback regarding this page. Thank you!
This file last modified Friday June 02, 2006, 13:06:08
|