Block-based Inclusion Algorithm

The Missouri Census Data Center has multiple applications that require us to determine if a geographic entity (such as a census tract, block group or block) falls "within" a circular area. Our series of CAPS applications are the most notable programs needing to make such determinations.

Our usual way of handling this is to use the internal point coordinates (centroid) of the entity; we calculate the distance between this point and the center of the circular area, and if this distance is less than or equal to the radius value, then we include the entity. This is an all-or-nothing approach — the entity is either entirely included or entirely excluded.

For example, if we are doing a 5-mile radius of a point in downtown St. Louis, then we look at all tracts in the area and identify all those to be included in our calculations. A tract that straddles the boundary of our 5-mile circle might contain a population of 5,000 people, some of whom live within the circle and some outside. We make a guess that they should all be included or excluded based on the location of the centroid assigned by the Census Bureau. We count on the assumption that there will be multiple tracts that straddle the area and that there will be a balancing of those being included and excluded. It is not a perfect algorithm, but it works fairly well, especially if when using block-level entities (as with our CAPS 2010 and Geocorr applications).

However, in the CAPS ACS application, we cannot use block-level entities, because there is no ACS data at the block level. There are data at the block group level, but not as much as there is at the tract level. So, we use census tracts as the entities to be aggregated when the smallest circle needed is larger than three miles radius (or when the user specifies on the form to use tracts for smaller circles). This can lead to pretty serious problems in trying to approximate, say, a 4-mile circle using census tracts. It is especially problematic in rural areas, where the tracts can be very large. To address this problem, the block-based inclusion algorithm (BBIA) is our approach to selecting and processing geographic entities for approximating circular areas.

The BBIA Algorithm

The concept is simple enough. We have a "ground zero" location (latitude-longitude coordinates) and we have a data set with, say, census-tract-level data that include internal point coordinates for the tract. Traditionally, we would look at tracts in the area and determine which ones we would select for aggregation (to the n-mile circular area) based on the distance between the tract's internal point and the ground zero point (circle center). If we had data at the block level, we could do this much better, since a typical tract comprises 10-30 blocks, which are much smaller spatially.

Although there are no ACS data at the block level we do have 2010 decennial census data at the block level, including internal point coordinates and the 2010 population count for each block. We also have land and total area in square miles for each block. We use these block-level data to implement the BBIA as follows:

  1. For each tract (or block group), we determine whether the area might fall in the circle using a bounding-box algorithm. This is not part of the BBIA algorithm per se, but it can significantly reduce the number of areas to be processed.
  2. We look at each census block within the area and determine if the block's internal point is within our circle. If it is, we accumulate that block's 2010 population count, as well as the land and total areas in square miles. We also know the 2010 population count for the entire census tract / block group. After processing all the blocks in the area, we now have an accumulated count of block populations where the blocks are "inside" the circle. We use this "inside population" figure divided by the area's total population to define an apportioning factor for the area. We now have a pretty good approximation of what portion of the area's population is within the circle. Note that if the area is entirely within the circle, then all of the block points will be, too; we would sum all of the block populations, which would equal the tract total, and the apportioning factor would be 1.0. This happens a lot, especially with larger circles.
  3. The apportioning factor is stored for use in the aggregation step. We take the ACS data at the tract (or BG) level and aggregate it with allocation factors. This is a familiar algorithm that we have been using for decades. We aggregate the two spatial area variables separately without any apportioning, because we do not want to apportion spatial area based on population portions as this is counter-productive. A larger population does not typically go with a larger spatial area. Many spatially large blocks have little (or even zero) population, and many spatially small blocks contain large populations (think Manhattan, NY).

Some interesting things that you will note regarding the BBIA algorithm as used with the CAPS ACS web app: