In business, it is critical that you understand your customer.
Accurate site selection and store planning and efficient marketing all depend on having up-to-date data. If that data is about the American people and their behavior, there is a good chance that it is rooted in the U.S. Census. Although the Census Bureau collects and releases data more frequently, the Census only happens once every 10 years and is the only time that everyone is counted. The most recent Census was scheduled for 2020. If you’re thinking it, you’re correct: it couldn’t have come at a worse time!
The Pandemic Disrupted the 2020 Census
Although the Census collects much of its data through mail and digital questionnaires, visiting households in person is critical to ensure an accurate count. In 2020, the COVID-19 pandemic caused countless problems for this data collection effort, ultimately delaying the release of most of the Census 2020 data products.
But COVID-19 wasn’t the only problem. When releasing data, the Census Bureau uses various methods to ensure individual’s privacy. Our data partner Synergos Technologies – publisher of STI: PopStats and other data sources that are partly built on Census data – discusses this process in a recent blog. And here’s what they had to say:
The Census Bureau had a major early issue with conducting the 2020 Census, the Covid-19 pandemic. The effect of this was a delay of only a few months. Industry demographers expected state-by-state releases between August and December 2021. But in fall 2021, the Census Bureau announced that this data would not be released until it could be revised for differential privacy noise injection. What’s that? This is the Bureau’s new mechanism for keeping personally identifiable information confidential. This further delaying an already late release.
What does this mean for the data?
As often happens with new processes, things didn’t go quite as planned. It’s a complicated issue, but essentially this attempt to make the data more confidential resulted in less accurate data. As another data partner of ours – Applied Geographic Solutions (AGS) – covered in excellent detail in a recent blog, this effort results in “statistically impossible data,” including:
- Entire blocks of unsupervised children in households (no adults)
- Ghost communes, where there are occupied dwellings with no people
- Baseball team size families, complete with a stocked bullpen
In short, this differential privacy effort is goofing up the Census data, rendering the data that data providers like Synergos and AGS rely upon much less reliable.
Does this mean the data your business is using is inaccurate?
The demographics and population numbers SiteSeer users count on every day are not suddenly less accurate due to the Census Bureau’s missteps and delays. Companies like STI and AGS use many other sources besides the Census and complex proprietary processes to ensure that the data they provide is accurate and reliable.
SiteSeer continues to get regular releases of this data and complete updates as it has in previous years. For our data providers, the decennial Census is an opportunity to measure their numbers against the Census’s ground truth and make adjustments as needed. Since it is has been more than 10 years since the last Census, data is likely to be at its least accurate point.
This doesn’t mean that the data is inaccurate. It just means that the data will be more accurate and consistent once the 2020 Census data is incorporated.
When will the data in SiteSeer reflect the new Census data?
To rebuild its STI: PopStats data based on new 2020 geographies, Synergos needs the Demographic and Housing Characteristics File that includes household, age, sex, race and other data. There’s no precise date when the Bureau will have this available, but some estimates say summer 2022, which would mean Synergos would update its data in October 2022 if at all possible. For the end user, that likely means they will first see this data in January 2023. Of course, if the data is available sooner, then we will make it available in SiteSeer sooner.
Are you hearing that other companies have 2020 Census data already? (Spoiler alert: not exactly!)
Some companies are claiming that they have Census 2020 data available to their users today. Although it is true that the Census 2020 boundary files are now available, the granular Blockgroup and Block-level Census data is not.
In reference to companies’ claims that they have Census 2020 data and have curated that data, AGS states, “in none of these cases have we seen any mention about the data quality and usability issues. Most users remain blissfully unaware of these issues, and it is incumbent upon responsible data suppliers to ensure that users understand the limitations of any database they provide.”
How is SiteSeer adjusting to these data quality issues?
At SiteSeer, we understand that data is the lifeblood of our system. We work closely with our many data partners to understand the issues and communicate them to our users. We also trust that our data partners are aware of any issues affecting their data and are tirelessly working to make the data as accurate as possible. We do not create or alter the data, but instead we implement processes to make the data easier to use and more powerful for our users doing retail analytics.
One of the changes SiteSeer has made for STI: PopStats users in the January 2022 update of the data is in where population is located. Since the lowest level that most data are provided is at the Blockgroup level, and Blockgroups can be quite larger at the edges of cities in rural areas, we use block allocation to decide where the population is within the Blockgroup. But when your blocks are also 10+ years old, you have a problem.
This is best illustrated with an example:
- Let’s say that the Blockgroup shown below had 40 people in 2010 and 1,000 people in 2022.
- If your trade area includes the entire Blockgroup, then your reports and models will report the correct 1,000 population.
- But let’s say your trade area only includes the areas shown inside the green polygon. Because we are using 2010 blocks to allocate population and there are no 2010 blocks (blue Xs) outside of the green area, then all of our 1,000 persons will be placed or “allocated” to the area inside our trade area, which we know is incorrect.
- Arguably, this is the biggest issue that will be fixed with the release of the new Census.
SiteSeer Has Fixed This Misallocation Issue
After learning that the Census data was further delayed, we decided it was critical that we attempt to fix this misallocation issue.
After much research and analysis, we determined that the location of population within about 3-4% of the Blockgroups in the U.S. had changed so dramatically that any improvement to the allocation of population in these Blockgroups was better than doing nothing.
Since 2020 block boundaries have been released – just not their population – we knew approximately where the population should be within the Blockgroup. In the example above, we know that the Blockgroup has 1,000 persons, and we know that the population is located in the areas indicated by the red circles, the 2020 blocks. We also have data such as streets and other data sources to make further estimates of where population is today. From there we were able to make reasonable improvements to our block allocation process to better allocate population.
Does this get us to the level of accuracy that the release of Census 2020 Block-level population will allow us to achieve?
No. Everything between the 2010 and 2020 Censuses is an estimate. This process simply improves our estimate. If the highest population accuracy is required, the only way to achieve that is by keeping Blockgroups whole. If that important to you, then using SiteSeer’s build-from-Blockgroup trade area tool is an option. For most users, splitting Census Blockgroups in high-growth areas will be necessary and these improvements should provide an increased level of accuracy until such time that the new Census data is released.
More questions? Sign up for a SiteSeer demo.
Hopefully we have provided some clarity to a complex issue and helped you feel more confident in your understanding of the data. It is best to understand the flaws in your data so you can use experience and judgment to overcome them.
As we have discussed, data is always an estimate. Whether that estimate comes from the Census applying privacy adjustments, your data provider blending various data sources to account for changes and growth, or your software provider improving processes for how population is allocated, the data you use represents the best estimate for a snapshot in time. Thus, it is best not to think of data as right or wrong but rather in degrees of accuracy. The release of the Census will not make the data “right,” but it will hopefully make it better.