A quick look at what differential privacy does for census data
A key detail when it comes to how the American Community Survey, the source of our data, conducts their census is that they operate based on "estimates." This means that the data you get from Censtats isn't exact numbers, but rather rough calculations from a five year survey. As the ACS themselves put it here:
“
Estimates also function as a way of giving an accurate range without risking personal information possibly being revealed, especially in areas of low population. More to the point of this blog post's topic, there is one other way that the Census works in privacy measures to their statistics: differential privacy.
Differential privacy in general is about injecting “noise” into data to both make it still as accurate as can be while further protecting the identities of those who are being represented by the data. In the context of the U.S. Census, areas with smaller populations, like rural census blocks or unrepresented races and ethnicities, may have their data inflated by this noise. While many categories of data utilize differential privacy, it is not applied to total population, the total number of housing units, and the types of group quarters (i.e. living arrangements).
This has raised a number of issues and concerns regarding data accuracy, such as how it relates to longitudinal studies and general apportionment based on population for races and ethnicities, but this is all in the service to overall privacy protection. So what this means is that certain oddities will inevitably arise. As summarized by this article by NPR on the matter:
“
So it’s a balancing act, really: present the data as accurate as possible and risk undermining the privacy that people are afforded by cooperating with the Census, or preserve that privacy by making the data slightly less accurate. That's where the difference between the Decennial Census and the American Community Survey comes into play: the former is about counting the exact population for congressional apportionment while the latter is about painting a picture of changing trends and demographics over a period of time, be it 1 year or 5 years. This is why Censtats chooses to feature the ACS' 5-year estimates: what they lack in recency they make up for in accuracy, and differential privacy is less likely to have a deciding influence on its estimates compared to the 1-year estimates or the Decennial Census.
For more info, you can check out the Census’ info on differential privacy here.
Header image sourced with permission from cero ploy.
This September marks the third annual campaign to raise disaster readiness and response education
A look at the Census Bureau's newest developments on the next decennial census