Over the past 9 months we’ve added substantial capability to TableBuilder. The end game is to allow the Office for National Statistics in England and Wales to publish more 2021 Census data tables, much sooner, while ensuring that data is kept confidential.
The Office for National Statistics is a trusted organisation: it commits to ensuring that the personal information people provide will be kept safe and secure. It also has to comply with confidentiality requirements set out in the Statistics and Registration Service Act 2007, the Data Protection Act 1998 and the forthcoming General Data Protection Regulation.
Publishing more tables for diverse areas
Access to finer grained data across the England and Wales will mean that decisions made which are based on census data can be better. Disclosure control rules are complex and historically they’ve been applied manually. This meant they were mostly applied uniformly across different geographical areas because there are too many output areas to consider: about 180,000 in England and Wales. This gave rise to a problem: tables that could be created for diverse areas (like Barnet in London) were not published because the similar table would be too disclosive in, for example, Northumberland.
Using TableBuilder the ONS can create a set of rules which are applied uniformly across the country. However the effect of the rules is not uniform because they depend on data which varies across the country. The rules are evaluated independently for every area in a user’s query (up to 180,000 areas). Data is only published for those areas where all the rules pass.
With TableBuilder we can now :-
- Process more complex input data
- Set disclosure rules to determine which tables can be published
- Automatically preserve ‘structural zeros’
- Demonstrate the user interface
- Generate tables orders of magnitude faster than a conventional database
Process more complex input data
We’ve transformed TableBuilder so that it can handle the amount of data generated by the Census (over 56 million people living in more than 23 million households according to the last Census ). We’ve loaded a full geographic hierarchy for England and Wales (180,000 output areas), added mappings (a.k.a. grouping or banding) and loaded ONS supplied perturbation parameters.
Set Disclosure Rules
The ONS has developed a set of disclosure rules which determine whether a table can be published. These rules are configurable with numeric threshold parameters that determine their sensitivity.
We created a disclosure rules editor to allow the ONS team to experiment with adjusting the rule parameters. Using this tool they can immediately see the impact on the number of different tables that could be published across the country.
How it works
Preserve Structural zeros
Some combinations of categories should never coexist, for example: people under 16 who are in full time employment. Zeros in the output table with these combinations are deemed “structural”.
In order to avoid exposing the workings of the perturbation algorithm we need to make sure that “structural” zeros in the output table are not perturbed. TableBuilder does this by first running the user’s query at a higher level geography to determine which zeros must not be changed.
Demonstrate the user interface
We’ve built a test interface to allow the ONS to show its users how the system works and to facilitate user research.
Why performance matters
TableBuilder has been engineered from the ground up to be performant: we can deliver results to queries in sub-second response times. In this time-frame we scan 56,000,000 rows, compute results at two different geographic levels, execute the cell key perturbation and evaluate multiple business rules on up to 180,000 areas.
What happens next?
The dissemination team at the ONS will show the system to some of their users and conduct more user research and testing.
Suzie Dunsmith (Head of 2021 Census Outputs at ONS):
“Users have always wanted the data as soon as possible after the Census. We expect this desire to be strongly reflected again in our forthcoming public consultation on outputs launching on 28 February this year. This work paves the way for us to be able to meet this demand”.
This blog tells you how we got here! Expose the value — Protect the data.