5 minute read
Transforming the England & Wales Census with Cantabular
This autumn the UK Office for National Statistics (ONS) has turned on Cantabular to power the production of outputs (such as this one) from the 2021 England and Wales census. This marks an important milestone in a 5 year journey that started with a simple question from the ONS: “Could we build a system to generate tables of census data on demand with privacy protections applied in real-time?”.
Not only did our team deliver on this technical challenge, it proved to be the foundation on which we could deliver the additional value that makes Cantabular what it is today: a suite of software modules that delivers high performance data publication with real-time privacy protection, integrated data and metadata, and all surfaced using a powerful GraphQL API & easy-to-use user interface.
From an idea to a thing
Turning an idea into a real thing is not easy. Whether that’s because a project loses momentum or the budget is blown before delivering value or because of lack of skills or leadership. So many things can go wrong.
So how did we avoid these numerous pitfalls? When the ONS came to us, they didn’t ask our team to build everything that Cantabular can do today; rather they had the wisdom to get us to focus on a small but difficult challenge - ‘‘automate privacy protections at speed’.
When we proved in just three months that we could deliver on this core technical challenge this had positive consequences. We developed confidence in our technical approach. The ONS developed trust in our capacity to deliver working software. We learned collectively that we could use an agile delivery process. Confidence, trust and agile working would mark our next 5 years of collaborative engagement.
Eating an elephant
Desmond Tutu once said “there is only one way to eat an elephant: a bite at a time.”
This approach has guided our work with the ONS. We ruthlessly focussed on building only essential components, seeking to deliver the simplest thing that could possibly work. This helped us ship software quickly and allowed the ONS to give us valuable insight for future development.
Living in an Agile world
Throughout our work with the ONS we have followed an Agile software development approach. This included frequent releases, deploying working software, listening to feedback and iterating quickly.
Show and tells
We worked really hard to communicate our progress using show and tells where a wide ONS audience could see and understand what was being worked on, providing them with a valuable forum to give feedback. One statistician, having seen a demo of Cantabular, observed that it was like “moving from black and white to colour TV”.
We also worked with the ONS to set up joint show and tells with teams from across different parts of the programme, all showing how their work fitted together into a larger end to end solution. Sharing the same space enabled us collectively to be aware of joint progress and the challenges we were facing.
Retrospectives
While delivering software using agile was second nature to our team, it was through retrospectives that we confirmed we were going in the right direction. Focusing on what went well and what didn’t and how to improve helped us all get better at delivery and built a collaborative team approach.
Being friendly
While working with the ONS remotely during the COVID-19 pandemic, we found one way to encourage people to engage was to share a short weather report before starting our meetings. This kept things light, encouraged people to talk in what were sometimes challenging technical discussions. We had regular reports from Beirut, Belfast, Basingstoke, Newport and Titchfield which demonstrated the diverse geographic range of the contributors, their different working environments and something that we could all share in.
What does the future look like for Cantabular?
As we plan further developments for Cantabular, we are firmly focused on some specific improvements we want to make:
Handling Origin-Destination or flow data: as part of our ongoing engagement with the ONS we are extending the capabilities of Cantabular to handle Origin Destination data, such as migration or travel to work. This typically involves creating very large and sparse output tables which cannot be held entirely in memory, meaning a new algorithmic approach is necessary.
Pivoting tables: Cantabular presents tables of data with one observation per row, which is the best format for data analysis and for accessibility. However, it is not the traditional grid of numbers that experienced census users are used to seeing. We’re adding a drag and drop capability to our user interface to allow users to pivot their data however they want, see the effect immediately and download pivoted data files.
Machine-readable metadata: the UK government recommends that publishers support open standards for supplementing tabular data outputs with structured metadata. We’re adding CSVW as an optional output format to our UI so that outputs can be made available with machine readable metadata.