Blog

Cover image for Webinar recording: Exploring census data with generative AI and Cantabular

Webinar recording: Exploring census data with generative AI and Cantabular

On 18th January, we hosted a webinar to practically consider how large language models (LLMs) could be used to surface official statistics. We showed working examples that were the result of experiments that our team carried during an internal hack day.

Helping users find data faster with taxonomies

We’ve added taxonomies to Cantabular, enabling search filtering functionality for ready-made tables and helping users find the data they need faster.

Creating an Irish translation of the 1911 Ireland census report

In 1913, the release of the 1911 Irish census report was exclusively in English. Today, we are pleased to release our interpretation of how this report might have appeared if it had been originally presented in the Irish language.

Automating publication of the 2021 Northern Ireland Census

The Northern Ireland Statistics and Research Agency (NISRA) has launched a new product, a “Flexible Table Builder”, allowing users to create their own outputs from the 2021 Northern Ireland Census, powered by our software Cantabular.

Webinar: Publishing census data twice as fast with the ONS

On 27th April, we hosted a webinar to present our recent work with the ONS helping them to automate the publication and aspects of the privacy protection for the 2021 England and Wales census.

Publishing census data twice as fast with the ONS

Earlier this week, ONS released their Create a custom dataset product for census 2021, powered by our software Cantabular, helping the ONS release billions of statistics far more quickly than ever before.

Internationalisation and right-to-left language support in Cantabular

A couple of years ago, we added the capability to localise data and user interfaces in Cantabular for different languages.

While our core data service handles data only in a single language, our metadata service supplements this by allowing labels and other reference metadata to be loaded in multiple languages so that all metadata for populations, variables and categories can be translated.

Creating a data pipeline for machine-readable metadata

Over the last few years we have worked with the ONS to help automate the production of outputs for the 2021 census. As part of that work, we have created a data pipeline so that when cross-tabulations are built on demand, metadata that relate to them can also be put together on demand.

Real-time tabulation and perturbation of census results

The Office for National Statistics (ONS) is using Cantabular to generate outputs for the England and Wales Census 2021. What are the key reasons for Cantabular’s existence and how have they been achieved? In this blog post we outline some of the approaches we’ve taken to achieve performance goals in Cantabular and how these have enabled new approaches for Census 2021.

Transforming the England & Wales Census with Cantabular

This autumn the UK Office for National Statistics (ONS) has turned on Cantabular to power the production of outputs from the 2021 England and Wales census. This marks an important milestone in a 5 year journey that started with a simple question from the ONS: “Could we build a system to generate tables of census data on demand with privacy protections applied in real-time?”.

JSM 2022: Presenting our work on the 2021 England and Wales census

We are presenting a poster case study of our work with the Office for National Statistics on the 2021 England and Wales census for the Joint Statistical Meeting Conference 2022 organised by The American Statistical Association.

Flexible publication of metadata with GraphQL

Last year we added a new capability to Cantabular to allow dissemination of structured metadata alongside real-time creation of safe, cross-tabulations from large datasets that the rest of Cantabular focuses on.

Demonstrating Cell Key Perturbation in Python

Over the past few months, the Sensible Code team has put together an example implementation of the cell key method, a perturbation method that adds noise to frequency tables in order to protect against differencing attacks. It is one of the disclosure control algorithms in our product Cantabular. This example has been written using Python in a Jupyter notebook.

Republishing the historic 1911 Irish census as an interactive dataset

Today we are releasing a new public website that makes the returns from the 1911 Irish census available as a preliminary statistical release to be queried by anyone.

All kinds of cross-tabulations and analysis of this data that were previously impossible are now easily accessible as open data through our user interface and API.

Guest post: Cantabular: a new use-case for OpenStreetMap

A guest post by Ciáran Staunton. OpenStreetMap is not merely a map of roads, buildings, land uses, coastal areas and mountain tops. OpenStreetMap contains many hidden treasures, which take the form of wonderful and obscure boundaries. These have been added by Ireland’s OpenStreetMap contributors, in the hope that such things will provide some context to places, but also in the hope that one day they could be directly re-used by those that may wish to download them.

Realising the historical value of the Irish Censuses of Population

Since 1841 the periodic Census of Population has been the cornerstone of official statistics in Ireland, particularly in the context of providing comprehensive information on demographic and social conditions. While some efforts were made to undertake censuses prior to 1841 they are generally considered to have been of poor quality for various reasons.

Unlocking the value in historic records with Cantabular

At the end of April we at Sensible Code are going to run an event to demonstrate our privacy-preserving software Cantabular using the 1911 Ireland census—the last all Ireland census—to show how modern technologies have the potential to unlock the value hidden in historic records, for the benefit of academics, researchers, statistical organisations and wider society.

Automating disclosure checks with our Disclosure Rules Language

Last week we released a new version of Cantabular with a big new feature: a disclosure rules language.

The disclosure rules language, or DRL, is a tool to help statisticians automate decisions about table publication which they might previously have made using manual analysis techniques.

ONS hackday Census data unplugged

We’d planned a hack day at the Office for National Statistics for our work on Census 2021 towards the end of May. But the arrival of Covid-19 meant we had to pivot to making the event virtual.

Go Cantabular!

We decided to use the Go programming language (also known as Golang) to build Cantabular and have found it a great fit. It is compact, readable, secure and fast. A perfect combination for a strategic product handling large amounts of sensitive data.

Cantabular product launch

Our team here at Sensible Code have been busy for a few years working on an innovative privacy preserving technology called Cantabular. Cantabular uses highly performant implementations of disclosure control algorithms to protect data in real time as a user or researcher makes a query.

Press release: ONS UK selects Cantabular for Census 2021

The UK-based Office for National Statistics has selected Belfast company Sensible Code and its privacy preserving technology Cantabular for disseminating confidential Census 2021 data.

Modernising statistics — keeping data safe

Statistics professionals within public sector agencies take great care in how they process and protect personal data and this is reflected in the trust and respect they enjoy from their customers and the public at large. GDPR has thrown a further spotlight on governance around data confidentiality.