Using the Cantabular API to Query Structured Data and Custom Metadata

In a previous blog we wrote about flexible publication of metadata using our Metadata GraphQL service. In this blog, we discuss our extended Application Programming Interface (API) which seamlessly brings together metadata from the metadata service and data from Cantabular in a single flexible endpoint to drive User Interfaces (UIs) and visualizations.

Brief introduction to GraphQL

GraphQL is a query language for APIs that gives clients more control over the data they request. Instead of relying on fixed endpoints that return a predefined structure, GraphQL lets clients specify exactly which fields they want, in a single request. This approach avoids the common problems of under-fetching and over-fetching, and simplifies how frontends interact with backend data sources.

Unlike traditional REST APIs, which often require multiple round-trips to combine related information, GraphQL queries can follow relationships between objects in a flexible and predictable way. Clients get a clear, typed view of the data that is available, and the server enforces structure and validation through a schema.

Cantabular extended API

The Cantabular API uses GraphQL because it offers a flexible and precise way for clients to request exactly the data they need. That flexibility is essential for our use case, where it's not just the data that matters but also the metadata that describes it.

In our API, metadata is a first-class part of the query model. With the metadata service, the schema is fully customisable by the data owner, supporting multiple languages throughout, and can be queried right alongside the data it describes.

Since Cantabular serves structured datasets along with rich metadata, GraphQL was a natural fit. It supports dynamic querying of both the data and structural metadata from Cantabular server, and the reference metadata (such as labels, definitions and classifications) from the metadata service all in a single request.

Building a simple client app using the Cantabular extended API

Let's walk through the GraphQL queries that could drive a simple data discovery and table viewing frontend UI for the 1911 historic Census data.

We aren’t going to get into any app code here, just a quick walk-through to demonstrate some examples of how the API can provide the data to drive selections.

Our imaginary app is going to present a list of datasets to pick from. When a dataset is chosen it will allow variables (and their metadata) to be selected and filtered, and a table of data to be viewed.

Listing available datasets

We start by getting a list of available datasets. We’re going to make requests for content in the English language, but as the dataset has Irish translations, we could just as easily specify lang: "ga" instead of lang: "en" in our requests.

{
  datasets(lang: "en") {
    name
    label
  }
}

The API responds with exactly the data we requested, a list of datasets with their names and labels. In this case there is only one dataset available:

{
  "data": {
    "datasets": [
      {
        "label": "1911 Irish census records",
        "name": "Ireland-1911-preview"
      }
    ]
  }
}

We can use the list of datasets to populate a dataset picker in the UI. The dataset label will be displayed to the user, while the name can be used as the dataset key in subsequent API calls to fetch more detailed information.

If we want a little more metadata about the dataset, when the user clicks some “info” link for example, we can get a broader description for the dataset as follows:

{
  dataset(name: "Ireland-1911-preview", lang: "en") {
    description
  }
}

The description is returned from the API ready for us to populate a modal dialog box or hovered tool tip:

{
  "data": {
    "dataset": {
      "description": "This is an interactive dataset covering people in households and
       communal establishments in the 1911 Census in Ireland and based on digitised
       census returns published on the National Archives of Ireland website.
       The dataset is an advance preview and should only be used for demonstration
       rather than statistical purposes."
    }
  }
}

Discovering available variables

The user has picked the dataset, we want to show them a list of variables to choose from:

{
  dataset(name: "Ireland-1911-preview", lang: "en") {
    variables(first: 3) {
      edges {
        node {
          name
          label
        }
      }
    }
  }
}

Lists of variables and categories returned by the API support pagination. In this example we fetch just the first three variables, additional GraphQL properties are available to pick up at the correct node, and determine if there are more results to come.

{
  "data": {
    "dataset": {
      "variables": {
        "edges": [
          {
            "node": {
              "label": "Age (5 year bands)",
              "name": "age_22cats"
            }
          },
          {
            "node": {
              "label": "Age",
              "name": "age_78cats"
            }
          },
          {
            "node": {
              "label": "Birthplace",
              "name": "birthplace"
            }
          }
        ]
      }
    }
  }
}

As with the datasets, variables also have descriptions. In the case of variables, there is an additional piece of custom metadata which is a URL linking to information about the terminology. We can access the custom metadata properties through the meta field:

{
  dataset(name: "Ireland-1911-preview", lang: "en") {
    variables(names: ["birthplace"]) {
      edges {
        node {
          meta {
            url
          }
        }
      }
    }
  }
}

The response gives us the URL for the terminology about birthplace:

{
  "data": {
    "dataset": {
      "variables": {
        "edges": [
          {
            "node": {
              "meta": {
                "url": "https://cantabular.com/ireland-census/1911-methodology/#birthplace"
              }
            }
          }
        ]
      }
    }
  }
}

In this API request we focused on one of the variables, but remember we can add these fields wherever appropriate so could return the custom metadata URL as part of the previous paginated variable list response.

Creating a data query to get figures

Let’s imagine the user wants data for a simple breakdown containing marital status:

{
  dataset(name: "Ireland-1911-preview", lang: "en") {
    table(variables: ["marital_status"]) {
      dimensions {
        variable {
          label
        }
      }
      values
    }
  }
}

The API responds with the data values:

{
  "data": {
    "dataset": {
      "table": {
        "dimensions": [
          {
            "variable": {
              "label": "Marital status"
            }
          }
        ],
        "values": [
          246354,
          2655371,
          1180608,
          205983,
          93196,
          1449
        ]
      }
    }
  }
}

The values are in row major order, so to make sense of them (which categories they relate to), we need to know the categories; we can request these in the query:

{
  dataset(name: "Ireland-1911-preview", lang: "en") {
    table(variables: ["marital_status"]) {
      dimensions {
        categories {
          code
          label
        }
        variable {
          name
        }
      }
      values
    }
  }
}

The response now includes enough information to interpret the data values:

{
  "data": {
    "dataset": {
      "table": {
        "dimensions": [
          {
            "categories": [
              {
                "code": "-1",
                "label": "Not specified"
              },
              {
                "code": "0",
                "label": "Single"
              },
              {
                "code": "1",
                "label": "Married"
              },
              {
                "code": "2",
                "label": "Widow"
              },
              {
                "code": "3",
                "label": "Widower"
              },
              {
                "code": "4",
                "label": "Other/not classified"
              }
            ],
            "variable": {
              "name": "marital_status"
            }
          }
        ],
        "values": [
          246354,
          2655371,
          1180608,
          205983,
          93196,
          1449
        ]
      }
    }
  }
}

The user decides to filter the categories to include only those with a marital status of single, married, widow or widower. To fetch data for specific categories, we use the filters property and specify the category codes. In this case we already know the category codes from the example response above, but lists of categories and their codes are also available when listing dataset variables. We can run the query again, this time with the category filters:

{
  dataset(name: "Ireland-1911-preview", lang: "en") {
    table(
      variables: ["marital_status"],
      filters: [{
        variable: "marital_status",
        codes: ["0", "1", "2", "3"]
      }]) {
      dimensions {
        categories {
          code
          label
        }
        variable {
          name
        }
      }
      values
    }
  }
}

The response now includes just the data values for the requested categories.

{
  "data": {
    "dataset": {
      "table": {
        "dimensions": [
          {
            "categories": [
              {
                "code": "0",
                "label": "Single"
              },
              {
                "code": "1",
                "label": "Married"
              },
              {
                "code": "2",
                "label": "Widow"
              },
              {
                "code": "3",
                "label": "Widower"
              }
            ],
            "variable": {
              "name": "marital_status"
            }
          }
        ],
        "values": [
          2655371,
          1180608,
          205983,
          93196
        ]
      }
    }
  }
}

Summary

Rich UIs and visualizations can use the Cantabular Extended API as a comprehensive data source for both data and metadata.

The API brings efficiency and flexibility to development projects, offering discovery of datasets, their structural and reference metadata and custom metadata properties with multi-lingual support.

You can read more about the extended API on our documentation page.