Using ObservableHQ notebooks for gathering and transforming data in digital research

We’ve recently been experimenting with the use of ObservableHQ notebooks for gathering and transforming data in the context of digital research. This post walks through a few recent examples of notebooks from recent Public Data Lab projects.

In one project we wanted to use the CrowdTangle “Links” API to fetch data about how certain web pages were shared online and across different platforms. After gaining access to relevant end points, we could adopt different means to call the APIs and retrieve data: such as using something like Postman (a general-purpose interface to call endpoints), or writing custom scripts (for example in Python or Javascript).

Code notebooks are a third option that lies somewhere in between these options. Designed for programmers, notebooks allow for iterative manipulation and experimentation with code whilst keeping track of creative processes by commenting on the thinking behind each step.

Notebooks allow us to both write and run custom scripts as well as creating simple interfaces for those who may not code. Thus we can use them to help researchers, students and external collaborators to collect data, making it easier to call APIs, setting parameters, or perform manipulations.

ObservableHQ is one solution for writing programming notebooks, it runs in the browser and is oriented towards data and visualisations (“We believe thinking with data is an essential skill for the future”). Hence, we thought it could be a good starting point for what we wanted to do.

Screen capture of a notebook

The first notebook that we produced allows researchers to call CrowdTangle APIs in a simplified way: it exposes calls parameters and, contextually, it provides explanations and warnings about how to set them. For instance, it transforms the selection of platforms into checkboxes or the interval between calls into a slider (with a warning about rate-limits). The insertion of dates and other parameters can also be facilitated.

Examples of input fields

Data can be browsed in tabular form or downloaded as a CSV or JSON.

Examples of data

Notebooks can be used for a lot of diverse tasks. For instance, we produced a notebook that extracts hashtags from a list of posts and formats the data so to be used in Table2Net. One for extracting URLs and domain names from texts. Another one is dedicated to extending shortened URLs.

The last example was a case where it was necessary to implement a back-end service. Notebooks in ObservableHQ, indeed, run as front-end browser Javascript, therefore certain operations are tricky or impossible (this is one of their main limitations).

However, there are also many advantages. Notebooks are very flexible and easy to be transform and adjust: we can start gathering and exploring data and, after a couple of iterations, we can decide how best to structure it. We can add and remove parts of the interface almost instantly and we can embed functions (“cells”) from other notebooks, such as an emoji loading bar. The possibility to reuse or modify an entire notebook, or just a part of it, is very useful to build on the work done by other researchers and quickly bootstrap new tools as we need them.

Notebooks are particularly useful as part of exploratory research approaches where you are iteratively refining and adjusting research questions and seeing what is possible as you adjust various settings (e.g. the structure of the data, the parameters of the APIs).

An unusual loading bar, which can be imported into your notebook.

So far these projects have been used in the context of investigations with journalists on the Infodemic project, as well as in ongoing research and collaborations around DeSmog’s Climate Disinformation Database (including a prize-winning undergraduate thesis on this topic).

As per the working principles of the Public Data Lab, all of these notebooks are open-source (MIT license) and you are most welcome to use, transform and adjust them to your own work. If you use them for a project of piece of research, or if you’re also using code notebooks for we’d love to hear from you!

Here’s a full list of the notebooks mentioned in this post: