Using ObservableHQ notebooks for gathering and transforming data in digital research

We’ve recently been experimenting with the use of ObservableHQ notebooks for gathering and transforming data in the context of digital research. This post walks through a few recent examples of notebooks from recent Public Data Lab projects.

In one project we wanted to use the CrowdTangle “Links” API to fetch data about how certain web pages were shared online and across different platforms. After gaining access to relevant end points, we could adopt different means to call the APIs and retrieve data: such as using something like Postman (a general-purpose interface to call endpoints), or writing custom scripts (for example in Python or Javascript).

Code notebooks are a third option that lies somewhere in between these options. Designed for programmers, notebooks allow for iterative manipulation and experimentation with code whilst keeping track of creative processes by commenting on the thinking behind each step.

Notebooks allow us to both write and run custom scripts as well as creating simple interfaces for those who may not code. Thus we can use them to help researchers, students and external collaborators to collect data, making it easier to call APIs, setting parameters, or perform manipulations.

ObservableHQ is one solution for writing programming notebooks, it runs in the browser and is oriented towards data and visualisations (“We believe thinking with data is an essential skill for the future”). Hence, we thought it could be a good starting point for what we wanted to do.

Screen capture of a notebook

The first notebook that we produced allows researchers to call CrowdTangle APIs in a simplified way: it exposes calls parameters and, contextually, it provides explanations and warnings about how to set them. For instance, it transforms the selection of platforms into checkboxes or the interval between calls into a slider (with a warning about rate-limits). The insertion of dates and other parameters can also be facilitated.

Examples of input fields

Data can be browsed in tabular form or downloaded as a CSV or JSON.

Examples of data

Notebooks can be used for a lot of diverse tasks. For instance, we produced a notebook that extracts hashtags from a list of posts and formats the data so to be used in Table2Net. One for extracting URLs and domain names from texts. Another one is dedicated to extending shortened URLs.

The last example was a case where it was necessary to implement a back-end service. Notebooks in ObservableHQ, indeed, run as front-end browser Javascript, therefore certain operations are tricky or impossible (this is one of their main limitations).

However, there are also many advantages. Notebooks are very flexible and easy to be transform and adjust: we can start gathering and exploring data and, after a couple of iterations, we can decide how best to structure it. We can add and remove parts of the interface almost instantly and we can embed functions (“cells”) from other notebooks, such as an emoji loading bar. The possibility to reuse or modify an entire notebook, or just a part of it, is very useful to build on the work done by other researchers and quickly bootstrap new tools as we need them.

Notebooks are particularly useful as part of exploratory research approaches where you are iteratively refining and adjusting research questions and seeing what is possible as you adjust various settings (e.g. the structure of the data, the parameters of the APIs).

An unusual loading bar, which can be imported into your notebook.

So far these projects have been used in the context of investigations with journalists on the Infodemic project, as well as in ongoing research and collaborations around DeSmog’s Climate Disinformation Database (including a prize-winning undergraduate thesis on this topic).

As per the working principles of the Public Data Lab, all of these notebooks are open-source (MIT license) and you are most welcome to use, transform and adjust them to your own work. If you use them for a project of piece of research, or if you’re also using code notebooks for we’d love to hear from you!

Here’s a full list of the notebooks mentioned in this post:

Investigating infodemic – researchers, students and journalists work together to explore the online circulation of COVID-19 misinformation and conspiracies

Over the past year researchers and students at institutions associated with the Public Data Lab have contributed to a series of collaborative digital investigations into the online circulation of COVID-19 misinformation and conspiracies.

Researchers and students contributed to a series of “engaged research led teaching” projects developed with journalists, media organisations and non-governmental organisations around the world.

These were undertaken in association with the Arts and Humanities Research Council funded project Infodemic: Combatting COVID-19 Conspiracy Theories, which explores how digital methods grounded in social and cultural research may facilitate understanding of WHO has described as an “infodemic” of misleading, fabricated, conspiratorial and other problematic material related to the COVID-19 pandemic.

These projects led to and contributed to a number of stories, investigations and publications including:

Continue reading

New edition of Data Journalism Handbook now open access with Amsterdam University Press

This blog is cross-posted from Further details can be found in this thread.

Today The Data Journalism Handbook: Towards a Critical Data Practice (which I co-edited with Jonathan Gray) is published on Amsterdam University Press. It is published as part of a new book series on Digital Studies which is also being launched today. You can find the book here, including an open access version:

The book provides a wide-ranging collection of perspectives on how data journalism is done around the world. It is published a decade after the first edition (available in 14 languages) began life as a collaborative draft at the Mozilla Festival 2011 in London.

Book sprint at MozFest 2011 for first edition of Data Journalism Handbook.

The new edition, with 54 chapters from 74 leading researchers and practitioners of data journalism, gives a “behind the scenes” look at the social lives of datasets, data infrastructures, and data stories in newsrooms, media organizations, startups, civil society organizations and beyond.

The book includes chapters by leading researchers around the world and from practitioners at organisations including Al Jazeera, BBC, BuzzFeed News, Der Spiegel,, The Engine Room, Global Witness, Google News Lab, Guardian, the International Consortium of Investigative Journalists (ICIJ), La Nacion, NOS, OjoPúblico, Rappler, United Nations Development Programme and the Washington Post.

An online preview of various chapters from book was launched in collaboration with the European Journalism Centre and the Google News Initiative and can be found here.

The book draws on over a decade of professional and academic experience engaging with the field of data journalism, including through my role as Data Journalism Programme Lead at the European Journalism Centre; my research on data journalism with the Digital Methods Initiative; my PhD research on “news devices” at the universities of Groningen and Ghent; and my research, teaching and collaborations around data journalism at the Department of Digital Humanities at King’s College London.

Further background about the book can be found in our introduction. Following is the full table of contents and some quotes about the book. We’ll be organising various activities around the book in coming months, which you can follow with the #ddjbook hashtag on Twitter.

If you adopt the book for a class we’d love to hear from you so we can keep track of how it is being used (and also update this list of data journalism courses and programmes around the world) and to inform future activities in this area. Hope you enjoy it!

Continue reading