Digital Methods Archives – Public Data Lab

zeehaven – a tiny tool to convert data for social media research

Posted on December 18, 2023 | by Public Data Lab

Zeeschuimer (“sea foamer”) is a web browser extension from the Digital Methods Initiative in Amsterdam that enables you to collect data while you are browsing social media sites for research and analysis.

It currently works for platforms such as TikTok, Instagram, Twitter and LinkedIn and provides an ndjson file which can be imported into the open source 4CAT: Capture and Analysis Toolkit for analysis.

To make data gathered with Zeeschuimer more accessible for for researchers, reporters, students, and others to work with, we’ve created zeehaven (“sea port”) – a tiny web-based tool to convert ndjson into csv format, which is easier to explore with spreadsheets as well as common data analysis and visualisation software.

Drag and drop a ndjson file into the “sea port” and the tool will prompt you to save a csv file. ✨📦✨

zeehaven was created as a collaboration between the Centre for Interdisciplinary Methodologies, University of Warwick and Department of Digital Humanities, King’s College London – and grew out of a series of Public Data Lab workshops to exchange digital methods teaching resources earlier this year.

You can find the tool here and the code here. All data is converted locally.

Exploring forest hashtags in COP27 Twitter with the European Forest Institute

Posted on December 7, 2022 | by Jonathan Gray

The following is a cross-post from Rina Tsubaki at the European Forest Institute, drawing on digital methods recipes and approaches developed with the Public Data Lab as part a broader collaboration around the SUPERB project on upscaling forest restoration.

Elon Musk’s takeover of Twitter has prompted confusion among its users and concerns about the platform’s future. Musk’s tweets are gathering daily attention due to large-scale layoffs and safety concerns around the new paid blue verification mark. To make things worse, as its engineers are on their way out of the door, users are also experiencing various technical glitches on the platform. Millions of users – including journalists, researchers and organisations – are already signing up on alternative platforms to be prepared for the platform’s deterioration and demise.

While no one can predict Twitter’s future, it remains widely used by politicians, scientists, companies, NGOs and influencers who are still busy posting on the platform. This includes COP27 in Egypt, where Twitter was one of the main platforms to report on the event. #cop27 has been tweeted over 2.85 million times since 5 November 2022.

Social media platforms can give us additional insights into how broader publics make connections between forest restoration and other social, economic and environmental issues. To see which issues and narratives around forest restoration have been brought up on Twitter in the lead-up to the event, we’ve carried out a series of small explorations based on the digital methods recipes developed by our colleagues at the Department of Digital Humanities, King’s College London and the Public Data Lab who are part of the SUPERB consortium led by EFI. This has been a good way to see if EFI could use these methods independently to understand international events as they unfold.

We usually see a spike in hashtag usage a few days before global events like the COPs. Using #cop27, we collected 217,189 tweets between 5 and 7 November 2022. We then examined the top 1000 hashtags to see which kinds of forest-related issues are present.

Continue reading →

“What actually happened? The use and misuse of Open Source Intelligence (OSINT)”, Digital Methods Winter School and Data Sprint 2023

Posted on October 27, 2022 | by Jonathan Gray

Applications are now open for the Digital Methods Winter School and Data Sprint 2023 which is on the theme of “What actually happened? The use and misuse of Open Source Intelligence (OSINT)”.

This will take place on 9-13th January 2023 at the University of Amsterdam. Applications are accepted until 1st December 2022.

More details and registration links are available here and an excerpt on this year’s theme and the format is copied below:

The Digital Methods Initiative (DMI), Amsterdam, is holding its annual Winter School on the ‘Use and Misuse of Open Source Intelligence (OSINT)’. The format is that of a (social media and web) data sprint, with tutorials as well as hands-on work for telling stories with data. There is also a programme of keynote speakers. It is intended for advanced Master’s students, PhD candidates and motivated scholars who would like to work on (and complete) a digital methods project in an intensive workshop setting. For a preview of what the event is like, you can view short video clips from previous editions of the School.
Continue reading →

Article on “Engaged research-led teaching: composing collective inquiry with digital methods and data”

Posted on July 11, 2022 | by Public Data Lab

A new article on “Engaged research-led teaching: composing collective inquiry with digital methods and data” co-authored by Jonathan Gray, Liliana Bounegru, Richard Rogers, Tommaso Venturini, Donato Ricci, Axel Meunier, Michele Mauri, Sabine Niederer, Natalia Sánchez-Querubín, Marc Tuters, Lucy Kimbell and Anders Kristian Munk has just been published in Digital Culture & Education.

The article is available here, and the abstract is as follows:

This article examines the organisation of collaborative digital methods and data projects in the context of engaged research-led teaching in the humanities. Drawing on interviews, field notes, projects and practices from across eight research groups associated with the Public Data Lab (publicdatalab.org), it provides considerations for those interested in undertaking such projects, organised around four areas: composing (1) problems and questions; (2) collectives of inquiry; (3) learning devices and infrastructures; and (4) vernacular, boundary and experimental outputs. Informed by constructivist approaches to learning and pragmatist approaches to collective inquiry, these considerations aim to support teaching and learning through digital projects which surface and reflect on the questions, problems, formats, data, methods, materials and means through which they are produced.

Profiling Bolsobot Networks

Posted on May 5, 2022 | by Janna Joceli Omena | 2 Comments

Blog post by By Emillie de Keulenaar, Francisco Kerche, Giulia Tucci, Janna Joceli Omena and Thais Lobo [alphabetical order].

Brazilian political bots have been active since 2014 to influence elections through the creation and maintenance of fake profiles across social media platforms. In 2017, bots’ influence and forms of interference gained a new status with the emergence of “bot factories” acting in support of Jair Bolsonaro’s election and presidency. What we call bolsobots are inauthentic social media accounts created to consistently support Bolsonaro’s political agenda over the years, namely Bolsonaro as a political candidate, President, and avatar of a conservative and militaristic vision of Brazilian history, where social discipline, Christian values and a strong but economically liberal state aim to uproot the decadent influence of “socialism” (Messenberg, 2019). From viralising or spreading hashtags to establishing target audiences with pro-Bolsonaro “slogan accounts” with a strong, visual presence, these bots have also been tied to documented disinformation campaigns (Lobo & Carvalho, 2018; Militão & Rebello, 2019; Santini, Salles, & Tucci, 2021). Despite the efforts of social media platforms, including Whatsapp and Telegram, to restrict their more or less coordinated inauthentic activities (Euronews, 2021), bolsobots still exist and actively adapt to online cultures.

Accounting for the upcoming Brazilian 2022 elections, the project Profiling Bolsobots Networks investigates the practices of pro- and anti- Bolsonaro bots across Instagram, Twitter and TikTok. It aims to empirically demonstrate how to capture the operation of bolsobot networks; the types of accounts that constitute bot ecologies; how (differently) bots behave and promote content; how bolsobots change over time and across platforms, pending to different cultures of authenticity; and, finally, how platform moderation policies may impact their activities over time. In doing so, the project will produce a series of research reports on “bolsobot” networks and digital methods recipes to further the understanding of bots’ presence and influence in the communication ecosystem.

We are (so far) a group of six scholars collaborating on this project: Janna Joceli Omena (Public Data Lab / iNOVA Media Lab / University of Warwick), Thais Lobo (Public Data Lab / King’s College London), Francisco Kerche (Universidade Federal do Rio de Janeiro), Giulia Tucci (Universidade Federal do Rio de Janeiro), Emillie de Keulenaar (OILab / University of Groningen) and Elias Bitencourt (Universidade do Estado da Bahia). Below are some of the preliminary outputs of the project.

Continue reading →

EASST session on ‘quali-quantitative’ methods in STS

Posted on January 20, 2022 | by Jonathan Gray

The following post is from Judit Varga, Postdoctoral Researcher on the ERC-funded project FluidKnowledge, based at the Centre for Science and Technology Studies, Leiden University.

We would like to invite the Public Data Lab and its network of researchers and research centres to join and contribute to our session about quali-quantitative, digital, and computational methods in Science and Technology Studies (STS) at the next EASST conference 6-9 July 2022.

Fitting with the Public Data Network’s activities, the session starts from the observation that engaging with digital and computational ways of knowing is crucial for STS and related disciplines to study or intervene in them. The panel invites contributions that attempt or reflect on methodological experimentation and innovation in STS by combining STS concepts or qualitative, interpretative methods with digital, quantitative and computational methods, such as quali-quantitative research.

Over the past decade, STS scholars have increasingly benefited from digital methods, drawing on new media studies and design disciplines, among others. In addition, recently scholars also called for creating new dialogues between STS and QSS, which have increasingly grown apart since the 1980s. Although the delineation of STS methods from neighboring fields may be arbitrary, delineation can help articulate methodological differences, which in turn can help innovate and experiment with STS methods at the borders with other disciplines.

We invite contributions that engage with the following questions. What do we learn if we try to develop digital and computational STS research methods by articulating and bridging disciplinary divisions? In what instances is it helpful to draw boundaries between STS and digital and computational methods, for whom and why? On the contrary, how can STS benefit from not drawing such boundaries? How can we innovate STS methods to help trace hybrid and diverse actors, relations, and practices, using digital and computational methods? How can methodological innovation and experimentation with digital and computational methods help reach STS aspirations, or how might it hinder or alter them? What challenges do we face when we seek to innovate and experiment with digital and computational methods in STS? In what ways are such methodological reflection and innovation in STS relevant at a time of socio-ecological crises?

The current deadline for abstract submissions is the ~~1^st of February 2022~~ 7th February 2022 (the deadline has been extended).

Introducing Memespector-GUI: A Graphical User Interface Client for Computer Vision APIs

Posted on October 27, 2021 | by Jason Chao

In this post Jason Chao, PhD candidate at the University of Siegen, introduces Memespector-GUI, a tool for doing research with and about data from computer vision APIs.

In recent years, tech companies started to offer computer vision capabilities through Application Programming Interfaces (APIs). Big names in the cloud industry have integrated computer vision services in their artificial intelligence (AI) products. These computer vision APIs are designed for software developers to integrate into their products and services. Indeed, your images may have been processed by these APIs unbeknownst to you. The operations and outputs of computer vision APIs are not usually presented directly to end-users.

The open-source Memespector-GUI tool aims to support investigations both with and about computer vision APIs by enabling users to repurpose, incorporate, audit and/or critically examine their outputs in the context of social and cultural research.

What kinds of outputs do these computer vision APIs produce? The specifications and the affordances of these APIs vary from platform to platform. As an example here is a quick walkthrough of some of the features of Google Vision API…

Continue reading →

Using ObservableHQ notebooks for gathering and transforming data in digital research

Posted on October 14, 2021 | by Tommaso Elli

We’ve recently been experimenting with the use of ObservableHQ notebooks for gathering and transforming data in the context of digital research. This post walks through a few recent examples of notebooks from recent Public Data Lab projects.

In one project we wanted to use the CrowdTangle “Links” API to fetch data about how certain web pages were shared online and across different platforms. After gaining access to relevant end points, we could adopt different means to call the APIs and retrieve data: such as using something like Postman (a general-purpose interface to call endpoints), or writing custom scripts (for example in Python or Javascript).

Code notebooks are a third option that lies somewhere in between these options. Designed for programmers, notebooks allow for iterative manipulation and experimentation with code whilst keeping track of creative processes by commenting on the thinking behind each step.

Notebooks allow us to both write and run custom scripts as well as creating simple interfaces for those who may not code. Thus we can use them to help researchers, students and external collaborators to collect data, making it easier to call APIs, setting parameters, or perform manipulations.

ObservableHQ is one solution for writing programming notebooks, it runs in the browser and is oriented towards data and visualisations (“We believe thinking with data is an essential skill for the future”). Hence, we thought it could be a good starting point for what we wanted to do.

Continue reading →

“Data critique and platform dependencies: How to study social media data?”, Digital Methods Winter School and Data Sprint 2022

Posted on October 14, 2021 | by Jonathan Gray

Applications are now open for the Digital Methods Winter School and Data Sprint 2022 which is on the theme of “Data critique and platform dependencies: How to study social media data?“.

This will take place on 10-14th January 2022 at the University of Amsterdam.

More details and registration links are available here and an excerpt on this year’s theme and the format is copied below.

The Digital Methods Initiative (DMI), Amsterdam, is holding its annual Winter School on ‘Social media data critique’. The format is that of a (social media and web) data sprint, with tutorials as well as hands-on work for telling stories with data. There is also a programme of keynote speakers. It is intended for advanced Master’s students, PhD candidates and motivated scholars who would like to work on (and complete) a digital methods project in an intensive workshop setting. For a preview of what the event is like, you can view short video clips from previous editions of the School.

Data critique and platform dependencies: How to study social media data?

Source criticism is the scholarly activity traditionally concerned with provenance and reliability. When considering the state of social media data provision such criticism would be aimed at what platforms allow researchers to do (such as accessing an API) and not to do (scrape). It also would consider whether the data returned from querying is ‘good’, meaning complete or representative. How do social media platforms fare when considering these principles? How to audit or otherwise scrutinise social media platforms’ data supply?
Recently Facebook has come under renewed criticism for its data supply through the publication of its ‘transparency’ report, Widely Viewed Content. It is a list of web URLs and Facebook posts that receive the greatest ‘reach’ on the platform when appearing on users’ News Feeds. Its publication comes on the heels of Facebook’s well catalogued ‘fake news problem’, first reported in 2016 as well as a well publicised Twitter feed that lists the most-engaged with posts on Facebook (using Crowdtangle data). In both instances those contributions, together with additional scholarly work, have shown that dubious information and extreme right-wing content are disproportionately interacted with. Facebook’s transparency report, which has been called ‘transparency theater’, demonstrates that it is not the case. How to check the data? For now, “all anybody has is the company’s word for it.”
For Facebook as well as a variety of other platforms there are no public archives. Facebook’s data sharing model is one of an industry-academic ‘partnership’. The Social Science One project, launched when Facebook ended access to its Pages API, offers big data — “57 million URLs, more than 1.7 trillion rows, and nearly 40 trillion cell values, describing URLs shared more than 100 times publicly on Facebook (between 1/1/2017 and 2/28/2021).” To obtain the data (if one can handle it) requires writing a research proposal and if accepted compliance with Facebook’s ‘onboarding’, a non-negotiable research data agreement. Ultimately, the data is accessed (not downloaded) in a Facebook research environment, “the Facebook Open Research Tool (FORT) … behind a VPN that does not have access to the Internet”. There are also “regular meetings Facebook holds with researchers”. A data access ethnography project, not so unlike to one written about trying to work with Twitter’s archive at the Library of Congress, may be a worthwhile undertaking.
Other projects would evaluate ‘repurposing’ marketing data, as Robert Putnam’s ‘Bowling Alone’ project did and as is a more general digital methods approach. Comparing multiple marketing data outputs may be of interest, and crossing those with CrowdTangle ‘s outputs. Facepager, one of the last pieces of software (after Netvizz and Netlytic) to still have access to Facebook’s graph API reports that “access permissions are under heavy reconstruction”. Its usage requires further scrutiny. There is also a difference between the user view and the developer view (and between ethnographic and computational approaches), which is also worth exploring. ‘Interface methods‘ may be useful here. These and other considerations for developing social media data criticism are topics of interest for this year’s Winter School theme.
At the Winter School there are the usual social media tool tutorials (and the occasional tool requiem), but also continued attention to thinking through and proposing how to work with social media data. There are also empirical and conceptual projects that participants work on. Projects from the past Summer and Winter Schools include: Detecting Conspiratorial Hermeneutics via Words & Images, Mapping the Dutchophone Fringe on Telegram, Greenwashing, in_authenticity & protest, Searching constructive/authentic posts in media comment sections: NU.nl/The Guardian, Mapping deepfakes with digital methods and visual analytics, “Go back to plebbit”: Mapping the platform antagonism between 4chan and Reddit, Profiling Bolsobots Networks, Infodemic everywhere, Post-Trump Information Ecology, Streams of Conspirational Folklore, and FIlterTube: Investigating echo chambers, filter bubbles and polarization on YouTube.
Organisers: Lucia Bainotti, Richard Rogers and Guillen Torres, Media Studies, University of Amsterdam. Application information at https://www.digitalmethods.net.

Continue reading →