Skip to main content

A look back at the Digital Methods Summer School 2013

Back home in Copenhagen, I am currently slowly but surely recovering from two intense weeks of summer school in Amsterdam. I took part in the annual Digital Methods Summer School, hosted by the Digital Methods Initiative at the University of Amsterdam. This year's theme -  "You are not the API I used to know: On the challenges of studying social media data" - was particularly relevant to my phd project, so I left Copenhagen full of excitement.

The DMSS proved to be unique in many ways. First of all, 'learning by doing' was taken seriously. We spent less time attending lectures than working on our own self-organised digital methods projects. (At the same time, the lectures were excellent, not least the ones by Bernard Rieder and Noortje Marres - see their slides here).

Fig. 1: Excitement while waiting for the Eurovision visualization to spatialize.

Second, the project work with considerably success bridged huge gaps in terms of the participants' previous experience with digital methods. Students and researchers with close to no knowledge of the DMI tools suite worked and presented side by side with among many others the director of the DMI, Richard Rogers, who took out a full two weeks to participate in and facilitate the summer school. "That's the way we do it here", he told me. Nevertheless an impressive commitment to a format that has grown from a humble beginning into an event with 70+ participants coming form 23 different countries.

So what did we do? Most participants took part in two individual projects (or rather sprints of two-three days), one per week. During the first week, my group formed around an interest in experimenting with analyzing overlaps in the users of different Facebook pages. We used Rieder's Facebook app Netvizz for data extraction and the open source network visualization software Gephi. Many resources were used on overcoming technical difficulties, but we also (re)learnt the importance of asking good questions. Our first attempts at coming up with a not-too-serious hypothesis ended up with the idea that people who like cats tend to be more libertarian and progressive, while people who prefer dogs tend to be more conservative and traditionalist (yes, the idea came about over after-work beers). With Facebook data, we thought, we could finally answer this pressing question.

We ended up visualizing the 'catosphere' and the 'dogosphere' in two different countries, Denmark and Italy, together with a couple of famous politicians of different leaning for each country. In short, what we did was to extract a sample of the list of fans of the pages for dogs and cats, and the two politicians. We then visualized the four groups together in order to see how they would position themselves in relation to each other. Each time an individual user was a fan of two pages, say dogs and Berlusconi, it would create a tie between the two groups, dragging them closer to each other in the visualization. Unfortunately, the result proved difficult to interpret, and we had to leave the question of the political significance of pet preferences unanswered.

Fig. 2: Difficult to interpret Gephi visualization of the Danish catosphere, dogosphere and two politicians' pages

We still had half a day left of the first week, however, so we came up with another hypothesis that we thought easier to test: Does the alleged Eurovision block voting also appear on Facebook? Self-evidently another pressing issue. Do people vote strategically with their mobile phones on the night of the Eurovision final, and then go on Facebook and reveal their 'real' preferences? Our strategy for answering this question was to extract sample data from all the Facebook pages for the artists competing in the 2013 Eurovision final and then rely on the overlap of users who like more than one artist in order to visualize whether the Facebook 'like' data reproduce the well-known voting blocks, such as 'The Viking Empire', 'The Balkan Bloc', and 'The Warsaw Pact'. The effort resulted in an immense visualization of 200.000 nodes representing artist pages from 25 different countries. We did not have time for a thorough analysis, but we found no signs of a Nordic cluster, so to some extent we managed to answer our question: People might vote in regional blocks, but when they report their preferences on Facebook, geographical proximity does not seem to be a useful predictor of clusters.

Fig. 3: An also hard-to-interpret visualization of 25 Eurovision fan pages. The Nordic countries are colored purple.

In the second week most of us had grown a bit weary of these not-too-serious themes and joined a much larger group led by three DMI-affiliated researchers, focusing on 'Detecting the Socials' of social media. The main idea was to explore different ways of getting at something 'social' through Twitter, using the new 'TCAT' tool developed by Bernard Rieder and Erik Borra at DMI. We split up in three different subgroups focusing on users, bots, and hashtags respectively. I joined the group working on hashtags, led by Noortje Marres. A lot happened here, and hopefully I will be able to link to our final presentation soon. Let me just reveal a bit about what I worked on the most.

The question my sub-sub-group found ourselves asking - and trying to answer - was: What is the 'enduring social' on Twitter? Is there a continuous conversation in the sense of a stable activity around a group of associated hashtags? In order to 'get to' this particular version of 'the social', we quickly found a lot of cleaning had to be done. First of all, an amazing amount of 'botty' and 'spammy' activity was present in the dataset of tweets related to the word 'privacy' that we had chosen to work with. Second, some events resulted in large spikes of more 'newsy' activity, such as the revealing of the NSA-leaks in the middle of the time frame we were analyzing. These became our two main criteria for getting to the 'enduring social':
  1. High user diversity around a hashtag: Activity not driven by bots or single accounts.
  2. Continuity: Activity not 'blippy' or 'bursty' due to being driven by news stories.
In many ways these results led us back to well-known insights, fundamental to controversy mapping with digital methods: There is a 'vital middle' that is neither the top sites/tweets that everyone sees nor the underworld of sites/tweets that no one links to or retweets. In order to make a point with digital data, a lot of cleaning up is necessary. 

In the end we arrived at a series of attempts at pointing to the enduring social in privacy-related conversations on Twitter by showing how some hashtags continued to be used substantially and by a variety of users over four time intervals. In the third time interval, the NSA-leak happened, so we were able to suggest with our visualizations to what extend the 'enduring conversations' were inflected by the 'newsy' ones.

The below graphs would of course not have been possible without the 'Associational Profiler' function of the Twitter TCAT tool (to be released soon) and the immense talent of our data designer. The graphs show over time what other hashtags a particular hashtag have been used together with. The enduring hashtags are highlighted, and the 'newsy' ones are colored black (#prism, #nsa, etc).

Fig. 4: Associational profile for #google inside the privacy-discussion on Twitter.

Fig. 5: Associational profile for #security inside the privacy-discussion on Twitter.

Fig. 6: Associational profile for #tech inside the privacy-discussion on Twitter.


The three examples might also be a bit hard to interpret, and indeed there may be several problems with this take on Twitter and 'the social'. One obvious theoretical point is that focusing on something 'enduring' is a quite conservative, Durkheimian definition of sociality. One obvious technical point is that focusing on hashtags means accepting a quite platform-centric version of sociality, and indeed one that gives primacy to attracting large numbers of users/retweets.

Fortunately several other entry points to the 'social' of social media were explored during the week, and indeed the most important lesson to take away was that the delineation of data sets and objects of study is a crucial step that decides what kinds of social we come to study and articulate.