Skip to main content

A look back at the Digital Methods Summer School 2013

Back home in Copenhagen, I am currently slowly but surely recovering from two intense weeks of summer school in Amsterdam. I took part in the annual Digital Methods Summer School, hosted by the Digital Methods Initiative at the University of Amsterdam. This year's theme -  "You are not the API I used to know: On the challenges of studying social media data" - was particularly relevant to my phd project, so I left Copenhagen full of excitement.

The DMSS proved to be unique in many ways. First of all, 'learning by doing' was taken seriously. We spent less time attending lectures than working on our own self-organised digital methods projects. (At the same time, the lectures were excellent, not least the ones by Bernard Rieder and Noortje Marres - see their slides here).

Fig. 1: Excitement while waiting for the Eurovision visualization to spatialize.

Second, the project work with considerably success bridged huge gaps in terms of the participants' previous experience with digital methods. Students and researchers with close to no knowledge of the DMI tools suite worked and presented side by side with among many others the director of the DMI, Richard Rogers, who took out a full two weeks to participate in and facilitate the summer school. "That's the way we do it here", he told me. Nevertheless an impressive commitment to a format that has grown from a humble beginning into an event with 70+ participants coming form 23 different countries.

So what did we do? Most participants took part in two individual projects (or rather sprints of two-three days), one per week. During the first week, my group formed around an interest in experimenting with analyzing overlaps in the users of different Facebook pages. We used Rieder's Facebook app Netvizz for data extraction and the open source network visualization software Gephi. Many resources were used on overcoming technical difficulties, but we also (re)learnt the importance of asking good questions. Our first attempts at coming up with a not-too-serious hypothesis ended up with the idea that people who like cats tend to be more libertarian and progressive, while people who prefer dogs tend to be more conservative and traditionalist (yes, the idea came about over after-work beers). With Facebook data, we thought, we could finally answer this pressing question.

We ended up visualizing the 'catosphere' and the 'dogosphere' in two different countries, Denmark and Italy, together with a couple of famous politicians of different leaning for each country. In short, what we did was to extract a sample of the list of fans of the pages for dogs and cats, and the two politicians. We then visualized the four groups together in order to see how they would position themselves in relation to each other. Each time an individual user was a fan of two pages, say dogs and Berlusconi, it would create a tie between the two groups, dragging them closer to each other in the visualization. Unfortunately, the result proved difficult to interpret, and we had to leave the question of the political significance of pet preferences unanswered.

Fig. 2: Difficult to interpret Gephi visualization of the Danish catosphere, dogosphere and two politicians' pages

We still had half a day left of the first week, however, so we came up with another hypothesis that we thought easier to test: Does the alleged Eurovision block voting also appear on Facebook? Self-evidently another pressing issue. Do people vote strategically with their mobile phones on the night of the Eurovision final, and then go on Facebook and reveal their 'real' preferences? Our strategy for answering this question was to extract sample data from all the Facebook pages for the artists competing in the 2013 Eurovision final and then rely on the overlap of users who like more than one artist in order to visualize whether the Facebook 'like' data reproduce the well-known voting blocks, such as 'The Viking Empire', 'The Balkan Bloc', and 'The Warsaw Pact'. The effort resulted in an immense visualization of 200.000 nodes representing artist pages from 25 different countries. We did not have time for a thorough analysis, but we found no signs of a Nordic cluster, so to some extent we managed to answer our question: People might vote in regional blocks, but when they report their preferences on Facebook, geographical proximity does not seem to be a useful predictor of clusters.

Fig. 3: An also hard-to-interpret visualization of 25 Eurovision fan pages. The Nordic countries are colored purple.

In the second week most of us had grown a bit weary of these not-too-serious themes and joined a much larger group led by three DMI-affiliated researchers, focusing on 'Detecting the Socials' of social media. The main idea was to explore different ways of getting at something 'social' through Twitter, using the new 'TCAT' tool developed by Bernard Rieder and Erik Borra at DMI. We split up in three different subgroups focusing on users, bots, and hashtags respectively. I joined the group working on hashtags, led by Noortje Marres. A lot happened here, and hopefully I will be able to link to our final presentation soon. Let me just reveal a bit about what I worked on the most.

The question my sub-sub-group found ourselves asking - and trying to answer - was: What is the 'enduring social' on Twitter? Is there a continuous conversation in the sense of a stable activity around a group of associated hashtags? In order to 'get to' this particular version of 'the social', we quickly found a lot of cleaning had to be done. First of all, an amazing amount of 'botty' and 'spammy' activity was present in the dataset of tweets related to the word 'privacy' that we had chosen to work with. Second, some events resulted in large spikes of more 'newsy' activity, such as the revealing of the NSA-leaks in the middle of the time frame we were analyzing. These became our two main criteria for getting to the 'enduring social':
  1. High user diversity around a hashtag: Activity not driven by bots or single accounts.
  2. Continuity: Activity not 'blippy' or 'bursty' due to being driven by news stories.
In many ways these results led us back to well-known insights, fundamental to controversy mapping with digital methods: There is a 'vital middle' that is neither the top sites/tweets that everyone sees nor the underworld of sites/tweets that no one links to or retweets. In order to make a point with digital data, a lot of cleaning up is necessary. 

In the end we arrived at a series of attempts at pointing to the enduring social in privacy-related conversations on Twitter by showing how some hashtags continued to be used substantially and by a variety of users over four time intervals. In the third time interval, the NSA-leak happened, so we were able to suggest with our visualizations to what extend the 'enduring conversations' were inflected by the 'newsy' ones.

The below graphs would of course not have been possible without the 'Associational Profiler' function of the Twitter TCAT tool (to be released soon) and the immense talent of our data designer. The graphs show over time what other hashtags a particular hashtag have been used together with. The enduring hashtags are highlighted, and the 'newsy' ones are colored black (#prism, #nsa, etc).

Fig. 4: Associational profile for #google inside the privacy-discussion on Twitter.

Fig. 5: Associational profile for #security inside the privacy-discussion on Twitter.

Fig. 6: Associational profile for #tech inside the privacy-discussion on Twitter.


The three examples might also be a bit hard to interpret, and indeed there may be several problems with this take on Twitter and 'the social'. One obvious theoretical point is that focusing on something 'enduring' is a quite conservative, Durkheimian definition of sociality. One obvious technical point is that focusing on hashtags means accepting a quite platform-centric version of sociality, and indeed one that gives primacy to attracting large numbers of users/retweets.

Fortunately several other entry points to the 'social' of social media were explored during the week, and indeed the most important lesson to take away was that the delineation of data sets and objects of study is a crucial step that decides what kinds of social we come to study and articulate.


Popular posts from this blog

Official statistics: 51% of 16-74 year old Danes use Facebook

In making a case for why my MSc dissertation here at the Oxford Internet Institute should be concerned with something as hyped and mundane as Facebook, I've been looking for numbers on the Danish social media landscape.

On the English-language web, the commercial SocialBakers Facebook statistics suggest that 49% of the Danish population are on Facebook.

This rather non-transparent number can now be compared with a recent report by Statistics Denmark, suggesting that 51% of 16-74 year old Danes have a Facebook account. The second-largest online social network service in Denmark, LinkedIn, is trailing far behind at 8%. Most surprisingly perhaps, a mere 3% of the surveyed age cohort use Twitter.

As such, there are compelling quantitative reasons for choosing Facebook over e.g. Twitter for a case study of how social media reflect life in Denmark. Another recent survey produced for a Danish daily confirms this: A tiny elite of the 319 most active Twitter users in Denmark write half of …

Two (used) comments on Gillespie's new chapter "The Relevance of Algorithms"

I'm in Paris this semester, as a visiting doctoral student at the Center for the Sociology of Innovation (CSI) at Ecole des Mines and at the médialab at Sciences Po. 
Apart from finding myself in the middle of two very lively research communities, I've also been so lucky that a series of cross-institutional seminars on Digital Methods are taking place in Paris this spring.
The last seminar was on "Transformative interaction: web effects on social dynamics", for which I volunteered to prepare a brief comment on one of the selected readings, namely Tarleton Gillespie's chapter "The Relevance of Algorithms", forthcoming in an edited volume on "Media Technologies" to be published by MIT Press. (The full chapter has been uploaded by Gillespie here).
Since I prepared the comments in writing, and since they did in fact spark some discussion, I've decided that it might be appropriate to recycle them as a blog post. Here goes:

Introducing: The Twitter-thing!

Context: The Twitter-thing is the (awkward?) translation into English of 'Twittertinget' - a project I worked on last year with two Danish colleagues, Irina Papazu (CBS) and Tobias Bornakke (Uni. of Copenhagen) in collaboration with the Danish newspaper Politiken. The Twitter-thing is a tool that draws on TCAT in order to build a network visualisation of how Danish MPs use hashtags on Twitter. Here follows my abstract for the upcoming Data Publics conference in Lancaster, where I'll be exhibiting the Twitter-thing.

Parliaments could seem to be highly issue-agnostic places. All sorts of problems move in and out of these large and expensive devices (Dányi 2015), while the membership stays more or less the same in-between elections. But as issues are taken up and left behind by parliaments, they also make cuts in the parliament in the sense that specific sets of parliamentarians become attached to specific issues. The aim of the Twitter-thing tool is to trace these cuts and v…