On hashtags

Up front:

A lot of this work was sparked by a discussion I had with Ed Summers earlier this week, and, you know, he’s also the one that built the tool I’ve been using to do this work, so. I deserve minimal credit for anything written here.

And, I’ll give additional credit to a conversation we had in class last week — about having questions to ask of data, versus having the skills to use tools to manipulate and look at data. It’s been fun for me, this semester, to tinker with things like twarc (and Programming Historian), but I’ve also realized that I’m not necessarily doing this work with a research question — or a thesis, or grant-funded project in mind. And I wasn’t quite sure, then, where I fit. But in class, we talked about how there is value — in its own right — in knowing how to use the tools. And so I’ve been tinkering.


Wisconsin is currently — and finally — grappling with the issue of racism on campus among its students and faculty, and, in response, student activists brought the conversation online, at #therealuw. (Disclaimer: it looks like this hashtag was used for the first time on March 17, but, due to Twitter’s API, I’ve only been able to work with tweets from the past seven days. Start as soon as you can.)

Alongside #therealuw are other similar hashtags: #reclaimOSU, concerning the creation of a “just, transparent and democratic food system,” the halting of “the Comprehensive Energy Management Plan which would further privatize Ohio State,” and demands that Ohio State administration divest from “companies that are complicit in Israeli apartheid.”

Another is #dismantledukeplantation, started by “a group of nine students” who are “currently occupying the main administrative building at Duke university to demand accountability and justice for Black & Brown workers.”

There are more, of course. And they’ll keep coming. But these are the three I decided to take a closer look at.


Twarc requires a bit of setup: you need to have Python installed, for one, and you need to register an application at Twitter (which will give you four codes you’ll need to run a search / crawl). Also, I encountered quite a few bugs while working with Twarc the first time, but these problems were on my end, and had to do with upgrading to El Capitan.

Anyway, this is all to say: I thought it would be interested to take a look at #therealuw, #dismantledukeplantation, and #reclaimosu as a group. So, first, I executed the crawls. (Another disclaimer: both #reclaimosu and #dismantledukeplantation are very active right now, while #therealuw was much more active earlier in the week, and that skewed the numbers a bit. Also, while crawling #reclaimosu, I hit Twitter’s rate limit, and stopped the crawl. So I wouldn’t call what I’ve collected a complete, or scholarly acceptable, set of data. It’s pretty haphazard, but it’s okay for a blog post.)

From #therealuw, I collected 1,269 tweets; from #dismantledukeplantation, 5,241; and from #reclaimosu, 5,290. One of the first things I did was create a word cloud for each.

Students, as exemplified, was the word that appeared the most across all three. (I ran ‘grep -c ‘students’ against each json files, and found that the word appears 371 times in #therealuw, 1612 times in #reclaimosu, and 1435 times in #dismantledukeplantation.) This isn’t exactly surprising, because these are student-led protests and demonstrations. But, the same concept can be used for less obvious words. Silenced, for instance, is not used in #therealuw or #dismantledukeplantation, but it appears 40 times in #reclaimosu. Community appears 71 times in #therealuw. Isolate appears 35 times in #dismantledukeplantation. Heard appears 85 times in #reclaimosu. And on. If you know what you’re looking for — a word, a hashtag-wide sentiment — this tool comes in pretty handy. (And I bet people a lot smarter than me can use this to consider trends and changes in the movements just by looking at a word or two.)


I attempted, too, to get an idea of how many original tweets were being generated within each hashtag. I think the original crawl I ran counted tweets en masse: so one tweet that was retweeted fifty times counted as fifty-one tweets. But by removing retweets, the number of different and unique tweets in each hashtag became a smaller group. #reclaimosu had 657 original tweets, #dismantledukeplantation had 897, and #therealuw had 325. 

You might — ? — be able to use this in an attempt to think about the number of ideas and thoughts being contributed to the hashtag. A collection of fifty tweets that, when stripped of all retweets, ends up being one tweet probably has less than a collection with six-thousand tweets that, when stripped of all retweets, ends up containing four thousand tweets. Right? It’s probably a faulty method, but I think it could at least help narrow the set of data you’re looking at, if you’re interested in something like the most active tweeters, or the first time an oft-retweeted sentiment appeared.


The last thing I did was look at links embedded within the tweets, which is something Ed Summers did for #therealuw about a week ago. By using a utility script installed alongside twarc, you can find out which link appeared most often in a certain hashtag. (I think what I’ve done here is slightly incorrect. The first step in this process is unshortening urls, which requires unshrtn, and I could not, for the life of me, figure out how to get that to work. Again: this is a user error.)

Interestingly, running this command turned up links that provide a whole lot of context to the origin and intention behind each hashtag. For instance, the most-tweeted link (90 times) within #therealuw hashtag was a link to a YouTube video created by Patrick Sims, Wisconsin’s Chief Diversity Officer and Vice Provost. (An aside: beware the link, friends. When I viewed it when it was posted, on March 31, the comments section was not open. But between now and then, it seems that the comments section was opened up, and what fills that space now are racist, vile, sickening comments. Hateful, inexcusable sentiments that are just gutting. And I don’t know what to say here, other than I can’t imagine the strength it takes to stand up to hatred like this, and that I support my fellow classmates in their brave, and seemingly unending, activism. We are better for their bravery and, in so many instances, undeserving of their efforts that push on in the face of this oppression.)

In #dismantledukeoppression, a Guardian article was cited most often, with 125 appearances. And for #reclaimosu, it was a link, tweeted out 117 times, from Afrikan Black Coalition, that listed the movement’s demands.


There are, I’m sure, hundreds of questions you can answer using this data. One question that comes to mind — if you act fast enough — might be the identification of the user who created the hashtag and / or the user that appears most often within the hashtag.

I look at this work from the perspective of an archivist most often. For instance: if we can identify a link that appears over and over within a hashtag, how do we make sure that the link lives on and remains accessible both tomorrow and years from now? And, you know, if we can identify trends as they appear on twitter, can we expect that similar content is being created outside of the Internet? And does a common theme within a hashtag lead us to look for these instances? Are there physical documents that reflect the conversation happening online? Videos? Photographs?

If so, how do we track them down?

As someone entering the field within the next few months, I recognize the need to be an archivist who engages with the source, the movement, the creator. When we wait for content, and people, to come to us, we barely get half the story. When we wait, we get content from those who already exist within the narrative and the archive: the privileged, the white, the rich, the well off. But by venturing out of our stale spaces, we meet new voices, and new people, and new histories, and we become part of the history as it is told by those who are living it — not by those who rewrote it, and not by those act as gatekeepers.

The role of the archivist is changing, I think. And hope. I don’t envision a future where archivists work like they did fifty — or ten — years ago. People create too much information, too quickly, for us to continue to operate like that, right? And so I hope that the profession turns outward: that our role becomes less custodial in nature and, instead, more educational. That archivists no longer protect what they think should live on, but instead teach communities to assure that the things they create outlive them — if they want them to. That our archives are no longer confined to a handful of institutions, but instead exist widely: at libraries, at community centers, within families. And I don’t think this threatens the field, or our institutions, but instead strengthens them.


To wrap, I’d be remiss if I didn’t bring your attention to Documenting the Now. Everything I’ve written about are ideas that the Documenting the Now team has been thinking about, and working toward, for a long time. And I am so excited to watch the work they do, and the conversation they’ve started, as it develops.


One thought on “On hashtags

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s