On establishing a web archiving program

Earlier this week, I drove to Davenport, Iowa to attend — and speak at — the Upper Midwest Digital Collections Conference. Trevor Owens of the Institute of Museum and Library Services delivered a wonderful keynote, and others followed his remarks by discussing digital projects — audio digitization! community archiving! — happening throughout the upper Midwest.

I attended to discuss the web archiving work I’ve been doing on behalf of the Madison Public Library (and, as the conversation evolved, I also spoke a bit about the work I’ve been doing at the University of Wisconsin-Madison Archives to document #therealuw). The talk was called You’ve Decided to Establish a Web Archiving Program. Now What? with hopes that I could demystify that odd space that comes between deciding to start a web archiving program and actually crawling sites.

The talk was broken into chunks: an introduction; a brief overview of available programs; and seven questions to ask of your organization / your employees as you start your program. And because my slides — as always — are quite bare, I thought I’d expand here, in writing.


I used the introduction to urge professionals to give up on perfection. (My actual words were: the perfect web archive does not exist.) The web, as it stands, moves too fast and contains too much content to ever build the perfect web collection.

I should say: I’m not urging you to half-ass (sorry) the development of your collection(s), but rather to think critically and be creative as you build. When an institution assembles a physical, analog collection, there’s no expectation that — at its beginning — the collection is going to contain everything related to its subject, right? And yet, we think about web archives differently — I think — because it seems like just about everything is right there, ready for the taking.

Strategic, adaptive collecting is one hundred times better than mindless, let’s-grab-it-all collecting — and that’s true for any format.


As far as platforms go, I suggested two: Archive-It and webrecorder.io. There are other options, certainly, but these are two that I have used before and trust.


The rest of the presentation focused on these seven questions I’ve developed, which, quite honestly, are things I asked of my own organization (and myself) after we were knee-deep in creating our web collection. I like the collection we’ve established, and I think it’s going to serve Madison’s community well, but I do wish I had tried to answer these questions before we started because it would have streamlined the appraisal process and, right off the bat, made our collection a bit more focused.

But, live and learn, right?


1. Who is the archive’s target audience?

This question comes easy to a lot of organizations, and that’s great. Establishing the Library’s collection threw me for a loop, though, because the Library serves such a diverse user group. There are traditional library patrons, who use the Library’s books and media and computers; there are the researchers, who use the Library’s reference services and research material and local history room; there are artists, who use the Library’s makerspace and exhibit spaces; there are community members who use the Library’s physical meeting rooms; and on, and on. This is not to say that this doesn’t happen at historical societies and universities, but I do think this happens on a larger scale at public libraries.

In order to focus the Library’s collection, we really had to think about who might use it and why. Researchers seemed to be the obvious answer — much of archive is a sum of sites that reflect digital versions of material previously in print and local reference collections. But what about more traditional library patrons: would they use it? And employees and residents of the Library’s makerspace: could they incorporate it into their lessons and workshops?

Our collection caters heavily to researchers, and that’s fine, and we’ll see if it works. We could have, just as easily, directed the focus elsewhere. But thinking about this question allowed us to narrow our focus — if a site we wanted to collect had little to no (perceived) research value, was it worth collecting? (And I know, I know: who knows what we’ll end up using for research. But, like I said earlier: perfection doesn’t exist and, honestly, we’re guessing.)


2. How will people access the archive?

When I brought this question up, it threw my supervisors for a loop, but: some web collections — like at the Library of Congress — contain material that is restricted, and can be accessed only on site. Some don’t. And that changes things a bit, I think, because it adds another level of thought: if, say, the Library wanted to include restricted sites, how would we make that work? Who on staff would monitor use of the archive? Who would patrons call to set up time with the archive? Do we have the equipment (a spare computer, let’s say) to create this kind of setup?

You see what I mean. It’s not a huge question, but it’s something to consider.


3. How will your organization promote the archive?

Don’t do a ton of work establishing a web collection and then let it go unused. Collections of this nature are not (yet) intuitive. Think of how often you tell someone you’re an archivist and they say, “What’s that?” and you jump right into your elevator pitch. Now think about how you’ll answer when you talk about web archiving and someone says, “Wait, what?”

People don’t get it. And that’s okay. But there’s a huge literacy gap between We’re creating a web archive and But stuff on the Internet lasts forever! So, promote the archive. Don’t bury the link to it on your institution’s webpage. Make fliers. Hell, offer workshops and lectures on web archives. (I’ve done the latter twice as part of a larger digital preservation workshop. In both sessions, I’ve instructed users how to download their data from Facebook / Twitter / Gmail and done a walkthrough of the Internet Archive’s Wayback Machine. Remember: just because everyone in your circle of coworkers and friends understands the need to archive the web doesn’t mean your users do.)

This is about getting the word out. Do it. Get people to use the collection you’ve created.


4. Has this work already been done?

This is one of my favorite questions. Madison Public Library is within walking distance of the Wisconsin Historical Society, the University of Wisconsin-Madison Archives, the Wisconsin Veterans Museum, the Madison Museum of Contemporary Art, the Chazen Art Museum, the Wisconsin Historical Society Museum, and on. And I know for certain that two of these places — the University of Wisconsin and the Wisconsin Historical Society — actively practice web archiving.

If the Wisconsin Historical Society is already crawling the blog that belongs to the Governor of Wisconsin (and their collection is publicly accessible and discoverable), why should the Library do it, too? And if the University is already crawling everything associated with wisc.edu, there’s no need for the Library to grab that domain, too.

This is both an ethical and practical thing for me. Practical because it’s going to save you money and time and needless maintenance. These collections are available online. And if a user of the Library’s web archive walks in and says, Hey, do you have the wisc.edu archive? it’s remarkably easy to say, No, actually, but here’s the webpage for that collection.

Ethically: the Society of American Archivists has a code of ethics. In that code of ethics, there is this statement: “[Archivists] collaborate with external partners for the benefit of users and public needs.” And this one: “Archivists cooperate and collaborate with other archivists, and respect them and their institutions’ missions and collecting policies.

We’re all so much better if we collaborate.


5. Can your organization afford the archive?

Listen: web archives cost money, whether you sign a contract with an external provider, or do it on your own. Sit down and create a realistic budget for your web archive and make sure that you can fund it not just this year, but next year, and the year after, and even the year after. A web archive deserves to be sustained. And that requires money.


6. Can your organization sustain the archive?

Listen again: web archives require labor. Labor done by a human. Labor done by a human who works as an employee at your institution. Labor done by a human who works as an employee at your institution on a frequent basis.

You cannot set and forget a web archive. It’s tempting, I know, because who has room for one more thing on their plate, or in their job description? But web archives require maintenance, and someone to monitor crawls, and someone to continually evaluate the sites that you are — and are not — collecting, and someone to promote the collection, and someone to teach patrons to use the collection. Just as physical collections require similar maintenance.

It doesn’t have to be a forty-hour a week commitment, but it should be, you know, more than a zero-hour a week commitment. Find a balance, and adapt. But do it. A good web archive is a cared for web archive.


7. How will you assure that the web archive remains inclusive?

This is the most important question. And it probably deserves a post of its own, but:

I was the sole curator — what a terrible word — of the Library’s collection. I am also white, well-off financially, college-educated, and on. My perspective — as hard as I try otherwise — is reflected in what I choose to collect and leave out. And for a collection that needs to serve the entire Madison community, this is not okay. So, recognize this. And then do something about it.

What I did? Went outside of the Library.

I did speak with Library employees. And the Library’s board. And the Friends of the Library. And then I kept going. I spoke with teachers and faculty at the University, and I spoke with classmates of mine, and students I had never met. I spoke with community leaders, program managers, business professionals, and workers. I asked other archivists, at other public libraries, for their opinions. And I’m still asking people these questions: what represents Madison to you? Where do you work? What do you do for fun in Madison? What are you involved in within the Madison community? Where do you get your news? What neighborhood do you live in? Where do you post your creative work, and your words?  What sites did you consult before deciding where to live? And on. These aren’t difficult questions, but they — I hope — get at the heart of what makes the community.

I also consulted a lot of other resources: online neighborhood lists, city-sponsored community event calendars, newspapers and alt-weekly newspapers, fliers taped up around the city, radio ads, protest signs, advertisements on television, etc. If I heard or read something in Madison that led me to a website, I looked at the website.

It’s not about having the right answers, but about seeking them. And so often — when you assemble any type of collection, when you embark upon any project — these answers come from outside of yourself. And that’s good, and that’s important.

Do not shirk this step. And, even when you think you’re finished with it, keep going. This one is constant and never stops.

Let your collection evolve and grow, and do it by asking for help.





2 thoughts on “On establishing a web archiving program

  1. Dear I want to conduct a study for my PhD thesis. The intended topic is DDPN: a need and feasibility analysis for university libraries of Pakistan. Can you please do me some help in this regard?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s