DH09 Wednesday, session 1: managing information

in DigiLib BLog, Preservation
June 24th, 2009

First up, Melissa Terras of University College London. “Digital Curiosities: Resource Creation Via Amateur Digitization”

Melissa has spent a lot of time studying images, and in most cases was studying images in/from institutions. But what about collections (of all sorts, not just images) created by people who aren’t affiliated with institutions? They’re actually quite interesting, and Melissa studied them using the following methods:

-Digitization literature survey (almost nothing written about personal digital collections!)
-Review of 100 stand-alone, self-identified “museums” on the internet. (Many people use flickr as a platform for presenting their personal museums of barbed wire and owls and whatnot.)
-Reviewed a bunch of groups and pools on flickr (which at this point has 3.6 billion pictures)
-Will present an overview of memory institutions interacting with user created content
-Interviewed 10 creators of online museum content [in their spare time]
-Surveyed contributors to flickr pools
-Surveyed academics: are they paying attention to this? [sort of, but they don’t necessarily admit to it formally: see below]

The internet is full of people with way too much stuff on their hands. Individually produced “museums” are produced by people ranging from nutters to people who know all about metadata and produce great collections that are also well documented.

We have a habit of discounting amateurs, but many museums were *started* by amateurs, grew up from cabinets of curiosities. Foundation of Astronomical research, research on British flora, ornithology, languages, weather observations, field archaeology, field sciences, genealogy/family history etc are done by amateurs — that word doesn’t mean you’re rubbish at what you do, but rather that you don’t get paid for what you do. [N.B.: genealogy is not so much about objects as about information, so Melissa is not focusing on that today.]

Self-defined museums, themes emerge: they want to make stuff public; recognition through networked community, shared values. Topics emerge: ephemera (advertising, packaging, nostalgia), comics, technology, personal and “embarrassing” collections, genealogy. Scope emerges: collections are self-delineated, time frame of collections is influenced by the age of the author (people are interested disproportionately in stuff that has been produced within their lifetimes). Finally, the usual platforms for such things are static HTML, blogs and Flickr. Particularly on Flickr, there’s a pretty good amount of basic metadata, which is actually augmented by other users.

Creators are self-motivated, enthusiastic, dedicated. They are diverse in interests and practices. They aren’t so much interested in standards, but do create intuitive methadata. Some are not amused by the patronizing term “amateur”, but are enthusiasts.

Users are aware of site visit stats. They’re good at finding stats and publishing them. Reserachers contact them because they’re visible. They interact with their user communities via email, comments, blog updates — they’re promoting stuff actively. They are aware that people are interested in their stuff because researchers contact them to ask for permission to use images for illustrations, research purposes, etc.

Memory institutions (like Smithsonian) are starting to become aware of, and participate in, the same venues. Smithsonian is on Flickr Commons. The Tate has Flickr groups. The Victoria and Albert museum has a Flickr gropu as well. The Oxford U Great War Archive has its own digital archive, but also goes out to Flickr and reaching out to people and trying to get them to contribute their own artifacts to the archive.

Flickr is the Grand Central Station of information.

Technology advances change amateur research radically.

Melissa is interested in studying the psychology of collecting — what’s an archive vs. a collection? She plans to engage in further dialogue with creators and users.

We need to start preserving these things — flickr phto


Museums and memory institutions are not the only hosts of worthwhile digital objects. Ephemera and pop culture are better served by online pro-am community, which is better at using networking tools. Interactive sites have better stats. (Better evidence of use?)

Memory institutions can learn from this. We need to start interacting with our user community. It’s not ok to scan and dump. We need to do outreach using flickr, facebook etc. Heck, it’s free to use such resources, and they already have usable platforms! So we don’t have to create them.

Academics use these resources in very creative ways! For example, a researcher studying a partic. sculpture will look for the 500 photos of it on Flickr that show it from all kinds of possible angles. Interestingly, researchers don’t acknowledge (or admit to) these activities in their papers.

Peter Williams [presenting], Ian Rowlands, Jeremy John. “Digital Lives: how people create, manipulate and store their personal digital archives.”

(Bunch of 3.5″ floppies on his opening slide, all sepia and graduated in colors. Ware art.)

We need to respond to the transition from paper-based personal collections to increasingly digital memories [of famous people, too: Harold Pinter has a huge email collection]. We need to better understand how people actually manage their own digital collections.

Enter Personal Information Management (PIM), which draws across several disciplines: comp science, info science, human-computer interaction.

Stuff like “what kinds of file names do you use? how do you back up?” is boring. So they conducted in-depth ‘narrative’ interviews, and people enjoyed talking about their digital lives. They also did an online survey, about which later.

There were 25 interviewees, both established (an architect, authors, a playwright, a web designer, a molecular biologist, a geophysicist, etc.) and emerging (a digital artist, a theatre director, etc.) [vz: uh. Looks like “established” == “academic”, and “emerging” == *pat on the head* “amateur”. They couldn’t possibly mean that, right? Dooce, for example, is not at all emerging.]

Document manipulation: you can create information actively (write a Word document) or passively (the metadata in a Word document); both are very useful to people studying your work.

Once you created the information, are you going to keep it active, modify it? (Frex, is it a book?) If yes, there’s one life cycle that ends either in discarding (into trash) or storage. If no modifying, you can again trash it or store it. At every stage there are decisions to be made.

Individuals’ personal policies reflected practical considerations (will I need this sometime in the future?); whether it’s a record of professional/academic value; whether they wanted to create and augment and maintain portfolios of their lives’ work; keeping things as contextual and/or emotional reminders.

People have romantic notions of media! Floppy disks with correspondence with an ex-girlfriend on them. (Hello, romancing the floppy).

Email is a huge part of most people’s digital collections. A typical example of a single user: university accounts; gmail/yahoo (Flickr) account; mails via facebook; LinkedIn mails… a large number of ways to communicate and be visible. So what do people do with those accounts? They use them as file storage systems; as appointments diaries; to send memos to themselves; to forward messages to themselves; to keep records of their work and contact; to use different accounts for different purposes (shopping, social, work etc.). People also maintain “defunct” accounts by “dummy mailing” (so you periodically send messages to yourself just to keep accounts that would otherwise expire).

Some problems with email: multiple uses of multiple accounts are confusing; email doesn’t tend to get backed up; email exists often only server-side; mails are infrequently culled; subject headings are inappropriate to wandering topics.

Conducted two online surveys, one of academic researchers and one of “general public” (slight overlap between the two). They were asked about how versed they are in IT, whether they’ve had catastrophic data loss; what strategies they have for organizing and preserving digital collection; whether they have made arrangements in case of death or disability; and why they archive in the first place.

People early-middle aged, pretty gender-balanced, with generally good IT skills, relatively low experience of major data loss had a relatively high data security dimension to their archives. They back up monthly, are good organizers.

Mature users, predominantly male, excellent IT skills, high incidence of Mac users, back up religiously (often daily), are exceptional organizers.

Younger users: more women than men, backup rarely if ever, are poorly organized, low experience of major data loss, report difficulty locating files, pay little attention to file naming, rarely use desktop search, little use of email for PIM.

Peter draws a connection between how well organized you are with your life on the computer, and incidence of Alzheimer’s, but I didn’t catch what this connection is.

One free-response question got this answer: “I don’t consider myself to have a ‘digital life’ any more than I consider myself to have a ‘washing machine life’… The computer is just a tool that I use but don’t think about very much.”

[vz: Interesting talk, but their sample is not very good. The difference between “established” and “emerging” interviewees is sketchy, and they don’t seem to have paid much attention to people who are professional internet presences. I’m thinking of Heather and John Armstrong, for example. Would be interesting to see a similar survey conducted, involving more than 25 poorly selected people.]

Neil Audenaert, Texas A&M, “BiblioMS: Project-Scale Bibliography Management.”

This is an experiment looking at project-scale bibliographies. Neil’s objective was to find out editorial needs, define commonalities among bibliographies, and design and implement a bibliography management system.

Does the world really need another bibliography manager? No, there already is one. But we can look at existing tools and group them into personal (BibTeX, EndNote, Zotero) and crowd-sourced (CiteULike, Connotea). [vz: though Zotero is quickly becoming crowd-sourced!]

Neil’s project scale: he’s interested in not so much the personal side of organizing and finding biblio records, but what we in DH do about organizing bibliographies for our projects. Our bibliographic projects are notably public; bibliographies are intended to be useful not to the larger project’s creators but general public. Our projects are also scholarly in nature (Cervantes Project, Shakespeare Bibliography), bibliographies intended to map a field or domain. Bibliographies’ focus is discursive (bibliographies are often annotated); they’re edited in a collaborative fashion; and they’re integrated into a larger project.

Examples of bibliographies Neil has studied: comprehensive (Cervantes Int’l Bibliography Online); documentary (Nautical Archaeology Digital Library); and Special Collections (Digital Donne). They support DH projects and make scholarly contributions (and thereby involve editorial selection and organization). Attribution and authority are very important: who is creating this bibliography and making all the decisions?

Common issues in the bibliographies Neil has studied: genres, organization, editing and access. Collections have special needs (unique aspects), and need to provide methods for effective communication (filtering by user). Needs change. Sometimes different collections are related, and that’s important to synchronize. Need tools for search and browsing, for discovery of related entries, for the creation of taxonomies and controlled vocabularies. Bibliographies also need to accommodate multiple perspectives.

Editorial teams tend to have multiple levels of authority and utilize revision and versioning. Management issues arise out of all of this.

Bibliographies need to be accessible. These are components of a broader DH projects, and need to not be stuck in a corner but accessible from throughout the project site. Bibliography tools need to be integrated into the editing interface for the “main” part of the project.

BiblioMS is a tool Neil and colleagues created. Its primary priorities are user-defined genres, collaborative management and integrated access throughout a project. Secondarily, searching, relationships and multi-faceted organization. [Secondary not because they aren’t important but because of time constraints.]

[vz: Neil talks about the architecture and gives a demo, but there doesn’t seem to be a public site for it, nor a demo online. If you know of web resources on BiblioMS, please comment.]

Tagged , , , ,

One Comment on DH09 Wednesday, session 1: managing information

  • […] of my PI’s Dr. Melissa Terras was also present at the conference. To her big surprise the DigiLib blog blogged about her talk while she was doing […]

Post Your Comment