DH09 Thursday, session 1: libraries!
The first paper was mine; naturally, I’m not going to blog it. But I’ll post a link to a PDF version of my talk here, and will Tweet it too. Stay tuned.
William Kretzschmar and William Porter, presented by WK, “Library Collaboration with Large Digital Humanities Projects.”
“If you stand still, you die” is nowadays a proverb in American business. The phrase has become popular also in computer role-playing games, where it’s literally true. Turns out, it’s also relevant to DH projects. They might be completed, but they need continued support — for sustainability, which is a huge issue and has been discussed at DH conferences and in DHQ. This paper continues the theme of sustainability. Proposes that collaboration with the library is the only realistic option for long-term sustainability in university settings.
Stand-still-and-die problem, for us, is twofold: 1. Digital environment keeps changing, and shows no sign of stopping. 2. We need to have continuous access to new financial and human resources just to keep up with the changes in media and operating environments.
Here’s an example of a big institutional project: Linguistic Atlas Project. Started in 1929, has been at U of Georgia with Bill since the 1980s. Each of the stages of gathering and digitizing/converting data has required funding. The Atlas is lucky to have a small endowment, which provides money for student developers to upkeep the computers involved. Another asset for the Atlas project is Bill’s status as a faculty member. The project’s financial needs are increasingly large with time, and he’s spent a lot of time fundraising.
Five years ago: another option emerged. Georgia created Research Computing Service. High-performance processing, with some storage and web services. Intended as an institutional resource, but the funding structure changed and it became a fee-based service for research with annual external funding. So the Atlas needed to find external funding, or else rely on indulgence from colleagues in better-funded fields. The univ. has declined to fill in the financial gaps between the Atlas’ needs and its funds, so the Atlas re-acquired a server and continued on its own.
In comes Bill Potter, the University Librarian, and says the library is expanding multimedia collections and needs a TON of storage space, so perhaps the Atlas’ 20Tb of audio files (and even more image files) could be hosted by the library. But who’ll pay for it, and who’ll do the work?
They’ve created an archive. Storage: LTO-4 computer tape instead of spinning disk. Refresh cycle by Library staff allows for updated tech. Atlas resources create original tapes, and distribute copies as requested. Atlas grant funding can provide equipment in the Library and in its research office.
Atlas will generate & store research & operational metadata, including OLAC as a public resource. The library will create a database for the Atlas staff to populate. The primary burden for creation and maintenance of metadata remains with the Atlas project, not with the Library.
As for the Atlas’ web presence on the Library’s servers: the website is pretty complex, heavily used. So: one problem with any interactive site is the need to work on the scripting required for interaction. That’s different from regular site maintenance. So they’ve said two things: there’s no expectation of programming maintenance by Library IT staff; and the new site’s maintenance will be distributed: basic web access provided by the Library; highly scripted interactive functions (like GIS for mapping) — by the project, which will extend the life of existing tools and look for new ones. Content distinguished from tools. Library archives content but the Project maintains the website and tools.
Security: the Atlas will use VM to separate the Atlas site from the other Library web services. Its operating invironment will be integrated with what the library is already using. That means no Flash server, no Cold Fusion, but yes specialized software and scripts.
Themes in collaboration have emerged:
-Integration is central: alignment of digital work of the Library and Atlas.
-Content and tools are different: information should last forever, but tools can only be temporary
-Resources: the collaborators have to respect the different resources of the Library and the Atlas
The answers to these questions lead the authors toward an institutional repository.
Hamed Alhoori et al., presented by Richard Furuta, “Supporting the Creation of Academic Bibliographies by Communities through Social Collaboration.”
[Before he starts on his paper, some remarks on previous papers in the session:] There exist a couple of different levels to collaboration. Authors are based in CS dept, and have been working w/people in other fields for ~15yrs now. Their belief: an effective collaboration involves three partners: someone who is interested in computing, someone who researches a particular area, and the library. N.B.: CS is no richer than anyone else, are willing to share resources, and also appreciate when others share resources with them.
When you say computing, you don’t necessarily mean CS. CS departments have many people who are hard to get along with, just like humanities depts. There’s a view of computing that’s monolithic, but needs to be broken down: need people willing to find a common language with collaborators.
The authors’ work says that when we start talking about social collab/mechanisms, we don’t fully understand what’s going on. We can build a site by throwing in interesting elements found on other sites, but understanding why those elements are effective is a different thing.
Authors are working on a testbed (emphasized as a testbed, not yet a product but a thought piece) for scholarly bibliographies. The two students involved are both CS PhD students, interested in the tech and educational aspects of Web 2.0.
Motivation for this came out of thinking about the central nature of scholarly bibliographies to broad dissemination of academic results. Problem: too many sources to track (2.5m articles published yearly in 25k peer-reviewed journals). Also, papers not available digitally are becoming invisible.
Traditional models of bibliography editing centralize evaluation and consistent quality. High price to pay: delay of months or years. Authors’ key questions: can users who benefit from bibliographies also contribute to it? And if yes, can quality be retained?
Crowdsourcing naturally comes to mind.
Research premises: yes, can use social collab. can support and reduce costs of creating a scholarly bibliography while ensuring its accuracy. It can also be used to create new features (tools?) for the bibliography.
Existing bibliographies (Cervantes Project, World Shakespeare Bibliog., Galileo Project, Whitman Archive): tend to be multilingual; may or may not be annotatable; tend to not allow saved searches; and don’t, as a rule, allow for social collaboration.
Authors put together an experimental interface using the existing Cervantes bibliography, adding personalization features: can create personal pages; blog features; identification of related items; export and import; make connections between items.
Added multilanguage capability (on-the-fly translation) for interface elements (automatic) and for content (automatic/manual) using a Google API. Acceptable, but not great. There’s also a moderation process: editor can decide what to do with an individual entry. Can approve, disapprove, delete or (to some degree) modify. Not as simple as might appear: once you approve something, what happens when someone modifies it? Sticky issues.
Looked then at other systems that allow social citations: CiteULike, Connotea, social bookmarking, BibSonomy. Several problems: citations tend to be redundant; there’s spam; there are phantom author names and phantom citations. All these are not a good sign of scholarly research, and would affect the significance (impact factor) of a journal or other publication.
Thinking about having a combination of an amateur bibliography created by non-professionals, and one that meets the requirements of the professionals.
Social citations sites tend to have different types of groups: private (nobody knows about it); closed (special need to approve a member, which tends to create bottlenecks), and open (which tends to require a lot of editing/moderating).
Overall sketch of the authors’ approval: there’s a graduated way of getting into a community. Users can read and post anything but their citations won’t be included until they’re approved; Collaborators are allowed to approve; and finally, Moderators can edit. In deciding who is who, they’re looking not so much for pre-existing fame but for accuracy and relevance of contributions, and for people who remain active in the process. Hoping to evaluate these qualities automatically.
Some drawbacks on relying on manually selected moderators: time consuming; people lose interest/become inactive; and, in interdisciplinary bibliographies, it can be hard to decide whether a citation is spam.
This is reputation-based social moderation. Approval of the contributions can happen by a moderator of by some determined number of collaborators.
How to move from one level to another: how many citations have they put in, and how often do they do this? how much have they tagged, rated, reviewed, translated or filtered?
Initial evaluation of this project: Cervantes Project worked with the system and both manually. The developed system behaved just about as well as manually developed bibliographies. So yes, you can start to build social systems that would behave about as well as manual ones. The catch is, you need to evaluate these systems, and do a lot of study.
Project has got potential. More extensive evaluation needed. There’s a need for new assessment metrics: do you compensate a user for finding bad entries? how would demoting people work, if there’s already promoting, and what effect will that have on the community? Also need to identify hidden spam to get statistics to automate the process of filtering and adapting existing work.