Tagged: institutional policies
Some more notes from SPARC Digital Repositories Meeting 2008. This post covers the New Horizons session. Here’s a description of the session from the program:
Early discussion of campus-based digital repositories focused on pre-print and post-print versions of faculty research papers. Many institutions have discovered strong community interest in disseminating other types of content as well – including audio, video, image research outputs, multimedia projects, and ancillary evidence such as datasets, etc. that might be created in the course of research and class work. This interest has been strengthened by requirements set by many federal agencies that data-sharing plans accompany grant applications. The New Horizons panel will explore the transformative potential of data-intensive scholarship as well as explore solutions for the depositing dilemma that redefine the repository within the library’s “story” and scope of services.
Session format was: three presentations, then a Q&A/discussion session, and all the while the internet’s a-Twitter.
Moderator’s opening note: university communication involves many different languages—humanities languages, sciences languages, administrator languages, faculty languages, etc etc. We need to speak all these languages as we build institutional repositories (IRs) and talk about sharing.
1. Sayeed Choudhury, Associate Dean for Library Digital Programs, The Johns Hopkins University
“A Data-centric View of the Academic Universe”
-”historical infrastructures…become ubiquitous, accessible, reliable, and transparent as they mature.” – CREW, Understanding Infrastructure
-”…they will do what we expect them to do and not do what we expect them not to do…” – Amy Friedlander, JEP Triple Helix
- Communities come together and build systems to solve their own problems. It’s really when these community systems come together that infrastructure emerges. (For example, regional vs. national railroads.)
Data are fuzzy and answers are approximate. This is a belief held both by scientists and by humanists! We don’t talk much about that similarity, which is (I imagine -vz) useful for interdisciplinary communication.
IRs are, or should be, nodes in a network, not networks unto themselves. (See also the previous post. This was a running theme of the conference.)
Remember the gopher protocol, way back when? File transfer and access, pre-web. The crucially good thing about openness of gopher: it made it easy to move its contents onto the web. So even if gopher isn’t used much anymore, its content isn’t lost but ported.
- Data are fundamentally different from collections.
- The scale and complexity of data mean that machines become necessary, with communities meeting higher-level knowledge-organization needs and machines meeting lower-level ones
- IRs are [only] the beginning of our journey
- Time to get off the e-horse!! (Sayeed is tired of e-publishing, e-formats, e-everything.) Let’s talk instead about the requirements of… faculty, admininstrators, everyone—and address those requirements.
A final quote: “The future is here. It’s just not widely distributed yet.” -William Gibson
2. Shawn Martin, Scholarly Communication Librarian, University of Pennsylvania
“Institutional Repository Personality Disorder: How Do We Cure It?”
Definition of personality disorder: “an enduring pattern of inner experience and behavior that deviates markedly from the expectations of the culture of the individual who exhibits it.” Talking to faculty about IRs is much like talking to them about cats. (The reaction is likely to be, “…huh?”)
Traditional arguments in favor of IRs: (1) archive forever! (2) marketing tools for departments to showcase research! (3) reclaiming intellectual property of the institution! And that’s all good and true. Here’s what they have at Penn: over 4000 documents, 73 collections, 33000 downloads per month. Sounds successful up to now! So what now?
Now they provide services. These services include, stealthily, conversations with faculty. Some things they tell the Penn faculty: downloads are important; direct communication based on people finding your stuff is great; your google rankings are important and directly influenced by having your stuff in an IR; a centralized place is easier to use than a department website. Most importantly: open access does not mean it isn’t peer reviewed.
All of the above are necessary to talk to about with faculty, because they dispel some common myths about IRs. Hence, the talk about curing the IR personality disorder. It is necessary to reframe our arguments, Shawn says. Faculty generally don’t see benefits of open access to them, but they do see the opportunity of giving themselves higher profiles inside and outside Penn. They also do see the benefits of electronic publication and of including “non-traditional” materials in repository (lecture series, proceedings, etc).
What will a future IR look like? Well, we could look at IRs as the backbone of a new scholarly communication system. Backbones, however, aren’t necessarily what is most compelling to faculty. (A similar attitude for comparison: I don’t much care about electric grids, I just want the light to go on when I flip the switch.) Penn is seeing increased interest in SelectedWorks (a front-end, user-friendly tool for its IR), e-publishing possibilities, and “front-end” services. Though IRs may be an essential component, they’re not the selling point.
So, what is the selling point? These services offered by Penn:
- Getting your scholarship into Google;
- Creating your own website;
- Creating pretty online journals;
- Clearing copyright permissions;
- Uploading articles for you.
All of Penn’s services are “fringe” from most librarians’ perspectives, but to faculty they’re incredibly important. So we need to rethink how we sell IRs to faculty. Penn is trying to turn this framework around and make these services the “core” from a faculty perspective. They are:
- not advocating for either open or closed access;
- assessing scholarly needs and providing options;
- taking advantage of the greater dissemination allowed by open access;
- but conceding that closed access may provide prestige or tenure.
So at Penn they provide both closed and open access (closed-access journals get links, abstracts and other information in SelectedWorks). They are creating virtual collections of their faculty’s work and pushing it out onto the web. They also work with publishers to promote their university’s work that may be appearing in their journals. The repository folks do all this, not faculty.
3. Jennifer Campbell-Meier, Doctoral Student, University of Hawaii
“Storytelling and Institutional Repositories”
Jennifer has performed a comparative case study analysis of IR development at six institutions in the US and Canada. Many participants stated that they didn’t know how to respond when faculty members ask why they should submit materials to the IR. So Jennifer started thinking about storytelling. Stories can be springboards: they can act as visualization tools; contextualize change, promote understanding.
Oddly, googling stories + libraries, we get many results—storytime, storytelling to dogs and so on—but not how to use storytelling in academic contexts. So Jennifer noted and recorded some opportunities for storytelling, and specific conversational triggers for them. Below are some examples.
Trigger: scholarly publishing. Story: the internet and scholarly publishing—IR as a tool for scholarly communication. Share stories with faculty about open access, etc.
Trigger: tenure. Story: IR benefits for faculty. Share stories with faculty and/or grad students about IR benefits to encourage use.
Trigger: grants. Story: faculty/library collaborations. Share stories about IR as a home for grant projects, a platform for research, an opportunity for collaboration.
Trigger: legislature. Story: showcasing what a college or university does. Share stories with administrators about the IR as a showcase for the scholarly output of the institution.
What hasn’t John Wilbanks done? Besides his current job running Science Commons at Creative Commons, Wilbanks is a research fellow at MIT and has worked at Harvard’s Berkman Center, as a legislative aide to a U.S. Representative, and in various capacities in the open access movement. His blog is Common Knowledge, part of ScienceBlogs.
At SPARC 2008, Wilbanks gave the opening keynote, and I couldn’t think of a better way to kick off a conference—thought-provoking, full of information and yet not so much that it bogs you down—just enough to get a lively conversation flowing. Below are some of my slightly episodic notes from the keynote. If I paraphrase (or quote) incorrectly, please point this out and I’ll be glad to change accordingly. Most of the below is either straight quotation (insofar as I could type fast enough while listening to him speak) or close paraphrase. My own inserted thoughts are italicized.
Keynote, John Wilbanks, Creative Commons and MIT.
Why is there a disconnect between planning to share and the actual sharing? Why aren’t individual repositories starting to federate into a network? (This kicked off a running theme of “an interoperable network of repositories is what we should be striving for; individual repositories themselves are a stepping stone toward that goal.”)
Disruptive services can’t be planned in advance; planned innovation tends to be incremental and slow… and not innovative. Disruptive processes on the network come from people hacking, not those planning to hack. (Related: process change comes more slowly than information product change.)
He seemed to say, it’s nice and all to plan repositories, but there’s something to be said for jumping in at the deep end. This was appropriate, I think, in the context of the conference: it was later counterbalanced by specific case studies. The implication seemed to be that, by the end of SPARC 2008, we all knew enough about what to do and what not to do to make some overall structural decisions and begin implementing.
Stable systems are resistant to change on multiple levels, with multiple fail-safes (redundancy). Pre-existing systems that have worked have blocks in place to prevent process disruption. Copyright locks the container of the facts in a scholarly work, even more so in a digital environment than on paper (digital environment more controllable). For example, many publishing contracts make it illegal to add hyperlinks to/from a given work—and this is technologically enforceable, as long as the work is hosted on a controllable server.
Copyright is being asserted on databases! But they’re often not creative works (for example, raw scientific data), and thus not subject to copyright. Nevertheless, copyright is asserted.
But data is integrated anyway, and we won’t be escaping from that. (Nor do/should we want to. Interoperability means dissemination of new knowledge means more new knowledge, sometimes in forms specifically enabled by the wide dissemination—think mashups.)
What do ideas addressed by Creative Commons (CC) mean in a world of integrated data?
There’s a tension between the demands of adding content and providing services. As an example, Wilbanks shows a Caveat Lector post in which Dorothea Salo describes changing a link in DSpace, which takes her an hour. And she’s no novice.
Reports from the front lines: building a commons is really, really hard. It takes dedicated, passionate people with strong points of view, who are willing to compromise on those points of view on a regular basis.
There are currently >1000 journals worldwide under a CC license. Individuals may use Scholars Copyright Integration, a single line of HTML code provided by CC, to add a standard copyright addendum to online work. But for privacy reasons, CC can’t keep data re: who uses it.
CC/Science Commons (SC) have been working not only with rights clearance (the easy part of copyright!) but also with database integration (databases integrated with each other, and into digital repositories). THAT’s the hard part. (Again, it’s all about interoperability, and the hard part seems, according to Wilbanks, be worth investing a lot of effort in.) To this end, SC has written guidelines for writing db licenses.
There’s a real danger in using the law to achieve integrity, and citation, and playing fair. It’s more about norms. (That’s part of why the DMCA failed.)
The paper, or stand-alone database, as a container for information, is a bad metaphor. We are building a web for data—the “semantic web” (a better metaphor). Links help computers understand relationships between items (coffee –> coffee pot), but not between concepts (drinking coffee –> feeling awake). This is where semantic web tools come in.
Major complaint about the semantic web: too much front-loaded work. But maybe we’re too hung up on the labels—web 2.0, science 3.0—what about making Google work better, for instance? Think about it:
-Google finds stuff based on inbound links, and assigns relevance based on that.
-SC working on open source data integration yields a repository of ontologies, namespaces, and integrated databases. The goal of such data integration: e pluribus unum.
-We can transform complex queries into links! (Hello, SQL.) The links are ugly on the back, but the front end can be concise and pretty. And as long as our data is interoperable, we can affect Google’s search result in real and useful ways.
Two possible futures lie before us. Which will we choose: a network of repositories, or a bunch of islands? Push this further: what questions can only a network of populated repositories answer?
Hope: depositing data into an IR is not something a faculty committee mandates, but something [the benefits of which] the faculty member who shares gets. Mandates are great, payments [to authors, for depositing, when you can afford this] are better, but letting people who want to share outcompete people who don’t want to share is the best.
-Don’t wait. A lot of stuff needs to happen before these dreams become reality.
-Open access and IRs aren’t free as in speech, nor free as in beer, but free as in a puppy: I can give you a “free” pure-bred puppy, but you’ll be spending lots of money on that puppy for the next 15 years. (This conference’s attendance is encouraging evidence of key people being willing to invest in IRs.)
During the Q&A, someone asked: how do you talk to the faculty about the semantic web? Wilbanks said, you don’t. You talk to the people who care about the semantic web. To the faculty, you say “we’re providing a service that makes your materials more findable and more usable. All you have to do is provide us with materials and a hint or two about what they mean.”
In October of this year the UK-based Joint Information Systems Committee (JISC) released a final report resulting form a six-month digital preservation policy study they’d conducted earlier in the year. The report is available here, and appendices are here (both links lead to PDF files).
Although the study was performed in the UK and the report is chiefly aimed at UK audiences, the JISC investigators drew on an international set of data, and their findings will certainly be useful to large institutions outside the Isles.
For all that the report is long and detailed, the authors’ definition of digital preservation is impressively concise: “In contrast to printed materials, digital information will not survive and remain accessible by accident: it requires ongoing active management. [...] Digital preservation is the process of active management by which we ensure that a digital object will be accessible in the future.” (10)
I list some of their recommendations below, as thinking points that jumped out at me. (My present context: recent return from the SPARC Digital Repositories Meeting 2008 held in Baltimore last week, about which soon.) Much more information is available in the report itself. This is what the JISC investigators see as best practices for thinking through a digital preservation policy (DPP below) on an institutional level:
- Have a principles statement, and tie it in to the university’s stated overall aims
- Highlight connections between the DPP and other policies, practices, objectives that may be in place at the same institution; highlight also connections between the DPP and similar policies at other institutions
- Clearly state preservation objectives (archival requirements, long-term research prospects) and an intent to “deliver a reliable and authentic version to [the] user community” (19)
- Speak not only to preservation itself but also to user experience
- State explicitly which relevant governmental statutes the policy will adhere to (Freedom of Information Act, for example)
- Specify what kinds of materials will be preserved (can be presented in different groupings, for example organized by how complex preservation is for given objects, by formats, by priority)
- Specify transparency and accountability as goals, and provide venues for external entities to check on that
- Outline an implementation plan. This is possibly the most difficult step in the process, but crucial.
- The policy should be version controlled.
The report also addresses important topics like intellectual property, financial and staff responsibility, distributed services, standards compliance, auditing and risk assessment – the list goes on. Though the report is sixty pages long, it is an excellent source of information and springboard for a detailed approach to creating – and most importantly implementing – a digital preservation policy.