Tagged: digital repositories
Here (PDF) is the document that describes BU’s current thinking on open access, and what we’re doing about it. Linked here is the version approved by the Faculty Council last September. We welcome and invite constructive feedback on this work in progress.
(Please note: I personally am involved in this project, but am acting more or less as the messenger. I will be unlikely to answer specific questions having to do with policy; other people monitoring this post, however, may be more useful. We’ll be discussing whatever comments we receive amongst ourselves, so if there is no overt reply, please be assured that your input is not only valuable but actively being included in the process.)
We’re go! University Council Approves Open Access Plan, BU Today, 17 Feb 2009:
Boston University took a giant step towards greater access to academic scholarship and research on February 11, when the University Council voted to support an open access system that would make scholarly work of the faculty and staff available online to anyone, for free, as long as the authors are credited and the scholarship is not used for profit. […]
“This vote sends a very strong message of support for open and free exchange of scholarly work,” says [University Librarian Robert] Hudson. “Open access means that the results of research and scholarship can be made open and freely accessible to anyone. It really has increased the potential to showcase the research and scholarship of the University in ways that have not been evident to people.”
Of course, we’ll need to be implementing this, which is no trivial matter—just ask Dorothea Salo and the many, many other institutional repository managers out there. And BU is very aware of this:
“Open access will really highlight the tremendous productivity of our faculty,” says [MED professor of medicine Barbara] Millen. “Among the more important things needed to make it work is a collaboration between the libraries and our faculty to get their research onto the Web. It’s not an inconsequential task.”
Yep, they sure know it’s going to take a large amount of resources—and it looks like the university is willing to put in the effort to do this right. It’s a fantastic thing to be part of.
Any repository folk who happen to read this, please share your wisdom and the appropriate warnings. It’ll be a long (exciting!) haul.
Created by Jesse Dylan, this brief video provides an introduction to Creative Commons licensing. For more information, go to:
Jesse Dylan’s short video about Science Commons provides a good rationale for archiving scientific data in a location that makes it available to other scientists. It would be interesting to make similar arguments for other disciplines. Humanities? Social Sciences?
The program said this about the last three-paper session of the conference: “Digital repositories are part of a set of emerging publishing functions on many campuses. Campus publishing activities are becoming increasingly collaborative as libraries partner with departments, campus I.T., university presses, other campuses, and third-party organizations such as scholarly societies. The recent Ithaka-sponsored report on “University Publishing in a Digital Age” encourages the development of “university publishing strategies” for a more strategic approach to disseminating the work of faculty and staff, and the reinstatement of publishing as an activity that is mission-critical to the academic ambitions of every institution. Campuses that have not traditionally thought of themselves as “publishers” must grapple with the shifting definition of that term, while university presses and other dedicated publishers struggle to articulate the value of their traditional services in a tight marketplace. […]”
The session was one on lessons learned in three very different higher-ed environments.
1. Rea Devakos, Coordinator of Scholarly Communication Initiatives, University of Toronto
“Building in Uncertain Times: News from the Great White North”
UT is running the ScholarsPortal project, which “allows you to query nearly 23 million references to scholarly journal articles from over 50 major index and abstract databases through a single search – it’s like a Google search of scholarly information sources.”
As you might expect, contributions of constituent databases for this portal, which covers all of Ontario, may vary greatly. Collaborator institutions are expected to have varied areas of focus, history of tech use and platforms, but they should all be comfortable working remotely. Lean production environments are a usual occurrence, so information flows must be well managed. And yet, the structure of the umbrella project must be lightly enough structured to accommodate the different contributors and their different levels of commitment.
2. Catherine Mitchell, Director, eScholarship Publishing Group, California Digital Library, University of California
“Let’s Stop Talking About Repositories: A Study in Perceived Use-Value, Communication and Publishing Services”
Let’s reframe the conversation away from repositories, and think about what makes an IR successful. Take the eScholarship repository: many deposits, MANY full-text downloads of content. So by those markers, it’s been successful. On the other hand, UC faculty system-wide generate ~26,000 publications a year, so by comparison, maybe the IR is not so successful.
Why the gap? Lack of visibility and lack of incentive.
Publishing needs at UC boil down to these broad categories:
- low budget journals;
- conference proposals and proceedings;
- working papers, previously published materials;
- disciplinary and departmental collections;
- faculty homepages, annual publication reports; and
- innovative digital scholarly publications.
So, maybe we should stop talking about the concept of repository and focus on publishing services provided, and their relevance to researchers. The IR deposit becomes a by-product of services rendered, rather than an end in itself.
So Mitchell’s publishing group conducted a marketing and outreach campaign, redesigned the access interface, and began collaborating more closely with UC Press. The marketing allowed them to build a network; they acquired an outreach and marketing coordinator and e-scholarship liaisons, and recruited local site administrators to help spread the word. They customized their message, responding to cultural differences among disciplines. They addressed the incentives and risks for ladder rank faculty at different career stages. And they also considered what Mitchell called unique challenges around interdisciplinary and disciplinary formation.
They paid attention to their communication strategy: clear messaging in the form of a logo, focus groups and training sessions; billing escholarship.org as a publishing, research and marketing platform. They formed strategic partnerships (such as the one with UC Press mentioned above), both to provide shared services and to acquire legitimacy by association. Most of all, they followed the pattern of “listen, talk, listen again.”
The interface redesign had very specific strategic goals itself: better contextualization and content aggregation; enhanced search functionality and results display; citation tools; and an emphasis on the services suite.
The collaboration with UC Press proved a large project in itself. There had already been a history of episodic, focused and opportunistic collaborative activities: escholarship editions, monographic series, the Mark Twain Project. But what about more sustained work?
In the strategic/business planning phase, it became clear that the two entities had different editorial interests and goals; different business models; different constituents; and not least, different cultures. So why force collaboration? Because, said Mitchell, the needs across UC encourage these particular separate organizations to work together. So they found a point of convergence: services. And they even found incentives for collaborating: UC Press gained associative legitimacy as a service provider (and thus opportunities for new business models were born); and the eScholarship office gained associative legitimacy as a publisher, as well as visibility.
Everybody wins. But oh, what a huge – and, it seems, gracefully executed – project!
3. Janet Sietmann, DigitalCommons Project Manager, and Teresa Fishel, Library Director, Macalester College [MN]
“Showcasing Student, Faculty, and Campus Publications; Promoting, Populating, and Publishing in a small liberal arts college IR”
Here, I’ll be honest, I gave my wrists a break and listened. If someone at BU is interested in what these folks said, please comment here; I’ll be happy to contact them and request a copy of their presentation, and summarize it on Digilib.
The second day of the SPARC Digital Repositories Meeting 2008 in Baltimore was no less exciting than the first, but it was shorter, and also contained less information immediately useful to us at BU. So I nursed my wrists, which had flared up with RSI for the first time in weeks (sign of a good, informative conference, no?) and took fewer notes.
Description from the program: “One of the challenges facing all repositories is the establishment of policies that positively affect the submission, accessibility, and re-use of materials. The wide spectrum of deposit mandates and recommendations currently in effect reflect the diverse nature of governmental and organizational funding objectives. This panel will provide three perspectives on these policies, representing current practices in Europe, Japan and the United States.”
1. David Prosser, Director, SPARC Europe
“Public Policy Drivers for Change in Europe”
Scholarly community, as Prosser sees it, is being impacted by:
- the knowledge economy;
- accountability and assessment – value for money spent;
- e-science/e-research; and
- concerns regarding access to data and public sector information.
Measuring success can take many forms:
- impact in the relevant fields measured by number of citations;
- who is citing whom;
- number of downloads for each published item;
- patent registration; and
- rate of technology transfer.
The EU’s open access policies are still “young” and in the process of being continually tested. It’s been accepted that some situations will require an embargo period for publication of items in a freely accessible repository. This is considered a sub-optimal course of action, so generally the embargo period is encouraged to be set at a maximum of six months, with the ideal being zero – any embargo at all is a compromise, as far as open access advocates are concerned.
Prosser quoted Daniel Coit Gilman, the first president of Johns Hopkins University, as saying the following about the university press in 1878: “It is one of the noblest duties of a university to advance knowledge, and to diffuse it not merely among those who can attend the daily lectures–but far and wide.”
[VZ: To this I will add a quote from the Massachusetts Constitution, to which I was pointed recently:
Wisdom, and knowledge, as well as virtue, diffused generally among the body of the people, being necessary for the preservation of their rights and liberties; and as these depend on spreading the opportunities and advantages of education in the various parts of the country, and among the different orders of the people, it shall be the duty of legislatures and magistrates, in all future periods of this commonwealth, to cherish the interests of literature and the sciences, and all seminaries of them; especially the university at Cambridge, public schools and grammar schools in the towns; to encourage private societies and public institutions, rewards and immunities, for the promotion of agriculture, arts, sciences, commerce, trades, manufactures, and a natural history of the country…
Promotion, rather than hoarding. Preserving rights and liberties by disseminating knowledge. Clear enough.]
[The speaker who described the situation in Japan was too difficult for me to understand, alas, from the back of the room. Tried to find his slides, and failed.]
3. Bonnie Klein, Defense Technical Information Center, USA
U.S. Federal Government Repositories & Public Access to Grant Research
The U.S. is running some federal repositories: CENDI, science.gov, worldwidescience.org. All of these are concerned with federally funded grant research, and provide venues for disseminating publication requirements, as well as distribution of and access to research results. CENDI is an interagency working group of senior scientific and technical information (STI) managers from 13 U.S. federal agencies. WorldWideScience is more of a portal, and was launched in 2007.
In all, 26 government agencies fund over 1000 grant programs, information on all of which is available on grants.gov. The results of work funded by government grants often must be published and/or disseminated openly, unless they’re classified. They take many forms. Publications are the characteristic product (journal articles, peer-reviewed papers, books, dissertations, abstracts, interim and final tech reports). Other common products of federal-grant-funded work are websites, new networks and collaborations, technologies and techniques, inventions, patent applications, licenses, new equipment.
Klein listed some disadvantages of publishing results, and I did not have time to write them down, but mostly they amounted to secret information. There’s a slippery slope between classifying information for, say, security reasons and hoarding it, but that seems to be a problem inherent to knowledge work – I doubt there will ever come a day when we’ll have completely rigid classification criteria for knowledge, given that we keep coming up with new stuff. So we’ll just have to navigate situations as they come up.
This is the remainder of the notes from the first day of the SPARC Digital Repositories Meeting 2008. The Value-Added User Services session was moderated by Kathleen Shearer from the Canadian Association of Research Libraries.
The program description says: “Now that your digital repository is up and running, what’s next? The success of repositories will depend on the extent to which users value the services they offer. What types of services are being developed to take digital repositories beyond the static repository concept and make them more attractive for deposit, search, and reuse of the material? How can these services be created and maintained, and how can repository practitioners engage with service providers? This session will explore strategies for individual repositories, as well as national and international repository networks, to improve user experiences.”
This was the single most important theme of the conference for those of us in the planning stages of IR setup, so I took detailed notes that I’ll try to organize here.
1. Joan Giesecke, Dean of Libraries, and Paul Royster, Coordinator of Scholarly Communications, University of Nebraska-Lincoln
“Value-Adding Services Bundled through an Institutional Repository: A Successful Model”
A frequently encountered publishing philosophy is as follows: faculty publications that have commercial or market value should go to a university press or another publisher; whereas faculty (or student) research and manuscripts with little commercial market should be put into an IR. The difficult part is navigating the gray world between the extremes.
UNL’s Center for Digital Research in the Humanities (CDRH) offers not just IR services but downright digital publication (see below), in addition to tools for text analysis.
They use ContentDM databases for images and video (good for architecture and art slides, campus museums, video archive for Nebraska’s educational public television).
They use, or at least are testing, an Encore search engine. The engine brings different databases together (there’s that interoperability; it’s a useful concept within an institution as well as across institutions). It harvests Dublin Core, TEI and EAD systems into the library’s MARC catalog, raising visibility of diverse collections.
UNL started their repository with the idea of self-archiving: “the articles will add themselves,” they thought (as many of us do.) Well, that didn’t work: Paul Royster likened it to going fishing and expecting the fish to jump into the boat. OK, so model #2: Tom Sawyer whitewashes a fence. (Here’s where my notes are unclear. I believe they just done their best to make repositories sound cool: if you say everyone’s doing it, everyone will eventually be doing it. Whatever the approach—and please, someone reading this, add to my shoddy description in comments—it didn’t work.)
Since neither of those methods produced results, and Dr. Royster had a lot of time on his hands (raucous laughter from the audience), they determined to make the faculty an offer they couldn’t refuse. They turned the do-it-yourself (DIY) model on its head, and came up with DIFM (do-it-for-me). Like mediated deposit, only more so. Here are the services CDRH offered:
- handling permissions (better and more consistent compliance with contracts!)
- scanning (that way there’s central control of quality, resolution, image type, OCR, etc)
- typesetting (they make articles look professional! This part—which sounds incredibly time-consuming—is, I believe, done by a student.)
- adding metadata (self-archiving authors sometimes fail to include things like abstracts, original publication citations, etc)
- uploading/posting materials to the IR (as Royster said, even a child can do it, but there’s not always a child available)
- usage reporting: “Your article has been downloaded N times” is very very valuable (though only if N>0!)
- promotion (solicit or place links from/to Wikipedia, Online Books Page out of Penn, WorldCat, subject or discipline-based websites)
A large part of how they’ve found their audience was described as hunting and gathering—a learned skill that they don’t necessarily teach in grad school! You find the published article first, THEN seek out the author, who is usually flattered and agrees to let you add it to the IR! Copyright allowing, of course.
The UNL repository managers also actively solicit and publish original materials, and this is among their most popular content. Open access dissertations are extremely popular; they are downloaded 60 times more than restricted pay-access versions! Currently, however, UNL is gathering only about 20% of new dissertations. (The reasons for this were unclear to me.)
CDRH publishes book-length works that are otherwise unpublishable (too long, too narrow in topic, too etc.). Check out the online Dictionary of Invertebrate Zoology, which in print is over 380 pages long and sells for over $90! Or the beautifully illustrated Hopi Nation, which had been submitted to various presses over 25 years with no success. A multi-volume work of perceived limited interest with color plates—no press would touch it. They published it at UNL, and it was downloaded 523 times in the first five weeks. Awesome.
The possibilities there are many. Out of print books! Tractor testing (“we’re Nebraska, after all”), ornithology, whatever.
Royster explained exactly the resources they’ve dedicated to all this, but I did not catch the contents of that slide. I do know that CDRH consists of 9 staff members and 7 associated faculty members, and employs 10 students. However, the IR stuff is only a small part of what the Center does in terms of services and original research and development. So: impressive. (In fact, so impressive that people in the audience were slightly incredulous. Having been to UNL and visited the Center, I can attest that this is a bunch of extraordinarily well organized, smart folks whose productivity is truly impressive.)
The considerable benefits reaped by offering the services described above include:
- increased faculty participation;
- faster rate of content recruitment;
- greater degree of content usage;
- word-of-mouth recommendations by faculty;
- and, importantly, the library is where faculty come first for their electronic publication needs.
The other two papers in the session presented work being done in Europe and Japan. This was mostly stuff that isn’t very relevant to us at BU, at this stage, so I mostly listened. Only a few sparse notes from the rest of the session:
- Japanese institutions working with IRs have organized themselves into DRF, or Digital Repository Federation
- Usage leads to sustainability. When things are not used, they’re forgotten. (Seems obvious, yet the crowd’s reaction to this suggested that we tend to elide this.)
- Related: advocacy requires a fundamental usefulness to advocate for.
- The 2003 Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities is a document that defines what qualifies a repository contribution as open-access, and requests free and unrestricted access to sciences and human knowledge representation worldwide. Perhaps not so simple, but these principles are taken up by
- The DRIVER (Digital Repository Infrastructure Vision for European Research) Project. The project works toward open access to information through digital repository networks (emphasis mine -vz) in Europe and worldwide.
- To this end, DRIVER has developed D-NET, a “network evolution toolkit” available for download.
Some more notes from SPARC Digital Repositories Meeting 2008. This post covers the New Horizons session. Here’s a description of the session from the program:
Early discussion of campus-based digital repositories focused on pre-print and post-print versions of faculty research papers. Many institutions have discovered strong community interest in disseminating other types of content as well – including audio, video, image research outputs, multimedia projects, and ancillary evidence such as datasets, etc. that might be created in the course of research and class work. This interest has been strengthened by requirements set by many federal agencies that data-sharing plans accompany grant applications. The New Horizons panel will explore the transformative potential of data-intensive scholarship as well as explore solutions for the depositing dilemma that redefine the repository within the library’s “story” and scope of services.
Session format was: three presentations, then a Q&A/discussion session, and all the while the internet’s a-Twitter.
Moderator’s opening note: university communication involves many different languages—humanities languages, sciences languages, administrator languages, faculty languages, etc etc. We need to speak all these languages as we build institutional repositories (IRs) and talk about sharing.
1. Sayeed Choudhury, Associate Dean for Library Digital Programs, The Johns Hopkins University
“A Data-centric View of the Academic Universe”
-“historical infrastructures…become ubiquitous, accessible, reliable, and transparent as they mature.” – CREW, Understanding Infrastructure
-“…they will do what we expect them to do and not do what we expect them not to do…” – Amy Friedlander, JEP Triple Helix
– Communities come together and build systems to solve their own problems. It’s really when these community systems come together that infrastructure emerges. (For example, regional vs. national railroads.)
Data are fuzzy and answers are approximate. This is a belief held both by scientists and by humanists! We don’t talk much about that similarity, which is (I imagine -vz) useful for interdisciplinary communication.
IRs are, or should be, nodes in a network, not networks unto themselves. (See also the previous post. This was a running theme of the conference.)
Remember the gopher protocol, way back when? File transfer and access, pre-web. The crucially good thing about openness of gopher: it made it easy to move its contents onto the web. So even if gopher isn’t used much anymore, its content isn’t lost but ported.
– Data are fundamentally different from collections.
– The scale and complexity of data mean that machines become necessary, with communities meeting higher-level knowledge-organization needs and machines meeting lower-level ones
– IRs are [only] the beginning of our journey
– Time to get off the e-horse!! (Sayeed is tired of e-publishing, e-formats, e-everything.) Let’s talk instead about the requirements of… faculty, admininstrators, everyone—and address those requirements.
A final quote: “The future is here. It’s just not widely distributed yet.” –William Gibson
2. Shawn Martin, Scholarly Communication Librarian, University of Pennsylvania
“Institutional Repository Personality Disorder: How Do We Cure It?”
Definition of personality disorder: “an enduring pattern of inner experience and behavior that deviates markedly from the expectations of the culture of the individual who exhibits it.” Talking to faculty about IRs is much like talking to them about cats. (The reaction is likely to be, “…huh?”)
Traditional arguments in favor of IRs: (1) archive forever! (2) marketing tools for departments to showcase research! (3) reclaiming intellectual property of the institution! And that’s all good and true. Here’s what they have at Penn: over 4000 documents, 73 collections, 33000 downloads per month. Sounds successful up to now! So what now?
Now they provide services. These services include, stealthily, conversations with faculty. Some things they tell the Penn faculty: downloads are important; direct communication based on people finding your stuff is great; your google rankings are important and directly influenced by having your stuff in an IR; a centralized place is easier to use than a department website. Most importantly: open access does not mean it isn’t peer reviewed.
All of the above are necessary to talk to about with faculty, because they dispel some common myths about IRs. Hence, the talk about curing the IR personality disorder. It is necessary to reframe our arguments, Shawn says. Faculty generally don’t see benefits of open access to them, but they do see the opportunity of giving themselves higher profiles inside and outside Penn. They also do see the benefits of electronic publication and of including “non-traditional” materials in repository (lecture series, proceedings, etc).
What will a future IR look like? Well, we could look at IRs as the backbone of a new scholarly communication system. Backbones, however, aren’t necessarily what is most compelling to faculty. (A similar attitude for comparison: I don’t much care about electric grids, I just want the light to go on when I flip the switch.) Penn is seeing increased interest in SelectedWorks (a front-end, user-friendly tool for its IR), e-publishing possibilities, and “front-end” services. Though IRs may be an essential component, they’re not the selling point.
So, what is the selling point? These services offered by Penn:
– Getting your scholarship into Google;
– Creating your own website;
– Creating pretty online journals;
– Clearing copyright permissions;
– Uploading articles for you.
All of Penn’s services are “fringe” from most librarians’ perspectives, but to faculty they’re incredibly important. So we need to rethink how we sell IRs to faculty. Penn is trying to turn this framework around and make these services the “core” from a faculty perspective. They are:
– not advocating for either open or closed access;
– assessing scholarly needs and providing options;
– taking advantage of the greater dissemination allowed by open access;
– but conceding that closed access may provide prestige or tenure.
So at Penn they provide both closed and open access (closed-access journals get links, abstracts and other information in SelectedWorks). They are creating virtual collections of their faculty’s work and pushing it out onto the web. They also work with publishers to promote their university’s work that may be appearing in their journals. The repository folks do all this, not faculty.
3. Jennifer Campbell-Meier, Doctoral Student, University of Hawaii
“Storytelling and Institutional Repositories”
Jennifer has performed a comparative case study analysis of IR development at six institutions in the US and Canada. Many participants stated that they didn’t know how to respond when faculty members ask why they should submit materials to the IR. So Jennifer started thinking about storytelling. Stories can be springboards: they can act as visualization tools; contextualize change, promote understanding.
Oddly, googling stories + libraries, we get many results—storytime, storytelling to dogs and so on—but not how to use storytelling in academic contexts. So Jennifer noted and recorded some opportunities for storytelling, and specific conversational triggers for them. Below are some examples.
Trigger: scholarly publishing. Story: the internet and scholarly publishing—IR as a tool for scholarly communication. Share stories with faculty about open access, etc.
Trigger: tenure. Story: IR benefits for faculty. Share stories with faculty and/or grad students about IR benefits to encourage use.
Trigger: grants. Story: faculty/library collaborations. Share stories about IR as a home for grant projects, a platform for research, an opportunity for collaboration.
Trigger: legislature. Story: showcasing what a college or university does. Share stories with administrators about the IR as a showcase for the scholarly output of the institution.
What hasn’t John Wilbanks done? Besides his current job running Science Commons at Creative Commons, Wilbanks is a research fellow at MIT and has worked at Harvard’s Berkman Center, as a legislative aide to a U.S. Representative, and in various capacities in the open access movement. His blog is Common Knowledge, part of ScienceBlogs.
At SPARC 2008, Wilbanks gave the opening keynote, and I couldn’t think of a better way to kick off a conference—thought-provoking, full of information and yet not so much that it bogs you down—just enough to get a lively conversation flowing. Below are some of my slightly episodic notes from the keynote. If I paraphrase (or quote) incorrectly, please point this out and I’ll be glad to change accordingly. Most of the below is either straight quotation (insofar as I could type fast enough while listening to him speak) or close paraphrase. My own inserted thoughts are italicized.
Keynote, John Wilbanks, Creative Commons and MIT.
Why is there a disconnect between planning to share and the actual sharing? Why aren’t individual repositories starting to federate into a network? (This kicked off a running theme of “an interoperable network of repositories is what we should be striving for; individual repositories themselves are a stepping stone toward that goal.”)
Disruptive services can’t be planned in advance; planned innovation tends to be incremental and slow… and not innovative. Disruptive processes on the network come from people hacking, not those planning to hack. (Related: process change comes more slowly than information product change.)
He seemed to say, it’s nice and all to plan repositories, but there’s something to be said for jumping in at the deep end. This was appropriate, I think, in the context of the conference: it was later counterbalanced by specific case studies. The implication seemed to be that, by the end of SPARC 2008, we all knew enough about what to do and what not to do to make some overall structural decisions and begin implementing.
Stable systems are resistant to change on multiple levels, with multiple fail-safes (redundancy). Pre-existing systems that have worked have blocks in place to prevent process disruption. Copyright locks the container of the facts in a scholarly work, even more so in a digital environment than on paper (digital environment more controllable). For example, many publishing contracts make it illegal to add hyperlinks to/from a given work—and this is technologically enforceable, as long as the work is hosted on a controllable server.
Copyright is being asserted on databases! But they’re often not creative works (for example, raw scientific data), and thus not subject to copyright. Nevertheless, copyright is asserted.
But data is integrated anyway, and we won’t be escaping from that. (Nor do/should we want to. Interoperability means dissemination of new knowledge means more new knowledge, sometimes in forms specifically enabled by the wide dissemination—think mashups.)
What do ideas addressed by Creative Commons (CC) mean in a world of integrated data?
There’s a tension between the demands of adding content and providing services. As an example, Wilbanks shows a Caveat Lector post in which Dorothea Salo describes changing a link in DSpace, which takes her an hour. And she’s no novice.
Reports from the front lines: building a commons is really, really hard. It takes dedicated, passionate people with strong points of view, who are willing to compromise on those points of view on a regular basis.
There are currently >1000 journals worldwide under a CC license. Individuals may use Scholars Copyright Integration, a single line of HTML code provided by CC, to add a standard copyright addendum to online work. But for privacy reasons, CC can’t keep data re: who uses it.
CC/Science Commons (SC) have been working not only with rights clearance (the easy part of copyright!) but also with database integration (databases integrated with each other, and into digital repositories). THAT’s the hard part. (Again, it’s all about interoperability, and the hard part seems, according to Wilbanks, be worth investing a lot of effort in.) To this end, SC has written guidelines for writing db licenses.
There’s a real danger in using the law to achieve integrity, and citation, and playing fair. It’s more about norms. (That’s part of why the DMCA failed.)
The paper, or stand-alone database, as a container for information, is a bad metaphor. We are building a web for data—the “semantic web” (a better metaphor). Links help computers understand relationships between items (coffee –> coffee pot), but not between concepts (drinking coffee –> feeling awake). This is where semantic web tools come in.
Major complaint about the semantic web: too much front-loaded work. But maybe we’re too hung up on the labels—web 2.0, science 3.0—what about making Google work better, for instance? Think about it:
-Google finds stuff based on inbound links, and assigns relevance based on that.
-SC working on open source data integration yields a repository of ontologies, namespaces, and integrated databases. The goal of such data integration: e pluribus unum.
-We can transform complex queries into links! (Hello, SQL.) The links are ugly on the back, but the front end can be concise and pretty. And as long as our data is interoperable, we can affect Google’s search result in real and useful ways.
Two possible futures lie before us. Which will we choose: a network of repositories, or a bunch of islands? Push this further: what questions can only a network of populated repositories answer?
Hope: depositing data into an IR is not something a faculty committee mandates, but something [the benefits of which] the faculty member who shares gets. Mandates are great, payments [to authors, for depositing, when you can afford this] are better, but letting people who want to share outcompete people who don’t want to share is the best.
-Don’t wait. A lot of stuff needs to happen before these dreams become reality.
-Open access and IRs aren’t free as in speech, nor free as in beer, but free as in a puppy: I can give you a “free” pure-bred puppy, but you’ll be spending lots of money on that puppy for the next 15 years. (This conference’s attendance is encouraging evidence of key people being willing to invest in IRs.)
During the Q&A, someone asked: how do you talk to the faculty about the semantic web? Wilbanks said, you don’t. You talk to the people who care about the semantic web. To the faculty, you say “we’re providing a service that makes your materials more findable and more usable. All you have to do is provide us with materials and a hint or two about what they mean.”
(A note on terminology: “digital repository” and “institutional repository” aren’t the same thing, according to some; nevertheless, the two terms were used more or less interchangeably at this conference. Per convention around BU, I’ll use IR for “institutional repository” to refer to them below.)
I took so many notes that my wrists hurt from typing for the first time in months, so thought I’d make an initial summary post and break it all down in later posts.
In attendance were, per SPARC’s website, “librarians, researchers, funders, administrators, government officials, publishers, and technologists from around the world.” In this case, around the world meant primarily North America, Europe and Japan. The crowd truly was that diverse, and conversation flowed both in traditional conference ways and on Twitter. This was the first professional event at which I saw Twitter – or any public, almost-real-time communication venue – widely used, and I have to say, it was quite handy for recording fleeting thoughts, and for gauging the interests of the crowd (at least the part of the crowd that was participating in that venue.)
Discussion topics ranged widely, from the nuts and bolts (and fascinating new applications!) that help people run successful repositories to marketing participation in open access to faculty, to drumming up funding and figuring out exactly what kind of resources it takes to run an IR.
Some take-away lessons:
- It takes more resources than you think to set up and effectively run an IR. Seriously.
- Getting faculty to participate by depositing their work into a repository: really, really important. (Without content, what good is an IR?) Getting students to participate: even more important. Students are the faculty of the future, and they should be involved with open access before they plunge into the whirlwind life of a newly-minted faculty member.
- We need to ensure future interoperability of IRs with each other. That’s where a knowledge infrastructure will emerge, not within an individual IR.
- Just having an IR is not enough; value-added services are crucial. This deserves a post all its own, but one example of a value-added service is helping authors with copyright clearance.
- Faculty liaisons, champions and early adopters of the open access movement, are indispensable when seeking out content.
- Clearly outlined policies help everyone.
A lot to think on.