Tagged: people

DH09 Wednesday, session 1: managing information

June 24th, 2009 in DigiLib BLog, Preservation 1 comment

First up, Melissa Terras of University College London. “Digital Curiosities: Resource Creation Via Amateur Digitization”

Melissa has spent a lot of time studying images, and in most cases was studying images in/from institutions. But what about collections (of all sorts, not just images) created by people who aren’t affiliated with institutions? They’re actually quite interesting, and Melissa studied them using the following methods:


Tagged , , , ,

SPARC 2008: John Wilbanks' keynote

December 1st, 2008 in DigiLib BLog 1 comment

What hasn’t John Wilbanks done? Besides his current job running Science Commons at Creative Commons, Wilbanks is a research fellow at MIT and has worked at Harvard’s Berkman Center, as a legislative aide to a U.S. Representative, and in various capacities in the open access movement. His blog is Common Knowledge, part of ScienceBlogs.

At SPARC 2008, Wilbanks gave the opening keynote, and I couldn’t think of a better way to kick off a conference—thought-provoking, full of information and yet not so much that it bogs you down—just enough to get a lively conversation flowing. Below are some of my slightly episodic notes from the keynote. If I paraphrase (or quote) incorrectly, please point this out and I’ll be glad to change accordingly. Most of the below is either straight quotation (insofar as I could type fast enough while listening to him speak) or close paraphrase. My own inserted thoughts are italicized.

Keynote, John Wilbanks, Creative Commons and MIT.

Why is there a disconnect between planning to share and the actual sharing? Why aren’t individual repositories starting to federate into a network? (This kicked off a running theme of “an interoperable network of repositories is what we should be striving for; individual repositories themselves are a stepping stone toward that goal.”)

Disruptive services can’t be planned in advance; planned innovation tends to be incremental and slow… and not innovative. Disruptive processes on the network come from people hacking, not those planning to hack. (Related: process change comes more slowly than information product change.)

He seemed to say, it’s nice and all to plan repositories, but there’s something to be said for jumping in at the deep end. This was appropriate, I think, in the context of the conference: it was later counterbalanced by specific case studies. The implication seemed to be that, by the end of SPARC 2008, we all knew enough about what to do and what not to do to make some overall structural decisions and begin implementing.

Stable systems are resistant to change on multiple levels, with multiple fail-safes (redundancy). Pre-existing systems that have worked have blocks in place to prevent process disruption. Copyright locks the container of the facts in a scholarly work, even more so in a digital environment than on paper (digital environment more controllable). For example, many publishing contracts make it illegal to add hyperlinks to/from a given work—and this is technologically enforceable, as long as the work is hosted on a controllable server.

Copyright is being asserted on databases! But they’re often not creative works (for example, raw scientific data), and thus not subject to copyright. Nevertheless, copyright is asserted.

But data is integrated anyway, and we won’t be escaping from that. (Nor do/should we want to. Interoperability means dissemination of new knowledge means more new knowledge, sometimes in forms specifically enabled by the wide dissemination—think mashups.)

What do ideas addressed by Creative Commons (CC) mean in a world of integrated data?

There’s a tension between the demands of adding content and providing services. As an example, Wilbanks shows a Caveat Lector post in which Dorothea Salo describes changing a link in DSpace, which takes her an hour. And she’s no novice.

Reports from the front lines: building a commons is really, really hard. It takes dedicated, passionate people with strong points of view, who are willing to compromise on those points of view on a regular basis.

There are currently >1000 journals worldwide under a CC license. Individuals may use Scholars Copyright Integration, a single line of HTML code provided by CC, to add a standard copyright addendum to online work. But for privacy reasons, CC can’t keep data re: who uses it.

CC/Science Commons (SC) have been working not only with rights clearance (the easy part of copyright!) but also with database integration (databases integrated with each other, and into digital repositories). THAT’s the hard part. (Again, it’s all about interoperability, and the hard part seems, according to Wilbanks, be worth investing a lot of effort in.) To this end, SC has written guidelines for writing db licenses.

There’s a real danger in using the law to achieve integrity, and citation, and playing fair. It’s more about norms. (That’s part of why the DMCA failed.)

The paper, or stand-alone database, as a container for information, is a bad metaphor. We are building a web for data—the “semantic web” (a better metaphor). Links help computers understand relationships between items (coffee –> coffee pot), but not between concepts (drinking coffee –> feeling awake). This is where semantic web tools come in.

Major complaint about the semantic web: too much front-loaded work. But maybe we’re too hung up on the labels—web 2.0, science 3.0—what about making Google work better, for instance? Think about it:

-Google finds stuff based on inbound links, and assigns relevance based on that.
-SC working on open source data integration yields a repository of ontologies, namespaces, and integrated databases. The goal of such data integration: e pluribus unum.
-We can transform complex queries into links! (Hello, SQL.) The links are ugly on the back, but the front end can be concise and pretty. And as long as our data is interoperable, we can affect Google’s search result in real and useful ways.

Two possible futures lie before us. Which will we choose: a network of repositories, or a bunch of islands? Push this further: what questions can only a network of populated repositories answer?

Hope: depositing data into an IR is not something a faculty committee mandates, but something [the benefits of which] the faculty member who shares gets. Mandates are great, payments [to authors, for depositing, when you can afford this] are better, but letting people who want to share outcompete people who don’t want to share is the best.


-Don’t wait. A lot of stuff needs to happen before these dreams become reality.
-Open access and IRs aren’t free as in speech, nor free as in beer, but free as in a puppy: I can give you a “free” pure-bred puppy, but you’ll be spending lots of money on that puppy for the next 15 years. (This conference’s attendance is encouraging evidence of key people being willing to invest in IRs.)

During the Q&A, someone asked: how do you talk to the faculty about the semantic web? Wilbanks said, you don’t. You talk to the people who care about the semantic web. To the faculty, you say “we’re providing a service that makes your materials more findable and more usable. All you have to do is provide us with materials and a hint or two about what they mean.”

Tagged , , , ,