The project team for the Evolutionary Subject Tagging in the Humanities project published the culminating white paper for the project: Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes.
The research interest in “evolutionary subject tagging in humanities research” grew out of both an appreciation for the value of subject classification in organizing and discovering information, and frustration with the limitations we encounter in currently available systems. Even as subject terms highlight and focus attention on relationships between information objects, particularly within academic disciplines, they can hide and blur relationships when trying to bridge multiple disciplines in one’s research.
Driven by desire to help humanities scholars more easily discover and examine information, the team’s early articulations of the problem led them to explore how we might fix it. Would additional subject tags improve discoverability? Would layering subject terms from multiple disciplines help? Is there a way to merge them? Is translation between disciplinary thesauri required? If more
subject terms are required, could we develop a scalable (sustainable) model for providing them? Would any of the efforts to improve the discoverability of humanities texts actually facilitate enhanced examination of the texts?
Repeatedly throughout the project, team members found themselves challenging each other about very basic assumptions that underlie subject classification and the use of subject terms for discovery and examination of information objects. Those conflicting opinions form a creative tension out of which the project and this paper have emerged. They continue to engage the team and to shape the Libraries’ exploration of how to improve discovery and examination of texts for humanities research.
In early November, Jack Ammerman and Vika Zafrin attended the Digital Library Federation Fall Forum 2010 in Palo Alto, California. While there, we led a working session on the subject of our grant project: evolutionary subject tagging in humanities research. We are grateful for the feedback the session participants provided. Below is a summary of references and questions we’ll need to consider. If you have any input on these, please email us!
Topics and questions to consider:
- Interaction between social tagging and “official” cataloging
- What’s the test/first corpus to tag?
- how to pick it while remaining as non-disciplinary as we can? (see latent semantic analysis below)
- Seeing subjects along with examples of those subjects within a structure can enable users to learn a taxonomy as they use it.
- How do disciplinary portals/sites describe, classify, categorize information?
- Computational linguistics
Approaches to consider:
- Reverse engineer bibliographies via citations?
- Look at latent semantic analysis
- Do we want to obtain relatedness of objects, or a thesaurus?
- LSA, avoiding any strings, might work for languages beyond English
- Trusting mathematical models vs. trusting catalogers’ (or anyone else’s) point of view
- Strategy that limits us to English and isn’t scalable may be the wrong road.
- Relevance: exploring items related to other items, or along a taxonomy map?
- we need to pull together conceptually related terms (phil. soul and theo. soul are different)
- Wikipedia disambiguation model?
Two models for how we might proceed: natural-language, and mathematical.
We need to clearly articulate those models, and what we see to be the strengths and weaknesses of each model.
Engage consultants around that thinking: are we missing something here? Is there a hybrid of these two models? Should we imagine moving forward with both at the same time? Different target audiences? Two projects? (What if we could compare the results of both on the same corpus?) If we move ahead with either of these two steps, who needs to be involved? What would be the barriers of both approaches? Which target audience would be best served by which of these approaches?
Vika Zafrin and Jack Ammerman attended the 2010 Digital Humanities Start-Up Grants Project Directors Meeting on September 28. Below is the presentation that Vika make during the “Lightning Round” of project presentations. We were limited to three slides and two minutes:
Twitter Hashtag: #SUG2010