Linked Data for Cultural Heritage (An ALCTS Monograph). Eds. Ed Jones and Michele Seikel. Chicago: ALA Editions, 2016. 134p. $75.00 softcover (ISBN 978-0-8389-1439-7).
While linked data has been on the horizon for librarians, archivists, and other curators of cultural memory nearly since it was first expounded fifteen years ago, for many it has remained an abstraction.1 Jones and Seikel present six contributions by those engaged in implementing linked data projects across the cultural heritage landscape, seeking to bridge the gap between the idea of linked data and concrete applications that can be adopted at a local level. The focus is not on the technology of linked data, though each of the chapters discuss some technical issues relevant to the projects, but rather on how the technology can overcome the limits of earlier cultural metadata encoding systems (e.g., MARC) and what new challenges and opportunities it presents. By presenting studies of real-world implementations of linked data, this volume effectively communicates the progress made and a sense of what the technology could do for a local collection.
Again, the collection is not a primer on linked data, or a technical manual or a guide to implementation, but each contribution does discuss some technical aspects. The introduction provides a brief overview of the basic structure of linked data, and individual chapters develop particular issues relevant to the projects described; these descriptions of the structure and syntax of linked data are sufficient to follow how the projects used them, but readers without previous familiarity with the topic may wish to review an introduction to linked data, such as Weese and Segal.2 Again, while the synopses of the individual projects discuss challenges met, the goal of the work is not to provide a roadmap to exposing your data as linked data, such as is provided by Hyvönen or Hooland and Verborgh.3 Rather, the intent is to highlight the potentials and challenges of linked data for cultural memory institutions in their current historical moment, updating and expanding the brief Mitchell (2013), and complementing the even briefer Mitchell (2016).4
The challenges of converting existing data into linked data emerged as a common theme among the various projects. The volume as a whole presents a picture that there are a number of tools emerging that can help convert datasets, but that, at present, human intervention continues to be needed, particularly where data in the originating record are ambiguous or the structure of the target linked dataset requires higher granularity. For example, Thorsen and Pattuelli in describing their Linked Jazz program note the development of a transcript analyzer that was used to process interview transcripts, find personal names, and generate triples with the predicate rel:knowsOf. The software could not assign more specific relationships, so the data was crowdsourced to refine those predicates to the likes of rel:collaboratedWith or rel:influencedBy. Godby, in describing the OCLC’s testing of conversion of MARC bibliographic records to linked data notes that while published monographs could be converted with minimal intervention, more complex works (her example was a video of a live performance of Tchaikovsky’s ballet The Nutcracker based on the tale by E. T. A. Hoffman) required substantial intervention, e.g., disambiguating the relation of a personal name in a 700 field as being related to the video, the performance, the ballet, or the tale.
The need for controlled vocabularies appears as another key theme among the different projects. Contrary to earlier expectations that a kind of invisible hand would guide the selection of usable vocabularies in a free-web environment, the contributors share a position that carefully created and maintained vocabularies are necessary to connect local metadata with the larger linked data environment, which is one of the main reasons cultural memory institutions would convert their data to linked data in the first place (33–34). Authority control is the focus of O’Dell’s chapter, where she takes the perspective that, since authority control is a mature practice within librarianship, the creation, use, and maintenance of controlled vocabularies is an area where libraries are in a position to make a substantive contribution to the linked data community. Huerga and Lauruhn approach the need for authority control from the perspective of science, technology, and medicine (STM), particularly in view of a changing landscape where research data is increasingly openly available and pressure for STM research to be reproducible. In particular, since several STM vocabularies are already available for linked data, and more are likely to be available soon, they point to the need for metadata specialists to select and apply appropriate vocabularies for local data, and for the need to map equivalencies and near-equivalencies of terms between different vocabularies.
The final two chapters share a concern for, among other things, how linked data representations of bibliographic entities can accommodate the Functional Requirements for Bibliographic Records (FRBR) work/expression/manifestation/item model. Godby, reporting on OCLC’s linked data conversion project, describes a working model for distinguishing works from manifestations by clustering records with (near-) identical 1xx and 245 fields, where the cluster represents the work, and is assigned appropriate relationships from the individual records, such as schema:about or schema:genre; members of the cluster are assigned the relationship schema:exampleOfWork, which suffices to identify them as manifestations; a relationship of schema:translationOfWork, derived from 41 and 240 fields is sufficient to identify an expression, and so forth. McCallum, reporting on the development of the Bibliographic Framework Initiative (BIBFRAME) at the Library of Congress, compares the BIBFRAME model of work/instance/item with the FRBR model and notes the resulting issues, for example, that every BIBFRAME instance must have a relationship with a BIBFRAME work, but in the data created in the MARC environment, work entities (i.e., authority files) were created in certain conditions.
Altogether, the volume makes an important contribution to the literature on linked data applications for cultural memory institutions. Anyone considering a project to convert their local metadata to linked data will find current perspectives on such questions as what linked data can or cannot (yet) do, what kinds of tools exist to assist the conversion, what level of human intervention will be needed, why are controlled vocabularies needed, and how can they be found and selected. Not all the answers lie within its pages, but the readers will be better able to understand the scopes of their anticipated projects and predict challenges that are likely to arise.—Paul Ojennus (pojennus@whitworth.edu), Whitworth University, Spokane, Washington
References
- Tim Berners-Lee, James Hendler, and Ora Lassila, “The Semantic Web,” Scientific American 284, no. 5 (2001): 34-43.
- Keith P. Weese and Dan Segal, Libraries and the Semantic Web (San Rafael, CA: Morgan & Claypool, 2015).
- Eero Hyvönen, Publishing and Using Cultural Heritage Linked Data on the Semantic Web (San Rafael, CA: Morgan & Claypool, 2012); Seth van Hooland and Ruben Verborgh, Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish your Metadata (Chicago: Neal Schuman, 2014).
- Erik Mitchell, Library Linked Data: Research and Adoption (Chicago: ALA TechSource, 2013); Erik Mitchell, Library Linked Data: Early Activity and Development (Chicago: ALA TechSource, 2016).