User Tagging Behaviors in an OPAC
An Analysis of Seven Years of I-Share User Tags
User tagging services are underused in cultural heritage institutions despite their availability for over a decade. This study considers seven years of user tags from university and public institutions by comparing tagging service usage between institution types and qualitatively analyzing a selection of tags from the University of Illinois. Researchers found that overall, few users tag items in online catalogs, but those tags that are being created are largely descriptive in nature, indicating the potential to improve discoverability for underdescribed materials, e.g., lack of subject headings. With improved education on their use and purpose, tagging and annotation services can become important resources for cultural heritage institutions.
Discoverability access service is at the heart of the library’s daily functions and depends largely on discovery systems, including the online access catalog (OPAC) and metadata, notably MARC records. As technologies advance, new and innovative opportunities arise to enhance access and discovery layers, and libraries have diligently experimented with them to adapt to some of these changes. One example is the user tagging service, a function that stemmed from the phenomenon of social tagging on the open web, often referred to as Web 2.0. User tagging has generated excitement and controversy in technical services because of the question: what role do uncontrolled user tags play in improving discovery and access in comparison to and in conjunction with the existing authority control of cataloging standards and practices?
This study explored user behavior when given the opportunity to tag within an OPAC environment and examined the purpose and reality of user tagging as a complementary service to traditional cataloging. Specifically, this study intended to capture and assess aspects of the context under which users are tagging materials, including categorizing tags based on their relationship to existing descriptive metadata and contextual relevance. To do this, researchers worked with the Consortium of Academic and Research Libraries in Illinois (CARLI) to gather bibliographic records and associated tags from the I-Share integrated library system and its VuFind discovery layer. First, the data were assessed as a whole to determine the distribution and frequency of user tagging across institution types. Next, a sample of the data was taken to classify and analyze tags in their context within the OPAC.
Literature Review
In his paper, “Tagging for Libraries: A Review of the Effectiveness of Tagging Systems for Library Catalogs,” Gerolimos outlined the emergence of trends within the study of tagging in information sciences literature. He addressed the increase of interest in tagging that began in the mid-2000s following the success of social networking sites like Facebook and Twitter.1 He tracked the shifts in research trends in the late 2000s and early 2010s towards implementation of tagging services within libraries and on websites dedicated to more traditional library materials, like Goodreads and LibraryThing.2 During this period, there was an emphasis on the comparison between user generated tags and controlled vocabularies, primarily the Library of Congress Subject Headings (LCSH), and divided perspectives on the validity and usefulness of the folksonomies for search and discovery.3
As Gerolimos’s review revealed, librarians and other information professionals were concerned with the nature of tags as an uncontrolled vocabulary, though many recognized the potential benefits, including a more inclusive vocabulary of description, facilitating serendipitous discovery, and the potential to alleviate costs when the implementation of a controlled vocabulary is not viable.4 He concluded that research on the use of tags in the library catalog should reach beyond “determining the quality of user tags compared to subject headings,” and expand to answer broader questions:5
How did the tag system manage to transfer that feeling of “importance” in creating online content and describing resources to its users...? To what extent is the effort of tag assignment to document records based on real-time need to augment the search capabilities of OPACs? At what level are users infused with the willingness to provide keywords to enhance . . . the search/research options of other users with the use of tags? And how likely is it that the subsequent user will benefit from the keywords chosen by the one before him?6
Since Gerolimos’s review, researchers have expanded the breadth of their inquiry into tagging and the behaviors surrounding the practice. Syn and Spring addressed methods for determining the potential of user generated tags to classify a collection based on metrics intended to determine user agreement and remove terms that are too broad or narrow.7 Joorabchi, English, and Mahdi investigated the feasibility of integrating tags and linked data methods to improve issues of inconsistency within such uncontrolled, but valuable, vocabularies. Still other researchers have studied influences on user tagging behaviors in a variety of environments, focusing on the motivations behind the act of tagging itself.8
This study’s scope was to expand upon such research, interrogating and applying observations on user tagging behaviors broadly. In analyzing these behaviors, researchers looked back and expanded on previous investigations into the relative value of and usability of user tags as a unique descriptive resource alongside traditional cataloging, addressing several of the questions Gerolimos proposed. This study focused on the tagging behaviors of users in academic library OPACs, and considers the context within which tags are made, the type of tag, and the implications of user tagging trends. As a result, the researchers designed this study to address the following questions:
- To what degree are users adding tags in an OPAC if the system allows such functionality?
- What types of tags are being added and in what context?
Additionally, the researchers sought to explore how this study might inform current discussion surrounding the following questions:
- Can libraries utilize user added tags to improve discovery and access services?
- Are tagging services still valid and useful in the age of linked open data?
For the purposes of this study, CARLI provided researchers with data in the form of a tab delimited file, listing as one unit the bibliographic record number and prefix indicating the holding institution, the number of users who had added tags, the total number of tags added, and a list of all tags added to the record. The data was drawn from eighty-nine institutions participating in I-Share, the collective integrated library system and shared OPAC offered by CARLI, and reflected all tags created from the service’s implementation of the VuFind discovery layer from June 2010 to March 2017, when the data were collected. Due to the nature of the data, researchers identified four data types: institution, bibliographic record, number of users who added a tag(s) to a record, and the tag(s) added. By defining these data types, researchers were able to both examine the individual types and the relationships between each type.
Having arranged the data in this manner, the researchers designed a two-part approach to the data analysis. First, researchers grouped the data based on institution type using the Carnegie Classification of Institutions’ Basic Classification guidelines to conduct a quantitative analysis of all data types.9 The Carnegie Classification of Institutions was selected for its consistency and accuracy as an ongoing standard of categorization of institutions of higher education. Second, a sample set of the data was identified and the associated tags categorized based on a set of categories identified by the researchers.
For the first analysis, the data consisted of 286,805 tags, 157,215 records, and 167,095 users from eighty-nine institutions. The institutions were divided into groups based on the five Basic Classifications defined by the Carnegie Classification: Doctoral Universities, Master’s Colleges and Universities, Baccalaureate Colleges, Associate’s Colleges, and Special Focus/Other.10 Within these categories, the total tags, users, and records were compiled for each individual institution, the five institution categories, and the data set as a whole (see Appendix A). These totals were used to calculate the number of tags appearing per record on average, the number of users adding tags per record on average, and the number of tags being added per user on average. These three averages were calculated for individual institutions, institutional categories, and the data set as a whole.
For the second analysis, data from the University of Illinois (U of I) was selected as a sample from the full CARLI data (see table 1). To work with this sample, researchers isolated the bibliographic record numbers for the records associated with U of I and ran a report to pull the associated MARC 245 ($a and $b), 100 ($a), 650 (all subfields), 651 (all subfields), and 655 (all subfields) data fields that represent the title, author, and subjects of each record. The resulting data set was compiled and uploaded into OpenRefine, an open source application for data cleaning and exploration. The researchers used the faceting feature to identify records that lacked values in the 650, 651, or 655 fields (i.e., any subject headings). These records were chosen for the sample and resulted in 2,605 tags, 1,237 users, and 1,207 records.
To contextualize the tags associated with U of I’s sample, OpenRefine’s faceting and clustering functions were used to produce a list of unique tags. In OpenRefine, the faceting function identifies each unique string value in a column and returns the number of times each string appears in the column. The clustering function can then be used to reconcile string values that are marked as similar according to an algorithm that determines “sameness” using a key collision method called fingerprinting.11 For this process, the researchers removed extra whitespace and punctuation at the beginning and end of strings. No tags were changed in regard to case or spelling to retain as much original context as possible.
Researchers then performed a cursory overview of the resulting list of unique tags and identified common themes from which categories could be determined. Based on these observations, researchers identified seven clear categories (see table 2). All tags remaining after the initial sort were assessed against their full bibliographic record and sorted to the best of the researchers’ abilities. The remaining tags following this secondary sort were grouped into a final category, Other.
Institutional Classification
Of the eighty-nine institutions identified within the data set, researchers identified ten doctoral universities, twenty-five master’s colleges and universities, fourteen baccalaureate colleges, twenty-four associate’s colleges, and sixteen special focus/other. After classifying all institutions, the number of individual tags, records, and users were quantified at the institution level and then averaged within each category. These results showed that on average, institutions classified as doctoral universities had the highest record, user, and tag counts when compared to other institutions and accounted for 52 percent of all records, 63 percent of all users, and 54 percent of all tags (see figure 1).
Despite representing only 11 percent of the participating institutions, doctoral universities were responsible for the bulk of the cumulative data in all three types. This phenomenon reflected the relative sizes of these institutions when considering the number of students, staff, and faculty (users) and volumes held (records). Larger collections and a greater number of potential users increase the overall tag output. The discrepancy in size of the collection and potential user pool between institution types did not appear to affect the likelihood of users adding tags to records as evidenced by an assessment of the relationships between each data type (see figure 2).
As illustrated in figure 2, researchers calculated the average number of users adding tags per record, tags added per record, and user to tag ratio. These relationships did not show a significant variation across institution types, thereby indicating a consistency with which users across institution types applied tags to records. This trend exhibited an independence from the relative size of the potential user group or institutional collection.
Subset Determination
To analyze the tags, researchers extracted data associated with U of I. U of I was categorized as a doctoral university and had 21,776 records, 22,863 users, and 37,706 tags total. Compared to other doctoral universities, the ratios of users per record (1.05:1), tags per record (1.732:1), and tags per user (1.649:1) for U of I’s data was well within the expected results.
Of the 21,776 records, researchers identified 1,207 records lacking subject headings, representing approximately 6 percent of the U of I data and 0.8 percent of the full I-Share data (see table 1). There are some brief records, and others are for literature that normally do not have subject headings. These records were extracted as a subset of the full data to be used for qualitative analysis on the basis that users would have tagged these materials under significantly less influence by the catalog records. The same quantitative analyses as the full data set was applied and compared to the rest of the U of I data.
In a comparison of the records lacking subject headings against the full U of I records, on average those without subject headings had a higher ratio of tags per record (2.136:1). When comparing the number of tags per record, both sets showed similar trends. As shown in figure 3, an analysis of the number of tags per record for the total records from U of I showed that approximately 62.12 percent of records had only one tag, while the maximum number of tags for a single record was thirty-seven. Comparatively, when only the records lacking subject headings were analyzed, approximately 62.06 percent of the records had only one tag, while the maximum number of tags for a single record was fourteen.
Tag Categorization
After sorting tags into the previously identified eight categories, researchers analyzed the resulting groupings and found that tags fell overwhelmingly into the Content Description category (54.22 percent). The second largest category, Title Words (22.04 percent), included a number of tags that could logically have been categorized as Content Description on the basis that titles are generally considered to be descriptive of a work’s contents. Researchers determined that the majority of the tags broadly described the contents of the resources (see table 3). The prevalence of descriptive tags indicated that many users have clear objectives when they added tags.
To further analyze the results of categorization, researchers extracted lists of all unique tags and their frequency of occurrence from the I-Share data, the U of I data, the full set of records without subject headings, and those tags categorized under Content Description (see Appendix B). In comparing the top thirty most frequently occurring tags, researchers recognized a variation in the specificity of the tags from the full data set and the U of I data and those from the subset and Content Description category. The tags for the I-Share records and the U of I records appeared to be more general, with some user commentary such as “to read” and initials, plus notes about the item’s intended use (“research” or “paper”). The subset and Content Description tags exhibited a greater degree of specificity, focused more on describing the genre of the resources with terms such as “Drama,” “comedy,” and “romance.” This sharpening of specificity indicated to researchers that users’ descriptive tagging behaviors became more pointed and purposeful when the subject headings in the catalog records were limited or non-existent.
User tagging has a long history of debate among the cultural heritage community in relation to the service’s potential for enhancing access and discoverability of materials. Assessment of the I-Share user tags indicated a limited use of tagging services by users across academic institution types, with the likelihood of users to tag remaining relatively standard across institution types. Although 157,215 individual item records were represented in this study, this is a modest percentage of the combined holdings of the eighty-nine participating institutions that represent a collective 14.7 million unique bibliographic records and 38.1 million item records. The reasons for such a small portion of materials being tagged could be attributed to a number of factors: lack of user awareness of tagging services, lack of user education on the use of tagging services, lack of user interest in tagging services, lack of use cases on how to use user tags in cataloging or (and) discovery services, etc. Regardless, several trends emerged from the data collected via I-Share that merit discussion.
User purpose for tags appears varied but can largely be understood to fall into three behaviors: adding context to described or under-described materials, creating a personal collection for research or reference, and indicating personal perception and/or future intentions. The presence of tags such as “jkbnhs,” which appears a total of 327 times in the full I-Share data set, indicates a behavior of collecting materials through personalized tags. Additionally, tags such as “diss” and “ARTF101” indicate a variation on this collecting behavior, grouping items based on relevance to research or coursework.
The prevalence of descriptive tags indicates a desire to enhance the description of records both for public and personal use. Annotations have been broadly defined to include any type of marking or notation made with the purpose of indicating observations, comments, and intentions. Using this definition suggests that the behaviors of users tagging records in the OPAC is a form of annotation with limited functionality. One constraint on the functionality of VuFind’s tagging service is how tags are processed and added to the catalog. To add a single word tag, users need only type the word into the designated search box. To add a phrase, users must enclose the phrase in quotes (see table 4).
The result is that some users appear to have followed the input requirements for phrases, while others did not, resulting in several individual tags, that when read together, complete a full annotative thought. These actions account for the variation in the number of tags per record and supports the observation that a lack of user education on how to use tagging services plays a role not only in the perception of the nature and meaning of a tag or tags, but also in the interpretation of the relevancy of tags to both users and library staff as evidenced by the researchers’ disregard for individual tags that are considered stop words in the analysis of the most frequently occurring tags, important context is lost without a reassessment of the context in which those tags exist.
When first introduced in the early 2000s, user tagging services were regarded as one of the direct implementations of Web 2.0 utility and welcomed by the library and cultural heritage community.12 This study examined users’ tagging behaviors in an OPAC by analyzing user tags added to the CARLI integrated library system from 2010 to 2017. Data analysis revealed that the tagging service is not used as much as anticipated, and that only a small number of CARLI records include user tags.
When examined closely, the study found that users create tags largely for descriptive purposes, although many tags indicate personal annotation when applied. This trend has led some researchers to speculate whether user tagging services is no longer desirable in the era of linked open data. However, based on this study’s findings, researchers believe there are ways to improve user tagging services. They encourage libraries to explore other options that facilitate the incorporation of user tagging into the main library services.
First, the analysis revealed that users added tags for a variety of purposes, all of which could be broadly considered annotations. Recently, the W3C Annotation Group published a data model and vocabularies for the web annotation service.13
Second, based on the limited use of user tagging services and the generally low quality of tags, libraries should seek to improve user education on the use and purpose of tagging and/or annotating in the OPAC. Users cannot use the service to full advantage nor provide quality tags when they are not aware of the service or how to use it. Coordinated instruction opportunities with public services or library instruction departments and a readily useable web document could provide the education necessary to fully utilize tagging or annotation services.
Third, because tags are uncontrolled, there is a certain limitation on integrating tags into a library’s bibliographic records. However, tags could still be used as part of the discovery services. VuFind version 4.3 includes user tags as a search options, in addition to more traditional search methods.14 The inclusion of tags as an indexed and searchable information source may aid users in discovering items when using natural language queries that are more familiar to them than library specific controlled vocabularies, such as Library of Congress Subject Headings. Because user tags tap into users’ natural language habits, they not only provide an alternate descriptive vocabulary, but also capture the unique perspectives and language of the users providing them.
While user-tagging services have been available since the early 2000s, they are underused for various reasons. As libraries and other cultural heritage institutions move towards adopting linked data and web technologies, it is time to reevaluate the service and find ways to better integrate tags, as a unique and user-reflective resource, into our discovery services to improve access to under-cataloged library materials and promote scholarly communication.
Figure 1. Percent of Cumulative Data by Institution Type
Figure 2. Relationship between Data Types
Figure 3. Frequency of Tags per Record
Table 1. University of Illinois at Urbana-Champaign Full and Sample Data
Total U of I data |
U of I Data without Subject Headings |
Data types |
Records |
21,776 |
1,207 |
Users |
22,863 |
1,245 |
Tags |
37,706 |
2,595 |
Unique tags |
8,883 |
1,083 |
Tags per record |
Minimum |
1 |
1 |
Maximum |
37 |
14 |
Average |
1.732 |
2.136 |
Table 2. Tag Categories
Category |
Definition |
Example |
Content Description |
Describes or addresses what the work is “about” |
action, romance |
Title Words |
Matches a word(s) in the title of the work as it appears in the 245 field |
Bhagwad Gita |
Creator Name |
Matches the name(s) of the work’s creator(s) as they appear in the 100 field |
kafka, Calvino |
User Commentary |
User notes, intentions, actions, and evaluations |
diss, REQUEST |
Course Information |
Indicates a course name and/or number |
ARTF101, AmLit |
Object Description |
Describes or addresses the physical or digital object |
e-book, map |
Call Number/Location |
Indicates the call number or physical location of the object |
L-OSF, stacks |
Table 3. Results of Tag Categorization
Category |
Tag Count |
% of Total Tags |
Content Description |
1,407 |
54.22 |
Title Words |
572 |
22.04 |
User Commentary |
198 |
7.63 |
Creator Name |
173 |
6.67 |
Course Information |
85 |
3.28 |
Other |
76 |
2.93 |
Object Description |
58 |
2.24 |
Call Number/Location |
26 |
1.00 |
* Note: Percentages calculated using the number of tags from the U of I Data without Subject Headings (see table 1).
Table 4. User Input Effect on Tag Output
User Input |
Resulting Tag(s) |
book |
book |
“to check out” |
to check out |
things I’m interested in |
things, I’m, interested, in |
