
The Stanford University Libraries are thrilled to announce the registration of a trademark for Sinopia, an open-source technology that enables library professionals from around the world to create and share linked data descriptions on one central platform. The descriptions created with Sinopia will enhance search results from individual library catalogs as data from external sources is included.
All records in a library catalog hold descriptive information known as “metadata” that helps to identify and categorize each item, making library resources easier to find. For example, a single query in SearchWorks, the Libraries’ online catalog, will yield all materials pertaining to American biochemist and Stanford University professor Paul Berg – his papers in Special Collections, an oral history, interviews, photographs, research articles, books about him, video recordings, conference papers, symposium posters, and more – an amazing 1,079 results. Library catalogs have always provided access to a wealth of resources through the metadata descriptions they create and provide.
Linking data in descriptions within the Libraries’ collections to information available on the open web can transform library search experiences. Following the same example, Paul Berg won the Nobel Prize in Chemistry in 1980 alongside Walter Gilbert and Frederick Sanger yet a library patron would need to search elsewhere for this information. Side-by-side with catalog search results, users could be presented with co-author graphs, an academic tree, and citation statistics powered by Scholia. With linked open data descriptions for library resources created in Sinopia and held at Stanford University Libraries, it is possible to establish relationships between Stanford collections and data available on the open web.
Linked open data can be leveraged further outside of the library catalog in innovative ways to support research and new digital services. Projects like Know Systemic Racism utilized linked open data to create a knowledge graph that gathers and shares data illustrating how interconnected systems of racism work against the Black community in the State of California. The GND Explorer tool visualizes relationships between entities in the German authority file, a dataset that helps consistently identify people, organizations, resources, and subjects.
Developed since 2019 in collaboration with Cornell University, Harvard University, the Library of Congress, the School of Library and Information Science at the University of Iowa, and the Program for Cooperative Cataloging, Sinopia is a first step toward a full-fledged linked data production environment. Sinopia currently supports over 2,700 users from 48 different countries and 55 institutions. For libraries who are now able to use Sinopia, the software is a free, hands-on way to learn linked data best practices in a shared cataloging environment. Accordingly, it is named for the reddish-brown pigment commonly used by artists for the underdrawings of paintings and frescoes.
Feedback from Sinopia’s user community guides its development, with forward-thinking librarians and technologists eager to leverage Artificial Intelligence (AI) and Large Language Models to complement Sinopia’s existing functionality. Some potential uses of AI technologies include detecting duplicate descriptions, automatically generating new descriptions based on existing data, title page scans or photographs, and supporting prompt design for chat-based workflows and user interfaces. Black@Stanford is a prototype question-and-answer application that contains oral history transcripts from Black Stanford students and faculty. It provides answers to questions posed by users through a chat interface, and serves as an example of how AI can be used to surface library collections in novel ways.
As collections quickly grow in size and scope, the Stanford University Libraries are part of a larger trend to improve search and discovery experiences for library users. Harvard Library and Yale Library are also investigating ways to save researchers time by strengthening their library catalogs to extend beyond keyword search to infer meaning and relationships from a search request. Linked open data makes these inferences more reliable, and enables emerging tools to aggregate data from across datasets to highlight previously hidden relationships between sources.