Over the past five years, the research landscape in UKHE has changed in several important ways. Institutional self-archiving mandates are increasingly common, in part due to the RCUK position paper on improved access to research outputs1, with many institutions choosing to maintain on-line repositories containing their publications. Likewise, the move to the Research Excellence Framework (REF) has placed an increased emphasis on the collection of research data for assessment purposes, but at present this data is collected to support central planning and funding decisions rather than for the direct benefit of the researchers that it describes.
The aim of the dotAC project is to develop a prototype demonstrator that synthesises research data from these sources (research publication metadata from institutional repositories, and research council data) and presents it to the end user through an interface that allows them to explore the state of the research landscape in UKHE.
The Open Linked Data initiative has in recent years provided a key focus on producing easily accessible resources on the Semantic Web. A number of significant datasets have been published, including numerous cross-linkages that enable the integration of these resources to form the emerging “Web of Data”. By publishing information in line with Linked Data guidelines, the value and usefulness of that data can be greatly enhanced through interlinking with other data sources, and can readily be consumed by a wide variety of tools and services.
Best practice prescribes that all non-information resources (e.g. real-world entities such as people, places or publications) are given URI identifiers that are resolvable using HTTP. When dereferencing such an identifier, the user or client application is redirected as appropriate to an information resource that provides a detailed description of that entity, either in a structured data format such as RDF, or in HTML for human consumption. As a result, semantic descriptions of resources are becoming easily available, with cross- linking enabling the traversal of datasets in the Web of Data in a manner analogous to that of navigating the World Wide Web by following hyperlinks between documents.
The proliferation of heterogeneous data sources presents a challenge to data integration on the scale required by this project; even if the data is exposed as linked open semantic data, there is no guarantee that different sources will choose the same identifiers for objects. Indeed, each data source will wish to use its own identifier, so that other internal systems can continue unchanged. This co-reference problem, that of determining whether or not two different identifiers refer to the same object, presents an obstacle to the widespread use and adoption of linked open data. Institutional repositories are on the verge of providing linked data, but there is no widespread coordination between repositories. This has the effect of hindering cross-repository access and browsing (it isn’t possible to consistently link from an author’s publications in one repository to their publications in another), especially in the case of a researcher who moves between from one institution to another.