As this is my first post for the Library Research Plus blog I’d like to introduce myself. I’m Bill Ayres, the new Strategic Lead for Research Data Management based here in the Main Library since January 2020. I previously worked in IT for over fifteen years, most of them in HE, and then more lately as Research Data Manager at the University of Salford. I’m passionate about open research in general, and how this can connect researchers to foster cross-disciplinary projects and also have real-world benefits for people who may not otherwise have access to scholarly findings, outputs and data (especially data).
One important thing:
Usually, I would link out to various examples and case studies When I Talk About Data Repositories, but I’m going to be more general here. We are currently considering various options that will provide a fully-fledged data repository for the University – this is a very good thing – so in the interests of impartiality and fairness I won’t mention any specific platforms or technology suppliers.
One less important thing:
Apologies to Haruki Murakami for borrowing / mangling his book title for this post. It felt like a good idea when I thought of it late at night, a bit less so now (but I can’t think of a better one).
What good can a good institutional data repository do?
From a system perspective, and for the libraries that run the service, it provides a home for “curated” files, datasets and other resources that support research findings and publications. It shouldn’t be a dumping ground for everything, but a place for these important research assets that allows them to be stored, published and preserved.
With some funders mandating that data supporting publications be made available open access, and others recommending this, a data repository can also provide a straightforward option to ensure that compliance is covered.
It can be a powerful complementary system to the main institutional repository, one which can link to this in many ways and provide an alternative route for discovery and reuse of outputs, but also have its own character and profile.
There are clear and logical integrations that the repository can have with other useful systems e.g. a main institutional repository to connect outputs and data so that there are persistent links between them. There are opportunities there for reporting and metrics that examine ways people search for and discover data and published outputs, and how these may differ. There is also an opportunity to add the home institution branding to data and create or strengthen an association that may not occur if data is always hosted on external or publisher repositories. These benefits of integration can also extend outwards to researcher focused platforms e.g. ORCID, DataCite and similar.
And from a security and administration perspective, implementing an institutional data repository can help ensure that research data is safe, secure, covered by ethics and related policies, and also can be subject to review or checking where appropriate.
What I’ve talked about so far – a focus on integration, compliance, security, and review processes – is all great from the point of view of the institutional teams who “own” the data repository, and we need these to effectively manage and support it. But experience tells us that any system or service intended, primarily, for use by researchers and academics has to provide real-world benefits to them, and *crucially* be easy to use, or they will not utilise it or engage with the service offering it relates to. That adds up to a wasted investment for the institution, and a missed opportunity to give researchers a great platform.
So what should the institutional data repository be doing for its primary users, researchers?
Alongside the ease of use mentioned, it can fill a fundamental gap by providing a platform to publish data more quickly, easily and effectively than via other routes. For a long time efforts have been focused on publishing research outputs as the final part of the lifecycle. But a good data repository can facilitate a “just in time” ability to make data available to a wider audience throughout the research lifecycle. Adopting a light touch approach to curation of data deposits means that researchers can choose to share initial data, illustrate novel mid-project findings with relevant datasets, and looking past their standard data types they can share conference resources like posters, slides, or videos.
Talking about video, a data repository can provide an excellent place to store and showcase file types that can bring research to life: images, audio files, video and film clips, and in some cases there will be functionality that can preview or render 3D models and complex graphical files.
Increasingly data repositories also provide the ability for researchers to create collections of their own (or related) data outputs; a curated selection of datasets that links to similar open access resources created by others in the discipline can provide a resource with great potential for reuse or further investigation.
Researchers often need a place to store data for the longer term too. Funders and institutional policies may mandate a 5 year, 10 year, or even indefinite preservation requirement for research data. It can make good technical and practical sense to integrate digital preservation into a data repository, from straightforward bit-level preservation to more holistic solutions which will automatically convert file types and formats as applications and technology move on. An institutional data preservation option can give researchers peace of mind that their data will survive for the long term.
From a perspective beyond the home institution
As a final thought on this topic, I’d like to reflect back on the principles which are at the heart of open research and open data, in making that data FAIR (Findable, Accessible, Interoperable and Reusable) and Open. Beyond the anticipated audience of researchers and academic investigators, a great data repository can be a powerful gateway for access and reuse by researchers in the developing world, healthcare professionals, or by members of the public. We often forget that the costs of journal subscriptions or other payment models to access outputs and data act as an impassable barrier to institutions or individuals that are unable to pay them. It’s our duty to make as wide a range of research data as possible freely and easily available as this can have benefits that go far beyond the original investigation or discipline.