You might have seen we recently released the first episode of our open research podcast Opening Remarks. This is something we’ve been talking about doing for a while, but the transition to working from home sped things up a little bit. We now spend a lot of our time talking to each other on platforms that enable audio recording, so our feeling was this would be a good opportunity to put that technology to good use.
The idea behind Opening Remarks is simple – we want to have conversations with colleagues from across the University about open research; how open research is supported and facilitated, but also how researchers embed open principles in their practice. We want these conversations to be informal, interesting and informative.
Our intention is to record six episodes in this initial series, covering research data, open access, research communications, metrics and lots more besides. We’d been keen to hear from you about what you think we should be talking about, and we’d be even keener to hear from you if you’d like to be a guest! Come and talk to us about the open research that you do!
The first episode is already available on iTunes and, pending successful reviews, should be available on Stitcher, Spotify and Google’s podcast player in the next couple of days. Do give it a listen and let us know what you think! You can contact us on Twitter at @UoMLibResearch or email us at firstname.lastname@example.org.
Opening Remarks is hosted by Clare Liggins and Steve Carlton, two Research Services Librarians with very little broadcast experience but lots of enthusiasm.
I’ve been a Research Services Librarian at Manchester since January 2019, specialising in open access and research communications. Before I arrived at Manchester I’d been working in open access at several other institutions across the north west, including spells at the University of Liverpool and the University of Salford.
I’m interested in open research and its potential to help researchers reach broader audiences, and outside work I’m into professional wrestling, non-league football, the music of Arthur Russell and the Australian TV soap Neighbours. If I can find a way to talk about any of those things in the podcast, I will.
I’m a Research Services Librarian in the Research Data Management Team. I’ve been working at the University since January 2019 (Steve and I started on the same day) and am interested in anything to do with promoting the effective practice of Research Data Management, including training, as well as anything to do with Open Research.
My background is in Literature and writing, and before working at the University I was a Law Librarian. Due to my background, I am also interested in finding ways of working with these areas to adopt Research Data Management processes more widely.
In my spare time I enjoy reading books about feminist writers, spotting beautiful furniture in films from the 1950s, cooking recipes written by Nigel Slater and making up voices for my cat.
In this first episode of Opening Remarks, we talk about the perils of working from home in the summer, then invite our colleagues to talk to us about research data management for an hour. We’re joined by Chris, Eleanor and Bill to cover: the complexities of supporting research data management across disciplines, the joys of checking data management plans, and we talk up some of the services we offer. We also get a bit excited talking about the impending arrival of an institutional data repository.
Music by Mike Liggins
Artwork by Elizabeth Carlton
This year I was delighted to attend and present a poster at IDCC 2020, which put together a truly thought-provoking line-up of speakers and topics, as well as a number of wonderful opportunities to sample some of Dublin’s cultural attractions. Even more than the delights of the “fair city”, I was especially interested in one important theme of the conference which explored supporting a FAIR data culture. Inspired by the many valuable contributions, this post outlines some of the key insights presented on this topic.
An excellent hook around which to frame this review is, I think, offered by the figure below capturing results from the FAIRsFAIR open consultation on Policy and Practice, which was one focus of Joy Davidson’s illuminating overview of developments in this area. The top factor influencing researchers to make data FAIR, when we take both positive points on the scale together, is the level of support provided.
So, let’s take a closer look at some of the key developments and opportunities for data services to enhance support for FAIR culture, bearing in mind of course that, when it comes to shaping service developments, local solutions must be informed by local contexts taking into account factors such as research strategy, available resources and service demand.
Enhancing the FAIR Support Infrastructure
That making data FAIR is an endeavour shared by researchers and data services was neatly illustrated by Sarah Jones. Her conclusion that equal, if not more, responsibility lies with data services gives cause to reflect on where and how we may need to raise our capabilities.
Let’s look here at three areas of opportunity for developing our support mechanisms around data stewardship, institutional repositories, and training.
Professionalising Data Stewardship
In 2016, Barend Mons predicted that 500,000 data stewards would need to be trained in Europe over the following decade to ensure effective research data management. Given this sort of estimate, it’s clear that our ability to build and scale data stewardship capability will be critical if we agree that data stewardship and data management skills are key enablers for research. Two particularly interesting developments in this area were presented.
Mijke Jetten outlined one project that examined the data steward function in terms of tasks and responsibilities, and the competencies required to deliver on these. The objective is a common job description, which then offers a foundation from which to develop a customised training and development pathway – informed of course by FAIR data principles, since alignment with FAIR is seen as a central tenet of good data stewardship. Although the project focused on life sciences in the Netherlands, its insights are highly transferable to other research domains.
Equally transferable is the pilot project highlighted by the Conference’s “best paper” from Virginia Tech, which described an innovative approach to addressing the challenge of effectively resourcing support across the data lifecycle in the context of ever-growing demand for services. Driven by the University Libraries, the DataBridge programme trains and mentors students in key data science skills to work across disciplines on real-world research data challenges. This approach not only provides valuable and scalable support for the research process, but serves also to develop data champions of the future, skilled and invested in FAIR data principles.
Leveraging Institutional Data Repositories
As a key part of the research data infrastructure, it’s clear that institutional data repositories (IRs) have an important role to play in promoting FAIR. Of course, researcher engagement and expertise are crucial to this end – as we rely on them to create good metadata and documentation that will facilitate discovery and re-use of their data.
In terms of fostering engagement, inspiring trust in an IR would seem to be an important fundamental, and formal certification is one way to build researchers’ confidence that their data will be well-cared for in the longer term by their repository. Ilona von Stein outlined one such certification framework, the CoreTrustSeal, which seems particularly useful since there’s a strong overlap between its requirements and FAIR principles. In terms of enhancing a repository’s reputation, one important post-Conference development worth noting is the recent publication of the TRUST Principles for digital repositories which offers a framework for guiding best practice and demonstrating IR trustworthiness.
Ilona also pointed to ongoing developments in terms of tools to support pre- and post-deposit assessment of data FAIRness. SATIFYD, for example, is an online questionnaire that helps researchers evaluate, at pre-deposit stage, how FAIR their dataset is and offers tips to make it more so. Developed by DANS, a prototype of this manual self-assessment tool is currently available with plans in the offing to enable customisation for local contexts and training. One to watch out for too is the development of a post-publication, automated evaluation tool to assess datasets for their level of FAIRness over time and create a scoring system to indicate how a given dataset performs against a set of endorsed FAIR metrics.
Another fundamental to think about is how skilled our researchers may or may not be when it comes to metadata creation as well as their level of tolerance for this task. Joao Castro made the point that researchers typically regard spending more than 20 minutes on this activity as time-consuming.
This observation came out of a project at the University of Porto to engage researchers in the RDM process and underlines the need to think creatively about how we, as data professionals, can enhance the support we offer. Joao described how the provision of a consultancy-type service had been explored to support researchers in using domain-specific metadata to describe their data. Underpinned by DENDRO, an open-source collaborative RDM platform, this service was well received by researchers across a range of disciplines and served to develop their knowledge / skills in metadata production, as well as raising FAIR awareness more broadly.
Maximising Training Impact
Of course, beyond raising awareness it’s clear that the upskilling of researchers through curriculum development and training is an essential step on the road to FAIR – a key question, however, is how do we make the most of our training efforts?
Daniel Bangert helpfully summarised findings from a landscape analysis of FAIR in higher education institutions and recommended focusing FAIR training initiatives on early career researchers (ECRs). This would seem to be a particularly powerful approach for affecting ‘ground up’ culture change, since ECRs are typically involved in operational aspects of research and will become the influential researchers of tomorrow.
This same report suggests that training and communication regarding FAIR should be couched within the wider framework of research integrity and open research. Framing data management training initiatives in this way provides important context and pre-empts the risk that it will be seen purely as a compliance issue.
As an interesting aside, an extensive research integrity landscape study, commissioned by UK Research and Innovation and published post-Conference, identified ‘open data management’ as the overall most popular topic for future training – a useful channel perhaps then through which to deliver and extend reach in the UK context at least.
Both Daniel and Elizabeth Newbold highlighted the need to draw on and share best practices and existing materials, where available. Subsequent workshop discussions strongly agreed with this sentiment but noted the challenges in finding and/or repurposing existing FAIR training, guidance and resources e.g. for a specific audience or level of knowledge. Indeed, it would seem sensible that FAIR principles should be applied to FAIR training materials!
So, hopefully plenty of food for thought and ideas for practical next steps here to adapt for your local context, wherever you are on the road to FAIR. While the challenges to creating a FAIR data culture are many, broad and complex, we can take heart not only from the many examples of sterling work underway, but also from the highly collaborative spirit across the data services community. In the context of increasing demands on tight resources, this will serve us well as we drive the FAIR agenda.
The University is purchasing a new costing tool for research projects. In order to provide some more information about the costing tool and what it can be used for, we sat down to have a conversation about how helpful it will be for costing research projects, with a focus on research data.
This podcast brings together people working in Research IT, in the Research Support Offices and the Research Data Management team in the Library. We talk about the costing tool, the finance implications of proper costing and the viewpoint of various funders on managing costing requirements at the start of your project and how a data management plan (DMP) can help.
It’s been a year since we launched Open Access+, an enhancement to our open access support services that aims to help University of Manchester researchers raise the visibility of their work. Since March 2019, 397 papers have been opted in, we’ve tweeted over 2,000 times from @UoMOpenAccess and generated 380 unique Communities of Attention reports. You might even have seen Scott Taylor’s excellent UKSG Insights article about the service.
The idea behind the Communities of Attention reports was simple. If a Twitter account is tweeting frequently about papers published in x journal, it’s likely that the account is either a) a bot, b) the journal’s marketing team or, more interestingly, c) someone who is very interested in research in that field. This approach obviously works better for journals with a narrower scope, though there’s a lot to be said for broadening your network. Armed with this information, our researchers could (hopefully) identify the leading voices in their field or at least find some useful accounts to follow.
You might have noticed that I’ve been talking about the Communities of Attention reports in the past tense. Here’s why. The reports were generated via a time consuming process which involved Python scripts, APIs and lots of manual editing of CSV files. We received some positive feedback and, as there wasn’t any other way that our researchers could get this information, we thought this work was worthwhile.
Recently, partly inspired by the work we’ve been doing (we think!), Altmetric introduced the new “Mention Sources” feature. As a result, you’re now able to build your own much more sophisticated Communities of Attention report in just a few clicks. It’s really cool! You can select multiple journals and see which Twitter accounts, news platforms, blogs, etc. mention their papers most frequently. And much more besides. Here’s a short video of the feature in action.
In this video, I search for who’s tweeted most frequently about papers published in the journal Acta Astronautica and then drill down so I can see the top account and the associated tweets.
Rather than replicating what the Altmetric Explorer now does and presenting that information in a spreadsheet, we’ve decided it’d be better to just point our researchers to the Altmetric Explorer. Where we used to include Communities of Attention reports in emails to our researchers, we now include some instructions on making use of the new feature instead.
The Open Access+ service continues to go from strength to strength, however, and moving away from generating and circulating Communities of Attention reports will give us an opportunity to focus on more useful activities that will help our researchers raise the visibility of their work. We have exciting plans for the future that will help us do this. Watch this step!
As this is my first post for the Library Research Plus blog I’d like to introduce myself. I’m Bill Ayres, the new Strategic Lead for Research Data Management based here in the Main Library since January 2020. I previously worked in IT for over fifteen years, most of them in HE, and then more lately as Research Data Manager at the University of Salford. I’m passionate about open research in general, and how this can connect researchers to foster cross-disciplinary projects and also have real-world benefits for people who may not otherwise have access to scholarly findings, outputs and data (especially data).
One important thing:
Usually, I would link out to various examples and case studies When I Talk About Data Repositories, but I’m going to be more general here. We are currently considering various options that will provide a fully-fledged data repository for the University – this is a very good thing – so in the interests of impartiality and fairness I won’t mention any specific platforms or technology suppliers.
One less important thing:
Apologies to Haruki Murakami for borrowing / mangling his book title for this post. It felt like a good idea when I thought of it late at night, a bit less so now (but I can’t think of a better one).
What good can a good institutional data repository do?
From a system perspective, and for the libraries that run the service, it provides a home for “curated” files, datasets and other resources that support research findings and publications. It shouldn’t be a dumping ground for everything, but a place for these important research assets that allows them to be stored, published and preserved.
With some funders mandating that data supporting publications be made available open access, and others recommending this, a data repository can also provide a straightforward option to ensure that compliance is covered.
It can be a powerful complementary system to the main institutional repository, one which can link to this in many ways and provide an alternative route for discovery and reuse of outputs, but also have its own character and profile.
There are clear and logical integrations that the repository can have with other useful systems e.g. a main institutional repository to connect outputs and data so that there are persistent links between them. There are opportunities there for reporting and metrics that examine ways people search for and discover data and published outputs, and how these may differ. There is also an opportunity to add the home institution branding to data and create or strengthen an association that may not occur if data is always hosted on external or publisher repositories. These benefits of integration can also extend outwards to researcher focused platforms e.g. ORCID, DataCite and similar.
And from a security and administration perspective, implementing an institutional data repository can help ensure that research data is safe, secure, covered by ethics and related policies, and also can be subject to review or checking where appropriate.
What I’ve talked about so far – a focus on integration, compliance, security, and review processes – is all great from the point of view of the institutional teams who “own” the data repository, and we need these to effectively manage and support it. But experience tells us that any system or service intended, primarily, for use by researchers and academics has to provide real-world benefits to them, and *crucially* be easy to use, or they will not utilise it or engage with the service offering it relates to. That adds up to a wasted investment for the institution, and a missed opportunity to give researchers a great platform.
So what should the institutional data repository be doing for its primary users, researchers?
Alongside the ease of use mentioned, it can fill a fundamental gap by providing a platform to publish data more quickly, easily and effectively than via other routes. For a long time efforts have been focused on publishing research outputs as the final part of the lifecycle. But a good data repository can facilitate a “just in time” ability to make data available to a wider audience throughout the research lifecycle. Adopting a light touch approach to curation of data deposits means that researchers can choose to share initial data, illustrate novel mid-project findings with relevant datasets, and looking past their standard data types they can share conference resources like posters, slides, or videos.
Talking about video, a data repository can provide an excellent place to store and showcase file types that can bring research to life: images, audio files, video and film clips, and in some cases there will be functionality that can preview or render 3D models and complex graphical files.
Increasingly data repositories also provide the ability for researchers to create collections of their own (or related) data outputs; a curated selection of datasets that links to similar open access resources created by others in the discipline can provide a resource with great potential for reuse or further investigation.
Researchers often need a place to store data for the longer term too. Funders and institutional policies may mandate a 5 year, 10 year, or even indefinite preservation requirement for research data. It can make good technical and practical sense to integrate digital preservation into a data repository, from straightforward bit-level preservation to more holistic solutions which will automatically convert file types and formats as applications and technology move on. An institutional data preservation option can give researchers peace of mind that their data will survive for the long term.
From a perspective beyond the home institution
As a final thought on this topic, I’d like to reflect back on the principles which are at the heart of open research and open data, in making that data FAIR (Findable, Accessible, Interoperable and Reusable) and Open. Beyond the anticipated audience of researchers and academic investigators, a great data repository can be a powerful gateway for access and reuse by researchers in the developing world, healthcare professionals, or by members of the public. We often forget that the costs of journal subscriptions or other payment models to access outputs and data act as an impassable barrier to institutions or individuals that are unable to pay them. It’s our duty to make as wide a range of research data as possible freely and easily available as this can have benefits that go far beyond the original investigation or discipline.
Open Access Week is here again. Beyond all of the activities we’ve got planned, the week also gives us an opportunity to reflect on the progress that has been made in recent years. We’re also looking to the future and thinking about how we can help our researchers take advantage of all of the benefits of making their research openly available.
Recently I was privileged to be able to present at 6:AM, the Altmetrics conference, on the Library’s Open Access+ service. This service relies heavily on altmetrics of various kinds, and my talk (I hope) offered a useful case study for how altmetrics can be used to help remove barriers to research. Our work has predominantly focused on removing the paywall, arguably the most important barrier. But we’re now looking to help researchers reach their audiences more effectively, whoever they may be and whatever their barriers might look like.
In my talk I spoke a bit about what those barriers might be, and gave Discoverability and Language (someone used the term “comprehensibility” elsewhere at the conference and this is a better term) as examples of further barriers that might prevent audiences from engaging with research. These are the barriers that OA+ attempts to help our researchers remove, or at least dismantle a little bit.
Open Access+ is an opt-in enhancement to the open access support that we provide through the Open Access Gateway. Researchers that want to take advantage of OA+ can check the box that says “I would like to receive customised guidance to help raise the visibility of my paper once published, and for the Library to promote the paper via its social media channels”.
This funnels the paper down a slightly different workflow than the one we use for non-OA+ papers. I’ve broken down what OA+ does into the following three categories in order to stop this post from becoming very long.
Signposting All authors using the OA Gateway, whether they opt-in to OA+ or not, are presented with a deposit success screen. This signposts them to useful tools, platforms and services that can help raise the visibility of their work and help them reach their audiences.
We encourage researchers to think about creating non-technical summaries on Kudos, as well as looking into whether The Conversation might be a useful platform for them to talk about their work. There are services across the University that can help too: Engagement@Manchester can advise on public engagement activities and Policy@Manchester can help researchers to get their work in front of policymakers.
Connecting The first thing we do when a researcher opts in to OA+ is generate a Communities of Attention report. We take the last 1000 DOIs from the journal that the article has been accepted in and push them through (technical term) the Altmetric API using a fancy Python script that we developed in-house. After a bit of wrangling from our Research Metrics team, we’re left with a spreadsheet that shows the Twitter accounts that tweet most often about papers in that journal, as well as blogs and news platforms that frequently mention papers in that journal too.
We send these reports to the author that deposited the paper within 48 hours of us receiving their AAM via the OA Gateway. The reports help researchers to build their online networks, giving (hopefully) helpful suggestions as to which Twitter accounts to follow, and which blogs and news platforms they might be interested in keeping an eye on. These blogs and news platforms might be useful when thinking about their research communications plans too.
We use these reports later on for a slightly different purpose too.
Amplifying The number of scholarly research articles being published continues to increase and it’s becoming even harder to keep track of the research in your field, let alone adjacent fields. In an increasingly crowded “marketplace”, it’s getting harder for researchers to get attention for their work and there’s pressure on them to take more responsibility for effectively disseminating their findings.
Twitter’s really useful for sharing research. Many researchers are really good at Twitter, but we often hear from those who feel uncomfortable promoting their own work. Or who don’t really like the idea of using social media full stop. If an author opts in to OA+, we’ll put together a tweet thread about their paper once it’s been published.
We bring together and surface a load of the stuff that we promote and support right across the Research Services Division in this thread. We link to the OA version of the paper and research datasets. We include interesting Altmetric mentions as well – blog posts and news articles often report on research findings using much more accessible language. We tag in authors, research funders and other Faculty/School/research group accounts and include any subject-specific hashtags that we can find.
We try and help people to decide whether the paper might be of interest to them by including some snippets from full text in the thread too. We use a tool called Scholarcy for this – it uses AI to break a paper down into its most significant chunks. This 3-4 tweet abstract gives a bit of an introduction to the paper and aims to persuade people idly scrolling through Twitter that they should click on the link and read the full text.
Finally, we go back to the Communities of Attention report we prepared earlier and tag in some of the Twitter accounts that we identified. We were a bit nervous about this approach (are we spamming people?) but the feedback we’ve had suggests that this is really useful for both the author and the people tagged in. Phew! Generally, feedback for these threads has been great.
Conclusion As I said in my talk at 6:AM, OA+ isn’t going to improve how we do research communications at Manchester overnight. There’s lots more that we can do to help our researchers’ ideas “travel further” (that’s an expression that I stole from Andy Miah), and audiences that we haven’t quite cracked yet. We’re already thinking about the next phase of the project and what that might look like: should we just go ahead and tweet about every paper we get through the Open Access Gateway? Is that even possible? Should we think about using other platforms to reach those who aren’t on Twitter? There’s lots to think about.
In the meantime, we are noticing subtle changes in behaviour. Researchers are starting to adopt some of our techniques when it comes to tweeting about their work, and we’re getting good engagement with our tweet threads. We’re also increasingly getting suggestions from researchers for things they’d like us to include in these threads, which makes things a lot easier for us! We’d like to encourage more collaboration when it comes to putting these threads together, and that’s something we’ll be trying to facilitate moving forward.
Journal format theses are becoming ever more popular,
enabling the incorporation of work suitable for publication in a peer-reviewed
journal. This increase in popularity has led to concerns that some eTheses may
not adhere to publisher self-archiving policies. This is particularly relevant
for us as the University is committed to ensuring as wide an audience as
possible can read and access research outputs and has an Open Access policy
requiring all Postgraduate Research eTheses to made Open Access no longer than
12 months after submission.
We decided to investigate whether this concern was warranted
and determine whether there was a need for our team to increase knowledge of
self-archiving amongst our students. We found a total of 671 journal format
theses had been submitted, with the majority of these (575) from students in
the Faculty of Science and Engineering. Of these, a representative sample of 50
was taken for analysis and we looked at whether the correct version and embargo
period had been used. The results show that 8% of students had included an incorrect
version of the paper and 34% had applied the wrong embargo period.
Following these results we decided to provide additional
guidance on our website to advise students how to make their work Open Access,
while still meeting publisher requirements around self-archiving.
We added a new page explaining additional considerations for
journal format theses and produced a detailed, downloadable guidance document.
This document explains where to find information about the publisher’s
self-archiving policy and how to apply this information. We also created a
decision tree using Typeform which is a more interactive way to determine how
to comply with the publisher policy and also acts a prompt to ensure students
have obtained all the information they require.
We hope that this new guidance will assist those students
submitting a journal format thesis and minimise the risk that students will
include the wrong article version or apply an incorrect embargo. Of course, students
can always contact us for further support.
Making data more findable is the bedrock of much of research data management and we aim to make this easy and simple for researchers to do in practice. Ever on the look out to do just this, we were delighted to spot an opportunity to take our University’s data catalogue to the next level.
The data catalogue comprises our CRIS (Pure) Datasets module, which allows researchers to capture details of their datasets, and the public facing portal (Research Explorer), which allows these datasets to be searched. When the data catalogue was originally set up it could be populated either by automated metadata feeds for datasets deposited in the data repository recommended by The University of Manchester, or by manually inputting metadata for datasets deposited in external data repositories. However, recognising that this manual input duplicates effort, is time consuming and requires some familiarity with Pure, we began to think about how we could make this process faster and easier for researchers.
Our solution? A Research Data Gateway.
Gateway to data heaven
The Research Data Gateway service allows researchers to input a dataset DOI to an online form, view the associated metadata to confirm its veracity, and then submit the dataset record to the Library, who populates Pure on the researcher’s behalf. Wherever possible our Content, Collections and Discovery (CCD) team enriches the record by linking with related research outputs, such as articles or conference proceedings, and this record displays in both Research Explorer and all relevant Researcher Profiles.
The screen capture below illustrates how the Research Data Gateway works in practice from the researcher’s perspective up to the point of submission, a process that usually takes about 15 seconds!
Figure 1: Animated screen capture of Research Data Gateway
In addition to delivering a service that reduces researchers’ workload, the Research Data Gateway increases the discoverability and visibility of externally deposited datasets together with their associated publications. In turn, this increases the likelihood that these research outputs will be found, re-used and cited. Moreover, since most funders and an increasing number of journals require the data that underlies papers to be shared, the Gateway helps researchers reap the maximum reward from this requirement.
The nuts and bolts
As you can see from above this is a very straight-forward process from the researcher’s perspective, but of course, behind the scenes there’s a little more going on.
As with most successful initiatives, making the Research Data Gateway happen was a truly collaborative effort involving a partnership across the Library’s Research Services (RS), Digital Technologies and Services (DTS) and Content, Collections and Discovery (CCD) teams, and the University’s Pure Support team. And this collaboration continues now in the ongoing management of the service. All Gateway-related processes have been documented and we’ve used a RACI matrix to agree which teams would be Responsible, Accountable, Consulted and Informed for any issues or enquiries that might arise.
Some technical challenges and work-arounds
As might be expected, we encountered a number of small but challenging issues along the way:
Datasets may be associated with tens or even hundreds of contributors which can make these records time-consuming to validate. This was a particular problem for high energy physics datasets for instance. For efficiency, our solution is to record individual contributors from this University, and then add the name of the collaboration group.
Multiple requests for a single dataset record are sometimes submitted to Pure especially if a record has multiple contributors. To resolve this, approvals by the CCD team include a check for duplicates, and the service informs relevant researchers before rationalising any duplicates to a single record.
A limitation of the Gateway is that it doesn’t accommodate datasets without a DOI. So further work is needed to accommodate repositories, such as GenBank, that assign other types of unique and persistent identifiers.
Feedback on the Gateway has been consistently positive from researchers and research support staff; its purpose and simple effectiveness have been well-received and warmly appreciated.
However, getting researchers engaged takes time, persistence and the right angle from a communications perspective. It’s clear that researchers may not perceive a strong incentive to record datasets they’ve already shared elsewhere. Many are time poor and might reasonably question the benefit of also generating an institutional record. Therefore effective promotion continues to be key in terms of generating interest and engagement with the new Gateway service.
We’re framing our promotional message around how researchers can efficiently raise the profile of their research outputs using a suite of services including our Research Data Gateway, our Open Access Gateway, the Pure/ORCID integration, and benefit from automated reporting on their behalf to Researchfish. This promotes a joined up message explaining how the Library will help them raise their profile in return for – literally – a few seconds of their time.
We’re also tracking and targeting researchers who manually create dataset records in Pure to flag how the Research Data Gateway can save them significant time and effort.
In addition, to further reinforce the benefits of creating an institutional record, we ran a complementary, follow-up project using Scholix to find and record externally deposited datasets without the need for any researcher input. Seeing these dataset records surface in their Researcher Profiles, together with links to related research outputs is a useful means of generating interest and incentivising engagement.
These two approaches have now combined to deliver more than 5,000 data catalogue records and growing, with significant interlinking with the wider scholarly record. As noted, both routes have their limitations and so we remain on the lookout for creative ways to progress this work further, fill any gaps and make data ever more findable.
Love is all around us this week it seems. Coinciding with Valentine’s Day, by chance or otherwise, this is also Love Data Week. So, we thought we’d share how we’ve been loving our data by making it more visible, shareable and re-usable!
This is an area of growing interest across the RDM community and if you, like us, are kept awake at night by questions such as how do you identify your institution’s datasets in external repositories or what’s the most efficient way to populate your CRIS with metadata for those datasets, then read on to learn how we’ve been meeting these sorts of challenges.
At the University of Manchester (UoM), the Library’s Research Data Management team has been using Scholix to find UoM researcher data records and make them available in the University’s data catalogue and Researcher Profiles, which are publicly available and serve as a showcase for the University’s research.
We saw here an opportunity not only to increase further the visibility of the University’s research outputs but also to encourage researchers to regard data more seriously as a research output. We also had in mind the FAIR Principles and were keen to support best practice by researchers in making their data more findable.
The headline result is the addition of more than 4,500 data records to the UoM CRIS (Pure), with reciprocal links between associated data and publication records also being created to enrich the University’s scholarly record.
So how did we go about this…
Following the launch in 2017 of the University’s Pure Datasets module, which underpins our institutional data catalogue (Research Explorer) and automatically populates Researcher Profiles, we created services to help researchers record their data in Pure with as little manual effort as possible. (To illustrate, see my companion blog post: Finding Data, Made Simple: Building a Research Data Gateway.) We’re delighted to see these services being well-received and used by our research community!
But what about historical data, we wondered?
We knew most researchers wouldn’t have the time or inclination to record details of all their previous data without a strong incentive and, in any case, we wanted to spare them this effort if at all possible. We decided to investigate just how daunting or not this task might be and made the happy discovery that the Scholix initiative had done lots of the work for us by creating a huge database linking scholarly literature with their associated datasets.
Working with a number of key internal and external partners, we used open APIs to automate / part-automate the process of getting from article metadata to tailored data records (see Figure 1).
Figure 1. Process summary: making research data visible
To generate and process the article metadata from Scopus we partnered with the Library’s Research Metrics, and Digital Technologies and Services teams. We submitted the article DOIs to Scholix via its open API which returned metadata (including DOIs) of the associated research data. Then using the DataCite open API we part-automated the creation of tailored data records that mirrored the Pure submission template (i.e. the records contained the relevant metadata in the same order). This saved our Content, Collections and Discovery team lots of time when manually inputting the details to Pure, before validating the records to make them visible in Research Explorer and Researcher Profiles.
Partnering with the University’s Directorate of Research and Business Engagement and Elsevier, we followed the same steps to process the records sourced from Pure. Elsevier was also able to prepare tailored data records for bulk upload directly into Pure which further streamlined the process.
Some challenges and lessons learned…
Manchester researchers like to share, especially if we can make it easy for them! Seeing the amount of data being shared across the institution is bringing us a lot of joy and a real sense of return on investment. In terms of staff time, which amounts to approximately 16 FTE weeks to upload, validate and link data in Pure, plus some additional time to plan and implement workflows. Cross-team working has been critical in bringing this project towards successful completion, with progress relying on the combined expertise of seven teams. In our view, the results more than justify this investment.
Of course, there are limitations to be addressed and technical challenges to navigate.
Initiatives, such as the COPDESS Enabling FAIR Data Project, that are bringing together relevant stakeholders (data communities, publishers, repositories and data ecosystem infrastructure) will help ensure that community-agreed metadata is properly recorded by publishers and repositories, so that it can feed into initiatives like Scholix and make our ‘downstream’ work ever more seamless. Widespread engagement for use of open identifiers would also make our work much easier and faster, in particular identifiers for researchers (ORCID) and research organisations (RoR). As ever, increased interoperability and automation of systems would be a significant step forward.
There are practical considerations as well. For instance, how do we treat data records with many researchers, which are more time-consuming to handle? How do we prepare researchers with lots of datasets for the addition of many records to their Researcher Profiles when there is such variation in norms and preferences across disciplines and individuals? How should we handle data collections? What do we do about repositories such as Array Express that use accession numbers rather than DOIs, as Scholix can’t identify data from such sources. And since Scholix only finds data which are linked to a research article how do we find data which are independent assets? If we are really serious about data being an output in their own right then we need to develop a way of doing this.
So, there’s lots more work to be done and plenty of challenges to keep us busy and interested.
In terms of the current phase of the project, processing is complete for data records associated with UoM papers from Scopus, with Pure records well underway. Researcher engagement is growing, with plenty of best practice in evidence. With REF 2021 in our sights, we’re also delighted to be making a clear contribution towards the research environment indicators for Open Data.
Update: We are openly sharing code that was created for this project via Github so that others can also benefit from our approach.