Finding Data, Made Simple: Building a Research Data Gateway

Making data more findable is the bedrock of much of research data management and we aim to make this easy and simple for researchers to do in practice. Ever on the look out to do just this, we were delighted to spot an opportunity to take our University’s data catalogue to the next level.

The data catalogue comprises our CRIS (Pure) Datasets module, which allows researchers to capture details of their datasets, and the public facing portal (Research Explorer), which allows these datasets to be searched. When the data catalogue was originally set up it could be populated either by automated metadata feeds for datasets deposited in the data repository recommended by The University of Manchester, or by manually inputting metadata for datasets deposited in external data repositories. However, recognising that this manual input duplicates effort, is time consuming and requires some familiarity with Pure, we began to think about how we could make this process faster and easier for researchers.

Our solution? A Research Data Gateway.

Gateway to data heaven

The Research Data Gateway service allows researchers to input a dataset DOI to an online form, view the associated metadata to confirm its veracity, and then submit the dataset record to the Library, who populates Pure on the researcher’s behalf. Wherever possible our Content, Collections and Discovery (CCD) team enriches the record by linking with related research outputs, such as articles or conference proceedings, and this record displays in both Research Explorer and all relevant Researcher Profiles.

The screen capture below illustrates how the Research Data Gateway works in practice from the researcher’s perspective up to the point of submission, a process that usually takes about 15 seconds!

Figure 1: Animated screen capture of Research Data Gateway

ResearchDataGateway

In addition to delivering a service that reduces researchers’ workload, the Research Data Gateway increases the discoverability and visibility of externally deposited datasets together with their associated publications. In turn, this increases the likelihood that these research outputs will be found, re-used and cited. Moreover, since most funders and an increasing number of journals require the data that underlies papers to be shared, the Gateway helps researchers reap the maximum reward from this requirement.

The nuts and bolts

As you can see from above this is a very straight-forward process from the researcher’s perspective, but of course, behind the scenes there’s a little more going on.

The basic workflow looks like this:

BLG_Gateway_Workflow_V1a

BLG_Gateway_Workflow_V1b

Once validated, the new dataset record automatically displays in both Research Explorer and the relevant Researcher Profile(s).

As with most successful initiatives, making the Research Data Gateway happen was a truly collaborative effort involving a partnership across the Library’s Research Services (RS), Digital Technologies and Services (DTS) and Content, Collections and Discovery (CCD) teams, and the University’s Pure Support team. And this collaboration continues now in the ongoing management of the service. All Gateway-related processes have been documented and we’ve used a RACI matrix to agree which teams would be Responsible, Accountable, Consulted and Informed for any issues or enquiries that might arise.

Some technical challenges and work-arounds

As might be expected, we encountered a number of small but challenging issues along the way:

  • Datasets may be associated with tens or even hundreds of contributors which can make these records time-consuming to validate. This was a particular problem for high energy physics datasets for instance. For efficiency, our solution is to record individual contributors from this University, and then add the name of the collaboration group.
  • Datasets may include mathematical symbols that don’t display correctly when records are created in Pure. Our solution has been to use MathJax, an open-source JavaScript display engine, that renders well with many modern browsers.
  • Multiple requests for a single dataset record are sometimes submitted to Pure especially if a record has multiple contributors. To resolve this, approvals by the CCD team include a check for duplicates, and the service informs relevant researchers before rationalising any duplicates to a single record.
  • A limitation of the Gateway is that it doesn’t accommodate datasets without a DOI. So further work is needed to accommodate repositories, such as GenBank, that assign other types of unique and persistent identifiers.

Some reflections

Feedback on the Gateway has been consistently positive from researchers and research support staff; its purpose and simple effectiveness have been well-received and warmly appreciated.

However, getting researchers engaged takes time, persistence and the right angle from a communications perspective. It’s clear that researchers may not perceive a strong incentive to record datasets they’ve already shared elsewhere. Many are time poor and might reasonably question the benefit of also generating an institutional record. Therefore effective promotion continues to be key in terms of generating interest and engagement with the new Gateway service.

We’re framing our promotional message around how researchers can efficiently raise the profile of their research outputs using a suite of services including our Research Data Gateway, our Open Access Gateway, the Pure/ORCID integration, and benefit from automated reporting on their behalf to Researchfish. This promotes a joined up message explaining how the Library will help them raise their profile in return for – literally – a few seconds of their time.

We’re also tracking and targeting researchers who manually create dataset records in Pure to flag how the Research Data Gateway can save them significant time and effort.

In addition, to further reinforce the benefits of creating an institutional record, we ran a complementary, follow-up project using Scholix to find and record externally deposited datasets without the need for any researcher input. Seeing these dataset records surface in their Researcher Profiles, together with links to related research outputs is a useful means of generating interest and incentivising engagement.

To learn how we did this see my companion blog post: From Couch to Almost 5K: Raising Research Data Visibility at The University of Manchester .

These two approaches have now combined to deliver more than 5,000 data catalogue records and growing, with significant interlinking with the wider scholarly record. As noted, both routes have their limitations and so we remain on the lookout for creative ways to progress this work further, fill any gaps and make data ever more findable.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s