Monthly Archives: February 2019

From Couch to Almost 5K: Raising Research Data Visibility at The University of Manchester

Love is all around us this week it seems. Coinciding with Valentine’s Day, by chance or otherwise, this is also Love Data Week. So, we thought we’d share how we’ve been loving our data by making it more visible, shareable and re-usable!

This is an area of growing interest across the RDM community and if you, like us, are kept awake at night by questions such as how do you identify your institution’s datasets in external repositories or what’s the most efficient way to populate your CRIS with metadata for those datasets, then read on to learn how we’ve been meeting these sorts of challenges.

At the University of Manchester (UoM), the Library’s Research Data Management team has been using Scholix to find UoM researcher data records and make them available in the University’s data catalogue and Researcher Profiles, which are publicly available and serve as a showcase for the University’s research.

We saw here an opportunity not only to increase further the visibility of the University’s research outputs but also to encourage researchers to regard data more seriously as a research output. We also had in mind the FAIR Principles and were keen to support best practice by researchers in making their data more findable.

The headline result is the addition of more than 4,500 data records to the UoM CRIS (Pure), with reciprocal links between associated data and publication records also being created to enrich the University’s scholarly record.

So how did we go about this…

Following the launch in 2017 of the University’s Pure Datasets module, which underpins our institutional data catalogue (Research Explorer) and automatically populates Researcher Profiles, we created services to help researchers record their data in Pure with as little manual effort as possible. We’re delighted to see these services being well-received and used by our research community!

But what about historical data, we wondered?

We knew most researchers wouldn’t have the time or inclination to record details of all their previous data without a strong incentive and, in any case, we wanted to spare them this effort if at all possible. We decided to investigate just how daunting or not this task might be and made the happy discovery that the Scholix initiative had done lots of the work for us by creating a huge database linking scholarly literature with their associated datasets.

Working with a number of key internal and external partners, we used open APIs to automate / part-automate the process of getting from article metadata to tailored data records (see Figure 1).

Figure 1. Process summary: making research data visible

ProcessScholix

To generate and process the article metadata from Scopus we partnered with the Library’s Research Metrics, and Digital Technologies and Services teams. We submitted the article DOIs to Scholix via its open API which returned metadata (including DOIs) of the associated research data. Then using the DataCite open API we part-automated the creation of tailored data records that mirrored the Pure submission template (i.e. the records contained the relevant metadata in the same order). This saved our Content, Collections and Discovery team lots of time when manually inputting the details to Pure, before validating the records to make them visible in Research Explorer and Researcher Profiles.

Partnering with the University’s Directorate of Research and Business Engagement and Elsevier, we followed the same steps to process the records sourced from Pure. Elsevier was also able to prepare tailored data records for bulk upload directly into Pure which further streamlined the process.

Some challenges and lessons learned…

Manchester researchers like to share, especially if we can make it easy for them! Seeing the amount of data being shared across the institution is bringing us a lot of joy and a real sense of return on investment. In terms of staff time, which amounts to approximately 16 FTE weeks to upload, validate and link data in Pure, plus some additional time to plan and implement workflows. Cross-team working has been critical in bringing this project towards successful completion, with progress relying on the combined expertise of seven teams. In our view, the results more than justify this investment.

Of course, there are limitations to be addressed and technical challenges to navigate.

Initiatives, such as the COPDESS Enabling FAIR Data Project, that are bringing together relevant stakeholders (data communities, publishers, repositories and data ecosystem infrastructure) will help ensure that community-agreed metadata is properly recorded by publishers and repositories, so that it can feed into initiatives like Scholix and make our ‘downstream’ work ever more seamless. Widespread engagement for use of open identifiers would also make our work much easier and faster, in particular identifiers for researchers (ORCID) and research organisations (RoR). As ever, increased interoperability and automation of systems would be a significant step forward.

There are practical considerations as well. For instance, how do we treat data records with many researchers, which are more time-consuming to handle? How do we prepare researchers with lots of datasets for the addition of many records to their Researcher Profiles when there is such variation in norms and preferences across disciplines and individuals?  How should we handle data collections? What do we do about repositories such as Array Express that use accession numbers rather than DOIs, as Scholix can’t identify data from such sources. And since Scholix only finds data which are linked to a research article how do we find data which are independent assets? If we are really serious about data being an output in their own right then we need to develop a way of doing this.

So, there’s lots more work to be done and plenty of challenges to keep us busy and interested.

In terms of the current phase of the project, processing is complete for data records associated with UoM papers from Scopus, with Pure records well underway. Researcher engagement is growing, with plenty of best practice in evidence. With REF 2021 in our sights, we’re also delighted to be making a clear contribution towards the research environment indicators for Open Data.

Working with the UK Data Service to support researchers with managing and sharing research data from human participants

Following the introduction of GDPR last May the Research Services team have been getting more and more enquiries about how to handle sensitive data, so we invited  Dr Scott Summers from the UK Data Service (UKDS) to visit us and deliver a one-day workshop on ‘Managing and sharing research data from human participants’. My colleague, Chris Gibson, worked with Scott to develop and arrange the session. It was a thoroughly engaging and informative day, with lots of opportunity for discussion.

The workshop attracted a group of 30 to come along and learn more about best practice for managing personal data. We invited colleagues from across all faculties and ensured that there was a mix of established and early career researchers, postgraduate researchers and professional services staff that support research data management. As well as getting advice to help with data management, the aim was to gather feedback from attendees to help us to shape sessions that can be delivered as part of the Library’s My Research Essentials programme by staff from across the University including Research Services, Information Governance and Research IT.  

As a fairly new addition to the Research Services team, I was keen to attend this workshop. The management of research data from human participants is a complex issue so any opportunity to work with the experts in this field is very valuable. My job involves working with data management plans for projects which often include personal data so gaining a deeper understanding of the issues involved will help me to provide more detailed advice and guidance.

The workshop began with looking at the ethical and legal context around gathering data. This is something that has been brought sharply into focus with the introduction of GDPR. We use ‘public task’ as our lawful basis for processing data but it was interesting to hear that ‘consent’ may be more prevalent as the preferred grounds in some EU countries. Using public task as a basis provides our participants with reassurance that the research is being undertaken in the public interest and means researchers are not bound by the requirement to refresh consent.

The session on informed consent led to lively discussion about how to be clear and specific about how and what data will be used when research may change throughout a project. One solution for longitudinal studies may be process consent – including multiple points of consent in the study design to reflect potential changing attitudes of participants. Staged consent is an option for those wanting to share data but give participants options. The main point that arose from this session is that we should aim to give participants as much control over their data as possible without making the research project so complicated as to be unworkable.

The final session generated debate around whether we can ever truly anonymise personal data. We worked through exercises in anonymising data. It quickly became apparent that when dealing with information relating to people, there are many aspects that could be identifying and in combination even seemingly generic descriptors can quickly narrow down to a small subset of participants. For example, ‘Research Officer’ is a term that could apply to a large group of people but mention this in relation to ‘University of Manchester Library’ and it quickly reduces to a subset of 3 people! The general consensus was that referring to data as ‘de-identified’ or ‘de-personalised’ would be more accurate but that these descriptions may not be as reassuring to the participants so it is imperative that consent forms are clear and unambiguous about how data will be used.

At the end of the session it was great to hear lots of positive feedback from researchers across many disciplines that the workshop took what could be quite a dry topic and made it engaging with numerous opportunities for discussion.

Our second workshop with Scott Summers is due to take place on 26th February and we are looking forward to gaining more feedback and insights into how we can enhance the support we deliver to researchers who are managing research data from human participants – so, watch this space!

Plan S feedback

UoM_image

On Friday I submitted the University of Manchester’s feedback on Plan S. We’d invited feedback from across campus so our response reflects views from a wide range of academic disciplines as well as those from the Library.

Our response could be considered informally as ‘Yes, but…’, ie, we agree with the overall aim but, as always, the devil’s in the detail.

Our Humanities colleagues expressed a number of reservations but noted “we are strongly in favour of Open Access publishing” and “we very much welcome the pressure, from universities and funders, on publishers to effect more immediate and less costly access to our research findings”.

The response from the Faculty of Biology, Medicine and Health also flagged concerns but stated “if Plan S is watered down, the pressure exerted on journal publishers may not be acute enough to force a profound shift in business model”.

A number of concerns raised assumed launch of Plan S based on the status quo. Updates from the Library have tried to reassure our academic colleagues that there’s work going on ‘behind the scenes’ which makes this unlikely and remind them that UK funder OA policies may not be exact replicas of Plan S.

We’ve been here before in the sense that when the UK Research Councils announced that a new OA policy would be adopted from April 2013, publishers amended their OA offer to accommodate the new policy requirements. Not every publisher of Manchester outputs did, but things did shift. For large publishers this happened fairly quickly, but for smaller publishers this took a bit longer, and in some cases required nudging by their academic authors.

It’s worth reflecting on how that policy played out as we consider Plan S: put simply, it cost a lot of money and most publishers didn’t provide options that fully met the Green OA requirements.

The key points in our response are concerns about affordability, Green OA requirements and the current ‘one size fits all’ approach. You can read it here: UoM_Plan-S_feedback.