Monthly Archives: March 2019

Using open citation data to identify new research opportunities

This year’s Annual Forum for the LIS-BIBLIOMETRICS mailing list took place at the British Library Knowledge Centre on 29 January 2019, and focused on the topic of ‘Open Metrics and Measuring Openness’.

As well as two keynotes, three parallel workshop sessions and a panel discussion featuring four participants, a session featuring five-minute ‘lightning talks’ gave nine of us a chance to give presentations on work relevant to the Forum topic.

Image 1
Stephen Pearson giving presentation at LIS-BIBLIOMETRICS Annual Forum 2019

Unlike most of the other offerings at the Forum, my talk wasn’t about the use of metrics in evaluations or about how to measure openness. Instead, I talked about the way in which large open sets of citation data can give interesting information about patterns of citation. I reported on some initial exploratory work we’ve done to see whether this information can help identify new research opportunities.

Where does inspiration for research come from?

Image 2
Isaac Newton

Inspiration for research can come from many different sources. For example, going back about 350 years, the act of noticing that an apple always falls perpendicularly to the ground could lead you to muse whether the earth had some power of attraction which caused this, and ultimately develop the law of universal gravitation.

Of course it’s still possible to come up with new ideas and discoveries on the basis of such ‘Eureka’ moments. But an increasing focus on interdisciplinarity has led to a situation of which it’s been said that ‘revolutionary scientific discoveries … are often the result of connecting ideas that have their origin in different disciplines.’

That quotation is from an article entitled ‘Interdisciplinary Research Boosted by Serendipity’. But do we have to rely on serendipity to discover that ideas from one discipline can be profitably applied in another field in a novel fashion? Or is there a way of systematically identifying such potential links?

I’d suggest that the technique of bibliographic coupling can help.

Bibliographic coupling

Image 3
Diagram illustrating bibliographic coupling

If two documents share several references in common, as Documents A and B do, then those documents are ‘bibliographically coupled’. And there’s at least a possibility that the two Citing Documents are using similar approaches to the research questions they’re respectively addressing.

In many cases the two Citing Documents will be by researchers who are addressing the same research question, or very closely related questions, and so the sharing of references has no deeper significance. A potentially more significant scenario occurs when the two Citing Documents are by researchers working in somewhat different fields. In that situation, the bibliographic coupling is a pointer to at least the possibility of a previously unidentified cross-disciplinary research connection.

COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations

So, what citation database could we use to try to identify such connections? The most well-known ones are the commercial products Web of Science and Scopus, but there are at least two barriers to using them for this type of work. The first is the need to pay a subscription cost to use them at all. The second is the limit to the number of records one can download. This makes it difficult to amass a dataset big enough to allow the required ‘mining’ for links.

We used COCI, a dataset created by the OpenCitations organisation. This was originally known as the Crossref Open Citations Index but it’s now the OpenCitations Index of Crossref open DOI-to-DOI citations.

Updated at least every six months, this dataset comprises all the DOI-to-DOI citations specified by open references in Crossref, which currently amounts to almost 450 million DOI-to-DOI citation links based on over 45 million bibliographic resources.

Looked at in terms of proportional coverage and from our own institution’s point of view, it comprises c.8,000 University of Manchester papers published since 2014 (which makes up around one-third of the total), together with their citation references.

This figure gives an example of the information which COCI provides, for each of the more than 45 million bibliographic resources whose DOIs it contains.

Image 4
Diagram illustrating information in COCI

The data is available in several formats. We downloaded it as a CSV ‘dump’, and then filtered it to extract only those records where the DOI of the citing paper matched that of a paper in our own institutional publication records. This table gives some examples of the information which we then combined with the COCI information.

Image 4B
Table illustrating Manchester metadata for enhancing information from COCI

This meant that we were able to create a rich dataset which comprised c.8,000 records for University of Manchester papers, each in the following format.

Image 5
Diagram illustrating information in COCI enhanced with Manchester metadata

Pointers to new research collaboration possibilities?

A colleague from the Library’s Digital Technologies and Services team wrote a program which pulled out all pairs of bibliographically coupled papers which had at least two references in common but where the authors came from different Faculties. We then used the free VOSViewer software to produce the following visualisation of bibliographical coupling links between publications by Schools/Divisions in different Faculties. (The closeness of the nodes is proportional to the strength of the bibliographic coupling.)

Image 6
Network visualisation of bibliographical coupling links between publications

What does this show? Here are two examples.

  • The two closely juxtaposed purple nodes at the bottom of the visualisation show that papers from the Division of Information, Imaging and Data Sciences (in the Faculty of Biology, Medicine and Health) and the School of Electrical and Electronic Engineering (in the Faculty of Science and Engineering) shared references in common.
  • The two closely juxtaposed green nodes at the left of the visualisation show that papers from the School of Mechanical, Aerospace and Civil Engineering (in the Faculty of Science and Engineering) and the School of Environment, Education and Development (in the Faculty of Humanities) shared references in common.

Do they highlight previously unsuspected opportunities for innovative new cross-disciplinary research? Unfortunately, no.

  • The juxtaposed purple nodes simply reflected the fact that closely related algorithmic approaches to medical diagnosis and to computer vision are used in both the Division of Information, Imaging and Data Sciences and the School of Electrical and Electronic Engineering. Although this is interesting, it’s not the kind of unsuspected connection we’d hoped to uncover.
  • Similarly, the juxtaposed green nodes show that approaches to the optimisation of land, water and energy use are an area of interest both to researchers in Civil Engineering and to those in Environment, Education and Development. Again, this isn’t an unsuspected connection which the bibliographic coupling has surprisingly brought to light.
Image 7
Nobel Prize Medal (Nobel-Prize CC-BY Abhijit Bhaduri via Flickr)

We’re not expecting to hear Manchester’s next Nobel Prize winners thanking us for bibliometric work which first alerted them to the possibility for a ground-breaking collaboration in their acceptance speech any time soon. However, the way in which this work highlighted related research being carried out in different Faculties (however unsurprising the specific examples) serves as an encouraging proof of concept.