All posts by Stephen Pearson

About Stephen Pearson

I qualified as a librarian in 1990, worked as Research Assistant at the Library and Information Statistics Unit (LISU) at Loughborough University for a year, and since 1991 have worked at The University of Manchester Library in various roles. A subject librarian in the Arts Team for much of this time, I took up the role of Research Information Analyst in the summer of 2012. As part of a change in focus for the Library's support of research and learning, from subject-oriented to service-oriented, I have initiated and developed the Library's provision of citation services to the University.

Using open citation data to identify new research opportunities

This year’s Annual Forum for the LIS-BIBLIOMETRICS mailing list took place at the British Library Knowledge Centre on 29 January 2019, and focused on the topic of ‘Open Metrics and Measuring Openness’.

As well as two keynotes, three parallel workshop sessions and a panel discussion featuring four participants, a session featuring five-minute ‘lightning talks’ gave nine of us a chance to give presentations on work relevant to the Forum topic.

Image 1
Stephen Pearson giving presentation at LIS-BIBLIOMETRICS Annual Forum 2019

Unlike most of the other offerings at the Forum, my talk wasn’t about the use of metrics in evaluations or about how to measure openness. Instead, I talked about the way in which large open sets of citation data can give interesting information about patterns of citation. I reported on some initial exploratory work we’ve done to see whether this information can help identify new research opportunities.

Where does inspiration for research come from?

Image 2
Isaac Newton

Inspiration for research can come from many different sources. For example, going back about 350 years, the act of noticing that an apple always falls perpendicularly to the ground could lead you to muse whether the earth had some power of attraction which caused this, and ultimately develop the law of universal gravitation.

Of course it’s still possible to come up with new ideas and discoveries on the basis of such ‘Eureka’ moments. But an increasing focus on interdisciplinarity has led to a situation of which it’s been said that ‘revolutionary scientific discoveries … are often the result of connecting ideas that have their origin in different disciplines.’

That quotation is from an article entitled ‘Interdisciplinary Research Boosted by Serendipity’. But do we have to rely on serendipity to discover that ideas from one discipline can be profitably applied in another field in a novel fashion? Or is there a way of systematically identifying such potential links?

I’d suggest that the technique of bibliographic coupling can help.

Bibliographic coupling

Image 3
Diagram illustrating bibliographic coupling

If two documents share several references in common, as Documents A and B do, then those documents are ‘bibliographically coupled’. And there’s at least a possibility that the two Citing Documents are using similar approaches to the research questions they’re respectively addressing.

In many cases the two Citing Documents will be by researchers who are addressing the same research question, or very closely related questions, and so the sharing of references has no deeper significance. A potentially more significant scenario occurs when the two Citing Documents are by researchers working in somewhat different fields. In that situation, the bibliographic coupling is a pointer to at least the possibility of a previously unidentified cross-disciplinary research connection.

COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations

So, what citation database could we use to try to identify such connections? The most well-known ones are the commercial products Web of Science and Scopus, but there are at least two barriers to using them for this type of work. The first is the need to pay a subscription cost to use them at all. The second is the limit to the number of records one can download. This makes it difficult to amass a dataset big enough to allow the required ‘mining’ for links.

We used COCI, a dataset created by the OpenCitations organisation. This was originally known as the Crossref Open Citations Index but it’s now the OpenCitations Index of Crossref open DOI-to-DOI citations.

Updated at least every six months, this dataset comprises all the DOI-to-DOI citations specified by open references in Crossref, which currently amounts to almost 450 million DOI-to-DOI citation links based on over 45 million bibliographic resources.

Looked at in terms of proportional coverage and from our own institution’s point of view, it comprises c.8,000 University of Manchester papers published since 2014 (which makes up around one-third of the total), together with their citation references.

This figure gives an example of the information which COCI provides, for each of the more than 45 million bibliographic resources whose DOIs it contains.

Image 4
Diagram illustrating information in COCI

The data is available in several formats. We downloaded it as a CSV ‘dump’, and then filtered it to extract only those records where the DOI of the citing paper matched that of a paper in our own institutional publication records. This table gives some examples of the information which we then combined with the COCI information.

Image 4B
Table illustrating Manchester metadata for enhancing information from COCI

This meant that we were able to create a rich dataset which comprised c.8,000 records for University of Manchester papers, each in the following format.

Image 5
Diagram illustrating information in COCI enhanced with Manchester metadata

Pointers to new research collaboration possibilities?

A colleague from the Library’s Digital Technologies and Services team wrote a program which pulled out all pairs of bibliographically coupled papers which had at least two references in common but where the authors came from different Faculties. We then used the free VOSViewer software to produce the following visualisation of bibliographical coupling links between publications by Schools/Divisions in different Faculties. (The closeness of the nodes is proportional to the strength of the bibliographic coupling.)

Image 6
Network visualisation of bibliographical coupling links between publications

What does this show? Here are two examples.

  • The two closely juxtaposed purple nodes at the bottom of the visualisation show that papers from the Division of Information, Imaging and Data Sciences (in the Faculty of Biology, Medicine and Health) and the School of Electrical and Electronic Engineering (in the Faculty of Science and Engineering) shared references in common.
  • The two closely juxtaposed green nodes at the left of the visualisation show that papers from the School of Mechanical, Aerospace and Civil Engineering (in the Faculty of Science and Engineering) and the School of Environment, Education and Development (in the Faculty of Humanities) shared references in common.

Do they highlight previously unsuspected opportunities for innovative new cross-disciplinary research? Unfortunately, no.

  • The juxtaposed purple nodes simply reflected the fact that closely related algorithmic approaches to medical diagnosis and to computer vision are used in both the Division of Information, Imaging and Data Sciences and the School of Electrical and Electronic Engineering. Although this is interesting, it’s not the kind of unsuspected connection we’d hoped to uncover.
  • Similarly, the juxtaposed green nodes show that approaches to the optimisation of land, water and energy use are an area of interest both to researchers in Civil Engineering and to those in Environment, Education and Development. Again, this isn’t an unsuspected connection which the bibliographic coupling has surprisingly brought to light.
Image 7
Nobel Prize Medal (Nobel-Prize CC-BY Abhijit Bhaduri via Flickr)

We’re not expecting to hear Manchester’s next Nobel Prize winners thanking us for bibliometric work which first alerted them to the possibility for a ground-breaking collaboration in their acceptance speech any time soon. However, the way in which this work highlighted related research being carried out in different Faculties (however unsurprising the specific examples) serves as an encouraging proof of concept.

World map with Shanghai Ranking

Academic Ranking of World Universities: the Shanghai Jiao Tong table

Where in the world?

Shanghai Jiao Tong University (SJTU) has just published its latest annual Academic Ranking of World Universities (ARWU). This is one of the most widely used global league tables for universities. The top university (for the twelfth year running) is Harvard. The US has 52 universities in the Top 100, and the next-best performing country is the UK, with eight. Cambridge is the top UK university this year – ranked 5th in the world.

Our own university, Manchester, has risen three places to 38th in the world, and ranks 5th in the UK. Like many other universities, Manchester is keen to rise even higher in the rankings, and the Library’s Citation Services team is providing expert bibliometric analysis to help inform discussion of how to achieve this.

Measure for measure

The many global league tables, such as the ARWU and those produced by Times Higher Education and QS, all use different metrics for ranking universities, ranging from the number of publications produced by a university to its reputation for research excellence among its peers as measured by a survey.

Most of the metrics which the ARWU uses are based on how many articles a university publishes in top journals or how many citations these articles receive. These include

  • how many articles a university publishes in science and social science journals covered by the Web of Science, the most long-established multidisciplinary bibliographic database
  • how many articles a university publishes in Nature and Science, generally considered the most prestigious of all multidisciplinary science journals
  • how many of a university’s academics are included in the list (based on Web of Science) of Highly Cited Researchers.

Open the box!

A frequent early criticism of the ARWU was that, although SJTU gave details of the elements used for creating the ranking, it did not explain how these elements were converted into scores, and so the results were not reproducible. However, recent research has succeeded in reproducing the results, and has thus ‘opened the black box’ of the ARWU.