Tag Archives: visualisation

Using open citation data to identify new research opportunities

This year’s Annual Forum for the LIS-BIBLIOMETRICS mailing list took place at the British Library Knowledge Centre on 29 January 2019, and focused on the topic of ‘Open Metrics and Measuring Openness’.

As well as two keynotes, three parallel workshop sessions and a panel discussion featuring four participants, a session featuring five-minute ‘lightning talks’ gave nine of us a chance to give presentations on work relevant to the Forum topic.

Image 1
Stephen Pearson giving presentation at LIS-BIBLIOMETRICS Annual Forum 2019

Unlike most of the other offerings at the Forum, my talk wasn’t about the use of metrics in evaluations or about how to measure openness. Instead, I talked about the way in which large open sets of citation data can give interesting information about patterns of citation. I reported on some initial exploratory work we’ve done to see whether this information can help identify new research opportunities.

Where does inspiration for research come from?

Image 2
Isaac Newton

Inspiration for research can come from many different sources. For example, going back about 350 years, the act of noticing that an apple always falls perpendicularly to the ground could lead you to muse whether the earth had some power of attraction which caused this, and ultimately develop the law of universal gravitation.

Of course it’s still possible to come up with new ideas and discoveries on the basis of such ‘Eureka’ moments. But an increasing focus on interdisciplinarity has led to a situation of which it’s been said that ‘revolutionary scientific discoveries … are often the result of connecting ideas that have their origin in different disciplines.’

That quotation is from an article entitled ‘Interdisciplinary Research Boosted by Serendipity’. But do we have to rely on serendipity to discover that ideas from one discipline can be profitably applied in another field in a novel fashion? Or is there a way of systematically identifying such potential links?

I’d suggest that the technique of bibliographic coupling can help.

Bibliographic coupling

Image 3
Diagram illustrating bibliographic coupling

If two documents share several references in common, as Documents A and B do, then those documents are ‘bibliographically coupled’. And there’s at least a possibility that the two Citing Documents are using similar approaches to the research questions they’re respectively addressing.

In many cases the two Citing Documents will be by researchers who are addressing the same research question, or very closely related questions, and so the sharing of references has no deeper significance. A potentially more significant scenario occurs when the two Citing Documents are by researchers working in somewhat different fields. In that situation, the bibliographic coupling is a pointer to at least the possibility of a previously unidentified cross-disciplinary research connection.

COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations

So, what citation database could we use to try to identify such connections? The most well-known ones are the commercial products Web of Science and Scopus, but there are at least two barriers to using them for this type of work. The first is the need to pay a subscription cost to use them at all. The second is the limit to the number of records one can download. This makes it difficult to amass a dataset big enough to allow the required ‘mining’ for links.

We used COCI, a dataset created by the OpenCitations organisation. This was originally known as the Crossref Open Citations Index but it’s now the OpenCitations Index of Crossref open DOI-to-DOI citations.

Updated at least every six months, this dataset comprises all the DOI-to-DOI citations specified by open references in Crossref, which currently amounts to almost 450 million DOI-to-DOI citation links based on over 45 million bibliographic resources.

Looked at in terms of proportional coverage and from our own institution’s point of view, it comprises c.8,000 University of Manchester papers published since 2014 (which makes up around one-third of the total), together with their citation references.

This figure gives an example of the information which COCI provides, for each of the more than 45 million bibliographic resources whose DOIs it contains.

Image 4
Diagram illustrating information in COCI

The data is available in several formats. We downloaded it as a CSV ‘dump’, and then filtered it to extract only those records where the DOI of the citing paper matched that of a paper in our own institutional publication records. This table gives some examples of the information which we then combined with the COCI information.

Image 4B
Table illustrating Manchester metadata for enhancing information from COCI

This meant that we were able to create a rich dataset which comprised c.8,000 records for University of Manchester papers, each in the following format.

Image 5
Diagram illustrating information in COCI enhanced with Manchester metadata

Pointers to new research collaboration possibilities?

A colleague from the Library’s Digital Technologies and Services team wrote a program which pulled out all pairs of bibliographically coupled papers which had at least two references in common but where the authors came from different Faculties. We then used the free VOSViewer software to produce the following visualisation of bibliographical coupling links between publications by Schools/Divisions in different Faculties. (The closeness of the nodes is proportional to the strength of the bibliographic coupling.)

Image 6
Network visualisation of bibliographical coupling links between publications

What does this show? Here are two examples.

  • The two closely juxtaposed purple nodes at the bottom of the visualisation show that papers from the Division of Information, Imaging and Data Sciences (in the Faculty of Biology, Medicine and Health) and the School of Electrical and Electronic Engineering (in the Faculty of Science and Engineering) shared references in common.
  • The two closely juxtaposed green nodes at the left of the visualisation show that papers from the School of Mechanical, Aerospace and Civil Engineering (in the Faculty of Science and Engineering) and the School of Environment, Education and Development (in the Faculty of Humanities) shared references in common.

Do they highlight previously unsuspected opportunities for innovative new cross-disciplinary research? Unfortunately, no.

  • The juxtaposed purple nodes simply reflected the fact that closely related algorithmic approaches to medical diagnosis and to computer vision are used in both the Division of Information, Imaging and Data Sciences and the School of Electrical and Electronic Engineering. Although this is interesting, it’s not the kind of unsuspected connection we’d hoped to uncover.
  • Similarly, the juxtaposed green nodes show that approaches to the optimisation of land, water and energy use are an area of interest both to researchers in Civil Engineering and to those in Environment, Education and Development. Again, this isn’t an unsuspected connection which the bibliographic coupling has surprisingly brought to light.
Image 7
Nobel Prize Medal (Nobel-Prize CC-BY Abhijit Bhaduri via Flickr)

We’re not expecting to hear Manchester’s next Nobel Prize winners thanking us for bibliometric work which first alerted them to the possibility for a ground-breaking collaboration in their acceptance speech any time soon. However, the way in which this work highlighted related research being carried out in different Faculties (however unsurprising the specific examples) serves as an encouraging proof of concept.

Specialist financial databases visualised as a tube map

Visualisation of library provision: a worked example using specialist financial databases

At The University of Manchester Library we subscribe to many database resources, containing vast amounts of structured data, organised by further descriptive or meta data. These descriptions can be considered as many dimensions or variables, and it is important to focus on just a few to begin with.

Many research students frequently need to consult our large and rich selection of specialist business and financial databases to collect data and to shape their studies. There are over fifty databases that I would consider particularly relevant to that field, which are also of interest to a wider audience. It would be beneficial to these new researchers to have a better way to begin to answer these queries quickly, saving potential hours of trawling through the wrong resource.

As an experiment, I created this diagram of specialist financial databases in the style of a topological tube network:

Specialist financial databases visualised as a tube map
Specialist financial databases visualised as a tube map

I will explain the process I took to planning and constructing this diagram below, but first I will briefly explain what it shows. Seven research areas that require the use of specialist financial and business databases are represented as tube lines. The viewer can follow each of these lines through the various database products, which are shown as stations. The places where researchers must be to use each database are shown as zones.

Identifying the content

With so many factors to consider, I focused on the most important or first answered:

  1. Research subject area (such as corporate governance, or economics)
  2. Geographical coverage
  3. Access location (in the Library or through the web).

Further factors that I would like to consider include:

  • Historical coverage
  • Type of companies or equities covered (quoted, private, banks)
  • Consideration of survivorship bias (active or dead companies)
  • Type of data provided (numerical, reports).

These seven questions still only scratch the surface when choosing a business data source, but it is a start. I had already created a table with a list of the 50-plus relevant databases and columns for each of those factors (Figure a) which I used to gain a better understanding of the resources I work with when I came into post.

I worked with my colleague Xia Hong to reduce this table to the 21 most important databases and the three most important factors listed above (Figure b). The research areas were marked against databases just as yes or no matches, preparing for a decision of which lines will go through which station.

Business database visualisation planning
Planning the visualisation: (a) original table, (b) reduced table, (c) sketch by hand, (d) sketch in PowerPoint

Designing the structure

I decided to use good old pen and paper when it came to drawing out the layout. (See Figure c.)  Network building software exists but I decided that the learning curve for these would be too steep for the benefits, since the hand-drawn approach worked for me. I started with the database that matched the most research areas (ThomsonONE.com) and drew outwards from there.

Next, I entered the structure into PowerPoint, as it was the fastest tool available that I knew how to use (Figure d). This clearer format was used for checking the content for accuracy and omissions. The layout of the objects was refined in this tool, before employing CorelDRAW for the final markup.

The final design has these features:

  • Stations: database products, with symbol “W” for those with WRDS portal entry
  • Lines: research subject areas, coloured with University branding
  • Zones: access location, with inner zone 1 for databases you need to come into the Library to use; zone 2 for web access only on-campus; and zone 3 for web access from anywhere
  • Position: the very top is North American coverage, the left China, above the middle is Europe, and the rest is international.

Sadly, there is no river, which I could have used to separate North America from the other continents!

Where next?

This diagram is busy enough that no more information could be added without compromising its readability. There is more information on the Library website subject guide page covering these databases, which is the first port of call for a student enquiry. After that, all current students and staff of The University of Manchester are welcome to attend a research consultation session, where an expert from the Research Services team will be available.


It is difficult to convey lots of structured information. If we focus on just the initial or most important factors, we can produce something that is helpful and appealing.

See also earlier post: Why are there so many business databases?