Categories
Discussion Report

Discussing digital scholarship at the second Digital Humanities Library Lab

Digital Humanities Second Library Lab photoThis month I delivered Digital Humanities Second Library Lab, a hands-on showcase of digital library collections and tools created for the purpose of innovative research using computational methods. This three-hour session followed on from a previous event I ran in March and concludes a short run of events that form part of DH@Manchester.

The aim of the workshop was to inspire researchers at all levels to gain practical experience with tools and techniques in order to go on to develop individual research projects with these or similar collections. Participants did not need any technical experience to join in, other than basic office and web browsing skills. The workshop plan and instructions are available online.

What projects and collections did we look at?

The three activities focused on image searching, analysing text and analysing colour. We looked at projects including the following.

  1. Broadside Ballads Online from the Bodleian Libraries (University of Oxford), a digital collection of English printed ballad-sheets from between the 16th and 20th centuries that includes a feature to search for an image within an image. The collection includes digitised ballad-sheets from The University of Manchester Library’s Special Collections following work by visiting researcher Dr Giles Bergel with the John Rylands Research Institute.
  2. JSTOR Text Analyzer from JSTOR Labs, a beta tool which will identify what any document you give it is about and recommend articles and chapters from JSTOR about the same topics.
  3. Robots Reading Vogue from Yale University Library’s Digital Humanities Lab, a collection of tools to interrogate the text within the entire U.S. Vogue Archive (ProQuest) and its front covers, such as a topic modeller, N-gram viewer and various colour analysis methods.

While developing this workshop, I created a project of my own to visualise the average colour used in the front covers of all full-colour issues from Illustrated London News (Gale Cengage). Just a few short Python scripts were required to extract this information from the collection and display it in an interactive web page. This allowed us to look for trends with particular hues, such as the more common use of reds on December issues.

Digital Humanities Second Library Lab summary

What did we learn?

After each activity we discussed some of the issues raised. (Incidentally, I captured key points on a Smart Kapp digital flipchart or smart whiteboard, continuing the “Digital First” principles that Library colleagues are adopting.)

  • Image analysis and computer vision has many potential applications with library collections, such as identifying where printed or handwritten text occurs in an image, facial recognition, and detecting patterns or differences between different editions or issues within a series.
  • For image analysis systems to work best, the image sets and algorithms will need to be carefully curated and trained. This is a time-consuming process.
  • The text analyser worked quite well but, as with the image search, was not perfect. It is important to find out precisely what “goes wrong” and why.
  • Other applications for the text analysis tool include checking your grant application for any gaps in topics you think should be covered, for checking your thesis development, or for lecturers to check their students’ use of references in submitted papers.
  • Being able to visualise an entire collection in one display (and then dive into the content) can give one an idea of what is there before selecting which physical item to go to the trouble of visiting and retrieving. Whitelaw (2015) suggests that such “generous interfaces” can open up the reader to a broader, less prescriptive view into a collection than the traditional web search.
  • It could be more useful to be able to compare different collections or publications against each other. This can be difficult when multiple licence holders or publishers are involved, with different technical or legal restrictions to address.
  • Programming or other technical skills would need to be learned in order to develop or apply many tools. Alternatively, technical specialists would need to work in partnership with researchers, perhaps utilising the University’s Research IT service or the Library’s Digital Technologies & Services division.

Summary

Digital or computational tools and techniques are increasingly being applied to arts, humanities and social science methods. Many of the collections at The University of Manchester Library have potential for stimulating interdisciplinary research. Such Digital Scholarship projects would often require a greater level of technical knowledge or skill than many research groups might currently possess, so further training or provision for technical support might be necessary.

References

Whitelaw M. (2015). ‘Generous Interfaces for Digital Cultural Collections’, Digital Humanities Quarterly, 2015 9.1, [Online]. Available at http://www.digitalhumanities.org/dhq/vol/9/1/000205/000205.html (Accessed: 25 May 2017)

Categories
Report

Exploring digital collections at the first Digital Humanities Library Lab

A new pilot workshop, the first Digital Humanities Library Lab, ran on 3 March 2017. This engaging and informative cross-discipline event offered a dozen researchers the chance to explore and discuss new tools and digital text collections from The University of Manchester Library, inspiring the development of future Digital Humanities computational research methods.

Exploring digital collectionsThe afternoon comprised of three activities.

  1. Spelling and printing variations when searching Jisc Historical Texts
  2. Visualising themes in longform scholarly outputs using the JSTOR Topicgraph tool
  3. A UK-first, beginning to use an API to access previously unavailable content from Adam Matthew Digital’s Mass Observation

The workshop instructions are available online for all to view, and the Library is looking to run a similar event again in May. What would you like to see covered next? Please get in touch with DH@Manchester or the Library’s DH Project Officer Phil Reed directly, or leave a comment below.

Support and seedcorn funding for Faculty of Humanities researchers

The Digital Humanities Project Call 2016-2017 has just been announced. This year DH@Manchester are focusing on developing new projects in two specific areas:

  • innovative projects driving out of the Library’s extensive electronic collections
  • cutting-edge research which can be developed in partnership with colleagues in the School of Computer Science (including text mining, linked data, image processing, and data visualization).

The closing date is Wednesday, 22 March 2017. View the Project Call page for more information.

Categories
Announcement Technology

Library data-merging pilot grows to exciting partnership opportunity

A technical prototype I developed for the Business Data Service has been used as the driving force behind a new and exciting research project post, bringing together partners from outside The University of Manchester Library.

What is the basic premise?

To develop a collection of tools to bring together commercially available databases from separate suppliers for use in leading, innovative research, using specialist knowledge of the field for accurate and efficient execution.

Who are the partners?

Why is this new post useful?

After spending money on expensive data sets, we need to make the most out of them. It is critical to use them together in order to unlock their full research value. In the case of some specialist resources, this activity is non-trivial.

Why is joining these datasets difficult?

databases with no join
There is no easy way to use the data from these different sources together, no common index.

Identifying companies across different databases is difficult as the codes used within each platform usually do not correspond to those used in another. There are good reasons why a platform will do this (their intellectual property is one), but this makes work harder for researchers, sometimes resorting to checking company name matches by eye, one at a time!

Writing code to map these where cross-checking is available requires the software developer to be aware of the various identification codes used such as CUSIP, ISIN, SEDOL and various ticker symbols, some of which can change with time or be further complicated in other ways. A close relationship to the curators of these databases at the University is required; this is found in the Library’s Business Data Service team whose expertise is well respected and appreciated by its users.

How will it happen?

As part of the project funding application, a new post was created. It sits outside the Library but is dependent on the library staff’s curating skills and knowledge of the library’s specialist financial databases. Under this post I will use my skills as a software developer and experience working in the Library to write new tools to combine access to various datasets within the project, as the products become available and as the researchers need them.

I’ll still be working my usual job in the Library as well, so nothing is lost from the Business Data Service.

Where might it lead?

The primary objective is to publishing new research on topics covering institutional investors, financial innovation and the “real economy”.

Once the research is published, we can develop new teaching topics and further broaden access to the University’s data sets with these tools, introducing them to new audiences in other subject areas.

Categories
Report

Improving research outcomes with Early European (and English) Books

Early European Books website
Screen capture of the Early European Books website

It’s not often that a single event appeals to three of my interests, but the recent Jisc-ProQuest symposium on Early European Books Online did just that. I work to develop research collections, and support digital humanities scholarship, while my own research is looking at 17th century texts using corpus linguistics tools: Early European Books brings all three neatly together.
I was slightly suspicious that I might be attending a sales demonstration, but Lorraine Estelle, from Jisc Collections, opened by announcing that Jisc has licensed this resource in perpetuity, making it is freely available to the UK HE community via the Historical Texts platform.

‘E-books are the research data of Humanities’

The presentation by Paul Ayris (delivered on the day by Ben Meunier) put EEB firmly in context of research data management, referencing the LERU 2013 roadmap, and ‘science 2.0’ for Arts and Humanities scholars – noting that e-books are our research data. New forms of text analysis are possible using text and data mining techniques. He described EEB as ‘a defining project, delivering on the science 2.0 agenda for Arts and Humanities’.

The Text Creation Partnership, and others

Artist's impression of XML files of early European books
Artist’s impression of XML files of early European books

What has really enabled text mining is not just the existence of EEB or EEBO, however, but the extraordinary partnership that has produced EEBO-TCP. The Text Creation Partnership has involved re-keying the text images to create standardized, accurate XML/SGML encoded electronic text editions of early printed books. This work, and the resulting text files, are jointly funded and owned by more than 150 libraries worldwide.

Matthew Brack gave us a very practical assessment of the project management experience in the Wellcome library’s digitisation projects. Andrew Pettegree talked about the Universal Short Title Catalogue (USTC) and the challenges of comprehensive coverage.

At the lively round table discussion we considered the difficulty of citing/measuring the impact of electronic sources and the [un]reliability (46%) of OCR software.
It’s clear to me that we need a partnership approach with librarians, providers, and scholars working to counteract misperceptions of ‘easy’ e-scholarship and/or lazy searching, and issues with citation of e-resources. After all, Digital Humanities research is typically a highly collaborative activity.

University of Manchester users can access the first four collections in Early European Books via the Library’s A-Z of databases pages.