Home » Specialist articles » Generating Interpretation of Graphs in Scientific Articles Using Deep Learning
Specialist article
Introduction Motivation
Data is exponentially growing worldwide. 90 % of it has been generated in the recent two years alone1. This trend will continue to manifest in the same speed in the next few years and will generate vast amounts of data unimaginable to the human brain. This inevitably makes us wonder how we will deal with data in the future and what it will look like.
Balnojan2, as many other academics and data specialists, predicts that a lot of the data, we will be working with in 6 years’ time, will be in the image form. The scope of this data will mainly be to provide consumers an easy way to find desired products online through images on their mobile devices. We can make sense of such image data through computer vision and deep learning technologies. These technologies have improved immensely in the recent years – they even surpass human vision in many cases. This means, that we are probably in the right path to handling this vast amount of data in real time through AI-technologies.
Computer Vision’s most impressive use cases range from cancer detection in medicine, performance assessment in sports, to autonomous vehicles3. Inspired by the advancement in the deep learning technologies that enable us to improve our lives, I too, would like to report on an interesting use case of deep learning technologies on image data.
This use case came about as part of my master thesis at the Berlin School of Economics and Law. Similar to the aforementioned exponential growth of data, the number of scientific papers published worldwide is also growing rapidly4. Thus, academics face the challenge of an exhaustive literature review. In fact, this is not a challenge anymore, but has become a nearly impossible task. When working on scientific research, we need to make sure that we go through all the available scientific work, and as a result differentiate our work from what has already been contributed. This becomes even harder when we take into consideration that only approximately 30 % of the delivered results on online libraries, are relevant to the searched topic.
This is where the CauseMiner5 software comes to help. It is a software, developed by Müller and Hütteman5 that uses natural language processing, linguistic rules, and text mining to analyse the contents of scientific articles and finally extract the main ideas behind the research. This way, researchers can quickly decide whether a scientific paper is relevant and thus worth their time.
But the main hypothesis or ideas behind scientific work are not only expressed through text, often researchers also provide visual graphs that concisely express the main hypothesis and their relationships through nodes and edges (see figure 1). These graphs are also known as graphical research models. Having the contents of these graphs as an output of the CauseMiner Software would be a good extension. This was the goal of the master thesis, to take the image data of these graphical research models and extract the information utilizing deep learning models.
Methodology
In this article, I would like to show step by step, how you can utilize image data to extract insightful information from it through deep learning.
The main goal of the thesis was to assess the capabilities of deep learning technologies to extract the structure of graphical research models in image form. I would like to point out that we are only interested in the structure of these graphs. We can extract the text provided in them through other computer vision technologies, such as Tesseract7.
I needed image data extracted from research papers. Clark and Divvala8 have developed a framework that extracts all image data from a corpus of papers. This provides us with a good amount of “real” graphical research models. I had to go through all these images and extract those that could be classified as graphical research models. These images then would become part of the dataset that I used to run my deep learning model on.
Unfortunately, my dataset was not ready for the deep learning models yet. While doing my research, I found out that the best deep learning models out there, that could extract information from the graphs in text form, were image captioning models and instance segmentation models.
Image captioning models generate a textual description of images. For this I could generate the needed data using a Python library: Graphviz9. instead of writing a description to every image myself.
For the image segmentation model, on the other hand, I could not generate any data and had to annotate it myself. Instance Segmentation Models are deep learning models that detect objects on images and generate masks on top of them. In other words, they assign every pixel to a class. I decided to detect only a few classes as the amount of data available was limited. The classes were nodes, edges (lines), arrows, and edge labels. This was done with an open-source tool, the VGG Image Annotator10.
After having my datasets ready, I was finally free to experiment with deep learning models. As an image captioning model, I chose to use a captioning model with semantic attention developed by Xu et al.11. This approach turned out to not be successful with my data. Therefore, I would like to not further dive into explanations on this, but would like to point out, that this technology is not able to generate the structure of graphical research models.
The instance segmentation model, on the other hand, delivered very promising results. I utilized Mask R-CNN12, which is an extension of an object detection model Faster R-CNN12. I chose this model as it was one of the state-of-the-art solutions by the time I was working on the thesis and due to all the literature available on it.
A short explanation of the Mask R-CNN model architecture:
An image is processed as follows: firstly, it is passed to a Convolutional Neural Network (CNN), the backbone network for feature extraction. Convolutional Neural Networks are multi-layer neural networks, that are applied to visual imagery and can learn local patterns or features in images that they can later recognize again in different locations13. The extracted features are fed to the Region Proposal Network, which creates anchor boxes that contain potential objects to detect. The ROI Align layers maintains the spatial orientation. Then, the fully connected network (FCN), processes the proposed regions and passes them to two different fully connected layers for object detection and bounding box refinement. The same output is processed in parallel by the Mask R-CNN branch, which generates the segmentation masks14.
I trained the model on 245 annotated images of graphs, or 435 real and augmented images. I was able to achieve an overall accuracy of 87 %. This was a very satisfying result when we consider the limited amount of training data available.
In figure 3 we demonstrate the tasks performed by Mask R-CNN, that is object detection, semantic segmentation, and instance segmentation.
As you might have already noticed, this deep learning model, would only deliver me the objects with their segmentation masks and classes, not with the desired graph structure. I could, however, use the information provided to me by Mask R-CNN to generate the graph structures (see figure 4)
Let’s wrap up the main steps we took to come to our generated graph structures:
This is what we have retrieved from the graph shown in Figure 5:
Note that the order of line and arrow, defines the direction of the connection as shown through the red arrows on Figure 6.
Now that we have finally extracted the information behind graphical research models, with a little refinement we can now add it to the CauseMiner Software.
Final Words
We’ve gone through all the steps of solving a problem through deep learning: from creating an appropriate dataset, to training a deep learning model and using its output to generate insightful information for us. I would like to point out, how powerful data can be and how much knowledge we can extract from it. With a little imagination, we can probably find many use cases for different datasets and who knows what we can achieve…
Gerne leiten wir Sie weiter. Hierbei übermitteln wir einige Daten an den Anbieter. Mehr Informationen unter: Datenschutz
Gerne leiten wir Sie weiter. Hierbei übermitteln wir einige Daten an den Anbieter. Mehr Informationen unter: Datenschutz
Gerne leiten wir Sie weiter. Hierbei übermitteln wir einige Daten an den Anbieter. Mehr Informationen unter: Datenschutz
We will gladly forward you to the site. In doing so, we transmit some data to the provider. More information under: Data protection
We will gladly forward you to the site. In doing so, we transmit some data to the provider. More information under: Data protection
We will gladly forward you to the site. In doing so, we transmit some data to the provider. More information under: Data protection