An automatic system for extracting figure-caption pair from medical documents: a six-fold approach

View article
PeerJ Computer Science

Main article text

 

Introduction

  • In this article, an effective and new six-fold approach is presented to extract figures and related captions from scanned medical documents. In contrast to previous methods, the raw graphical objects stored in the scanned files are not examined directly by the proposed method.

  • The edges are extracted from the scanned document using a-torus wavelet transform. Then, it distinguishes the text element from the scanned file’s graphical element and applies maximally stable extremal regions connected component analysis to the text and graphical element to identify individual figures and captions. Text and graphics are separated by using multi-layer perceptron. It is trained to recognize the text parts from the scanned file so that the graphical content parts are identified easily. The bounded box concept is used to create separate individual blocks for every figure-caption pair.

  • The proposed system is tested using a self-created dataset comprised of the pages from the five open access books titled “Brain and Human Body Modelling 2021” by Sergey Makarov, Gregory Noetscher and Aapo Nummenmaa, (Makarov, Noetscher & Nummenmaa, 2023) “Healthcare and Disease Burden in Africa” by Ilha Niohuru (Niohuru, 2023), “All-Optical Methods to Study Neuronal Function” by Eirini Papagiakoumou (Papagiakoumou, 2023), “RNA, the Epicenter of Genetic Information” by John Mattick and Paulo Amaral (Mattick & Amaral, 2023) and “Illustrated Manual of Pediatric Dermatology” by Susan Bayliss Mallory, Alanna Bree and Peggy Chern (Mallory, Bree & Chern, 2005).

Materials & Methods

Pre-processing

Edge detection with the undecimated wavelet transform

A-torus undecimated wavelet transform

Edge detection

Connected Component (CC) extraction using MSER

Graphic objects detection

  1. H-position: Horizontal position which is the count of pixels from the left edge of the character to the centroid of the minutest bounding box with all character pixels inside.

  2. V-position: Vertical position which is the count of pixels from the bottom of the character bounding box to the above box.

  3. Width: The width of the bounding box, in pixels.

  4. Height: The height of the bounding box, in pixels.

  5. Total pixel: The total number of character image pixels.

  6. Mean H-position: Mean horizontal position of all character pixels relative to the centroid of the bounding box and divided by the width of the box. This feature has a negative value if the image is lefty-heavy, for eg., letter F.

  7. Mean V-position: Mean position in the vertical direction of the entire character pixels corresponding to the centroid of the bounding box and divided by the height of the box.

  8. Mean SQ-H: The horizontal pixel’s mean squared value of the distances as computed in VI above. This feature will have a larger value for images whose pixels are broadly divided in the horizontal direction, for eg., letters M or W.

  9. Mean SQ-V: The vertical pixel’s mean squared value as calculated in VII above.

  10. Mean PROD-HV: The mean product of the horizontal and vertical distances for every character pixel as calculated in 6 and 7 above. This feature has a negative value for the top left to bottom right diagonal lines and a positive value for the bottom left to top right diagonal lines.

  11. H-variance: For each character pixel, the mean value of the squared horizontal distance adjusts the vertical distance. This calculates the correlation of the horizontal variance with the vertical position.

  12. V-variance: For each character pixel, the mean value of the squared vertical distance adjusts the horizontal distance. This calculates the correlation of the vertical variance with the horizontal position.

  13. Mean V-edge: The mean number of edges (a character pixel to the immediate right of either the character boundary or a non-character pixel) come across while doing scans from left to right at all vertical positions within the bounding box. This feature differentiates amongst letters like “L” or “I” and letters like “M” or “W”.

  14. The sum of the edge positions in the vertical direction is described in XIII above. If there are more edges at the top of the bounding box, as in the letter “Y”, this feature would give a higher value.

  15. Mean H-edge: The mean number of edges (a character pixel to the immediate above of either the image boundary or a non-character pixel) come across while doing scans over the entire horizontal positions within the box of the character from the bottom to top.

  16. The sum of edges horizontal positions is described in 15 above.

Detection of caption

Figure-caption pair extraction

Results

Pre-processing

Edge detection

Text and graphics bounding box detection using MSER connected component

Detection of graphic element

Figure-caption pair extraction

Discussion

Conclusions

Additional Information and Declarations

Competing Interests

Jyotismita Chaki is a Section Editor of PeerJ Computer Science.

Author Contributions

Jyotismita Chaki conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The figure caption recognition is available at Github and Zenodo: https://github.com/Jyotismita-1/figure-caption.

Jyotismita chaki. (2023). figure-caption recognition. https://doi.org/10.5281/zenodo.7527836.

The data is available at figshare: Chaki, Jyotismita (2023). Medical dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.21894681.v1.

Funding

The authors received no funding for this work.

1,226 Visitors 1,109 Views 61 Downloads