Sometimes input for document processing tasks such as: OCR, table detection or text segmentation can be scan or photo taken from hand that do not have ideal perspective - is rotated or spatially distorted in some way (warped document). If you are looking for my recommendations go straight to the last section of this article Summary and recommendations. This article was inspired by list of OCR-related project posted on the list awesome-ocr. To give readers intuition of popularity of the project - information about GiHub stars is added to each project (as of Feb 2022 - time of writing this article). To differentiate actively developed projects from ones that don’t get commits anymore - date of last commit was added.
Typical approach for deskewing
The deskewing is typically realized by using Canny Edge Detection and Hough Transform to determine angle of rotation (skew) and then applying rotation in opposite direction.
Typical approach for dewarping
Reconstruction of spatial (3D) structure of the document is typically done using Deep Learning approach.
List of approaches presented in this article
- Summary and recommendations
Page dewarp (1.1k stars)
Last commit: Oct 2016, but reworked version is actively developed (last commit: 24 Jan 2022)
page_dewarp - Page dewarping and thresholding using a “cubic sheet” model
Read more here: Page dewarping
NOTE: It is written in Python but using Python 2.
Since original work of mzucker was written in Python 2 and not
developed further there was initiative to renovate the original
scripts and there is
is also available on Pypi, and it is pip installable (
pip install page-dewarp)
MORAN (579 stars)
Last commit: 30 Jul 2019
MORAN_v2 - A Multi-Object Rectified Attention Network for Scene Text Recognition
Written in Python, using PyTorch
NOTE: The project is only free for academic research purposes.
DewarpNet (291 stars)
Last commit: 6 Sep 2021
DewarpNet project web page. Here is how authors characterize their solution in the abstract of the paper:
DewarpNet, a deep learning approach for document image unwarping from a single image. Our insight is that the 3D geometry of the document not only determines the warping of its texture but also causes the illumination effects. Therefore, our novelty resides on the explicit modeling of 3D shape for document paper in an end-to-end pipeline.
DewarpNet pre-trained models are available for download from Google Drive.
Document Image Dewarping - algorithm (241 stars)
Last commit: 30 Sep 2019
Document-Image-Dewarping- Document image dewarping is approached by using text-lines and line segments.
In this repository there is no public code to use but just algorithm description and executable available for download.
Unproject Text (104 stars)
Last commit: 13 Oct 2016
unproject_text - Perspective recovery of text using transformed ellipses.
It is not exactly dewarping but perspective correction id more than determining document rotation that is why it was place in the dewarping section.
Written in Python, it is pretty lightweight: using numpy, scipy, cv2,…
In a nutshell, letters are replaced with ellipses and the axes of ellipses are used to determine what affine transformation is needed to correct perspective:
Image source: repository owner’s writeup
Docuwarp (stars 83)
Last commit: 18 Oct 2021
Docuwarp - An application of high resolution GANs to dewarp images of perturbed documents. This project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translation. The objective is to take images of documents that are warped, folded, crumpled, etc. and convert the image to a “dewarped” state by using pix2pixHD to train and perform inference.
Written in Python.
Book content segmentation and dewarping (under construction) (11 stars)
Last update of code: 2018
Book Content Segmentation and Dewarping - Using FCN (fully convolution network) to segment the image into 3 parts (left page,right page and background).
Segmentation demo is available here: https://raymondmgwx.github.io/?e=Project_BookContent&&theme=Image-Process-Content.
NOTE: that Data Augment and Dewarp Algorithm are in TODO of this project.
Unpaper (770 stars)
Last commit: 21 Jan 2022
unpaper is a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies.
The main purpose is to make scanned book pages better readable on screen after conversion to PDF. Additionally, unpaper might be useful to enhance the quality of scanned pages before performing optical character recognition (OCR).
unpaper tries to clean scanned images by removing dark edges that appeared through scanning or copying on areas outside the actual page content (e.g. dark areas between the left-hand-side and the right-hand-side of a double-sided book-page scan).
The program also tries to detect misaligned centering and rotation of pages and will automatically straighten each page by rotating it to the correct angle. This process is called “deskewing”.
Written mostly in C.
Alyn (222 stars)
Last commit: 14 Jun 2017
Alyn - Skew detection and correction in images containing text. It uses Canny Edge Detection and Hough Transform to determine skew.
How the Alyns’ skew detection works:
- Converts the image to greyscale
- Performs Canny Edge Detection on the Image
- Calculates the Hough Transform values
- Determines the peaks
- Determines the deviation of each peaks from 45-degree angle
- Segregates the detected peaks into bins
- Chooses the probable skew angle using the value in the bins
Alyn is written in Python can be installed with pip (
pip install allyn).
Deskew (211 stars)
Last commit : 10 Feb 2022
galfar/deskew (102 stars)
Last commit: 6 Jan 2022
galfar/deskew - Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect “text lines” in the image. As an output, you get an image rotated so that the lines are horizontal.
There are binaries built for these platforms: Win64, Win32, Linux 64bit, macOS, Linux ARMv7. GUI frontend for this CLI tool is available as well (Windows, Linux, and macOS),
NOTE: It is written in Pascal.
Skew correction (12 stars)
skew_correction - Deskewing images with slanted content by finding the deviation using Canny Edge Detection.
Deskewing (stars 8)
Last commit: 12 Jan 2014
deskewing - Contains code to deskew images using MLPs, LSTMs and LLS transformations. Written in Python.
Text deskewing (5 stars)
Last commit: 9 Mar 2018
text_deskewing - Rotate text images if they are not straight for better text detection and recognition. Uses Canny Edge Detection and probabilistic Hough Transform.
It is written in Python and the repository do not contain a lot of code - it is easy to follow and learn how those simple techniques can be used to desk the text.
Summary and recommendations
What to use for Deskewing?
- If you need deskew and additionally clean-up document from scanning artefacts use: unpaper
- If you just need to correct rotation of the document use: Alyn or deskew
- If you want to learn about using Edge Detection and Hough Transform for document deskewing you might want to have look at: text_deskewing
What to use for Unwarping and Deskewing?
- For dewarping book pages that has smooth bendings consider using page-dewarp (renovated version of popular page_dewarp.
- For more complex dewarping including e.g. folded pages use Deep Learning based solutions such as: DewarpNet or Docuwarp.
- If you are working with flat pages and you just need to correct perspective unproject_text might be right tool for you.
What to use for Document Segmentation
Document segmentation was not in the scope of this article. You can check awesome-ocr section on Document Segmentation
- awesome-ocr - rich collection of OCR-related projects and tools
- Image used in the header of this article comes from the paper DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks. Sagnik Das, Ke Ma, Zhixin Shu, Dimitris Samaras, Roy Shilkrot; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 131-140
- All other images in the article comes from the project owners or related sources (papers, articles)