Skip to content
Old book and magnifying glass

Document Recognition from Photos: 9 Challenges

In our very diverse customer projects using PRISM Classify & Content (for content recognition), we have encountered a wide range of different documents.

Documents taken with a cell phone camera present us with very special challenges. Whether blurred, skewed, distorted, crumpled: here many things can become a disturbing factor. In order to be able to process this source material successfully, we have developed a pipeline that helps us to interpret it reliably. It is of course adapted depending on the project, but generally consists of the following nine steps:

  1. Black/white conversion
    Color tones are removed as a basis for better text recognition (of course, only if color markings are irrelevant for the project in question).
  2. Brightness and contrast correction
    To make the text stand out clearly from the background, both values are adjusted.
  3. Sharpness
    If the edges of the font are not clear enough everywhere, we help.
  4. Horizontal alignment (rotation)
    Especially with customer photos, documents are often not correctly aligned horizontally or vertically. Then we rotate them so that text lines can be found reliably.
  5. Removal of noise artifacts
    Images affected by camera or file format compression can be improved before recognition by reducing artifacts through AI.

The sequence of these steps is varied in each case and the best variant is used for customers.

The four further steps are the most complex; which is why we highlight them in detail.

6. Depth of field correction

blank
Depth of field correction

When a document is photographed from a certain angle, only part of it may be in the camera’s focus. The example to the right is, of course, an extreme case (but you’d be surprised at the customer photos we’ve already seen), but it illustrates the problem well:

The depth of field in this shot is very shallow, and therefore only the front or rear part could be captured in focus. The front (green) was chosen – making the rear (red) unreadable.

In the example shown, it becomes very difficult to extract any information here. However, in less exaggerated cases, our AI manages very well to correctly process only the affected part of the shot. A smooth correction with increasing sharpness filters must be applied here. The bottom line is that this shooting error can be counteracted so well.

blank
Perspective correction

7. Perspective correction

In a similar situation as before, the angle may be very poorly chosen, but the content is at least sharp (or sharpened by step 6).

Nevertheless, text recognition then faces a completely different hurdle with such a photo: The text converges to the right in a trapezoidal shape; the font is much larger on the left side than at the end of each line.

Again, we have trained an AI to initiate appropriate geometric countermeasures. Thus, the font lines that converge radially regain their order, and the document as a whole can be reshaped correctly based on the detected line geometry.

8. Correction of distorted areas

blank
Correction of distorted areas

The challenge of point 7 has an increase: in the picture on the right you see that the paper hangs from a table edge at a certain point. As a result, a bend starts there – and aggravating it again can be the depth of field.

Targeted training of the AI on such special cases gave us surprisingly reliable results. Especially the recognition of text lines plays its big advantage here: For a well-trained AI, fixing the unusual deformation is a snap. A challenge that would still be a Sisyphean task with conventional, logical programming shows quite clearly the advantages of self-learning AI networks.

blank
Photo without Dewarping

9. Wrinkling correction (Dewarping)

Some customers seem to like to carry their documents around in their pockets before submitting them for further processing. Photographed documents like the one pictured on the left do indeed occur. And even if they first appear to be an insurmountable obstacle for text recognition, we can reassure you: It works!

In fact, it works so well that we ourselves were quite surprised by the results of our AI. And we’ll explore this in more detail in another blog post, where we’ll share some before/after pictures.

We hope this has given you an entertaining insight into our AI-powered document processing from photographs. If you’re facing similar challenges yourself, or now feel that we can handle your complex projects, feel free to drop us a line!

Picture of Harald Kerschhofer

Harald Kerschhofer

Harald was one of the first developers at LinkThat and has been producing creative content for and about our products since completing his media studies.

Do you want to find out more?