I am working on an OCR pipeline that extracts structured information from scanned identity documents. The system performs reasonably well on high-quality images, but accuracy drops significantly when dealing with:

  • Blurry or low-resolution scans

  • Uneven lighting and shadows

  • Rotated or skewed images

  • Compression artifacts from mobile uploads

The main fields I need to extract are names, dates, document numbers, and addresses.

So far, I have tried image preprocessing techniques such as resizing, denoising, contrast enhancement, and deskewing before running OCR. While these help in some cases, there are still frequent recognition errors on critical fields.

For those who have built production OCR systems, what preprocessing techniques or OCR architectures have given the biggest improvement in accuracy for low-quality document images? Are transformer-based OCR models significantly better than traditional OCR engines in this scenario?