KonomicKonomic
← Back to glossary

OCR (Optical Character Recognition)

OCR is a technology that converts images of text — like scanned documents, photos, or PDF scans — into editable, searchable text that computers can read.

How OCR works

OCR engines analyze the shapes of characters in an image, match them to known letter patterns, and output digital text. Modern OCR uses machine learning models trained on millions of text samples, making them accurate across dozens of fonts, languages, and document layouts.

The typical OCR pipeline has four stages: preprocessing (deskewing, denoising, binarization), layout analysis (finding columns, paragraphs, tables), character recognition (identifying individual letters), and postprocessing (dictionary correction, formatting recovery).

When you need OCR

  • Converting scanned paper documents into searchable PDFs
  • Extracting text from photographs of books, receipts, or whiteboards
  • Making old document archives searchable
  • Reading text from screenshots for accessibility
  • Digitizing forms and invoices for data entry

Accuracy and language support

Modern OCR achieves 95-99% accuracy on clean printed text. Accuracy drops with poor scan quality, unusual fonts, handwriting, or complex layouts. Tesseract 5 (the open-source engine Konomic uses) supports 100+ languages including Latin, Cyrillic, CJK, Arabic, and Hebrew scripts.

OCR vs text extraction — the difference

A digitally-created PDF already contains text that can be selected and copied — no OCR needed. OCR is only required when the text exists only as an image (scanned pages, photos). If you can't select text in a PDF by clicking and dragging, you probably need OCR.

Try it yourself

Extract text from scanned PDFs and images

Open tool