What is OCR? (Optical Character Recognition)

OCR (Optical Character Recognition)

OCR is a technology that converts images of text — like scanned documents, photos, or PDF scans — into editable, searchable text that computers can read.

How OCR works

OCR engines analyze the shapes of characters in an image, match them to known letter patterns, and output digital text. Modern OCR uses machine learning models trained on millions of text samples, making them accurate across dozens of fonts, languages, and document layouts.

The typical OCR pipeline has four stages: preprocessing (deskewing, denoising, binarization), layout analysis (finding columns, paragraphs, tables), character recognition (identifying individual letters), and postprocessing (dictionary correction, formatting recovery).

When you need OCR

Converting scanned paper documents into searchable PDFs
Extracting text from photographs of books, receipts, or whiteboards
Making old document archives searchable
Reading text from screenshots for accessibility
Digitizing forms and invoices for data entry

Accuracy and language support

Modern OCR achieves 95-99% accuracy on clean printed text. Accuracy drops with poor scan quality, unusual fonts, handwriting, or complex layouts. Tesseract 5 (the open-source engine Konomic uses) supports 100+ languages including Latin, Cyrillic, CJK, Arabic, and Hebrew scripts.

OCR vs text extraction — the difference

A digitally-created PDF already contains text that can be selected and copied — no OCR needed. OCR is only required when the text exists only as an image (scanned pages, photos). If you can't select text in a PDF by clicking and dragging, you probably need OCR.

Try it yourself

Extract text from scanned PDFs and images

Open tool

How OCR works

When you need OCR

Converting scanned paper documents into searchable PDFs
Extracting text from photographs of books, receipts, or whiteboards
Making old document archives searchable
Reading text from screenshots for accessibility
Digitizing forms and invoices for data entry

OCR (Optical Character Recognition)

How OCR works

When you need OCR

Accuracy and language support

OCR vs text extraction — the difference

Related terms

OCR (Optical Character Recognition)

How OCR works

When you need OCR

Accuracy and language support

OCR vs text extraction — the difference

Related terms