• Shaarli
  • Tag cloud
  • Picture wall
  • Daily
  • RSS
  • Login
4252 shaares
Filters

Pulse AI Blog - Why LLMs Suck at OCR

QRCode

LLM’s suck at complex OCR, and probably will for a while. LLMs are excellent for many text-generation or summarization tasks, but they falter at the precise, detail-oriented job of OCR—especially when dealing with complicated layouts, unusual fonts, or tables. These models get lazy, often not following prompt instructions across hundreds of pages, failing to parse information, and “thinking” too much.

LLMs process images through high-dimensional embeddings, essentially creating abstract representations that prioritize semantic understanding over precise character recognition

Consider a simple table cell containing "1,234.56". The LLM might understand this represents a number in the thousands, but lose critical information about:

Exact decimal placement
Whether commas or periods are used as separators
Font characteristics indicating special meaning
Alignment within the cell (right-aligned for numbers, etc.)

https://news.ycombinator.com/item?id=42966958

https://www.runpulse.com/blog/why-llms-suck-at-ocr
February 12, 2025 at 10:56:59 AM EST *
llm gemini pdf ocr
FILLER
Shaarli · The personal, minimalist, super fast, database-free, bookmarking service by the Shaarli community · Documentation
Fold Fold all Expand Expand all Are you sure you want to delete this link? Are you sure you want to delete this tag? The personal, minimalist, super fast, database-free, bookmarking service by the Shaarli community