• Shaarli
  • Tag cloud
  • Picture wall
  • Daily
  • RSS
  • Login
4402 shaares
Filters

Should LLMs just treat text content as an image?

QRCode

https://news.ycombinator.com/item?id=45652952

According to the DeepSeek paper, you can pull out 10 text tokens from a single image token with near-100% accuracy. In other words, a model’s internal representation of an image is ten times as efficient as its internal representation of text. Does this mean that models shouldn’t consume text at all? When I paste a few paragraphs into ChatGPT, would it be more efficient to convert that into an image of text before sending it to the model? Can we supply 10x or 20x more data to a model at inference time by supplying it as an image of text instead of text itself?

https://www.seangoedecke.com/text-tokens-as-image-tokens/
June 4, 2026 at 4:10:54 PM EDT *
ocr ai llm pdf
FILLER
Shaarli · The personal, minimalist, super fast, database-free, bookmarking service by the Shaarli community · Documentation
Fold Fold all Expand Expand all Are you sure you want to delete this link? Are you sure you want to delete this tag? The personal, minimalist, super fast, database-free, bookmarking service by the Shaarli community