• Shaarli
  • Tag cloud
  • Picture wall
  • Daily
  • RSS
  • Login
4251 shaares
 
Filters
12 results tagged pdf

Pulse AI Blog - Why LLMs Suck at OCR

QRCode

LLM’s suck at complex OCR, and probably will for a while. LLMs are excellent for many text-generation or summarization tasks, but they falter at the precise, detail-oriented job of OCR—especially when dealing with complicated layouts, unusual fonts, or tables. These models get lazy, often not following prompt instructions across hundreds of pages, failing to parse information, and “thinking” too much.

LLMs process images through high-dimensional embeddings, essentially creating abstract representations that prioritize semantic understanding over precise character recognition

Consider a simple table cell containing "1,234.56". The LLM might understand this represents a number in the thousands, but lose critical information about:

Exact decimal placement
Whether commas or periods are used as separators
Font characteristics indicating special meaning
Alignment within the cell (right-aligned for numbers, etc.)

https://news.ycombinator.com/item?id=42966958

https://www.runpulse.com/blog/why-llms-suck-at-ocr
February 12, 2025 at 10:56:59 AM EST *
llm gemini pdf ocr
FILLER

Ingesting Millions of PDFs and why Gemini 2.0 Changes Everything

QRCode

Markdown extraction is just the first step. For documents to be effectively used in RAG pipelines, they must be split into smaller, semantically meaningful chunks.

Recent studies have shown that using large language models (LLMs) for this task can outperform other strategies in terms of retrieval accuracy. This intuitively makes sense - LLMs excel at understanding context and identifying natural boundaries in text, making them well-suited for generating semantically meaningful chunks.

The problem? Cost. Until now, LLM-based chunking has been prohibitively expensive. With Gemini Flash 2.0, however, the game changes again - it's pricing makes it feasible to use it to chunk documents at scale.

https://news.ycombinator.com/item?id=42952605

(disclaimer I am CEO of llamaindex, which includes LlamaParse)
Nice article! We're actively benchmarking Gemini 2.0 right now and if the results are as good as implied by this article, heck we'll adapt and improve upon it. Our goal (and in fact the reason our parser works so well) is to always use and stay on top of the latest SOTA models and tech :) - we blend LLM/VLM tech with best-in-class heuristic techniques.

Some quick notes: 1. I'm glad that LlamaParse is mentioned in the article, but it's not mentioned in the performance benchmarks. I'm pretty confident that our most accurate modes are at the top of the table benchmark - our stuff is pretty good.

  1. There's a long tail of issues beyond just tables - this includes fonts, headers/footers, ability to recognize charts/images/form fields, and as other posters said, the ability to have fine-grained bounding boxes on the source elements. We've optimized our parser to tackle all of these modes, and we need proper benchmarks for that.

  2. DIY'ing your own pipeline to run a VLM at scale to parse docs is surprisingly challenging. You need to orchestrate a robust system that can screenshot a bunch of pages at the right resolution (which can be quite slow), tune the prompts, and make sure you're obeying rate limits + can retry on failure.

https://www.sergey.fyi/articles/gemini-flash-2
February 12, 2025 at 10:51:05 AM EST *
llm pdf google gemini ocr
FILLER

Poppler - PDF Rendering Library

QRCode

Poppler is a PDF rendering library based on the xpdf-3.0 code base.

https://poppler.freedesktop.org/
July 15, 2024 at 2:35:09 PM EDT *
pdf
FILLER

SVG to EMF | CloudConvert

QRCode

SVG to EMF Converter - CloudConvert is a free & fast online file conversion service.

https://graphicdesign.stackexchange.com/questions/60996/illustrator-emf-export-loses-precision

https://cloudconvert.com/svg-to-emf
April 22, 2023 at 12:40:17 PM EDT *
pdf svg emf microsoft
FILLER

PDF document creation with Markup languages | PMPERRY [blogs.perl.org]

QRCode

New, powerful features have recently been added to PDF::Builder and PDF::Table, enabling faster and easier high-level generation of PDF documents. The versions are respectively 3.025 and 1.005, and are available on CPAN.

As well as the ability to "pour" text into a document's defined page areas, and have it flow easily over pages, this new version also enables high level text formatting with markup languages, as well as much-enhanced font management. The markup supports Markdown (via Text::Markdown) and a large (and growing) subset of HTML/CSS, as well as simple paragraphs-only markup, and for the first time can be used to format cells in PDF::Table. This is far from the final version, as many improvements are in the pipeline for this functionality, and are expected to be released over the coming one to two years. These will include, among other things, proper word hyphenation and probably some form of paragraph shaping, such as Knuth-Plass. The full list (at this time) is at https://github.com/PhilterPaper/Perl-PDF-Builder/issues/195 .

https://blogs.perl.org/users/pmperry/2023/01/pdf-document-creation-with-markup-languages.html
February 3, 2023 at 2:19:17 PM EST *
perl pdf markdown
FILLER

Sejda helps with your PDF tasks

QRCode
https://www.sejda.com/
August 1, 2017 at 4:06:53 PM EDT *
pdf
FILLER

Remove passwords from PDF files | At The Core

QRCode
http://blog.marcus-brinkmann.de/2011/06/08/remove-password-from-pdf/
December 6, 2011 at 3:53:02 PM EST *
pdf
FILLER

Multivalent Home Page

QRCode
http://multivalent.sourceforge.net/
November 4, 2010 at 11:50:47 AM EDT *
java pdf
FILLER

List of PDF Editing tools for Ubuntu  | Ubuntu Geek

QRCode
http://www.ubuntugeek.com/list-of-pdf-editing-tools-for-ubuntu.html
October 21, 2009 at 9:25:39 AM EDT *
pdf
FILLER

PDF to TIFF conversion : pdf, convert, tiff

QRCode
http://www.experts-exchange.com/Web_Development/Document_Imaging/Adobe_Acrobat/Q_21347292.html
February 13, 2008 at 1:25:54 PM EST *
pdf
FILLER

Internet Explorer file downloads over SSL do not work with the cache control headers

QRCode
http://support.microsoft.com/?kbid=323308
September 27, 2007 at 2:56:26 PM EDT *
microsoft iis ssl pdf
FILLER

java.net: Generating PDFs for Fun and Profit with Flying Saucer and iText

QRCode
http://today.java.net/pub/a/today/2007/06/26/generating-pdfs-with-flying-saucer-and-itext.html
June 27, 2007 at 8:53:44 AM EDT *
java pdf
FILLER
Shaarli · The personal, minimalist, super fast, database-free, bookmarking service by the Shaarli community · Documentation
Fold Fold all Expand Expand all Are you sure you want to delete this link? Are you sure you want to delete this tag? The personal, minimalist, super fast, database-free, bookmarking service by the Shaarli community