Generating Content with ChatGPT - Perl Hacks
Back in January, I wrote a blog post about adding JSON-LD to your web pages to make it easier for Google to understand what they were about. The example I used was my ReadABooker site, which encourages people to read more Booker Prize shortlisted novels (and to do so by buying them using my Amazon
How did *thinking* reasoning LLM's go from a github experiment 4 months ago, to every major company offering super advanced thinking models only 4 months later, that can iterate code, internally plan code, it seems a bit fast? Was it already developed by major companies, but unreleased? : MLQuestions
It was like a revelation when chain-of-thought AI became viral news as a GitHub project that supposedly competed with SOTA's with only 2 developers and some nifty prompting...
Did all the companies just jump on the bandwagon an weave it into GPT/ Gemini / Claude in a hurry?
Did those companies already have e.g. Gemini 2.5 PRO thinking in development 4 months ago and we didn't know?
A simple search engine from scratch* | Max Bernstein
*if you include word2vec.
Why the Coolest Job in Tech Might Actually Be in a Bank
For tech and AI talent, jobs at financial services companies are more desirable than they have ever been. Banks have been working hard to make it happen.
Presentation Slide Templates | Beautiful.ai
Build your next presentation in minutes with our free slide templates! No matter what you’re creating, Beautiful.ai has the template for you.
Personal Software: The Unbundling of the Programmer?
Why LLMs will transform development but not how you think
it's about how AI tools are enabling a new category of software that simply couldn't exist before.
When someone can describe their specific needs conversationally and receive working code in response, the economics of personal software development shift dramatically.
Think of it this way: just as spreadsheets enabled non-programmers to perform complex calculations and data analysis, AI-assisted development tools are enabling non-programmers to create personal software solutions.
Pulse AI Blog - Why LLMs Suck at OCR
LLM’s suck at complex OCR, and probably will for a while. LLMs are excellent for many text-generation or summarization tasks, but they falter at the precise, detail-oriented job of OCR—especially when dealing with complicated layouts, unusual fonts, or tables. These models get lazy, often not following prompt instructions across hundreds of pages, failing to parse information, and “thinking” too much.
LLMs process images through high-dimensional embeddings, essentially creating abstract representations that prioritize semantic understanding over precise character recognition
Consider a simple table cell containing "1,234.56". The LLM might understand this represents a number in the thousands, but lose critical information about:
Exact decimal placement
Whether commas or periods are used as separators
Font characteristics indicating special meaning
Alignment within the cell (right-aligned for numbers, etc.)
Ingesting Millions of PDFs and why Gemini 2.0 Changes Everything
Markdown extraction is just the first step. For documents to be effectively used in RAG pipelines, they must be split into smaller, semantically meaningful chunks.
Recent studies have shown that using large language models (LLMs) for this task can outperform other strategies in terms of retrieval accuracy. This intuitively makes sense - LLMs excel at understanding context and identifying natural boundaries in text, making them well-suited for generating semantically meaningful chunks.
The problem? Cost. Until now, LLM-based chunking has been prohibitively expensive. With Gemini Flash 2.0, however, the game changes again - it's pricing makes it feasible to use it to chunk documents at scale.
https://news.ycombinator.com/item?id=42952605
(disclaimer I am CEO of llamaindex, which includes LlamaParse)
Nice article! We're actively benchmarking Gemini 2.0 right now and if the results are as good as implied by this article, heck we'll adapt and improve upon it. Our goal (and in fact the reason our parser works so well) is to always use and stay on top of the latest SOTA models and tech :) - we blend LLM/VLM tech with best-in-class heuristic techniques.
Some quick notes: 1. I'm glad that LlamaParse is mentioned in the article, but it's not mentioned in the performance benchmarks. I'm pretty confident that our most accurate modes are at the top of the table benchmark - our stuff is pretty good.
-
There's a long tail of issues beyond just tables - this includes fonts, headers/footers, ability to recognize charts/images/form fields, and as other posters said, the ability to have fine-grained bounding boxes on the source elements. We've optimized our parser to tackle all of these modes, and we need proper benchmarks for that.
-
DIY'ing your own pipeline to run a VLM at scale to parse docs is surprisingly challenging. You need to orchestrate a robust system that can screenshot a bunch of pages at the right resolution (which can be quite slow), tune the prompts, and make sure you're obeying rate limits + can retry on failure.
Which AI to Use Now: An Updated Opinionated Guide
Picking your general-purpose AI
Also:
https://www.oneusefulthing.org/p/doing-stuff-with-ai-opinionated-midyear
Collaborators needed for bring full OpenAI support to Perl
Thus, that module was deprecated in favor of Nelson's OpenAPI::Client::OpenAI module. Throw the 13K+ lines OpenAPI spec for OpenAI at it and it just works. Further, the module is pretty much a single Perl class rather than a bunch of hand-crafted code.
CPAN authors know it can be hard to keep modules up-to-date (mea culpa, mea culpa!) and this module is no exception. I need this module so I offered to collaborate and created a PR to update it to version 2.0.0 of the OpenAI spec. It now passes all the tests (for those wondering, you need an OpenAI key and it costs $0.04 USD to run the test suite).
In trying to build a Whisper pipeline for that, I found that I couldn't. There was a PR for Whisper support for the older module, but for the newer one, I can't figure out how to get it to issue a request with multipart/form-data support. I've noted the issue in the PR.
Photoshop for text — Steph Ango
In the near future, transforming text over an entire document will become as commonplace as filtering images.
Up until now, text editors have been focused on input. The next evolution of text editors will make it easy to alter, summarize and lengthen text. You’ll be able to do this for entire documents, not just individual sentences or paragraphs. The filters will be instantaneous and as good as if you wrote the text yourself. You will also be able to do this with local files, on your device, without relying on remote servers.
In a “A camera for ideas”, I coined the term synthography to describe synthetic images created with generative models.
Text generator plugin for Obsidian to generate text content using GPT-3 (OpenAI) | GitHub - nhaouari/obsidian-textgenerator-plugin
Text generator is a handy plugin for Obsidian that helps you generate text content using GPT-3 (OpenAI). - GitHub - nhaouari/obsidian-textgenerator-plugin: Text generator is a handy plugin for Obsidian that helps you generate text content using GPT-3 (OpenAI).
Use Text Generator to generate ideas, attractive titles, summaries, outlines, and whole paragraphs based on your knowledge database.
14islands | The art of prompting: An introduction to Midjourney
A great deal of my learnings and inspiration comes from the great content from Yubin Ma at AiTuts, where you can learn more about prompting and view a myriad of examples.
GitHub - abi/screenshot-to-code: Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue) - GitHub - abi/screenshot-to-code: Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Ask HN: Tutorial on LLM / already grasp neural nets | Hacker News
I've watched the 4 videos from 3blue1brown on neural nets. The web and youtube are awash with mediocre videos on Large Language Models. I'm looking for a good one.
This is part of a longer series but is maybe the single best video I know of on the topic:
https://youtu.be/kCc8FmEb1nY?si=zmBleKwlpV06O3Mw
I thought this video from Steven Wolfram was also quite good:
https://www.youtube.com/live/flXrLGPY3SU?si=SrP1EJFMPJqVCFPL
GitHub - varunshenoy/opendream: An extensible, easy-to-use, and portable diffusion web UI 👨🎨
An extensible, easy-to-use, and portable diffusion web UI 👨🎨 - GitHub - varunshenoy/opendream: An extensible, easy-to-use, and portable diffusion web UI 👨🎨
LLM: A CLI utility and Python library for interacting with Large Language Models
A CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine.
What are embeddings?
A deep-dive into machine learning embeddings.
How to Use AI to Do Stuff: An Opinionated Guide
Covering the state of play as of Summer, 2023
Patterns for Building LLM-based Systems & Products
Evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.