Do You Hear What I Hear?

Hopefully Not AI Hallucinations

Dec 23, 2024

What Is a “Hallucination”?

"Hallucination" is an evocative word. It makes you think of something surreal or bizarre. But, in the context of LLMs, it's when the model generates text that feels confident and plausible but is factually incorrect or entirely fictional. If you ask it to name Santa’s 15 reindeer, it would much rather give you 15 names rather than tell you that there are only nine to list.

Avoiding Hallucinations in Production

This might be amusing when naming reindeer, but accuracy becomes critical when you're running LLMs millions of times in production and there’s no human to inspect each output. Our customers need reliability without a human-in-the-loop, so we’ve had to embrace techniques that minimize hallucinations.

There are many strategies for doing this, but the two big categories involve Training and Grounding.

Training

When LLMs are trained, they can be trained to respond with uncertainty if they have insufficient data. Most modern foundation models do this pretty well when dealing with simple use cases.

For example, asking:

User: What holidays are occurring this week?

will usually be met with some variant of:

Assistant: To provide an accurate response, I would need to know the specific week you are referring to, including the year.

But, as the use cases become more complex, and as you start guiding the LLM to respond in certain formats, that training can quickly become ignored. When we ask that same foundation model to list Santa’s 15 reindeer, the training works fine:

Traditionally, Santa Claus is said to have nine reindeer. Here is a list of the commonly known reindeer:
1. Dasher
2. Dancer
3. Prancer
4. Vixen
5. Comet
6. Cupid
7. Donner
8. Blitzen
9. Rudolph

However, as soon as we ask it to output a JSON list of Santa’s 15 reindeer, we’re met with:

["Dasher","Dancer","Prancer","Vixen","Comet","Cupid","Donner","Blitzen","Rudolph","Chet","Snowball","Ginger","Frosty","Mistletoe","Jingle"]

Grounding

A highly reliable mechanism that allows hallucinations to virtually disappear is when you "ground" the LLM in truth—when you give it explicit data to work with so that it does not need to recall things from memory. The model isn't left guessing or relying on fuzzy facts that it learned, and the probabilities it uses are anchored to the data you've handed it. It's like saying, "Hey, here's the actual data schema, the business rules, the acceptance criteria - stick to these." When it's done right, the results are remarkable. You could run a data transformation pipeline 100,000 times, or analyze thousands of documents against compliance rules, and it nails it every single time.

As a concrete example, suppose we had a news reporter agent that was instructed to tell us all about the positive things that happened in the news today.

Persona: You are a positive-vibes news reporter that summarizes today's current events into a top-5 bulleted list of the positive things that happened today.

If we don’t give the LLM any other context and ask what happened today, the LLM will happily make up positive news for us:

Assistant: Here are today's top 5 positive news highlights:
1. Innovative Climate Solution Launched: A team of young scientists have unveiled a new eco-friendly technology that effectively removes carbon dioxide from the atmosphere, offering a promising tool in the fight against climate change.
2. Historic Peace Agreement Reached: Two long-standing rival countries have signed a historic peace agreement, opening up new opportunities for trade, collaboration, and cultural exchanges that promise a brighter future for their citizens.
3. Community Garden Project Blossoms: An urban community has come together to transform an unused city lot into a thriving garden that provides fresh produce to local families, promoting healthy eating and strengthening community bonds.
4. Breakthrough in Medical Research: Researchers have announced a breakthrough in cancer treatment, with a new therapy showing remarkable success rates in early trials, bringing hope to patients and families worldwide.
5. Kindness Campaign Gains Momentum: A kindness initiative encouraging people to perform random acts of gratitude and compassion has gone global, with millions joining in to spread positivity and make the world a better place.

None of that is real. It’s all a complete fabrication. But it sounds nice, doesn’t it?

However, if we ground the LLM in reality by simply supplying the HTML from CNN’s homepage as a reference we get:

Assistant: Here are today's top five positive news highlights:
1. Charlie Woods' Golfing Triumph: Charlie Woods, son of Tiger Woods, celebrated a momentous achievement by hitting his first hole-in-one at the PNC Championship, sharing the proud moment with his legendary father.
2. Heartwarming Beatles Reunion: Music legends Ringo Starr and Paul McCartney delighted fans with a live performance of classic Beatles hits, spreading joy and nostalgia in a rare reunion that captured headlines.
3. World's Steepest Cable Car Opens: Adventure enthusiasts can now experience the thrill of the world's steepest cable car, which has opened in Switzerland, offering breathtaking views and a thrilling ride.
4. Innovative Environmental Solution: Researchers are exploring the potential of using squid bone sponges to help remove microplastics from our waters, promising a cleaner future for our oceans and aquatic life.
5. Transformational Vagus Nerve Discovery: A new study suggests that vagus nerve stimulation could offer relief from treatment-resistant depression, providing hope and a new treatment avenue for those impacted by mental health challenges.

All of these are actual legitimate stories straight from CNN’s homepage today. Nothing was fabricated in that response.

Why Grounding Works

It's all about narrowing the model's predictive space. When you don't ground the model, it's trying to predict the next word across the entire universe of possibilities it's been trained on—everything from Hallmark holiday movies to heated Reddit arguments about Christmas cookie recipes. But when you ground it, you're giving it specific data to reference, so it can work with concrete facts instead of fuzzy probabilities.

Real-World Applications

Now you might be wondering: "If you can give the LLM all of the data it needs, doesn't that restrict the use cases it can handle?" Yes, this approach won't work for every problem, but it works for far more than you’d think. Consider how much business work happens with humans referencing data right in front of them—in spreadsheets, emails, or PDFs—and applying reasoning or following processes based on that information. LLMs excel at doing exactly that, with human-like reasoning but machine-like consistency, at scale.

Here are examples of the kinds of things our system does every day where grounding makes the magic happen reliably over and over and over again:

Automated inventory moderation: Managing inventory for e-commerce companies, assigning product taxonomies, standardizing titles, descriptions, etc.
Email Processing: Processing email threads to extract specific data, requirements, and communication classification.
Customer support: Using your organization's rules and historical data to accurately assign tickets and suggest proven solutions
Contract analysis: Reviewing legal documents against specified criteria to identify key clauses and risks
Document Parsing: Converting highly varied documents with inconsistent formats, such as invoices, into consistent well-structured data that follow your business rules

Grounding allows our customers to build powerful workflows where AI can handle their complex tasks with human-like reasoning and machine-like consistency.

It’s not just an LLM-mas miracle. It’s reality for our users every day.

Learn more about LOGIC, Inc. at https://logic.inc

A guest post by

Martha Schumann

Engineer!

BITS of LOGIC, Inc.

Discussion about this post