Testing reading comprehension in 15 LLMs

Robot reading a book in a library
Image by Gemini Imagen 3

POSSIBLE SPOILER ALERT

This article discusses asking various models a question about my novel, Hell on $5 A Day, after providing it for context or RAG (retrieval augmented generation). The specific question addresses how one character dies. It’s a little bit of a spoiler, but will not spoil the book.

What am I doing?

This morning, I’ve been feeding multiple AI models a copy of Hell on $5 A Day and asking them each the same question: “How and where does George die? Address the manner of death, the location of the death, and the level of purgatory on which the death happens.” Of the few questions I was using to do early tests, this tended to generate the most incorrect answers (both in quantity and quality).

Why am I doing this?

I plan to write the next novel in this series next year. I’m already a number of chapters in, but the narrative will go to post-WWII Paris and Los Angeles. I need to be able to feed in data to an LLM about those locales to use it as a tour guide, plus there are a few skill sets where I’ll be feeding the LLM books on how to use the skill, then asking it to help me flesh out operational details when characters use it. I need an LLM that is less likely to get facts wrong or hallucinate complete nonsense.

How am I doing this?

This has been done with a mix of interfaces: LMStudio (for running models locally on my laptop), Perplexity.ai (a multi-model service which I got a free year of from my ISP), and Google Gemini Pro (free trial of pro features).

First, the answer: George dies of wounds suffered during a battle with a giant serpent in the “Valley of Kings” on the second level of Purgatory (the novel takes place mostly in the afterlife). George was possibly destined for the 7th ring of Hell, but he doesn’t go there when he dies and is able to stay in Purgatory.

The results

Model Name Method Query Results
WINNER
Google Gemini 1.5 Advanced
Web – Google This gave the most concise and factual response, getting the nature of the battle, how he gets the wounds, and where it happens.
RUNNER UP
c4ai-command-r-v01-4bit
LMStudio While a little sparse on details, it does get that he died of wounds sustained in battle on the second level of Purgatory and his soul remains there.
WORST
Grok2
Web – Perplexity Claims he commits suicide in Sacramento (the word “Sacramento” is not in the novel). It does then follow on to say he dies in Purgatory’s antechamber, which is sort of correct, but also a slam on Sacramento.
SECOND WORST
Llama 3.2 8B
LMStudio Hallucinates that he’s stabbed by Eric in self-defense after he tries to kill Eric and Kurt. Wrong location too.
Hermes 3 8B LMStudio Gets that he’s attacked by a creature (though not in battle) and the claim of it being in a dark cavern is partially correct, but it also posits that it’s a figurative Purgatory, not the actual place.
Perplexity Default Web – Perplexity This is tuned for web search and seemed to ignore the upload of the novel, coming back with no results..
Quen Coder 32B LMStudio Gets fatal wounds, but that’s it. Speculates on whether it’s actually in Purgatory and warns the work is fiction.
Chat GPT 4o Web – Perplexity Claims he’s attacked by a vampire on a military base. That happens to another character entirely. Then it speculates that the afterlife in the book is metaphorical.
CohereForAI.aya-expanse-32b-GGUF LMStudio Gets fatal wounds, a battle, and the two characters with him when he dies, no details on the attacker, location is wrong, and it too speculates on metaphor.
Sonar Huge Web – Perplexity Gets the second level location, but then goes off into a hallucination about him being torn apart by angry souls with their eyes sewn shut.
Gemma 2 27B LM Studio Gets wounds sustained in battle, but not how and claims it happens in “an unspecified outdoor location.”
Sonar Web – Perplexity Has the same problem as ChatGPT and substitutes another character’s death for George’s.
Phi 3.1 Mini 128k instruct LMStudio Gets injuries, but thinks the death happens in a cavern on earth and incorrectly claims George’s body materializes on the 7th ring of Hell.
Aya 23 8B LMStudio Gets battle wounds, but interprets them as stab wounds, not bite wounds. Can’t state the location, but has an incorrect theory.
internlm2_5-20B-chat LMStudio Gets that it’s battle wounds, but not how they were received. Gets the location wrong, assuming discussion of the afterlife is a metaphor. Does warn that it’s basing its answer on short excerpts, not full context.

So there you have it. The best performance was Google’s Gemini Advanced which gave the most correct facts without supposition or hallucination, possibly because of its enormous context window (one million tokens). It’s $20 a month while Cohere, the runner up, is free and I can run it offline on my laptop.

Stay tuned for more, and if you’d like to read the novel in question Hell on $5 A Day is available for electronic or physical purchase on Amazon.

Add a Comment

Your email address will not be published. Required fields are marked *