Tom MacWright

2025@macwright.com

Hallucination city

My current feeling about using LLMs for coding problems is:

  1. If it's a refactoring problem that could be done with ast-grep, LLMs will be able to do it but they'll take longer and be worse at it. But it's still worth using them sometimes if I'm feeling lazy.
  2. If it's a coding problem that isn't in the hard parts of TypeScript and doesn't need lots of context, the LLMs have a 70% chance of getting it right, but they'll leave their calling cards, like sloppy types, redundant methods, copypasta code, and comments.
  3. If it's something that I haven't been able to figure out by reading the documentation, the odds that the LLMs are going to waste my time are 80%.

Here's the third case. I wanted to look up something I was pretty sure I had written on Mastodon and the search wasn't working. For fun, I thought I could export my archive and search them with DuckDB. The export was easy but the archives are shaped like

{
  "@context": [
  ],
  "id": "outbox.json",
  "type": "OrderedCollection",
  "totalItems": 1227,
  "orderedItems": [
    {
      "id": "https://mastodon.social/users/tmcw/statuses/1680721/activity",
      "type": "Create",
      "actor": "https://mastodon.social/users/tmcw",
      "published": "2017-04-05T15:27:06Z",

So, instead of being a nice root-level array like DuckDB would want, the information that I want to look at is under the orderedItems key. How do I do this? Let's ask from LLMs.

My question:

Let's say I'm using DuckDB and want to load some data like this:

{
   "entries": [
      {"a":1},
      {"a":2}
   ]
}

In a file called outbox.json. And I want the entries part to be essentially what's read into a table, so that I have a table with column 'a' and rows with values 1, 2

The LLMs

Claude 4.5 😵‍💫 Hallucination:

SELECT * FROM read_json('file.json', records='entries');

The records parameter can be 'auto', 'true', or 'false'. This is made up.

ChatGPT 5 😵‍💫 Hallucination

SELECT *
FROM read_json_auto('data.json', jsonpath='$.entries');

There is no jsonpath parameter. That's not a thing.

ChatGPT 5 😵‍💫 Hallucination

-- read the JSON file as a single JSON object
WITH raw AS (
  SELECT read_json('data.json') AS doc
)
-- extract and expand the 'entries' array
SELECT *
FROM raw,
LATERAL json_each(doc->'entries');

You can't use read_json like that because it's a table function, not a scalar function.

Kimi K2

SELECT *
FROM (
  SELECT unnest(json_extract(read_json('data.json'), '$.entries')) AS entry
) AS t
SELECT entry['a'] AS a
FROM t;

Wrong for the same reason as ChatGPT 5.

Qwen 3 Coder

SELECT * FROM read_json_objects('outbox.json', '$.entries');

This hallucinates that read_json_objects has a second parameter that takes a jsonpath expression. It doesn't.

GLM-4.6

CREATE TABLE entries AS SELECT * FROM read_json_auto('outbox.json', json_format='auto', auto_detect=true);

This hallucinates a json_format parameter for read_json_auto.

Gemini 2.5 Flash

SELECT
  (json_each.value)->>'a' AS a
FROM
  read_json_objects('outbox.json') AS t,
  json_each(t.entries) AS json_each;

This is the closest attempt I think because it is valid SQL and is able to run. It doesn't accomplish the goal, but it doesn't hallucinate.


In conclusion, the correct solution was this, which Claude gave as its 4th alternative solution.

SELECT unnest(entries, recursive := true)
FROM read_json('file.json');