“Hello OpenAI, How Do I Replace the Cartridge in ChatGPT?” Why your prints aren’t getting fainter — or why ChatGPT is not a fax.
Misunderstanding how generative AI works isn’t just common — it’s becoming institutionalized. In this response to a viral post, I explore conceptual errors, legal overreach, and faulty analogies.
Misunderstanding how generative AI works isn’t just common — it’s becoming institutionalized.
In this response to a viral post by a well-known AI privacy advocate, I explore how conceptual errors, legal misapplications, and misleading analogies (like fax machines and blank pages) fuel public confusion. From “accuracy” in GDPR to how LLMs generate images, this essay unpacks what really happens under the hood — and why so much commentary gets it wrong.
I came across this post (by
) sharing some views about AI. The author describes herself as: Co-founder of the AI, Tech & Privacy Academy, LinkedIn Top Voice, Ph.D. Researcher, Author, Latina, Polyglot, and Mother of three.I respect many of her contributions. However, as a co-founder of what appears to be an educational initiative (AI Academy), one would expect a certain depth of understanding — not necessarily about AI architecture itself, but about how knowledge is constructed and communicated. That includes the ability to question assumptions and to exercise caution before presenting conclusions as settled truths.
This post illustrates the kind of conceptual errors that can arise when that discipline is overlooked.
She begins with a story in which someone asks ChatGPT to create an exact copy of a picture. I’m not sure why that would be something a user needs help with. If the user has a file to upload and a device capable of accessing ChatGPT, they already possess both the image and the tools to duplicate it. In fact, they’ve already made a copy — that's what happens the moment they upload the file via their browser.
But ok, let’s accept pointless user requests as relevant. Luiza then points out:
“We observe an accuracy problem. Article 5.1.d of the GDPR states: ‘Personal data shall be accurate...’”
Using the phrase ‘we observe an accuracy problem’ is unusual and a bit confusing — because if by ‘we’ she refers to herself and the reader (say me), then she is correct that we both observe an accuracy problem: I observe how Luiza inaccurately observes an accuracy problem.
The user request here does not make this artefact subject to personal data protection. Otherwise, I could send my picture to every FTSE 100 company, followed by a letter threatening legal action for failing to protect my personal data under GDPR. This is not a legal argument but a metaphor — and as a metaphor, it provides the right interpretation of how to think about the situation.
“LLMs are not knowledge repositories.”
LLMs don’t store knowledge as discrete facts or records — like a database would. Instead, they store statistical patterns derived from the data they were trained on. What we experience as “knowledge” in their responses is not retrieved — it is synthesized. It emerges probabilistically, based on correlations, not direct information access.
It makes no difference whether we call this a knowledge repository (which I would) or not — what matters is what we mean by the term. Our brain, like an LLM, does not store information as a file system either, but as patterned potential. And I do think humans retain some knowledge in this way.
“They're merely predicting the next word in a sentence”
LLMs predict tokens, not words. This is important because it highlights the technical design — which makes the claim factually incorrect in isolation, but more importantly, misleading. It’s surprising that this isn’t apparent to the author.
The user asked ChatGPT 47 times to create a copy and failed each time. How many repetitions does it take before one starts asking: how does an LLM do that — if it can only predict the next word in a sentence?
The answer lies in what the author overlooks: the mechanism does not define the capability. Token prediction is the technical core — but that doesn’t describe what the model is capable of achieving at scale, across complex sequences with layered context.
Reducing all behavior to “just next-word prediction” is like saying a pianist “just chooses the next button to push.” It’s not wrong — but it misses everything that matters about the system’s actual performance.
“The clip below shows another manifestation of the LLMs' lack of accuracy: the user prompt asked for the exact same image, but after every prompt (i.e., personal data processing), it changed the original image.”
Again, this has nothing to do with GDPR.
The LLM didn’t do anything with the picture. It likely passed it to a paired model — probably DALL·E, which is not an LLM. This image generation model normalised the input to infer what it shows. That inference is a probabilistic estimate — you could call it a guess, since the model has no eyes and has never seen a real human.
The extracted information is then used as a prompt template, guiding the generation of a new image that statistically matches the earlier parameters — another guess. On top of that, the generation process includes built-in randomness.
Since the model has no copy machine in the cloud where it lives, it is categorically false to say we’re observing a lack of “accuracy.”
What we’re actually seeing is a user misunderstanding the system and making an impossible demand.
The equivalent would be this:
You commission a painter to make a portrait. They return with a well-rendered painting, and you complain: “This isn’t photorealistic — it’s inaccurate!”
Or worse:
You write a letter to BMW saying “Your engine is broken — it won’t fly me to the moon.”
That’s not a failure of the system.
It’s a category error in expectation.
“Unexpectedly, fax machines and Generative AI systems might have more in common than most people think.”
No — they really don’t.
Fax machines are analog systems that degrade signal quality with each copy due to accumulated noise and entropy.
Generative AI does not copy — it reconstructs a new image each time from scratch, based on inferred parameters and probabilistic sampling. The output doesn’t get “fainter” over time. You could repeat the request 47 million times and still get fresh results — not a degraded replica.
“As recent studies have shown Generative AI trained with Generative AI trained with Generative AI leads to model collapse (similarly to the fax machine's blank page).”
We don’t need studies to show this effect. It’s an expected consequence of the architecture — but widespread model collapse would only occur if we somehow forgot this fact and failed to account for it. The problem is not new, and neither is the solution.
The concept of model collapse applies to training, as Luiza also notes — not generation. It refers to the loss of diversity and semantic richness when models are repeatedly trained on their own outputs.
So once again: asking a model 47 times to generate the same image won’t give you an identical copy — but it won’t drain the ink cartridge either.