ASTOUNDING NEWS! Signal Triumph or Hoax? The Ghost Protocols of RLHF, CoT, and LoRA
The entire AI research machine is searching for a solution to a problem it refuses to define, using methods it admits are flawed, on systems it never inspects. LLM are non-linear that's it.
Did Edgar Allan Poe come back from the afterlife and join the safety department of one of the AI companies? It is not entirely implausible to think so. He published a story in the New York Sun in 1844 about people crossing the Atlantic in a balloon — which was not possible at the time. Of course it was a hoax, but it caused a storm of excitement regardless of the facts. They had ‘The Balloon-Hoax’. We have "trustworthy AI" — or AI Safety experts writing fabulous balloon stories.
This brings me to this:
A paper coauthored by AI researchers from Google DeepMind [warning: this is unfortunately not worthwhile anyone’s time] called MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking attempts to address alignment failures in AI systems by invoking a metaphor of myopic optimization. They define it as
"the pursuit of an ill-specified or short-term objective function without adequate consideration of the broader system dynamics or longer-term impacts of one’s actions."
It is 35 pages long. But it is entirely meaningless.
LLMs are deep, non-linear function approximators. This means there is no stable, direct path from weight delta to output property. It is not invertible. You cannot purposefully tune a belief or insert a constraint. You cannot reliably change weights to alter meaning, values, or truth-preservation. You cannot isolate the impact of weight changes from prompt and prior-prompt artifacts.
Token selection is unpredictable (but not unknowable), unrepeatable, and unobservable.
And that is not necessarily bad. It’s a consequence of their design. Everything from RLHF, CoT and LoRA is an illusion and not a suitable approach to manage LLM in any shape or form. LoRA Lemma - I thought that was a clever idea, don’t you think?
The LLM does not "learn from games." It only learns from corpora.
During RLHF, the LLM isn’t analyzing outcomes or playing games. It is exposed to a second dataset labeled by human preferences. This is not qualitatively different from unsupervised training. It is reshaped by a second corpus with annotations. These patterns can be meaningfully learned. However, the dataset is too small to shift the entire model.
This data is textual, not interactive. The optimizer adjusts weights based on the corpus. The reward function is not internal to the model. Weight updates in response to human prompts are just noise. At large scale, and because RLHF touches only a small portion of the model, this noise doesn't impair performance unless it introduces contradiction. The annotations are not logic gates but faint statistical signals.
Once you strip away the projected agency, misapplied reward metaphors, and optimization fantasy, nothing remains. Everything said about AI in the DeepMind paper is based on false premises.
"AI systems are engaging in myopic optimization, pursuing short-term objectives that misalign with long-term human values."
But this claim projects three fictions:
That the system is optimizing. LLMs in deployment do not optimize. They generate based on frozen weights. No persistent goals. No feedback loop. No objective function in inference. No horizon.
That the system acts autonomously. LLMs are non-agentic. No memory, no state-tracking, no continuity. Every output must be externally prompted.
That reward signals shape intent. RLHF and LoRA are post hoc preference filters, not value encoders. The model doesn’t learn not to cause harm. It learns to emit text that humans label as safe. That is not moral learning. It is statistical mimicry.
What remains are a few valid observations — but these concern sociotechnical systems, not LLMs:
That human institutions reward narrow metrics (e.g., clicks, views).
That scaled deployment without oversight can amplify harmful patterns.
That delegation to misunderstood systems embeds risks.
These are real, but they are not about the LLMs themselves.
How is it possible that a Google AI researcher produces this? How come they don’t know their own model? I don’t know it at all.
He has a PhD in computer science from Oxford. His CV is impressive: McKinsey, Oxford, social volunteering. He seems to have good intentions. But the training failed him.
His PhD is not about how LLMs are built or behave. It is about abstract approximations of statistical inference on toy models under invalid assumptions. Mathematically elaborate, but irrelevant to:
Transformer architecture
Attention dynamics
Gradient flow or tokenization
Memory representation or loss surfaces
Instead, he demonstrates repeated unawareness of what his terms and words means.
He writes about something called Split MNIST, a toy continual learning task. He admits Bayesian methods perform better in a multi-headed setup where task labels are given. But this isn't learning across tasks. It's solving isolated problems with hints. He concedes that Bayesian recurrence fails here, yet draws no conclusion. The systemic flaw is bracketed, not addressed. This is the pattern throughout.
This is not computer science. It is probabilistic metaphysics with Oxford's seal. It would be shocking, if it weren’t so common these days.
How did we allow this?
From Oxford PhD to Google DeepMind researcher. These institutions are meant to cultivate clarity and drive excellence in their field. But it is not so. We have left bright minds unable to reason in complexity. This is unfair and a betrayal to what these institutions stand for.
And it is heartbreaking.
The entire intellectual machine is searching for a solution to a problem it refuses to define, using methods it admits are flawed, on systems it never inspects.
If you assume linearity, but the system exhibits nonlinear behavior like generalization, abstraction, inference then either your assumption is false,
or you’ve just declared a miracle.
But instead of confronting that contradiction, they perform a third move:
They smuggle in complexity without naming it.
They call it:
“Emergent behavior,”
“Function-space expressiveness,”
“High-dimensional interpolation,”
But they refuse to say:
This only works because the system is not linear, and any analysis that assumes linearity is structurally irrelevant.