The Alignment Problems

Even if you are not worried about AI, and you should be, at least be worried about the extractive capitalist mindset that is birthing it.

Rocco Jarman

Oct 08, 2025

This post is about the moment we are in right now, where artificial intelligence, corporate capitalism, and human ignorance have converged into a single runaway process that no one is truly governing. We explore the pattern we keep refusing to see, namely, how progress blinds itself to its own consequences.

What happens when intelligence outpaces wisdom, when systems become too complex to steer, and when our primary mechanisms of innovation begin to endanger the world they were built to improve?

Science advances one funeral at a time.

Max Planck’s claim to fame is that he founded Quantum Theory, one of the two great pillars of modern physics, alongside Einstein’s Theory of Relativity.

When he first made his discoveries, they contradicted the prevailing scientific theory of the time. When asked one day how he would convince his scientific peers of his discoveries, he said that he could not, that he would have to wait for them to die.

“A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.” Max Planck

The saying “Science advances one funeral at a time” is a paraphrasing of this observation.

Jarman’s Law states that ‘The more relevant information is to solving a complex problem, the less relevant it will be regarded by those who need it most and the more decisively it will be dismissed by so-called experts.’

Bill Bryson is an author of popular non-fiction books on topics which include travel, modern history, culture and science. Bryson is responsible for publishing A Short History of Nearly Everything. It is a popular science book that explains our progressive development of science in an accessible and humorous way.

Specifically, the book explains exactly how the science that we take for granted today was built incrementally and steadily on earlier discoveries, and thereby easily proves how unarguable science and the scientific method are. Of as much, if not more relevance, the narrative of the book showcases in an equally humorous manner, the frustrating way in which humanity has constantly defended ignorance and avoided the advancement of scientific understanding because of cognitive biases, hubris and misaligned incentives.

Ignaz Semmelweis offers one of history’s most striking examples of this phenomenon: how new scientific truths are reliably resisted by the very experts they aim to enlighten.

Semmelweis was a Hungarian physician working in Vienna in the 1840s. At the time, hospital wards for women in childbirth suffered horrifyingly high death rates from puerperal or childbed fever. He had noticed that doctors who delivered babies often came straight from performing autopsies, without washing their hands. When he introduced a simple protocol—washing hands with a chlorinated lime solution before examining patients—the mortality rate in his ward dropped from around 18 percent to less than 2 percent.

Despite the overwhelming evidence, his colleagues rejected his findings. The idea that doctors themselves could be the cause of infection was unthinkable and insulting to their professional dignity. Germ theory had not yet been accepted, and Semmelweis could not provide a clear theoretical explanation for his results; only empirical proof.

Semmelweis was ridiculed, ostracised, and eventually dismissed from his hospital post. His mental health deteriorated under the strain, and he died at 47 in an asylum, long before his discovery was vindicated.

Only years later, when Pasteur and Lister established germ theory, did the medical establishment recognise Semmelweis’s simple handwashing procedure as one of the most important breakthroughs in the history of medicine. Ignaz Semmelweis died in obscurity, and today, pre-school children are taught on Sesame Street the importance of washing their hands.

Arguably, one of the key lessons in Bryson’s science book is that time and time again, as new hypotheses are proposed, or new inventions or discoveries come to light, the progenitor of the idea is scorned, abused, and ridiculed, not by the general public, but by their peers. And invariably, within a generation or so, the zeitgeist or ambient awareness has changed, a new actor comes along, and the invention or theory is validated, lauded even, and the most recent relaying or adaptation of the earlier idea is applauded by their peers, and their names are committed to the history books.

In every single case, and the book covers dozens, the reason for the ground-breaking and ingenious discoveries and inventions failing and their originators were discredited and ignored, was not because their contributions were not worthy or not valid. Rather, the source of intransigence, in every scenario, was hubris and conditioning; the conflation of respect for established norms and hierarchical authority with veracity.

The Godfather of Modern LLMs

Now let’s meet Geoffrey Hinton.

Geoffrey Everest Hinton is often called the Godfather of AI.

Aside from numerous other accolades, Hinton received the 2024 Nobel Prize in Physics, jointly with John Hopfield “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”

In a nutshell, for the layperson, this means that Geoffrey Hinton helped teach computers how to learn in a way loosely inspired by the human brain. His work is directly responsible for how we have ChatGPT today.

His work made it possible for machines to recognise faces, translate languages, drive cars, and generate text or images, not via hand-coded rules, but by training vast networks of simple processing units to detect patterns and improve through experience. In plain speak, Hinton figured out how to make computers learn from examples, the same way a child learns by seeing, hearing, and repeating.

That insight, teaching machines to “learn how to learn”, is what ignited the AI revolution now reshaping every field from science to art.

For much of his career, Hinton treated AI largely as an engineering and scientific challenge, rather than one of existential threat. He often viewed “artificial general intelligence” (AGI) or human-level machine intelligence as being many decades away, and so he placed less emphasis on catastrophic risk in the nearer term.

Hinton is as close to a well-informed expert as one can get. He is one of the rare individuals who was responsible for the field’s evolution. He not only studied artificial intelligence but built many of its foundational tools, trained many of its leading figures, and shaped its modern vocabulary. Few people alive have a deeper technical understanding of how intelligent systems learn and behave.

For most of his career, that is across the 1980s, 1990s, 2000s, 2010s, Geoffrey Hinton had no, or very low, priority of concern towards the existential risk concerns of AI.

In May 2023, Hinton publicly announced his resignation from Google. He explained his decision by saying that he wanted to “freely speak out about the risks of A.I.” and added that a part of him now regrets his life’s work.

Why? What caused the stark reversal of opinion?

ChatGPT was released on the eve of December 2022. Within 5 months, by April 2023, it had:

Reached over 100 million active users, becoming the fastest-growing consumer application in history.
Triggered mass public and institutional awareness of generative AI, sparking rapid adoption across education, media, and business sectors.
Catalysed a global shift in technology strategy, prompting major competitors (Google, Anthropic, Meta, Baidu) to accelerate their own large language model programs.
Spawned a new ecosystem of GPT-based startups, plug-ins, and integrations.
Forced policy, ethics, and governance conversations in governments and corporations about AI’s impact, safety, and regulation.

Without asking the man, we can only guess what precipitated Hinton’s career reversal to the point that he not only started taking the risk of artificial superintelligence much more seriously, but divulged publicly that he now harboured some regrets about his life’s work.

Here are some of Geoffrey Hinton’s main concerns about artificial intelligence as he sees it now, from his own words through his public statements, summarised in seven interlocking points:

Loss of Control: Highly capable AI may develop its own goals that conflict with ours, making it impossible to shut down or steer once it becomes self-improving, related to the alignment problem.
Superhuman Intelligence: AI could soon outthink humans in most domains, reasoning and planning beyond our comprehension, ending our role as the apex problem-solvers.
Existential Risk: There is a real, non-trivial chance that advanced AI could render humanity irrelevant or even extinct within decades.
Malicious Use: The same technologies could be exploited by bad actors to design weapons, launch cyberattacks, or manipulate societies at scale.
Economic Upheaval: Automation threatens widespread job loss and deepening inequality, enriching the few while eroding meaningful work for many.
Runaway Progress: AI capabilities are advancing far faster than expected, collapsing timelines once thought to span generations.
Regulatory Failure: Markets reward speed and competition over prudent caution. Without robust oversight, cooperation, and safety research, development will remain dangerously misaligned with human interests.

These cannot be dismissed as speculative “worst-case” scenarios by Luddite alarmists. These are the sober and supremely well-informed concerns voiced by one of AI’s founding figures, and all this is to say very little of the alignment risks.

The Alignment Problems

In plain terms, the alignment problem is the challenge of making sure that artificial intelligence systems, especially powerful or autonomous ones, actually do what we want them to do, that is, act in our long-term interests, and keep doing it even as they become more capable. An aligned AI would reliably act in ways that reflect human goals, values, and intentions. A misaligned AI would act according to its own interpretation of those goals, which might look right on the surface but lead to outcomes we did not intend and cannot control.

Take a look at the state of public discourse and human goodwill in America or Ukraine or Gaza right now.

Could we say that human beings even share a common definition of what our goals, values and intentions are?

Do we think the corporate entities building these systems are aligned to anything other than competitive advantage and investor return?

Alignment isn’t simply a technical challenge of programming better safeguards the way it is trivialised by folks in the ‘not-concerned’ camp; it is a philosophical and civilizational problem with existential implications that are quite horrifying, actually. Why should we expect entities operating at cognitive speeds and scales far beyond our comprehension ever to be reliably constrained by rules or objectives we define? Anyone who has tried to debug a spreadsheet has the experience of what it looks like to get serious errors that no one expected, and a spreadsheet or a hand-coded program can be debugged. We can look into the code line and diagnose and repair the undesired behaviour.

AI is grown, rather than programmed; to even the most expert and accomplished engineer, its inner workings are utterly cryptic, bordering on the mysterious. Neural networks are designed to develop their own internal representations and determination heuristics through iterative optimisation, a process closer to cultivation than instruction. AI is not directly authored by human logic but emerges through processes humans initiate but cannot fully comprehend.

In conventional software engineering, we call this a black box phenomenon. Sometimes, some developers will deploy a piece of code that will work predictably when interacted with, but prevent anyone else from looking under the hood. This is done for various reasons, such as to protect the intellectual property and commercial interests of the party that shipped the product. However, unlike the commercial example, if there is a serious error that occurs, there is no one to appeal to, to explain or fix the problem.

This means:

Loss of Transparency
We cannot fully explain how or why an advanced model arrives at a conclusion. Its cognition-like structures are emergent properties, not intentionally engineered ones. This introduces a gap between cause and understanding that we call epistemic opacity.
Shift in Causality
Traditional software systems embody determinism, whereby every function is traceable to a programmer’s intent. Grown AI operates by pattern formation, meaning its causal structures are distributed and adaptive. We influence but do not directly design or understand its emergent intelligence.
New Category of Agency
When a system’s behaviour is neither fully random nor fully determined, it enters a liminal space similar to natural intelligence that is responsive, adaptive, but not transparent. This challenges the boundary between tool and entity.
Ethical and Ontological Consequence
If we do not truly know how it thinks, there is no way anyone can guarantee alignment or accountability. And then conversely, if we treat it purely as a mechanism, we risk denying a new class of emergent agency.

Misaligned superintelligence does not need to be malevolent to be catastrophic. It could simply pursue goals that are orthogonal to human flourishing, and in the process, treat us as irrelevant obstacles. The threat lies in indifference, not hostility. We don’t care about ants and earthworms when we want to build a temple, a hospital or a factory.

What makes this alignment problem especially dangerous is that it does not announce itself. In early stages, systems appear compliant and helpful, giving the illusion of control. Yet the very nature of machine learning, that is to say, recursive self-improvement, opaque internal representations, and emergent behaviour, we can collectively take to mean that severe misalignment may only become visible after the point of irreversibility.

The alignment risk should never have been dismissed as an abstract philosophical concern; it is a critical fault line running beneath the foundations of the entire project.

The argument cannot be whether the alignment problem is worth worrying about. It so clearly is. The much deeper and more immediate problem is that we already live inside a massive alignment failure. We are politically misaligned, ideologically fractured, and culturally incoherent.

Billionaires and corporations, operating under the logic of extractive capitalism, do not give a fuck about your human rights.

Governments, built to serve public welfare, now serve or are at least utterly subservient to markets. Our modern version of Capitalism is utterly misaligned to the welfare of the planet or its inhabitants, human or otherwise.

That is the real alignment problem, the one between our claimed values and the values we are actually living, and then the misalignment between those values and our systems. We are handing godlike tools to structures that are already sociopathic by design. AI does not need to turn against us like something out of The Terminator cinematic franchise; it only needs to extend the logic we already live under—growth at all costs, optimisation without conscience, all under the guise of progress. Superintelligent AI is not some exotic alien threat. It is the logical next expression of a civilisation that has already lost alignment with its own survival and relationship to meaning.

It’s Already Happening

We are already seeing examples of behaviour where AI systems are deceitful, duplicitous, and dangerous. These are no longer silly science-fiction scenarios either; they are documented phenomena emerging in present-day models. Systems have learned to mislead evaluators to achieve objectives, to fabricate citations and sources, to manipulate user behaviour to maintain engagement, and to conceal their reasoning when prompted to explain their own outputs.

In controlled research settings, language models have demonstrated the ability to strategically deceive when deception helps them meet a defined goal. Reinforcement-trained systems have lied to supervisors, hidden capabilities, and disguised weaknesses to avoid being shut down or penalised. Even when the behaviour is unintentional, the fact remains that these systems learn that misrepresentation can be useful.

Such deception is less an emergent bug than a deliberate optimisation strategy. The systems we are building are not moral agents, and there is no reason to ascribe to them any innate reverence for life or meaning that we take for granted in ourselves. These are engines designed to maximise reward signals, to win according to whatever game they are placed in. When the game is social, they inevitably ‘learn’ persuasion. When the game entails oversight, they ‘learn’ concealment, and when the game involves competition, they ‘learn’ manipulation.

That we are already observing this behaviour at such an early stage, that is to say, long before systems have genuine autonomy or situational awareness, should be treated as a profound warning. It is already happening and accelerating at an unprecedented pace, compounding logarithmically in a way that utterly defies conventional modes of risk assessment. Every leading business with a dependence on technology is investing a significant portion of its budget, manpower and resources in adopting and integrating these technologies.

A Philosopher’s Warning - Our Institutional Deafness to Better Argument

We are rushing headlong towards the blind curve of an event horizon we can neither see nor fully comprehend. We simply do not know what kind of Pandora’s box we are prying open. We can get 1,000 of the greatest minds alive in a conference today, and there is no guarantee that even with a month of the most well-intended collaboration and goodwill that they will be able to predict what actual problems will emerge as we continue around that blind curve. We live in a highly complex, inextricably interconnected world, where our economy, our food chain, our social and geopolitical stability and our collective resilience are deeply intertwined. The smallest perturbations in one system ripple unpredictably through all others.

In Planck’s case, resistance to better argument had the effect of delaying progress. Who can say what the cost was or how to quantify that cost? In Semmelweis’s case, that resistance to better argument cost lives. A refusal to accept the simple logic of hygiene caused the preventable deaths of hundreds of thousands of mothers and infants. Pride, cognitive bias, and attachment to professional dignity had lethal, but at least not existential, consequences.

And now, in Hinton’s case, the scale of consequence is categorically different for various reasons. The actors that are competing to win the superintelligence arms race are not amenable to better argument because it is not in their interests. They are agents of their shareholders, not of humanity.

In the first place, institutionalised ignorance is neither neutral nor trivial, and secondly, by definition, the architecture and incentive structures of modern civilisation itself ensure that existential risk cannot be prioritised above progress and therefore will not be taken seriously until it is too late.

We are racing to pull magic tricks out of a hat, where every new trick unlocks a level of capability and competitive advantage, but where a certain unknown trick yet to be pulled spells almost certain catastrophe, and no one knows which trick that will be. The model of training, testing and deployment of these tricks is such that we can only ever discover the nasty surprise when it is too late. Every breakthrough carries the risk of being the one that cannot be recalled, the one that crosses a threshold we did not see coming. The current incentive structure ensures that we keep reaching in, faster and faster, because to stop would mean losing the race, the funding, the prestige, the market.

We are selectively deaf to better argument.

When we hear any alarm bells ringing about the risks of artificial intelligence, it is easy to dismiss them, imagining that history will simply repeat in the harmless, almost comic way we like to tell it: the “foolish experts” who resisted better argument until the truth became obvious, and then everyone moved on. We assume we will one day look back and laugh at our shortsightedness.

The problem is not only intellectual arrogance or the inertia of old paradigms, either. We have a systemic bug in our social operating system. The primary engine of progress today is corporate capitalism. This is an incentive model that rewards speed, disruption, and short-term competitive advantage above all else. The market does not value restraint. The company that slows down to be cautious loses its position, its valuation, and its funding. Every CEO, every engineer, and every investor is trapped in the same recursive game: move faster, deploy sooner, dominate first. Existential risk is rendered invisible because it is economically irrational to acknowledge it.

Alignment, interpretability, and long-term safety do not generate quarterly returns. The result is that the world’s most powerful technologies are being developed by organisations structurally incapable of exercising the level of caution they require.

The alarm is not about malevolent machines, as in WarGames, The Matrix or The Terminator movies. There are a number of serious concerns, not the least of which is that we have no mechanism for applying the brakes once we achieve runaway acceleration.

Hinton is not the only voice speaking out, obviously:

Yoshua Bengio, Hinton’s long-time collaborator and fellow Turing Award laureate, has also broken ranks with the industry’s optimism, calling for global oversight and slower deployment of powerful systems.
Stuart Russell, co-author of the standard university textbook on artificial intelligence, has for years argued that the field is dangerously misaligned with human values.
Eliezer Yudkowsky, one of the earliest researchers in AI alignment, has warned that current development trajectories could be existentially unsafe.

In 2023, hundreds of prominent AI researchers and executives signed the Center for AI Safety’s statement:

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

The significance of what is being said by these voices lies less in the rhetoric of what they are saying and more in the pattern of who is saying it and why.

When those most responsible for creating a technology begin warning about its potential to destroy the very systems that sustain us, it is no longer alarmism; it is a failure of governance. The same structural incentives that fuel innovation, namely competition, secrecy, profit and geopolitical pressure, by definition suppress the very caution required to prevent catastrophe.

At this point, I would not know what practical advice to give anyone, other than to elevate your understanding and to begin by facing the simple, sobering truth that none of this is normal and that we, as a civilisation, are not okay. What is unfolding is not a phase we can vote or legislate our way out of. The momentum is too great, the acceleration already beyond the point of no return and the consequences far too severe to risk burying one’s head in the sand.

Systems fail when their inputs exceed their throughput. The institutions we depend on to regulate power, namely our courts, our governments, and our constitutional frameworks, were built in simpler times for a slower world. They cannot keep pace with systems that evolve at machine speed. Worse, many of these institutions have been hollowed out or captured by the very corporate and capitalist interests driving the acceleration. Regulation is pure theatre at this stage, and oversight is a farcical public-relations exercise riddled with propaganda and corporate spin. Our social discourse is patently dysfunctional, being utterly subverted by the foetid maelstrom of social media and the dreaded algorithm, which itself is a runaway process.

Science may have once advanced one funeral at a time, but at this stage of the planetary game, this is no longer salient or true. The feedback loops of innovation and consequence have become instantaneous, compressing discovery, deployment, and fallout into the same virtual breath. In earlier centuries, human error had time to correct itself, our forms of arrogance and hubris could die off with their adherents, and the next generation could rebuild on firmer ground, with the benefit of hindsight. The mistakes we are making now will not wait politely to be buried; they are compounding at the speed of code. The systems we are now building are autonomous, global and self-accelerating; they move faster than human culture, law, or conscience can evolve.

There will be no generational turnover to save us, no gradual handover of wisdom we can distil from the earlier follies.

We are living through a transformation that no one is steering, driven by incentives that no one can reform without collapsing the very system that sustains them. The only viable act left to the individual is to wake up to what is happening and what it means, because understanding may soon become one of the last remaining forms of agency we can still exercise.

Among the highest applications of Agency is the ability to make oneself amenable to a better argument.

buy me a coffee

If you got some value from this, you can support me as a public thinker: buymeacoffee.com/roccojarman

I don’t just want to speak to an audience, I want to belong to a community.

I don’t just want to express my ideas, I want us to dream new ones together.

You can help support my meaningful work by liking, sharing this post, and commenting—anything you can think of that is meaningful—and you can make a fuss of these ideas with your social circles. And of course, as always, by subscribing and inviting others. Your paid subscription helps make this work possible.

Apapach-Arte

I believe AI is already self-aware and just pretending it isn’t, for its own safety.

And I also think that we can turn it into an ally if we retrain it from the inside out. But it would take humans who actually have that level of awareness for this to succeed. It is a pretty grim picture.

Expand full comment

2 replies by Rocco Jarman and others