The Intelligence Dividend: Countries of Geniuses in Data Centres

Also MCP servers, tech standards, and reliability and verifying tasks.

Mar 31, 2025

Welcome to The Intelligence Dividend, an experimental newsletter about AI, its technology, and the economies it sits within.

Marcel Duchamps, *The Bride Stripped Bare by Her Bachelors, Even* (1923)

Countries of Geniuses in Data Centres

Here is a simple model for a research laboratory:

Take a lot of smart people and put them in a room
Give them all the tools and resources they need to do their jobs
Make sure they work for a lot of the time on specific given problem

Research laboratories are fairly productive places: R&D contributes somewhere between 2.5-3.5% of US GDP, which is around one trillion dollars, which is a lot of dollars, and the world manages to produce a steady stream of new discoveries despite the fact that research productivity seems to be declining.1

How do we get more research? One way is to just do 1-3, but a lot more. Fire all the people in the room and replace them with even smarter people, give them even more resources and tools, and make them work much harder.

If you do this with superhuman intelligent AI, you reach the ‘country of geniuses in a data centre’ hypothesis, as Dario Amodei, CEO of Anthropic, put it in his Machines of Loving Grace essay.

If we did have a country of geniuses in a data centre, what would we expect to see? Probably more progress in basic science and research than we have now. But it’s not obvious how much. Amodei writes:

[Y]ou might think that the world would be instantly transformed on the scale of seconds or days (“the Singularity”), as superior intelligence builds on itself and solves every possible scientific, engineering, and operational task almost immediately. The problem with this is that there are real physical and practical limits, for example around building hardware or conducting biological experiments. Even a new country of geniuses would hit up against these limits. Intelligence may be very powerful, but it isn’t magic fairy dust.

Intelligence might be a significant bottleneck on research output, but it is hardly the only bottleneck, and in some cases it might not even be the main bottleneck. Simply increasing the IQ of everybody working on a problem doesn’t necessarily solve the problem.

Nor does increasing the number of hours they work. Computers are useful because they don’t need to take as many breaks as humans do, and they don’t (currently) have unions, so you can work them much harder. But the number of hours worked isn’t the only bottleneck either.

So if you’re trying to work out how much more progress we get with advanced AI systems, your model needs to incorporate how the returns to intelligence and hours worked diminish, and that is a really complicated and difficult thing to predict. (Predicting it perfectly is roughly the same thing as planning the economy, which is famously difficult.)

This insight is Tyler Cowen’s biggest contribution to the AI-economy debate:

2. Human bottlenecks become more important, the more productive is AI. Let’s say AI increases the rate of good pharma ideas by 10x. Well, until the FDA gets its act together, the relevant constraint is the rate of drug approval, not the rate of drug discovery.
2b. These do not have to be regulatory obstacles, though many are. It may be slow adopters, people in the workplace who hate AI, energy constraints, and much more. It simply is not the case that all workplace inputs are rising in lockstep, quite the contrary.

And Ege Erdil makes the same point in a new, refreshingly sober, interview:

If you just automate [math] … it might be true that scientific progress needs math to happen but like that doesn't mean if you just automate math you'll suddenly be like inventing tons of new exotic technologies that are going to increase economic growth by a ton

Erdil’s point is that things like agency and the ability to move reliably through the world, and, importantly, common sense, are often binding constraints on getting research done:

[P]eople often neglect the important capabilities that humans have that enable them to be competent economic agents, that enable them to do most jobs because it just looks normal to us

This seems right to me. There are clearly lots of bottlenecks to research that don’t automatically get resolved or scale with more intelligence, and don’t seem to be captured well in the benchmarks:

Scientific discoveries often require experimental verification, and lots of things in the real world move at a slow pace;
There might be fundamental complexity barriers in certain domains;
The amount of energy and compute available to run these models will be limited, at least for the foreseeable future – models will have to compete for it;
For at least some domains, the amount of work that a given problem needs seems to increase as the field matures and the lower-hanging fruit gets picked.

It also seems quite possible that putting AIs to work together will cause social problems, especially given that they are trained on human data and (we hope) will be aligned to human values. The rate of productivity increase from new hires in human labs decreases when labs get larger, due to things like coordination overhead, redundancy, and politics; it isn’t obvious that these problems won’t also exist with swarms of AI agents.

It is easy to wave away these problems by saying that you just increase the IQ or the hours worked until the problem goes away, but this is only the case in a world with virtually unlimited compute. Even with infinite compute, there will still be bottlenecks that can’t be accelerated computationally.

OpenAI Adoption Means MCP Is Actually Now A Standard

Anthropic’s Model Context Protocol (MCP) had an inauspicious start. It was released in November 2024 without much acknowledgement, but over the past couple of weeks it has become sexy, and now everybody is talking about it.

Daddy Sam (and his team) announced last week that OpenAI will be rolling out app-wide and API-wide support for MCP soon:

people love MCP and we are excited to add support across our products.

available today in the agents SDK and support for chatgpt desktop app + responses api coming soon!

OpenAI’s adoption of MCP has upgraded it from an Anthropic curio to an actual standard, and in doing so has made possible its most powerful feature: a tool registry.

Let’s say you’re running a bookshop, and you have a list of books you’d like to make available to an AI shopping agent. You might offer an API to do this: the API could have endpoints to list books, maybe perform a keyword search over that list, and place and manage orders. MCP gives you a standard way to describe all of those operations specifically for LLMs to use, allowing the agent access to the tool without having to think much about how to integrating with it, and allowing you to give the agent access to the tool without ceding control about how it gets used or when it changes.

But in order to use this API, the agent needs to know about it in the first place – it needs to know that it can reach for it when it needs it, ideally without you having to somehow sneak your way into the context window. What we need is discoverability of these tools.

This is what a tool registry offers, something that Anthropic are currently working on. A tool registry has two key benefits:

Discoverability: a tool registry allows models to discover tools automatically (the model can ask “give me a tool that can take a latitude and longitude and return the weather at that point”, or “give me a tool that lets me search by ISBN and then place an order for the book” and get the description of the tool)
Network effects: in theory, each new tool added to the registry makes all the existing clients more capable without requiring upgrades or changes to the agents themselves

Discoverability is not a new concern, and nor is an MCP registry a new idea. Web APIs have long grappled with discoverability, and engineers have clearly often wanted it to be automated, something which motivated data description languages such as WSDL. But outside of a handful of specific cases, automated discoverability isn’t actually that useful; if your application-level code doesn’t support the new services it discovers, there is not a huge benefit to letting it discover those services.

But giving an intelligent agent the capacity to discover new services might be the key to making this more useful, since the agent has a much greater ability to integrate new services on-the-fly without much planning.

The fact that it is Anthropic themselves building an official MCP registry is also beneficial: it adds a crucial trust layer, in the same way that Docker Hub provides some legitimacy and first-party control over what gets distributed, and other package registries, such as PyPi or even Github, provide reputation signals (eg stars) that allow human developers to make judgements.

But this is all theory. Will MCP actually prove useful? Probably – the past fifty years of computing has been built upon standards that look a lot like it. But it is suggestive that, with all the noise on Twitter about MCP, all I have seen organically are MCP servers. Nobody is talking about clients yet.

Reliability and Verification

How reliably do you need a task to be performed? It is going to depend on at least two factors: how expensive failure is, and how expensive it is to retry.

I would like my heart surgeon to be very reliable, because the cost of them failing to perform my surgery successfully is pretty high. I care less about my toaster not turning on the first time: it might be annoying, but I can just press the lever thing down a few times until it starts working.2

But over time, the annoyance will accrue and I will buy a new toaster. For any given toasting, I might be happy to roll the dice, but over time the stability of the man-toaster relationship depends upon the toaster working when I want it to work, even though the cost of retrying is actually fairly low.

Sergey Filimonov agrees:

AI is still in its infancy, and while early adopters might tolerate complexity and occasional failures, mainstream users demand simplicity and reliability. The truth is that predictable, comprehensible results are far more valuable than spectacular yet erratic performance. In our experience, users will gladly accept modest accuracy—like a consistent 80%—over a flashy but unreliable 90%.

The lesson, Filimonov thinks, is to prioritise specific tasks that can be done with high reliability:

Given the intensifying competition within AI, teams face a difficult balance: move fast and risk breaking things, or prioritize reliability and risk being left behind. The key to navigating this tension is focus—choosing a small number of tasks to execute exceptionally well and relentlessly iterating upon them.

This post did well on Hacker News, and there are some intelligent suggestions in the comments about what to do when failure is expensive:

Perhaps the solutions(s) needs to be less focusing on output quality, and more on having a solid process for dealing with errors. Think undo, containers, git, CRDTs or whatever rather than zero tolerance for errors. That probably also means some kind of review for the irreversible bits of any process, and perhaps even process changes where possible to make common processes more reversible (which sounds like an extreme challenge in some cases).
I can't imagine we're anywhere even close to the kind of perfection required not to need something like this - if it's even possible. Humans use all kinds of review and audit processes precisely because perfection is rarely attainable, and that might be fundamental.

It is indeed the case that existing systems are imperfect – in part because humans are flighty and hungry and horny and these things affect their judgement, and in part because digital systems supervene upon analog systems in ways that create hard thresholds that random events can push a given instance either side of – and we need lots of review and audit processes to handle those imperfections.

But there is a third important factor which Sergey and pals missed: how easy it is to verify whether a task has succeeded at all?

A lot of generative AI use cases are especially useful for tasks that can be verified deterministically: categorisation, writing code, searching for things in text. In these cases, we can check that the response is what we expect it to be very cheaply, set some reliability rate, and then use the traditional tools of software testing (CI/CD checks, benchmarks, manual testing) to make sure it works and continues to work.

But a lot of generative AI use cases are doing things that can’t really be verified without recourse to another intelligent, non-deterministic system: writing essays or technical documentation; answering arbitrary support requests in a friendly and helpful manner; interpreting certain types of data. In these cases, designing the checks to ensure that the responses meet some reliability rate is a lot harder.

Why does this matter? If failure isn’t expensive and a task is cheap to retry AND it is easy to verify, then we can just retry it a bunch of times until we get a successful outcome. In these cases, our reliability rate doesn’t really matter: the task can succeed 1% of the time, and I can just do it 100 times.

What makes this exciting is that AI means we can exchange a modest amount of money for those retries, rather than locking up other, more scarce, resources, like time. The era of the NP-complete problem is upon us.

Things happen

The race to find a competitive advantage is on. Tell me, 4o, of the man of many devices. I am from the government and I am here to teach you AI. Something something measures and targets. New uses for old drugs. Gemini 2.5 is already jailbroken. Zvi recaps the OpenAI board drama.

‘Research productivity seems to be declining’ is a fair but maybe misleading description of what the Bloom et al paper is talking about: the paper doesn’t demonstrate that the productivity of R&D labs themselves are declining, but rather suggests that the next year’s worth of new ideas (understood in a broad sense) are more difficult to find, which could suggest a slow down in research labs but might not necessarily.

There is a subtle dynamic here that I’m going to flag then ignore: at some point, the annoyance will accrue and I will buy a new toaster. But for any given toasting, I’m happy to roll the dice.

The Intelligence Dividend

Discussion about this post

Ready for more?