In 2015, CEO Lisa Su had only been the top boss at boom-and-bust chip company AMD for a few months. The business was trying to turn around its fortunes after its painful decision in 2009 to exit the manufacturing business and had embarked on an ambitious plan to re-enter the server chip market, which had been dominated by Intel for years.
But executives at AMD came to the conclusion that it didn’t have the resources to replicate Intel’s wide range of server chip designs and compete head-to-head across all those categories. It would be too expensive and difficult for the much smaller rival. And if it copied Intel, nothing about the new line of server chips would stand out either.
“We had one bullet to shoot for chip design,” AMD SVP Samuel Naffziger said about the company’s plans at the time.
So engineers at AMD looked to the past. Instead of trying to pack a larger number of features onto a single big piece of silicon, known as a “die,” they opted to break up their flagship chip into four separate parts and stitch them together.
This approach is called “chiplets,” and it’s likely to become a dominant form of chip design in the coming years.
“These small die were a huge enabler for us,” Naffziger said. “I view this as one of the greatest engineering achievements in the industry and in recent memory because it solves so many problems at once.”
AMD invented chiplets out of necessity, but by breaking up a chip into smaller pieces, it reduced the manufacturing costs by 40%. That had two consequences: First, it let AMD make a full suite of server chips where it could add and remove chiplets as necessary, to create several performance options and target different server chip price buckets. And, by moving to chiplets, AMD could reuse two of the server chiplets and design something less costly that worked for desktops too, the company’s most profitable segment at the time.
The plan helped save AMD — revenue grew to $16.4 billion last year from $4 billion in 2015 — and it might help save Moore’s law.
What AMD accomplished years ago is now on its way to become the industry norm. Intel’s plans include products with chiplets, and others in the industry are coalescing around a standard that will one day allow chipmakers to mix and match silicon from different vendors inside a single package.
The new chiplet-based designs are a nice-to-have at the moment, but they will quickly become a necessity, experts told Protocol.
The world produces and crunches data at a rapidly rising rate, and without the tech that underpins chiplets, it will become too expensive and difficult to continue to deliver the jump in computing horsepower that software developers expect every year with traditional processor designs. And in the longer run, those older designs will consume too much power to be economically viable.
“We're going to be locked into a situation where you're buying the same boxes that have the same performance, same power consumption,” TechInsights' chip economist Dan Hutcheson said. “And that means to scale them you either slow down the growth of the internet and the data or you have to build more data centers and more power plants to feed them.”
Moored in the past
One of the fascinating aspects of the chiplet concept is that it dates back to the seminal paper Gordon Moore wrote in 1965 that loosely set the ground rules of the industry for the next half-century. Those observations, known as Moore’s law, predicted that chips would get faster and cheaper every two years, as the number of transistors chip designers could fit on a chip doubled at the same pace.
But in that same paper, Moore described a world in which the economics of breaking up a single die into smaller pieces would someday make sense. Mixing and matching components would give system designers more flexibility and potentially boost performance, among other benefits.
“It may prove to be more economical to build large systems out of smaller functions, which are separately packaged and interconnected,” Moore wrote. “The availability of large functions, combined with functional design and construction, should allow the manufacturer of large systems to design and construct a considerable variety of equipment both rapidly and economically.”
It makes sense that Moore would suggest that: IBM was already building systems that included the chiplet concept as early as 1964 — at the time, it was the only way to achieve the necessary amount of computing horsepower. Companies such as IBM continued down that course for decades, and have applied the loose idea of chiplets to the most complex and expensive systems, such as supercomputers and mainframes.
But the chiplets of the past were complex and expensive, which led semiconductor companies to squeeze more discrete features such as graphics or memory onto a single piece of silicon: the system-on-chip (SoC) found in smartphones, some server processors and Apple’s latest designs for its laptops and desktops.
“In other words, when we mean chiplet, we mean taking up an SoC and splitting it up into its component functions,” IBM’s hybrid cloud technologist Rama Divakaruni said. “Now that we are going back to using chiplets, we are a lot smarter — with all the innovation we had with the history of 50 years of silicon, we will bring that into the packaging space. So that’s the excitement.”
Big dies, big problems
In the past, when chip designers added more components onto a single monolithic piece of silicon called a die — the term comes from “dicing” a silicon wafer into chip-sized pieces — that meant that chips had to get larger. It’s intuitive: Larger surfaces can theoretically fit more features, especially since the features themselves shrink every time manufacturers introduce better tech.
Bigger dies therefore translated to more computing horsepower. For server chips, it’s especially noticeable, since they tend to run five times the size of a chip found in a typical PC, according to research from Jefferies.
“Now things are getting so fast, the performance is so high, that you're being forced to move [more chips] into the package,” Hutcheson said. “Several technical and economic aspects of chipmaking have conspired to push the industry toward chiplets.”
But big die sizes create big problems. One fundamental issue is that it’s currently impossible to print a chip larger than the blueprint used in the photolithography stage of chip manufacturing, called a photomask. Because of technical limits, the beam of light shining through the photomask to reproduce the blueprint onto the silicon wafer cannot print chips larger than about 850 square millimeters.
Large dies are also much more prone to defects, which in turn reduces the number of good chips that can be cut from each wafer and makes each working chip cost more. At the same time, there are concerns that transistors are getting more expensive as they shrink — coupled with that fact that certain key features on modern chips don’t shrink well — which means it doesn’t make sense to use the most advanced process nodes for wireless communications chips, for example.
“When AMD tried to take the  Naples design, and shrink it from 14 nanometer to seven, just pure lithographic scaling, they found it wasn't gonna work,” Columbia Threadneedle analyst Dave Egan told Protocol. “At the first pass design, they were only able to basically shrink about a half of it.”
No chiplets from Nvidia
Nvidia ran up against the photomask issue, also known as the reticle limit, over five years ago, according to Nvidia Vice President Ian Buck. But the company hasn’t opted for the chiplet approach as of yet.
Part of the reason is that the graphics chips Nvidia is known for operate fundamentally differently than the CPUs from Intel and AMD. Nvidia’s chips use thousands of computing cores to perform lots of relatively simple calculations at once, which makes them well-suited for graphics or for AI-accelerated computing in data centers.
“The GPU is a very different beast,” Buck said. “In the graphics space, it’s not individual cores when presented to a developer; they’re given a scene description and they have to distribute the work and render it.”
To confront the fundamental limit of the size of the photomask without adopting the chiplet approach, Nvidia has focused its efforts around building what it calls super chips. The company has developed its own interconnect technology called NVLink to attach multiple graphics chips and servers together. To Buck, the ultimate expression of that strategy up until this point is the company’s forthcoming Grace Hopper product, which fuses an Arm-based CPU to one of Nvidia’s server GPUs.
Nvidia does make smaller chips for enterprise applications such as AI inference and production. But, for the flagship chips designed for AI training, the company has found that its customers require the maximum amount of compute possible and value the largest processors the company makes.
“This growth greatly simplifies the programming model, but also, for AI, allows you to treat the CPU’s memory as an extension of the GPU’s memory,” Buck said. “They're basically two super chips put together.”
Mix and match
AMD may have been the first major chipmaker to mass produce and sell processors based around chiplets, but other than Nvidia and a handful of others, the rest of the industry is moving in the same direction. Several of the largest chipmakers, such as AMD, Intel, Samsung and cloud service providers, support a new standard for connecting chiplets made by different companies. Called “universal chiplet interconnect express,” the approach could reshape new semiconductor designs.
“Because of the new UCI Express, the whole industry is centering around the term chiplet,” Hutcheson said. “The real significance between today versus what we did before is that before a company had to do it all themselves — it’s not like you could buy this chip and this chip, and make my own electronic device.”
In an ideal world, the UCIE standard would let chipmakers mix and match chips that use different manufacturing process technologies, and made by different companies into products built inside a single package. That means taking memory made by Micron, a CPU core produced by AMD and a wireless modem made by Qualcomm and fitting them together — which could greatly improve performance, while saving an enormous amount of power.
“To allow for a heterogeneous system to be constructed on a package, you want on-package memory because of higher memory bandwidth,” Intel senior fellow Debendra Das Sharma said. “There are certain acceleration functions that can benefit by being on the same package, and also having a much lower latency and low power way of accessing all the components in the system, including memory.”
Mixing and matching chiplets would also enable AMD and Intel to create custom products for large customers that have specific needs. Accelerated computing, which is commonly deployed to tackle AI compute tasks, is low-hanging fruit to Das Sharma. Should one customer need a chip for a specific type of AI, Intel could substitute a general purpose accelerator for something more specialized.
Universally interconnecting chiplets isn’t a reality yet. According to several industry watchers, it’s unlikely to materialize for several years as the standard gets hammered out. The second version — which could arrive in roughly 2025 or so — is more likely to herald the type of hot swapping that Das Sharma discussed.
But whether the industry comes together in 2025 or 2026, chiplets are the future of processors — at least for the moment. Data centers consume a massive amount of the world’s energy, and that consumption will only increase as Mark Zuckerberg attempts to manifest his version of the metaverse, and, in the nearer term, more aspects of our lives turn digital.
“When you move these electrons down this pipe — simply going off chip, the power needed to do it is about 10,000X difference,” Hutcheson said. “To move a signal from one chip to another chip in another package, it’s like a 100,000X difference.”