Why Google’s Partha Ranganathan is doubling down on custom video chips

YouTube plans to launch two new versions of the custom video transcoding chip it designed called Argos. The first edition made its data centers far more efficient.


Partha Ranganathan was at the heart of the effort to design and deploy Google's Argos chip.

Photo: Chris Ratcliffe/Bloomberg via Getty Images

YouTube’s first-generation Argos video chip made its data centers way more efficient, freeing up expensive processors for demanding tasks. If one is good, two is better.

YouTube has at least two new versions of the custom video transcoding chip in the works, suggesting the company is committed to producing the piece of silicon for the foreseeable future. The video coding unit, or VCU, came into being after Google figured out that Moore’s law — the predictable doubling of chip performance at a lower cost — had become an unreliable way to plan its data center construction.

A tech lead for infrastructure and Google fellow, Partha Ranganathan was at the heart of the effort to design and deploy the Argos chip. Ranganathan co-founded the project and was the chief architect of the chip. He serves on the board of the Open Compute Project Foundation, and prior to his time at Google, Ranganathan worked for HP Labs for more than a decade.

Ranganathan recently discussed with Protocol why Google decided to make custom chips, how it elected to pursue one focused on the compute-intensive workload of video transcoding and the future of hardware-accelerated video.

This interview has been edited and condensed.

Maybe a good place to start is at the beginning. Can you tell us in as much visceral detail as possible how this idea of a chip for video — for YouTube — came into being?

So in the role that I am in, I constantly look at how our infrastructure is evolving, and, about six or seven years back, we realized that Moore's law was dead, and this notion that performance doubles for the same cost every 18 months. So every two years, we used to get double the performance for the same cost. Now it's showing up every four years, and it looks like it's going to get slower. And we said, “Well, I think we need to do something different.” We decided to embrace custom silicon hardware accelerators.

We built this accelerator for machine learning called the [tensor processing unit, or TPU], which was our very first stuff, and I was privileged to help out a little bit with that as well. And so we were doing that, and that project was really doing very well. And we were realizing that things that were not possible earlier were magically happening. We could go on Google Photos, [and] some amazing things that used to take months to train were [now] taking minutes to train, and we were creating new product categories.

So I have an equivalent to that, which is, if an accelerator lands in the fleet, and nobody uses [it], it didn't really land.

So I was coming at it from the point of view of: What is the next big killer application we want to look at? And then we looked at the fleet, and we saw that transcoding was consuming a large fraction of our compute cycle. So we started off saying, “Hey, look, is there something here? This looks like an incredibly compute-intensive workload and it's fairly well defined.”

Building an accelerator is not an easy undertaking; you need a strong stomach for it. One of the big things you saw in the 13-page paper is this: It's all about co-design. So the hardware is really kind of the tip of the iceberg. It's the entire surrounding ecosystem.

I have this quote that my colleagues find very humorous. You know the philosophy quote: If a tree falls in the forest, and nobody heard it fall, did it fall? So I have an equivalent to that, which is, if an accelerator lands in the fleet, and nobody uses [it], it didn't really land. And the point really is you can build hardware: It's not that complicated. You can build amazing hardware. But if you don't build it in a way that our software colleagues can use it, and it can actually work and there is compilation and tools and debugging and deployment and so on — it's a pretty big undertaking, right?

Were there any important “eureka” moments along the way to designing the Argos chip?

The first aha moment was that we needed another accelerator, and video seems to be growing. But one of the big epiphanies we had was that accelerators are not about efficiency. I think it's kind of very contradictory, or counterintuitive. Because most people draw a pie chart of where the cycles go and say, “Hey, here's 30% of my cycles, I'm going to accelerate it with hardware.”

What we realized was that accelerators are all about capabilities. Not only are we going to make all these [tasks] faster, much like machine learning, we're going to create magical experiences that otherwise didn't exist.

I’m thankful we used Google video conference [for this interview], because it’s running on the hardware we developed, it’s running on a VCU. And so this blurring of my image behind me is running on a VCU. And so you can do some really nice stuff with image processing. You could do 8K video, you could do immersive video, you could 360 degrees, you could compress video. So the bandwidth became faster, and you could get quality of service. You could do YouTube TV, you could do cloud gaming, right? So the capabilities, not the efficiencies — the new things that you could do — that’s when we realized we were sitting on something really interesting.

One of the unique things about the Argos chip is that it had, at least compared with a chip company, an unusual amount of collaboration between the hardware designers and software engineers. Can you talk a bit about how that worked?

I still remember the really exhilarating — but sometimes it was a very challenging — conversation about, “Where do we draw the line between hardware and software?” I know the paper was very dry and technical. But to me, it's super exciting because some of the trade-offs we made were the first time in the history we've done these kinds of things. This is the first time on the planet we have warehouse-scale transcoding. And we've never done large-scale distributed transcoding. So what do we put in hardware? What do we put in software? How do we do schedulers? How do we do testing? How do we do high-level synthesis?

High-level synthesis is kind of an emerging technique. But this notion of using a software-centric approach to designing hardware was something that we really pushed on very hard in Argos. There's a whole bunch of software things that are associated with it. So hardware is pretty hard. It's complicated. And it takes a long time. So I work at Google, which is a software company, and we do some of the world's largest software platforms, and something like web search or Android are huge, complex software, code bases, but we still kind of make updates to them fairly frequently.

When I go to my software colleagues and I tell them, “Hey, look, I have a hardware idea — I'm going to change your business model. I’ll come back in two years and I will get you something at that time.” They look at me and say, “Two years? That's a long lead time for me to get something all right.” And they've often asked me, “Why don't we do it the same way software people do? Why don't you kind of do incremental things? Why don't you do agile development?” And I always tell them, “Hey, look, hardware is hard. It's different.”

And so one of the approaches, I think, in the post-Moore's law world, is this notion of: How can you have software-defined hardware? The idea really is, can you use high-level synthesis techniques [or HLS]? And what was very notable was we actually did some nice innovations in that space as well, which we didn't talk about too much.

What did using high-level synthesis for your designs achieve?

So what it does is we got hardware — there’s a term called PPA, P for power, P for performance and P for area, so that’s how you look at hardware. So we got similar power, similar performance, similar area, with maybe a little bit of trade-offs here and there. But what it allowed us to do was iterate much faster, and so the paper actually talked about [an] example of how you could look at a much broader design space. Because we could quickly learn it and simulate it, and see what happens. So you could do a much more systematic design-space exploration.

We use a lot of Intel, AMD, and Arm in our fleet, and if somebody delivers something magical, I will use it.

But you could also start to be very nimble about late additions, and we actually had an example in the paper of a last minute additional — we decided we need to add a little bit, the algorithm changed. And so we said, no worries, we can go ahead and compile the hardware. That’s what the whole HLS is about.

Were there any stand-out moments once the chip launched? For example, what happened during the early days of the pandemic when many people spent time at home? Did it push the Argos chip to the limit?

When the pandemic hit, usage just went up — like a 25% increase in watch time across the world in a 30-day period. The fact that we had an accelerator lying around that could really stand up to all the demand was pretty useful as well. So that was a very memorable moment for us.

The final step in the chip design process is called a tape out, dating back from the last century when designers literally taped a design together before it went off to the fab. It’s an important moment, even today. How did your team celebrate?

If you've designed hardware — so maybe the right analogy is to think about the most exhausting project that you have done. And it was an adrenaline roller coaster, and so on. And then you finished it, what happened? My suspicion is you’re going to sleep right after that. I was so exhausted. But at Google, we have the tradition that we always celebrate with ice cream and kind of have a party, so we did have all of that. But it wasn't like that one moment where we pressed, the magic button popped up and confetti rained all over like a NASA launch. We hugged each other and did all of that stuff. I wish I could say there was that nice movie-worthy moment.

And so the person who did the tape out sent an email saying, “Hey, look, this is done,” and we have a flurry of congratulatory emails and then everybody's exhausted. Nowadays it's just a bunch of FTP files getting uploaded, and there is not that nice epochal moment where there is literally a transition of physical hardware. So you have to make do with emails being sent out and ice creams being consumed.

What does the future for accelerator chips look like? Does Google want the likes of Intel, AMD or Nvidia to start making custom video accelerator chips?

I've been working in this area for multiple decades, and this is by far the most exciting time that I've had — the number of opportunities that we've had: video, ML, network acceleration and security, data processing, there's so many things to be done. And so when the dust settles there are going to be a bunch of big accelerators. Now, to me, video is easily going to be one of those category accelerators, and we're just touching the tip of the iceberg. So video transcoding is this one small block, and things which we know for sure are very important, and we want to do this.

But if you start looking at how much video is central to our lives — and I think, for good or bad, the pandemic has made video even more central — and you saw that how many kids used YouTube and cloud gaming during the first few months of the pandemic, how videoconferencing is [the] default. I was at a conference last week. And it's now the default to have a hybrid, right? And then I look at the number of IoT devices like cameras that are capturing the images, cameras and manufacturing that are looking for quality checking. Video is going to be teaching computers to see, and the computers are going to be everywhere. That’s why I think video will be an integral part of our lives.

So all of this is a long-winded way of saying there's plenty of opportunities. And I really see a very, very vibrant cottage industry of us all figuring out how to use and accelerate video in the future. Is it OK, if Intel or AMD does that — I think part of the reason why we publish the paper is we would love for the entire industry to understand the importance of this problem and kind of build on top of that, because that is how the search works. That's how innovation works, people built on top of each other, and again, at the end of the day, I'm looking for us to deliver magical experiences. And if somebody else delivers hardware — we use a lot of Intel, AMD, and Arm in our fleet, and if somebody delivers something magical, I will use it.

Because we are in the business of using hardware to build even bigger, magical experiences. On top of that, if it turns out we have awareness of a problem that needs hardware, and we think we can do it well, like we did with Argos, we will continue to do that. And I think I see a very, very rich road map, and have ideas in the future, that we can continue to accelerate.


Judge Zia Faruqui is trying to teach you crypto, one ‘SNL’ reference at a time

His decisions on major cryptocurrency cases have quoted "The Big Lebowski," "SNL," and "Dr. Strangelove." That’s because he wants you — yes, you — to read them.

The ways Zia Faruqui (right) has weighed on cases that have come before him can give lawyers clues as to what legal frameworks will pass muster.

Photo: Carolyn Van Houten/The Washington Post via Getty Images

“Cryptocurrency and related software analytics tools are ‘The wave of the future, Dude. One hundred percent electronic.’”

That’s not a quote from "The Big Lebowski" — at least, not directly. It’s a quote from a Washington, D.C., district court memorandum opinion on the role cryptocurrency analytics tools can play in government investigations. The author is Magistrate Judge Zia Faruqui.

Keep ReadingShow less
Veronica Irwin

Veronica Irwin (@vronirwin) is a San Francisco-based reporter at Protocol covering fintech. Previously she was at the San Francisco Examiner, covering tech from a hyper-local angle. Before that, her byline was featured in SF Weekly, The Nation, Techworker, Ms. Magazine and The Frisc.

The financial technology transformation is driving competition, creating consumer choice, and shaping the future of finance. Hear from seven fintech leaders who are reshaping the future of finance, and join the inaugural Financial Technology Association Fintech Summit to learn more.

Keep ReadingShow less
The Financial Technology Association (FTA) represents industry leaders shaping the future of finance. We champion the power of technology-centered financial services and advocate for the modernization of financial regulation to support inclusion and responsible innovation.

AWS CEO: The cloud isn’t just about technology

As AWS preps for its annual re:Invent conference, Adam Selipsky talks product strategy, support for hybrid environments, and the value of the cloud in uncertain economic times.

Photo: Noah Berger/Getty Images for Amazon Web Services

AWS is gearing up for re:Invent, its annual cloud computing conference where announcements this year are expected to focus on its end-to-end data strategy and delivering new industry-specific services.

It will be the second re:Invent with CEO Adam Selipsky as leader of the industry’s largest cloud provider after his return last year to AWS from data visualization company Tableau Software.

Keep ReadingShow less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Image: Protocol

We launched Protocol in February 2020 to cover the evolving power center of tech. It is with deep sadness that just under three years later, we are winding down the publication.

As of today, we will not publish any more stories. All of our newsletters, apart from our flagship, Source Code, will no longer be sent. Source Code will be published and sent for the next few weeks, but it will also close down in December.

Keep ReadingShow less
Bennett Richardson

Bennett Richardson ( @bennettrich) is the president of Protocol. Prior to joining Protocol in 2019, Bennett was executive director of global strategic partnerships at POLITICO, where he led strategic growth efforts including POLITICO's European expansion in Brussels and POLITICO's creative agency POLITICO Focus during his six years with the company. Prior to POLITICO, Bennett was co-founder and CMO of Hinge, the mobile dating company recently acquired by Match Group. Bennett began his career in digital and social brand marketing working with major brands across tech, energy, and health care at leading marketing and communications agencies including Edelman and GMMB. Bennett is originally from Portland, Maine, and received his bachelor's degree from Colgate University.


Why large enterprises struggle to find suitable platforms for MLops

As companies expand their use of AI beyond running just a few machine learning models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

As companies expand their use of AI beyond running just a few machine learning models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

Photo: artpartner-images via Getty Images

On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.

Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.

Keep ReadingShow less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories