Google’s multicloud national AI research plan could cost $500M a year. It wants first crack at the data

All three big cloud providers — Amazon, Google and Microsoft — want in on a huge national project to build an AI research hub, but Google has specific plans. Especially when it comes to processing the data.

Andrew Moore

Google's vice president and general manager for AI and industry solutions in its cloud unit, Andrew Moore, sits on the government task force guiding the project.

Photo: The National Security Commission on Artificial Intelligence

Google has big ideas for a massive federally funded AI cloud research project, and it thinks they are worth $500 million a year. All it wants is first dibs on vast amounts of raw government data.

The company wants the U.S. government to pony up at least half a billion dollars annually to fund a giant national hub for AI research, according to Google’s response to a request for stakeholders to weigh in on the project. Already in the works, the National AI Research Resource — or NAIRR — is expected to benefit all three of the largest commercial cloud services — including Google Cloud. But Google has devised a detailed plan for how it could be built and how Google should be involved. And its vice president and general manager for AI and industry solutions in its cloud unit, Andrew Moore, sits on the government task force guiding the project.

If the company has its way, the proposed funding would not just create a new contract for Google Cloud, it could benefit other divisions under the Alphabet umbrella including its urban tech unit Sidewalk Labs.

In a previously unreported bold proposal submitted in October to the federal task force overseeing the project, Google stated, “In order to achieve significant impact, we recommend that the [U.S. government] fund the resource at $500 million/year or more.” The resource is meant to be a repository of data, AI tools and access to computing power necessary for researchers to develop machine learning and other AI systems. It’s in the early stages of planning through a process led by the National Science Foundation and the White House Office of Science and Technology Policy.

Because of its focus on large-scale AI, which requires huge datasets and tons of storage and computational capacity, the resource by its nature is likely to involve the top cloud providers. The other two cloud giants, Amazon Web Services and Microsoft Azure, did not propose specific dollar figures, but both companies are eager to get in on the action.

And, although the companies engage in bruising competition to attract enterprise cloud customers, AWS, Google and Microsoft each indicated some willingness to work together to support the initiative. In the end, the project could reap dividends for the AI and cloud industry when it comes to fueling data sources, educating the next generation of much-desired tech talent and spurring increased interest in cloud and AI services from the public sector.

The push for strong commercial ties

All three companies also emphasized the benefits of constructing the research resource on a foundation enabled by commercial cloud providers as opposed to something built by the federal government.

“We believe the NAIRR should be a multi-cloud hosting platform for commercial Cloud resources (as opposed to a new Cloud platform developed by government or academia),” wrote Google, which, as the underdog of the Big Cloud triad, has the most to gain from playing nice with the competition in a multi-cloud setup.

The company highlighted the “security, operational, and energy efficiency” benefits of partnering with the cloud experts rather than building in-house. “Building a new platform from the ground up would require a huge investment of dollars and expertise, and even once built would not have the advantages brought by the scale of existing Cloud providers.”

Meanwhile, Amazon used the task force’s request for information as an opportunity to pitch its AI and cloud products and services. “As a leading cloud service provider, AWS’s compute, storage, AI/ML, and data analytics services can form the backbone of NAIRR’s shared research infrastructure,” noted the company. Sometimes AWS even ventured into sales-deck territory: “The AWS Global Cloud Infrastructure [can enable] the NAIRR to deploy application workloads across the globe in a single click,” and its pre-trained AI services “can provide ready-made intelligence,” the company wrote.

Microsoft didn’t shy away from the sales opportunity either. Plus, it had flow charts: One chart featured an AI technology stack built from Microsoft services including the Azure Open Data repository and Azure Machine Learning.

Manish Parashar, director of NSF’s Office of Advanced Cyberinfrastructure and co-chair of the task force, told Protocol it’s too early to know whether or how private sector cloud providers might be involved. However, he said there is general consensus that the data and computing service infrastructure underpinning the project will combine existing and new resources.

“This approach would take advantage of campus-level, region-level and national-level resources, creating a federated platform that connects users to a diverse set of resources and facilitates their use through educational tools and user support,” he said.

Following meetings next week and into early next year, the task force will issue reports to Congress in May and November 2022.

Over time, a hybrid approach to standing up the research resource would be the fastest and most cost efficient, according to the Stanford Institute for Human-Centered Artificial Intelligence, which has pushed hard for a cloud-based national AI research hub. In its “Blueprint for the National Research Cloud” published in October, the institute recommended a dual investment strategy involving partnering with commercial cloud providers for computing power at first, then piloting a publicly-owned infrastructure built by commercial vendors but operated by the government — the model for national labs such as the massive Oak Ridge National Laboratory.

The Stanford team estimated that building a standalone public infrastructure for the research hub would be less expensive in the long run than working under a vendor contracting arrangement. According to their math, if the government were to negotiate a 10% discount with AWS to use its computing services and comparable hardware under constant usage over a five year period, it could cost 7.5 times as much as the estimated costs to run the Summit supercomputer at Oak Ridge, the world’s second-most powerful supercomputer.

“Even in a scenario where [national research cloud] usage fluctuates dramatically, commercial cloud computing could cost 2.8 times Summit’s estimated cost,” they wrote.

All three big cloud companies mentioned existing public-private partnerships they’re involved with to enable cloud services for academic and government research. For instance, they all partner with CloudBank, an NSF-affiliated service led by several universities to provide cloud access to computer science students, as well as a cloud environment for medical research overseen by the National Institutes of Health.

Microsoft even mentioned its partnerships in support of research outside the U.S. including through its government-funded, public-private “AI innovation hub” in Shanghai, China.

Mentioning a partnership with the Chinese government is notable. The task force was established by Congress in 2020 on recommendation from the National Security Commission on AI, which has pushed for billions of dollars in non-defense funding to bolster AI research in the hopes that the U.S. keeps pace with global AI development, particularly with China. The commission has referred to China as a rival in a “race” not just to win AI tech development, but to ensure AI incorporates Western “values.”

Why Google wants to manage the data

From the looks of its own submission to the task force, Google has moved well beyond the sales pitch to the project planning phase. The company’s ideas for the research resource are descriptive even beyond proposing a dollar figure.

Google wants data from the private sector as well as from state and local government sources to be fed into the system — including “some types of sensitive government data” like health, census and financial services data. And it wants researchers that don’t need computing power from the research cloud to be able to access that data. For one thing, that would ensure that researchers from commercial cloud providers like Google, AWS and Microsoft can get at the data. “Rates should be lower and subsidized by the [U.S. government] for academic and government users,” Google wrote.

Stanford’s AI Institute researchers emphasized the need to ensure that the government-funded research hub remain a resource for academic and non-profit researchers, not the private sector. Jen King, privacy and data policy fellow at the institute who helped write the paper, pointed to “the growing brain drain of AI academics into industry,” where it’s easier to access data and computing power. “My colleagues and I explored the question of whether it would make sense to open this resource to private actors, and we concluded that at least initially, doing so would pose legal and logistical issues, as well as distract from the core mission of supporting research in AI.”

Google wants to be as close as possible to the firehose of data that would flow into the research hub. “We recommend that the NAIRR co-locate an instance of Data Commons in all NAIRR clouds, which we would provide as an in-kind contribution.” Essentially what Google is proposing here is that it would manage all the data clean-up work to ensure data quality and standard formatting of countless disparate government data feeds, and it would do it for free. The process is necessary to prepare and unify data to use to train AI models. Once it’s cleaned and standardized by Google, it would sit in a common area accessible through any cloud platform connected to the research resource.

“So, for example, if a researcher wants the population, violent crime rate and unemployment rate of a county, the researcher does not have to go to three different datasets (Census, FBI and BLS), but can instead, get it from a single database, using one schema, one API,” wrote Google. “Co-locating updated versions of Data Commons with the NAIRR would therefore enable more effective use of the resource.”

But despite proposing to do the work at no cost, Google makes a point to highlight how valuable a service it is. “Cleaning a large dataset is no small feat; before making Google datasets publicly available for the open-source community, we spend hundreds of hours standardizing data and validating quality.”

Google’s proposal to do the job pro bono, said Eric Woods, research director at smart city technology research firm Guidehouse Insights, “raises the question of tech companies bearing gifts — what’s in it for them?”

Ultimately the project may not be just about the cloud business for Google. As a leader in extracting value from the world’s information, said Woods, Google could squeeze a lot of value from processing raw government data. “There is value in that data before it’s filtered that can be extracted,” he said. Particularly when it comes to sensitive data that the company may not have access to currently, it could provide new insights and help Google improve algorithms for various aspects of its business — for starters, its search and maps products.

The resource could become a huge dumping ground for regularly updated, raw data feeds from federal agencies, states, municipalities or even private entities across the country. As the official cleaning crew, Google could access information it has not been able to see before, in a form others could not see once it’s cleaned and formulated for access through a data commons. Perhaps more importantly, it could give Google the power to decide how that information is organized, labeled and formatted.

Matt Tarascio, senior vice president of Artificial Intelligence at consulting and research firm Booz Allen agreed that having first-hand knowledge of data flows and what information looks like before clean-up would enhance Google’s algorithmic prowess. “There’s significant value in understanding the data streams and where the data comes from,” he said.

Having that sort of data access and decision-making power could be particularly beneficial for Google sibling, Alphabet-owned Sidewalk Labs, a company that uses municipal and other public and commercial data to build algorithmic tech for city governments, energy utilities, real estate developers and healthcare providers. “They would enhance their ability to understand and cleanse messy, public datasets,” said Woods. Sidewalk Labs itself proposed use of a data commons in conjunction with its failed “city of the future” experiment in Canada, Sidewalk Toronto.

If Google were to be chosen to process the data for the research hub, there are bound to be concerns about a commercial entity managing it, said Woods. “That’s exactly the debate that was going on around Sidewalk Toronto,” he said. When Google proposed using a data commons there, he said, “Others were saying, hang on, who’s ultimately got control over this?”


The West’s drought could bring about a data center reckoning

When it comes to water use, data centers are the tech industry’s secret water hogs — and they could soon come under increased scrutiny.

Lake Mead, North America's largest artificial reservoir, has dropped to about 1,052 feet above sea level, the lowest it's been since being filled in 1937.

Photo: Mario Tama/Getty Images

The West is parched, and getting more so by the day. Lake Mead — the country’s largest reservoir — is nearing “dead pool” levels, meaning it may soon be too low to flow downstream. The entirety of the Four Corners plus California is mired in megadrought.

Amid this desiccation, hundreds of the country’s data centers use vast amounts of water to hum along. Dozens cluster around major metro centers, including those with mandatory or voluntary water restrictions in place to curtail residential and agricultural use.

Keep Reading Show less
Lisa Martine Jenkins

Lisa Martine Jenkins is a senior reporter at Protocol covering climate. Lisa previously wrote for Morning Consult, Chemical Watch and the Associated Press. Lisa is currently based in Brooklyn, and is originally from the Bay Area. Find her on Twitter ( @l_m_j_) or reach out via email (ljenkins@protocol.com).

Every day, millions of us press the “order” button on our favorite coffee store's mobile application: Our chosen brew will be on the counter when we arrive. It’s a personalized, seamless experience that we have all come to expect. What we don’t know is what’s happening behind the scenes. The mobile application is sourcing data from a database that stores information about each customer and what their favorite coffee drinks are. It is also leveraging event-streaming data in real time to ensure the ingredients for your personal coffee are in supply at your local store.

Applications like this power our daily lives, and if they can’t access massive amounts of data stored in a database as well as stream data “in motion” instantaneously, you — and millions of customers — won’t have these in-the-moment experiences.

Keep Reading Show less
Jennifer Goforth Gregory
Jennifer Goforth Gregory has worked in the B2B technology industry for over 20 years. As a freelance writer she writes for top technology brands, including IBM, HPE, Adobe, AT&T, Verizon, Epson, Oracle, Intel and Square. She specializes in a wide range of technology, such as AI, IoT, cloud, cybersecurity, and CX. Jennifer also wrote a bestselling book The Freelance Content Marketing Writer to help other writers launch a high earning freelance business.

Indeed is hiring 4,000 workers despite industry layoffs

Indeed’s new CPO, Priscilla Koranteng, spoke to Protocol about her first 100 days in the role and the changing nature of HR.

"[Y]ou are serving the people. And everything that's happening around us in the world is … impacting their professional lives."

Image: Protocol

Priscilla Koranteng's plans are ambitious. Koranteng, who was appointed chief people officer of Indeed in June, has already enhanced the company’s abortion travel policies and reinforced its goal to hire 4,000 people in 2022.

She’s joined the HR tech company in a time when many other tech companies are enacting layoffs and cutbacks, but said she sees this precarious time as an opportunity for growth companies to really get ahead. Koranteng, who comes from an HR and diversity VP role at Kellogg, is working on embedding her hybrid set of expertise in her new role at Indeed.

Keep Reading Show less
Amber Burton

Amber Burton (@amberbburton) is a reporter at Protocol. Previously, she covered personal finance and diversity in business at The Wall Street Journal. She earned an M.S. in Strategic Communications from Columbia University and B.A. in English and Journalism from Wake Forest University. She lives in North Carolina.


New Jersey could become an ocean energy hub

A first-in-the-nation bill would support wave and tidal energy as a way to meet the Garden State's climate goals.

Technological challenges mean wave and tidal power remain generally more expensive than their other renewable counterparts. But government support could help spur more innovation that brings down cost.

Photo: Jeremy Bishop via Unsplash

Move over, solar and wind. There’s a new kid on the renewable energy block: waves and tides.

Harnessing the ocean’s power is still in its early stages, but the industry is poised for a big legislative boost, with the potential for real investment down the line.

Keep Reading Show less
Lisa Martine Jenkins

Lisa Martine Jenkins is a senior reporter at Protocol covering climate. Lisa previously wrote for Morning Consult, Chemical Watch and the Associated Press. Lisa is currently based in Brooklyn, and is originally from the Bay Area. Find her on Twitter ( @l_m_j_) or reach out via email (ljenkins@protocol.com).


Watch 'Stranger Things,' play Neon White and more weekend recs

Don’t know what to do this weekend? We’ve got you covered.

Here are our picks for your long weekend.

Image: Annapurna Interactive; Wizard of the Coast; Netflix

Kick off your long weekend with an extra-long two-part “Stranger Things” finale; a deep dive into the deckbuilding games like Magic: The Gathering; and Neon White, which mashes up several genres, including a dating sim.

Keep Reading Show less
Nick Statt

Nick Statt is Protocol's video game reporter. Prior to joining Protocol, he was news editor at The Verge covering the gaming industry, mobile apps and antitrust out of San Francisco, in addition to managing coverage of Silicon Valley tech giants and startups. He now resides in Rochester, New York, home of the garbage plate and, completely coincidentally, the World Video Game Hall of Fame. He can be reached at nstatt@protocol.com.

Latest Stories