Google’s multicloud national AI research plan could cost $500M a year. It wants first crack at the data

All three big cloud providers — Amazon, Google and Microsoft — want in on a huge national project to build an AI research hub, but Google has specific plans. Especially when it comes to processing the data.

Andrew Moore

Google's vice president and general manager for AI and industry solutions in its cloud unit, Andrew Moore, sits on the government task force guiding the project.

Photo: The National Security Commission on Artificial Intelligence

Google has big ideas for a massive federally funded AI cloud research project, and it thinks they are worth $500 million a year. All it wants is first dibs on vast amounts of raw government data.

The company wants the U.S. government to pony up at least half a billion dollars annually to fund a giant national hub for AI research, according to Google’s response to a request for stakeholders to weigh in on the project. Already in the works, the National AI Research Resource — or NAIRR — is expected to benefit all three of the largest commercial cloud services — including Google Cloud. But Google has devised a detailed plan for how it could be built and how Google should be involved. And its vice president and general manager for AI and industry solutions in its cloud unit, Andrew Moore, sits on the government task force guiding the project.

If the company has its way, the proposed funding would not just create a new contract for Google Cloud, it could benefit other divisions under the Alphabet umbrella including its urban tech unit Sidewalk Labs.

In a previously unreported bold proposal submitted in October to the federal task force overseeing the project, Google stated, “In order to achieve significant impact, we recommend that the [U.S. government] fund the resource at $500 million/year or more.” The resource is meant to be a repository of data, AI tools and access to computing power necessary for researchers to develop machine learning and other AI systems. It’s in the early stages of planning through a process led by the National Science Foundation and the White House Office of Science and Technology Policy.

Because of its focus on large-scale AI, which requires huge datasets and tons of storage and computational capacity, the resource by its nature is likely to involve the top cloud providers. The other two cloud giants, Amazon Web Services and Microsoft Azure, did not propose specific dollar figures, but both companies are eager to get in on the action.

And, although the companies engage in bruising competition to attract enterprise cloud customers, AWS, Google and Microsoft each indicated some willingness to work together to support the initiative. In the end, the project could reap dividends for the AI and cloud industry when it comes to fueling data sources, educating the next generation of much-desired tech talent and spurring increased interest in cloud and AI services from the public sector.

The push for strong commercial ties

All three companies also emphasized the benefits of constructing the research resource on a foundation enabled by commercial cloud providers as opposed to something built by the federal government.

“We believe the NAIRR should be a multi-cloud hosting platform for commercial Cloud resources (as opposed to a new Cloud platform developed by government or academia),” wrote Google, which, as the underdog of the Big Cloud triad, has the most to gain from playing nice with the competition in a multi-cloud setup.

The company highlighted the “security, operational, and energy efficiency” benefits of partnering with the cloud experts rather than building in-house. “Building a new platform from the ground up would require a huge investment of dollars and expertise, and even once built would not have the advantages brought by the scale of existing Cloud providers.”

Meanwhile, Amazon used the task force’s request for information as an opportunity to pitch its AI and cloud products and services. “As a leading cloud service provider, AWS’s compute, storage, AI/ML, and data analytics services can form the backbone of NAIRR’s shared research infrastructure,” noted the company. Sometimes AWS even ventured into sales-deck territory: “The AWS Global Cloud Infrastructure [can enable] the NAIRR to deploy application workloads across the globe in a single click,” and its pre-trained AI services “can provide ready-made intelligence,” the company wrote.

Microsoft didn’t shy away from the sales opportunity either. Plus, it had flow charts: One chart featured an AI technology stack built from Microsoft services including the Azure Open Data repository and Azure Machine Learning.

Manish Parashar, director of NSF’s Office of Advanced Cyberinfrastructure and co-chair of the task force, told Protocol it’s too early to know whether or how private sector cloud providers might be involved. However, he said there is general consensus that the data and computing service infrastructure underpinning the project will combine existing and new resources.

“This approach would take advantage of campus-level, region-level and national-level resources, creating a federated platform that connects users to a diverse set of resources and facilitates their use through educational tools and user support,” he said.

Following meetings next week and into early next year, the task force will issue reports to Congress in May and November 2022.

Over time, a hybrid approach to standing up the research resource would be the fastest and most cost efficient, according to the Stanford Institute for Human-Centered Artificial Intelligence, which has pushed hard for a cloud-based national AI research hub. In its “Blueprint for the National Research Cloud” published in October, the institute recommended a dual investment strategy involving partnering with commercial cloud providers for computing power at first, then piloting a publicly-owned infrastructure built by commercial vendors but operated by the government — the model for national labs such as the massive Oak Ridge National Laboratory.

The Stanford team estimated that building a standalone public infrastructure for the research hub would be less expensive in the long run than working under a vendor contracting arrangement. According to their math, if the government were to negotiate a 10% discount with AWS to use its computing services and comparable hardware under constant usage over a five year period, it could cost 7.5 times as much as the estimated costs to run the Summit supercomputer at Oak Ridge, the world’s second-most powerful supercomputer.

“Even in a scenario where [national research cloud] usage fluctuates dramatically, commercial cloud computing could cost 2.8 times Summit’s estimated cost,” they wrote.

All three big cloud companies mentioned existing public-private partnerships they’re involved with to enable cloud services for academic and government research. For instance, they all partner with CloudBank, an NSF-affiliated service led by several universities to provide cloud access to computer science students, as well as a cloud environment for medical research overseen by the National Institutes of Health.

Microsoft even mentioned its partnerships in support of research outside the U.S. including through its government-funded, public-private “AI innovation hub” in Shanghai, China.

Mentioning a partnership with the Chinese government is notable. The task force was established by Congress in 2020 on recommendation from the National Security Commission on AI, which has pushed for billions of dollars in non-defense funding to bolster AI research in the hopes that the U.S. keeps pace with global AI development, particularly with China. The commission has referred to China as a rival in a “race” not just to win AI tech development, but to ensure AI incorporates Western “values.”

Why Google wants to manage the data

From the looks of its own submission to the task force, Google has moved well beyond the sales pitch to the project planning phase. The company’s ideas for the research resource are descriptive even beyond proposing a dollar figure.

Google wants data from the private sector as well as from state and local government sources to be fed into the system — including “some types of sensitive government data” like health, census and financial services data. And it wants researchers that don’t need computing power from the research cloud to be able to access that data. For one thing, that would ensure that researchers from commercial cloud providers like Google, AWS and Microsoft can get at the data. “Rates should be lower and subsidized by the [U.S. government] for academic and government users,” Google wrote.

Stanford’s AI Institute researchers emphasized the need to ensure that the government-funded research hub remain a resource for academic and non-profit researchers, not the private sector. Jen King, privacy and data policy fellow at the institute who helped write the paper, pointed to “the growing brain drain of AI academics into industry,” where it’s easier to access data and computing power. “My colleagues and I explored the question of whether it would make sense to open this resource to private actors, and we concluded that at least initially, doing so would pose legal and logistical issues, as well as distract from the core mission of supporting research in AI.”

Google wants to be as close as possible to the firehose of data that would flow into the research hub. “We recommend that the NAIRR co-locate an instance of Data Commons in all NAIRR clouds, which we would provide as an in-kind contribution.” Essentially what Google is proposing here is that it would manage all the data clean-up work to ensure data quality and standard formatting of countless disparate government data feeds, and it would do it for free. The process is necessary to prepare and unify data to use to train AI models. Once it’s cleaned and standardized by Google, it would sit in a common area accessible through any cloud platform connected to the research resource.

“So, for example, if a researcher wants the population, violent crime rate and unemployment rate of a county, the researcher does not have to go to three different datasets (Census, FBI and BLS), but can instead, get it from a single database, using one schema, one API,” wrote Google. “Co-locating updated versions of Data Commons with the NAIRR would therefore enable more effective use of the resource.”

But despite proposing to do the work at no cost, Google makes a point to highlight how valuable a service it is. “Cleaning a large dataset is no small feat; before making Google datasets publicly available for the open-source community, we spend hundreds of hours standardizing data and validating quality.”

Google’s proposal to do the job pro bono, said Eric Woods, research director at smart city technology research firm Guidehouse Insights, “raises the question of tech companies bearing gifts — what’s in it for them?”

Ultimately the project may not be just about the cloud business for Google. As a leader in extracting value from the world’s information, said Woods, Google could squeeze a lot of value from processing raw government data. “There is value in that data before it’s filtered that can be extracted,” he said. Particularly when it comes to sensitive data that the company may not have access to currently, it could provide new insights and help Google improve algorithms for various aspects of its business — for starters, its search and maps products.

The resource could become a huge dumping ground for regularly updated, raw data feeds from federal agencies, states, municipalities or even private entities across the country. As the official cleaning crew, Google could access information it has not been able to see before, in a form others could not see once it’s cleaned and formulated for access through a data commons. Perhaps more importantly, it could give Google the power to decide how that information is organized, labeled and formatted.

Matt Tarascio, senior vice president of Artificial Intelligence at consulting and research firm Booz Allen agreed that having first-hand knowledge of data flows and what information looks like before clean-up would enhance Google’s algorithmic prowess. “There’s significant value in understanding the data streams and where the data comes from,” he said.

Having that sort of data access and decision-making power could be particularly beneficial for Google sibling, Alphabet-owned Sidewalk Labs, a company that uses municipal and other public and commercial data to build algorithmic tech for city governments, energy utilities, real estate developers and healthcare providers. “They would enhance their ability to understand and cleanse messy, public datasets,” said Woods. Sidewalk Labs itself proposed use of a data commons in conjunction with its failed “city of the future” experiment in Canada, Sidewalk Toronto.

If Google were to be chosen to process the data for the research hub, there are bound to be concerns about a commercial entity managing it, said Woods. “That’s exactly the debate that was going on around Sidewalk Toronto,” he said. When Google proposed using a data commons there, he said, “Others were saying, hang on, who’s ultimately got control over this?”


How 'Dan from HR' became TikTok’s favorite career coach

You can get a lot of advice about corporate America on TikTok. ‘Dan from HR’ wants to make sure you’re getting the right instruction.

'Dan from HR' has posted hundreds of videos on his TikTok account about everything from cover letters to compensation.

Image: Dan Space

Daniel Space downloaded TikTok for the same reason most of us did. He was bored.

At the beginning of the COVID-19 pandemic, Space wanted to connect with his younger cousin, who uses TikTok, so he thought he’d get on the platform and try it out (although he refused to do any of the dances). Eventually, the algorithm figured out that Space is a longtime HR professional and fed him a post with resume tips — the only issue was that the advice was “really horrible,” he said.

Keep Reading Show less
Sarah Roach

Sarah Roach is a reporter and producer at Protocol (@sarahroach_) where she contributes to Source Code, Protocol's daily newsletter. She is a recent graduate of George Washington University, where she studied journalism and mass communication and criminal justice. She previously worked for two years as editor in chief of her school's independent newspaper, The GW Hatchet.

Sponsored Content

A CCO’s viewpoint on top enterprise priorities in 2022

The 2022 non-predictions guide to what your enterprise is working on starting this week

As Honeywell’s global chief commercial officer, I am privileged to have the vantage point of seeing the demands, challenges and dynamics that customers across the many sectors we cater to are experiencing and sharing.

This past year has brought upon all businesses and enterprises an unparalleled change and challenge. This was the case at Honeywell, for example, a company with a legacy in innovation and technology for over a century. When I joined the company just months before the pandemic hit we were already in the midst of an intense transformation under the leadership of CEO Darius Adamczyk. This transformation spanned our portfolio and business units. We were already actively working on products and solutions in advanced phases of rollouts that the world has shown a need and demand for pre-pandemic. Those included solutions in edge intelligence, remote operations, quantum computing, warehouse automation, building technologies, safety and health monitoring and of course ESG and climate tech which was based on our exceptional success over the previous decade.

Keep Reading Show less
Jeff Kimbell
Jeff Kimbell is Senior Vice President and Chief Commercial Officer at Honeywell. In this role, he has broad responsibilities to drive organic growth by enhancing global sales and marketing capabilities. Jeff has nearly three decades of leadership experience. Prior to joining Honeywell in 2019, Jeff served as a Partner in the Transformation Practice at McKinsey & Company, where he worked with companies facing operational and financial challenges and undergoing “good to great” transformations. Before that, he was an Operating Partner at Silver Lake Partners, a global leader in technology and held a similar position at Cerberus Capital LP. Jeff started his career as a Manufacturing Team Manager and Engineering Project Manager at Procter & Gamble before becoming a strategy consultant at Bain & Company and holding executive roles at Dell EMC and Transamerica Corporation. Jeff earned a B.S. in electrical engineering at Kansas State University and an M.B.A. at Dartmouth College.

1Password's CEO is ready for a password-free future

Fresh off a $620 million raise, 1Password CEO Jeff Shiner talks about the future of passwords.

1Password is a password manager, but it has plans to be even more.

Business is booming for 1Password. The company just announced it has raised $620 million, at a valuation of $6.8 billion, from a roster of A-list celebrities and well-known venture capitalists.

But what does a password manager need with $620 million? Jeff Shiner, 1Password’s CEO, has some plans. He’s building the team fast — 1Password has tripled in size in the last two years, up to 500 employees, and plans to double again this year — while also expanding the vision of what a password manager can do. 1Password has long been a consumer-first product, but the biggest opportunity lies in bringing the company’s knowhow, its user experience, and its security chops into the business world. 1Password already has more than 100,000 business customers, and it plans to expand fast.

Keep Reading Show less
David Pierce

David Pierce ( @pierce) is Protocol's editorial director. Prior to joining Protocol, he was a columnist at The Wall Street Journal, a senior writer with Wired, and deputy editor at The Verge. He owns all the phones.

Boost 2

Can Matt Mullenweg save the internet?

He's turning Automattic into a different kind of tech giant. But can he take on the trillion-dollar walled gardens and give the internet back to the people?

Matt Mullenweg, CEO of Automattic and founder of WordPress, poses for Protocol at his home in Houston, Texas.
Photo: Arturo Olmos for Protocol

In the early days of the pandemic, Matt Mullenweg didn't move to a compound in Hawaii, bug out to a bunker in New Zealand or head to Miami and start shilling for crypto. No, in the early days of the pandemic, Mullenweg bought an RV. He drove it all over the country, bouncing between Houston and San Francisco and Jackson Hole with plenty of stops in national parks. In between, he started doing some tinkering.

The tinkering is a part-time gig: Most of Mullenweg’s time is spent as CEO of Automattic, one of the web’s largest platforms. It’s best known as the company that runs WordPress.com, the hosted version of the blogging platform that powers about 43% of the websites on the internet. Since WordPress is open-source software, no company technically owns it, but Automattic provides tools and services and oversees most of the WordPress-powered internet. It’s also the owner of the booming ecommerce platform WooCommerce, Day One, the analytics tool Parse.ly and the podcast app Pocket Casts. Oh, and Tumblr. And Simplenote. And many others. That makes Mullenweg one of the most powerful CEOs in tech, and one of the most important voices in the debate over the future of the internet.

Keep Reading Show less
David Pierce

David Pierce ( @pierce) is Protocol's editorial director. Prior to joining Protocol, he was a columnist at The Wall Street Journal, a senior writer with Wired, and deputy editor at The Verge. He owns all the phones.


Biden wants to digitize the government. Can these techies deliver?

A December executive order requires federal agencies to overhaul clunky systems. Meet the team trying to make that happen.

The dramatic uptick in people relying on government services, combined with the move to remote work, rendered inconvenient government processes downright painful.

Photo: Joe Daniel Price/Getty Images

Early last year, top White House officials embarked on a fact-finding mission with technical leaders inside government agencies. They wanted to know the answer to a specific question: If there was anything federal agencies could do to improve the average American’s experience interacting with the government, what would it be?

The list, of course, was a long one.

Keep Reading Show less
Issie Lapowsky

Issie Lapowsky ( @issielapowsky) is Protocol's chief correspondent, covering the intersection of technology, politics, and national affairs. She also oversees Protocol's fellowship program. Previously, she was a senior writer at Wired, where she covered the 2016 election and the Facebook beat in its aftermath. Prior to that, Issie worked as a staff writer for Inc. magazine, writing about small business and entrepreneurship. She has also worked as an on-air contributor for CBS News and taught a graduate-level course at New York University's Center for Publishing on how tech giants have affected publishing.


5 takeaways from Microsoft's Activision Blizzard acquisition

Microsoft just bought one of the world’s largest third-party game publishers. What now?

The nearly $70 billion acquisition gives Microsoft access to some of the most valuable brands in gaming.

Image: Microsoft Gaming

Just one week after Take-Two took the crown for biggest-ever industry acquisition, Microsoft strolled in Tuesday morning and dropped arguably the most monumental gaming news bombshell in years with its purchase of Activision Blizzard. The deal, at nearly $70 billion in all cash, dwarfs Take-Two’s purchase of Zynga, and it stands to reshape gaming as we know it.

The deal raises a number of pressing questions about the future of Activision Blizzard’s workplace culture issues, exclusivity in the game industry and whether such massive consolidation may trigger a regulatory response. None of these may be easily answered anytime soon, as the deal could take up to 18 months to close. But the question marks hanging over Activision Blizzard will loom large in the industry for the foreseeable future as Microsoft navigates its new role as one of the three largest game makers on the planet.

Keep Reading Show less
Nick Statt
Nick Statt is Protocol's video game reporter. Prior to joining Protocol, he was news editor at The Verge covering the gaming industry, mobile apps and antitrust out of San Francisco, in addition to managing coverage of Silicon Valley tech giants and startups. He now resides in Rochester, New York, home of the garbage plate and, completely coincidentally, the World Video Game Hall of Fame. He can be reached at nstatt@protocol.com.
Latest Stories