Researchers push to make bulky AI work in your phone and personal assistant

Chipmakers like Nvidia and researchers from Notre Dame want to make huge transformers like large natural-language-process models speedier, more nimble and more energy efficient.

Phones connected on a gray background.

"We want it smaller and smaller, and it has to be more energy efficient.”

Illustration: Christopher T. Fong/Protocol

Transformer networks, colloquially known to deep-learning practitioners and computer engineers as “transformers,” are all the rage in AI. Over the last few years, these models, known for their massive size, large amount of data inputs, big scale of parameters — and, by extension, high carbon footprint and cost — have grown in favor over other types of neural network architectures.

Some transformers, particularly some open-source, large natural-language-processing transformer models, even have names that are recognizable to people outside AI, such as GPT-3 and BERT. They’re used across audio-, video- and computer-vision-related tasks, drug discovery and more.

Now chipmakers and researchers want to make them speedier and more nimble.

“It’s interesting how fast technology for neural networks changes. Four years ago, everybody was using these recurrent neural networks for these language models and then the attention paper was introduced, and all of a sudden, everybody is using transformers,” said Bill Dally, chief scientist at Nvidia during an AI conference last week held by Stanford’s HAI. Dally was referring to an influential 2017 Google research paper presenting an innovative architecture forming the backbone of transformer networks that is reliant on “attention mechanisms” or “self-attention,” a new way to process the data inputs and outputs of models.

“The world pivoted in a matter of a few months and everything changed,” Dally said. To meet the growing interest in transformer use, in March the AI chip giant introduced its Hopper h100 transformer engine to streamline transformer model workloads.

Designing transformer tech for the edge

But some researchers are pushing for even more. There’s talk not only of making compute- and energy-hungry transformers more efficient, but of eventually upgrading their design so they can process fresh data in edge devices without having to make the round trip to process the data in the cloud.

A group of researchers from Notre Dame and China’s Zhejiang University presented a way to reduce memory-processing bottlenecks and computational and energy consumption requirements in an April paper. The “iMTransformer” approach is a transformer accelerator, which works to decrease memory transfer needs by computing in-memory, and reduces the number of operations required by caching reusable model parameters.

Right now the trend is to bulk up transformers so the models get large enough to take on increasingly complex tasks, said Ana Franchesca Laguna, a computer science and engineering PhD at Notre Dame. When it comes to large natural-language-processing models, she said, “It’s the difference between a sentence or a paragraph and a book.” But, she added, “The bigger the transformers are, your energy footprint also increases.”

Using an accelerator like the iMTransformer could help to pare down that footprint, and, in the future, create transformer models that could ingest, process and learn from new data in edge devices. “Having the model closer to you would be really helpful. You could have it in your phone, for example, so it would be more accessible for edge devices,” she said.

That means IoT devices such as Amazon’s Alexa, Google Home or factory equipment maintenance sensors could process voice or other data in the device rather than having to send it to the cloud, which takes more time and more compute power, and could expose the data to possible privacy breaches, Laguna said.

IBM also introduced an AI accelerator called RAPID last year. “Scaling the performance of AI accelerators across generations is pivotal to their success in commercial deployments,” wrote the company’s researchers in a paper. “The intrinsic error-resilient nature of AI workloads present a unique opportunity for performance/energy improvement through precision scaling.”

Farah Papaioannou, co-founder and president at Edgeworx, said she thinks of the edge as anything outside the cloud. “What we’re seeing of our customers, they’re deploying these AI models you want to train and update on a regular basis, so having the ability to manage that capability and update that on a much faster basis [is definitely important],” she said during a 2020 Protocol event about computing at the edge.

Wanted: custom chips

Laguna uses a work-from-home analogy when thinking of the benefits of processing data for AI models at the edge.

“[Instead of] commuting from your home to the office, you actually work from home. It’s all in the same place, so it saves a lot of energy,” she said. She said she hopes research like hers will enable people to build and use transformers in a more cost- and energy-efficient way. “We want it on our edge devices. We want it smaller and smaller, and it has to be more energy efficient.”

Laguna and the other researchers she worked with tested their accelerator approach using smaller chips, and then extrapolated their findings to estimate how the process would work at a larger scale. However, Laguna said that turning the small-scale project into a reality at a larger scale will require customized, larger chips.

Ultimately, she hopes it spurs investment. A goal of the project, she said, “is to convince people that this is worthy of investing in so we can create chips so we can create these types of networks.”

That investor interest might just be there. AI is spurring increases in investments in chips for specific use cases. According to data from PitchBook, global sales of AI chips rose 60% last year to $35.9 billion compared to 2020. Around half of that total came from specialized AI chips in mobile phones.

Systems designed to operate at the edge with less memory rather than in the cloud could facilitate AI-based applications that can respond to new information in real time, said Jarno Kartela, global head of AI Advisory at consultancy Thoughtworks.

“What if you can build systems that by themselves learn in real time and learn by interaction?” he said. “Those systems, you don’t need to run them on cloud environments only with massive infrastructure — you can run them virtually anywhere.”

LA is a growing tech hub. But not everyone may fit.

LA has a housing crisis similar to Silicon Valley’s. And single-family-zoning laws are mostly to blame.

As the number of tech companies in the region grows, so does the number of tech workers, whose high salaries put them at an advantage in both LA's renting and buying markets.

Photo: Nat Rubio-Licht/Protocol

LA’s tech scene is on the rise. The number of unicorn companies in Los Angeles is growing, and the city has become the third-largest startup ecosystem nationally behind the Bay Area and New York with more than 4,000 VC-backed startups in industries ranging from aerospace to creators. As the number of tech companies in the region grows, so does the number of tech workers. The city is quickly becoming more and more like Silicon Valley — a new startup and a dozen tech workers on every corner and companies like Google, Netflix, and Twitter setting up offices there.

But with growth comes growing pains. Los Angeles, especially the burgeoning Silicon Beach area — which includes Santa Monica, Venice, and Marina del Rey — shares something in common with its namesake Silicon Valley: a severe lack of housing.

Keep Reading Show less
Nat Rubio-Licht

Nat Rubio-Licht is a Los Angeles-based news writer at Protocol. They graduated from Syracuse University with a degree in newspaper and online journalism in May 2020. Prior to joining the team, they worked at the Los Angeles Business Journal as a technology and aerospace reporter.

While there remains debate among economists about whether we are officially in a full-blown recession, the signs are certainly there. Like most executives right now, the outlook concerns me.

In any case, businesses aren’t waiting for the official pronouncement. They’re already bracing for impact as U.S. inflation and interest rates soar. Inflation peaked at 9.1% in June 2022 — the highest increase since November 1981 — and the Federal Reserve is targeting an interest rate of 3% by the end of this year.

Keep Reading Show less
Nancy Sansom

Nancy Sansom is the Chief Marketing Officer for Versapay, the leader in Collaborative AR. In this role, she leads marketing, demand generation, product marketing, partner marketing, events, brand, content marketing and communications. She has more than 20 years of experience running successful product and marketing organizations in high-growth software companies focused on HCM and financial technology. Prior to joining Versapay, Nancy served on the senior leadership teams at PlanSource, Benefitfocus and PeopleMatter.


SFPD can now surveil a private camera network funded by Ripple chair

The San Francisco Board of Supervisors approved a policy that the ACLU and EFF argue will further criminalize marginalized groups.

SFPD will be able to temporarily tap into private surveillance networks in certain circumstances.

Photo: Justin Sullivan/Getty Images

Ripple chairman and co-founder Chris Larsen has been funding a network of security cameras throughout San Francisco for a decade. Now, the city has given its police department the green light to monitor the feeds from those cameras — and any other private surveillance devices in the city — in real time, whether or not a crime has been committed.

This week, San Francisco’s Board of Supervisors approved a controversial plan to allow SFPD to temporarily tap into private surveillance networks during life-threatening emergencies, large events, and in the course of criminal investigations, including investigations of misdemeanors. The decision came despite fervent opposition from groups, including the ACLU of Northern California and the Electronic Frontier Foundation, which say the police department’s new authority will be misused against protesters and marginalized groups in a city that has been a bastion for both.

Keep Reading Show less
Issie Lapowsky

Issie Lapowsky ( @issielapowsky) is Protocol's chief correspondent, covering the intersection of technology, politics, and national affairs. She also oversees Protocol's fellowship program. Previously, she was a senior writer at Wired, where she covered the 2016 election and the Facebook beat in its aftermath. Prior to that, Issie worked as a staff writer for Inc. magazine, writing about small business and entrepreneurship. She has also worked as an on-air contributor for CBS News and taught a graduate-level course at New York University's Center for Publishing on how tech giants have affected publishing.


These two AWS vets think they can finally solve enterprise blockchain

Vendia, founded by Tim Wagner and Shruthi Rao, wants to help companies build real-time, decentralized data applications. Its product allows enterprises to more easily share code and data across clouds, regions, companies, accounts, and technology stacks.

“We have this thesis here: Cloud was always the missing ingredient in blockchain, and Vendia added it in,” Wagner (right) told Protocol of his and Shruthi Rao's company.

Photo: Vendia

The promise of an enterprise blockchain was not lost on CIOs — the idea that a database or an API could keep corporate data consistent with their business partners, be it their upstream supply chains, downstream logistics, or financial partners.

But while it was one of the most anticipated and hyped technologies in recent memory, blockchain also has been one of the most failed technologies in terms of enterprise pilots and implementations, according to Vendia CEO Tim Wagner.

Keep Reading Show less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.


Kraken's CEO got tired of being in finance

Jesse Powell tells Protocol the bureaucratic obligations of running a financial services business contributed to his decision to step back from his role as CEO of one of the world’s largest crypto exchanges.

Photo: David Paul Morris/Bloomberg via Getty Images

Kraken is going through a major leadership change after what has been a tough year for the crypto powerhouse, and for departing CEO Jesse Powell.

The crypto market is still struggling to recover from a major crash, although Kraken appears to have navigated the crisis better than other rivals. Despite his exchange’s apparent success, Powell found himself in the hot seat over allegations published in The New York Times that he made insensitive comments on gender and race that sparked heated conversations within the company.

Keep Reading Show less
Benjamin Pimentel

Benjamin Pimentel ( @benpimentel) covers crypto and fintech from San Francisco. He has reported on many of the biggest tech stories over the past 20 years for the San Francisco Chronicle, Dow Jones MarketWatch and Business Insider, from the dot-com crash, the rise of cloud computing, social networking and AI to the impact of the Great Recession and the COVID crisis on Silicon Valley and beyond. He can be reached at bpimentel@protocol.com or via Google Voice at (925) 307-9342.

Latest Stories