Researchers push to make bulky AI work in your phone and personal assistant

Chipmakers like Nvidia and researchers from Notre Dame want to make huge transformers like large natural-language-process models speedier, more nimble and more energy efficient.

Phones connected on a gray background.

"We want it smaller and smaller, and it has to be more energy efficient.”

Illustration: Christopher T. Fong/Protocol

Transformer networks, colloquially known to deep-learning practitioners and computer engineers as “transformers,” are all the rage in AI. Over the last few years, these models, known for their massive size, large amount of data inputs, big scale of parameters — and, by extension, high carbon footprint and cost — have grown in favor over other types of neural network architectures.

Some transformers, particularly some open-source, large natural-language-processing transformer models, even have names that are recognizable to people outside AI, such as GPT-3 and BERT. They’re used across audio-, video- and computer-vision-related tasks, drug discovery and more.

Now chipmakers and researchers want to make them speedier and more nimble.

“It’s interesting how fast technology for neural networks changes. Four years ago, everybody was using these recurrent neural networks for these language models and then the attention paper was introduced, and all of a sudden, everybody is using transformers,” said Bill Dally, chief scientist at Nvidia during an AI conference last week held by Stanford’s HAI. Dally was referring to an influential 2017 Google research paper presenting an innovative architecture forming the backbone of transformer networks that is reliant on “attention mechanisms” or “self-attention,” a new way to process the data inputs and outputs of models.

“The world pivoted in a matter of a few months and everything changed,” Dally said. To meet the growing interest in transformer use, in March the AI chip giant introduced its Hopper h100 transformer engine to streamline transformer model workloads.

Designing transformer tech for the edge

But some researchers are pushing for even more. There’s talk not only of making compute- and energy-hungry transformers more efficient, but of eventually upgrading their design so they can process fresh data in edge devices without having to make the round trip to process the data in the cloud.

A group of researchers from Notre Dame and China’s Zhejiang University presented a way to reduce memory-processing bottlenecks and computational and energy consumption requirements in an April paper. The “iMTransformer” approach is a transformer accelerator, which works to decrease memory transfer needs by computing in-memory, and reduces the number of operations required by caching reusable model parameters.

Right now the trend is to bulk up transformers so the models get large enough to take on increasingly complex tasks, said Ana Franchesca Laguna, a computer science and engineering PhD at Notre Dame. When it comes to large natural-language-processing models, she said, “It’s the difference between a sentence or a paragraph and a book.” But, she added, “The bigger the transformers are, your energy footprint also increases.”

Using an accelerator like the iMTransformer could help to pare down that footprint, and, in the future, create transformer models that could ingest, process and learn from new data in edge devices. “Having the model closer to you would be really helpful. You could have it in your phone, for example, so it would be more accessible for edge devices,” she said.

That means IoT devices such as Amazon’s Alexa, Google Home or factory equipment maintenance sensors could process voice or other data in the device rather than having to send it to the cloud, which takes more time and more compute power, and could expose the data to possible privacy breaches, Laguna said.

IBM also introduced an AI accelerator called RAPID last year. “Scaling the performance of AI accelerators across generations is pivotal to their success in commercial deployments,” wrote the company’s researchers in a paper. “The intrinsic error-resilient nature of AI workloads present a unique opportunity for performance/energy improvement through precision scaling.”

Farah Papaioannou, co-founder and president at Edgeworx, said she thinks of the edge as anything outside the cloud. “What we’re seeing of our customers, they’re deploying these AI models you want to train and update on a regular basis, so having the ability to manage that capability and update that on a much faster basis [is definitely important],” she said during a 2020 Protocol event about computing at the edge.

Wanted: custom chips

Laguna uses a work-from-home analogy when thinking of the benefits of processing data for AI models at the edge.

“[Instead of] commuting from your home to the office, you actually work from home. It’s all in the same place, so it saves a lot of energy,” she said. She said she hopes research like hers will enable people to build and use transformers in a more cost- and energy-efficient way. “We want it on our edge devices. We want it smaller and smaller, and it has to be more energy efficient.”

Laguna and the other researchers she worked with tested their accelerator approach using smaller chips, and then extrapolated their findings to estimate how the process would work at a larger scale. However, Laguna said that turning the small-scale project into a reality at a larger scale will require customized, larger chips.

Ultimately, she hopes it spurs investment. A goal of the project, she said, “is to convince people that this is worthy of investing in so we can create chips so we can create these types of networks.”

That investor interest might just be there. AI is spurring increases in investments in chips for specific use cases. According to data from PitchBook, global sales of AI chips rose 60% last year to $35.9 billion compared to 2020. Around half of that total came from specialized AI chips in mobile phones.

Systems designed to operate at the edge with less memory rather than in the cloud could facilitate AI-based applications that can respond to new information in real time, said Jarno Kartela, global head of AI Advisory at consultancy Thoughtworks.

“What if you can build systems that by themselves learn in real time and learn by interaction?” he said. “Those systems, you don’t need to run them on cloud environments only with massive infrastructure — you can run them virtually anywhere.”


Judge Zia Faruqui is trying to teach you crypto, one ‘SNL’ reference at a time

His decisions on major cryptocurrency cases have quoted "The Big Lebowski," "SNL," and "Dr. Strangelove." That’s because he wants you — yes, you — to read them.

The ways Zia Faruqui (right) has weighed on cases that have come before him can give lawyers clues as to what legal frameworks will pass muster.

Photo: Carolyn Van Houten/The Washington Post via Getty Images

“Cryptocurrency and related software analytics tools are ‘The wave of the future, Dude. One hundred percent electronic.’”

That’s not a quote from "The Big Lebowski" — at least, not directly. It’s a quote from a Washington, D.C., district court memorandum opinion on the role cryptocurrency analytics tools can play in government investigations. The author is Magistrate Judge Zia Faruqui.

Keep ReadingShow less
Veronica Irwin

Veronica Irwin (@vronirwin) is a San Francisco-based reporter at Protocol covering fintech. Previously she was at the San Francisco Examiner, covering tech from a hyper-local angle. Before that, her byline was featured in SF Weekly, The Nation, Techworker, Ms. Magazine and The Frisc.

The financial technology transformation is driving competition, creating consumer choice, and shaping the future of finance. Hear from seven fintech leaders who are reshaping the future of finance, and join the inaugural Financial Technology Association Fintech Summit to learn more.

Keep ReadingShow less
The Financial Technology Association (FTA) represents industry leaders shaping the future of finance. We champion the power of technology-centered financial services and advocate for the modernization of financial regulation to support inclusion and responsible innovation.

AWS CEO: The cloud isn’t just about technology

As AWS preps for its annual re:Invent conference, Adam Selipsky talks product strategy, support for hybrid environments, and the value of the cloud in uncertain economic times.

Photo: Noah Berger/Getty Images for Amazon Web Services

AWS is gearing up for re:Invent, its annual cloud computing conference where announcements this year are expected to focus on its end-to-end data strategy and delivering new industry-specific services.

It will be the second re:Invent with CEO Adam Selipsky as leader of the industry’s largest cloud provider after his return last year to AWS from data visualization company Tableau Software.

Keep ReadingShow less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Image: Protocol

We launched Protocol in February 2020 to cover the evolving power center of tech. It is with deep sadness that just under three years later, we are winding down the publication.

As of today, we will not publish any more stories. All of our newsletters, apart from our flagship, Source Code, will no longer be sent. Source Code will be published and sent for the next few weeks, but it will also close down in December.

Keep ReadingShow less
Bennett Richardson

Bennett Richardson ( @bennettrich) is the president of Protocol. Prior to joining Protocol in 2019, Bennett was executive director of global strategic partnerships at POLITICO, where he led strategic growth efforts including POLITICO's European expansion in Brussels and POLITICO's creative agency POLITICO Focus during his six years with the company. Prior to POLITICO, Bennett was co-founder and CMO of Hinge, the mobile dating company recently acquired by Match Group. Bennett began his career in digital and social brand marketing working with major brands across tech, energy, and health care at leading marketing and communications agencies including Edelman and GMMB. Bennett is originally from Portland, Maine, and received his bachelor's degree from Colgate University.


Why large enterprises struggle to find suitable platforms for MLops

As companies expand their use of AI beyond running just a few machine learning models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

As companies expand their use of AI beyond running just a few machine learning models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

Photo: artpartner-images via Getty Images

On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.

Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.

Keep ReadingShow less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories