Connected cell phones
Illustration: Christopher T. Fong/Protocol

Shrinking AI down to your phone

Protocol Enterprise

Hello and welcome to Protocol Enterprise! Today: how researchers are trying to put popular-but-large AI transformers into smaller packages, how Wells Fargo divvied up its multicloud strategy, and the latest moves in enterprise tech.

Spin up

Crime waves come and go but banks and other financial institutions have always, and will always, be a bigger target than most companies. According to new research from VMware, 63% of financial institutions saw an increase in cyberattacks compared to the previous year, and 74% of them experienced at least one ransomware attack.

More than meets the eye

Transformer networks, colloquially known to deep-learning practitioners and computer engineers as “transformers,” are all the rage in AI. Over the last few years, these models, known for their massive size, large amount of data inputs, big scale of parameters — and, by extension, high carbon footprint and cost — have grown in favor over other types of neural network architectures.

Now chipmakers and researchers want to make them speedier and more nimble.

  • “It’s interesting how fast technology for neural networks changes. Four years ago, everybody was using these recurrent neural networks for these language models and then the attention paper was introduced, and all of a sudden, everybody is using transformers,” said Bill Dally, chief scientist at Nvidia, during an AI conference last week held by Stanford’s HAI.
  • Dally was referring to an influential 2017 Google research paper presenting an innovative architecture forming the backbone of transformer networks that is reliant on “attention mechanisms” or “self-attention,” a new way to process the data inputs and outputs of models.
  • “The world pivoted in a matter of a few months and everything changed,” Dally said.

But some researchers are pushing for even more. There’s talk not only of making compute- and energy-hungry transformers more efficient, but of eventually upgrading their design so they can process fresh data in edge devices without having to make the round trip to process the data in the cloud.

  • A group of researchers from Notre Dame and China’s Zhejiang University presented a way to reduce memory-processing bottlenecks and computational and energy consumption requirements in an April paper.
  • The “iMTransformer” approach is a transformer accelerator, which works to decrease memory transfer needs by computing in-memory, and reduces the number of operations required by caching reusable model parameters.
  • Right now the trend is to bulk up transformers so the models get large enough to take on increasingly complex tasks, said Ana Franchesca Laguna, a computer science and engineering PhD at Notre Dame.
  • When it comes to large natural-language-processing models, she said, “It’s the difference between a sentence or a paragraph and a book.” But, she added, “The bigger the transformers are, your energy footprint also increases.”

Using an accelerator like the iMTransformer could help to pare down that footprint, and, in the future, create transformer models that could ingest, process and learn from new data in edge devices.

  • “Having the model closer to you would be really helpful. You could have it in your phone, for example, so it would be more accessible for edge devices,” Laguna said.
  • That means IoT devices such as Amazon’s Alexa, Google Home or factory equipment maintenance sensors could process voice or other data in the device rather than having to send it to the cloud, which takes more time and more compute power, and could expose the data to possible privacy breaches, she said.
  • IBM also introduced an AI accelerator called RAPID last year.
  • “Scaling the performance of AI accelerators across generations is pivotal to their success in commercial deployments,” wrote the company’s researchers in a paper. “The intrinsic error-resilient nature of AI workloads present a unique opportunity for performance/energy improvement through precision scaling.”

Laguna uses a work-from-home analogy when thinking of the benefits of processing data for AI models at the edge.

  • “[Instead of] commuting from your home to the office, you actually work from home. It’s all in the same place, so it saves a lot of energy,” she said.
  • Laguna and the other researchers she worked with tested their accelerator approach using smaller chips, and then extrapolated their findings to estimate how the process would work at a larger scale.
  • However, turning the small-scale project into a reality at a larger scale will require customized, larger chips.

That investor interest might just be there. AI is spurring increases in investments in chips for specific use cases. According to data from PitchBook, global sales of AI chips rose 60% last year to $35.9 billion compared to 2020. Around half of that total came from specialized AI chips in mobile phones.

  • Systems designed to operate at the edge with less memory rather than in the cloud could facilitate AI-based applications that can respond to new information in real time, said Jarno Kartela, global head of AI Advisory at consultancy Thoughtworks.
  • “What if you can build systems that by themselves learn in real time and learn by interaction?” he said. “Those systems, you don’t need to run them on cloud environments only with massive infrastructure — you can run them virtually anywhere.”
— Kate Kaye (email| twitter)


In a complex technological environment, when a business needs to pivot quickly in reaction to external forces, the “as-a-service” model of delivery for IT hardware, software and services offers companies of all sizes the ultimate flexibility to stay competitive with a scalable, cloud-like consumption model and predictable payment options for hardware and service inclusions.

Learn more

Wells Fargo likes Microsoft Azure, except for the data part

As multicloud strategies continue to evolve, understanding which cloud customers are picking for different workloads starts to become very interesting.

Wells Fargo plans to use Microsoft Azure for “the bulk” of the cloud part of its hybrid cloud strategy, which it hopes will save the company $1 billion over the next ten years, according to a Business Insider interview with CIO Chintan Mehta published Thursday. However, it will put its “advanced workloads” — specifically, data and AI — on Google Cloud.

While Microsoft will enjoy a decent windfall from scoring a big customer such as Wells Fargo, data and AI workloads are among the more profitable sectors of cloud computing because they are so compute-intensive. And once a company puts its mission-critical data into a particular cloud, it’s unlikely to move that data for a very long time given the effort involved.

Google Cloud has skated on the strength of its data and AI tools, especially BigQuery, for years as it has tried to challenge AWS and Microsoft for cloud business. If a new generation of cloud converts finds running apps across different clouds works for them, cloud vendors might have some decisions to make about how and where they plan to differentiate themselves now that the basic ideas behind cloud computing are widely accepted.

— Tom Krazit (email | twitter)

Enterprise moves

Over the past week, leaders from the Pentagon and U.K.’s national health service left to join tech companies, customer data platform company mParticle added new executives and more.

Preston Dunlap resigned as chief architect officer at the Department of Defenseto start a software company focused on satellites, data and AI.

Indra Joshi joined Palantir from her role as director of AI for Britain’s National Health Service.

Ed Ellett joined Dell as SVP of the Client Systems group. Ellet was formerly a VP in HPE’s photonic business and an SVP at Nvidia.

Bonney Pelley joined mParticle as COO. Pelly was previously SVP of Strategy and Operations at New Relic.

Barbie Brewer joined mParticle as chief people officer. Brewer previously held HR leadership roles at ClickUp, GitLab and Netflix.

Jigar Desai joined Sisu as SVP of Engineering. Desai formerly held senior engineering roles at Facebook, PayPal and eBay.

Sara Andrews joined Marvell’s board of directors. Andrews is also chief information security officer at Experian, and previously held security roles at Pepsi and Verizon.

Suresh Pandian is now SVP of Engineering at Celigo. Pandian was formerly a VP of Product Development at Informatica.

— Aisha Counts (email | twitter)

Around the enterprise

SolarWinds launched a new tool for monitoring and observability of hybrid cloud environments, with the goal of helping customers improve security.

As Kubernetes is used more widely, there’s been a corresponding increase in attacks on the container-orchestration software, according to TechRepublic.


Lenovo’s broad portfolio of end-to-end solutions provide organizations with the breadth and depth of services that empower CIOs to leverage new IT to achieve their strategic outcomes. Organizations also have the flexibility to scale and invest in new technology solutions as they need them.

Learn more

Thanks for reading — see you tomorrow!

Recent Issues