Hello and welcome to Protocol Enterprise! Today: Meta’s new large language text predictor model, Western Digital to consider a flash-memory split, and AMD earnings surge.
After years of talk, new research from Celonis found that companies are taking sustainability improvements seriously. According to survey results, 87% of companies are automating their supply chains to improve sustainability and 51% are willing to live with lower margins to achieve those goals.
The problem with Meta’s AI transparency charm offensive
Meta finally revealed an algorithm — just not one it uses to power Facebook or Instagram.
Word got out Monday among computer scientists that Meta planned to reveal a new large language model rivaling OpenAI’s GPT-3, the open-source technology that has formed the foundation of chatbots, automated customer service tools and more. “Come on come on open the repo already,” wrote one ML engineer on Twitter, referring to the GitHub repository of code, data and documentation associated with the new model.
On Tuesday Meta did unveil the codebase, development process logbook, data, research paper and other information associated with Open Pretrained Transformer, or OPT-175B, its new 175-billion-parameter open-source large language model.
- The company called the effort an exercise in transparency that is part of its commitment to open science.
- Referring to GPT-3, Joelle Pineau, managing director of Meta AI, told Protocol, “Of course others have come before us in terms of training large language models, and in some cases have provided an API to run inference. But the code and trained parameters for these models have not been released to the broader research community.”
- “With the release of OPT-175B, we are opening up direct access to the large scale models to this community for the first time, so that scientific discourse on LLMs can be conducted on reproducible results,” she said.
As of this morning, a Facebook Research repository on GitHub was available to developers, loaded with code files and other documentation.
- In keeping with emerging approaches to AI model transparency, Meta researchers included a “model card” — a concept popularized by former Google engineer Timnit Gebru — to explain details of the datasets used to train the OPT-175B model.
- The Meta team used a combination of datasets including one featuring text from thousands of unpublished books and data gleaned from years of crawling the web.
- Pineau said no Facebook or Instagram user data was employed to train the model.
- “Meta did not use any Meta user data or proprietary data to train OPT-175B, as our goal was to be able to publicly release the models and documentation to the AI research community as part of our commitment to accessible, reproducible and transparent science,” she said.
Training large language models requires massive amounts of compute, sucking up huge amounts of energy. Meta addressed the climate impact downsides of natural language processing AI.
- In its OPT-175B paper, the company said its model was developed with an estimated carbon emissions footprint of 75 tons. The researchers compared that to the carbon footprint created when training other large language models including GPT-3 (500 tons) and Gopher (380 tons).
- “We recognize though that recent developments in AI research have consumed an extraordinary amount of compute power,” Pineau told Protocol. “While industry labs have started to report the carbon footprint of these models, most do not include the computational cost associated with the R&D phases of experimentation, which in some cases can be an order of magnitude more resource-intensive than training the final model.”
- She added, “By sharing our models, we are aiming to reduce the collective carbon footprint of the field when pursuing research at this scale — otherwise studying these models will require repeated efforts to reproduce, amplifying the compute costs even further.”
- Hardware hiccups may have contributed to wasted energy in training the model. In its paper, researchers wrote, “We faced a significant number of hardware failures in our compute cluster while training OPT-175B. In total, hardware failures contributed to at least 35 manual restarts and the cycling of over 100 hosts over the course of 2 months.”
But there’s an elephant in the room as big as a data-hungry large language model, despite Meta’s transparency charm offensive.
- Meta is under intense pressure to reveal details of the algorithmic systems it uses to decide what Facebook or Instagram posts are amplified or suppressed, which ads get kicked out of the platform or which posts get caught up in moderation censor nets.
- But the OPT-175B transparency initiative does not provide any more information about the AI models that govern how two of the most influential social media platforms on the planet were built or operate.
- Indeed, OPT-175B is not used by the company in its social platforms. “Presently, OPT-175B is only being used internally as a tool for research purposes,” said Pineau.
- “The level of transparency that we are providing with this release, including the release of our logbook and notes, really speaks to our commitment to accessible, reproducible and transparent science,” Pineau said.
While Facebook critics and lawmakers demanding more transparency from Meta may not see Tuesday’s language model reveal as true openness, computer scientists had a different perspective.
— Kate Kaye (email
- Awni Hannun, a scientist at Zoom AI, seemed surprised by Meta’s acknowledgement of hardware failures.
- He tweeted, “Meta's OPT 175B is a nice ‘behind the scenes’ take on training LLMs. Instability in both hardware and training is a big challenge.”
A MESSAGE FROM PENDO
Our workplace has changed in many ways. Most work now happens inside technology, hybrid work arrangements appear here to stay, and organizations are trying to keep up. Join us NEXT WEEK May 10 at Guide: The Digital Adoption Summit to learn how your org can adapt to the digital workplace.
Storage is stronger apart?
Spinning hard drives and flash storage chips are technologies that have little, if anything, to do with one another — beyond that they both help servers store bits. Still, Western Digital makes flash and hard disks under one roof, and a letter sent to the company Tuesday by Elliott Investment Management may change that.
The activist investor has asked the board to break the business into its constituent parts, a move that would effectively unwind the SanDisk acquisition WD made to get into the flash business in the first place.
The reasoning goes something like this: The promise of the SanDisk deal has not borne significant fruit. There are no benefits to attempting to operate two units that have very little to do with one another, in terms of tech but also when selling both products to potential customers. The flash and hard disk businesses would benefit from being standalone companies, Elliot said.
Western Digital said that it will carefully consider Elliott’s plan.
— Max A. Cherney (email | twitter)
Around the enterprise
AMD’s first-quarter earnings came in well above Wall Street estimates, thanks to a rebound in the PC market and the gains it continues to make against Intel in the data center.
SAP hired a banking adviser in hopes of selling its Litmos learning software division for as much as $1 billion, according to Reuters.Intel acquired Siru, a graphics chip design company
that could help it build “emerging accelerated compute solutions, in the areas of blockchain, metaverse, high performance edge compute and hyperscale,” which, sure.
A MESSAGE FROM PENDO
What makes it hard to manage a complex IT portfolio? How can IT take the lead on driving software adoption? What role should cross-departmental partners play in their strategy? You’ll get the answers to these questions and more from leaders at Asana, Linksys, and ELF Beauty during our CIO panel at Guide: The Digital Adoption Summit. Join us NEXT WEEK on May 10.
Thanks for reading — see you tomorrow!