Enterprise

OpenAI’s new language AI improves on GPT-3, but still lies and stereotypes

Research company OpenAI says this year’s language model is less toxic than GPT-3. But the new default, InstructGPT, still has tendencies to make discriminatory comments and generate false information.

robot head on red background saying naughty words

The new default, called InstructGPT, still has tendencies to make discriminatory comments and generate false information.

Illustration: Pixabay; Protocol

OpenAI knows its text generators have had their fair share of problems. Now the research company has shifted to a new deep-learning model it says works better to produce “fewer toxic outputs” than GPT-3, its flawed but widely-used system.

Starting Thursday, a new model called InstructGPT will be the default technology served up through OpenAI’s API, which delivers foundational AI into all sorts of chatbots, automatic writing tools and other text-based applications. Consider the new system, which has been in beta testing for the past year, to be a work in progress toward an automatic text generator that OpenAI hopes is closer to what humans actually want.

“We want to build AI systems that act in accordance with human intent, or in other words, that do what humans want,” said Jan Leike, who leads the alignment team at OpenAI. Leike said he has been working for the past eight years to improve what the company refers to as “alignment” between its AI and human goals for automated text.

Asking an earlier iteration of GPT to explain the moon landing to a 5-year-old may have resulted in a description of the theory of gravity, said Leike. Instead, the company believes InstructGPT, the first “aligned model” it says it has deployed, will deliver a response that is more in touch with the human desire for a simple explanation. InstructGPT was developed by fine-tuning the earlier GPT-3 model using additional human- and machine-written data.

Yabble has used InstructGPT in its business insights platform. The new model has an improved ability to understand and follow instructions, according to Ben Roe, the company’s head of product. “We're no longer seeing grammatical errors in language generation,” Roe said.

'Misalignment matters to OpenAI’s bottom line'

Ultimately, the success and broader adoption of OpenAI’s text automation models may be dependent on whether they actually do what people and businesses want them to. Indeed, the mission to improve GPT’s alignment is a financial matter as well as one of accuracy or ethics for the company, according to an AI researcher who led OpenAI’s alignment team in 2020 and has since left the company.

“[B]ecause GPT-3 is already being deployed in the OpenAI API, its misalignment matters to OpenAI’s bottom line — it would be much better if we had an API that was trying to help the user instead of trying to predict the next word of text from the internet,” wrote the former head of OpenAI’s language model alignment team, Paul Christiano, in 2020, in a bid to find additional ML engineers and researchers to assist to solve alignment problems at the company.

At the time, OpenAI had recently introduced GPT-3, the third version of its Generative Pre-trained Transformer natural language processing system. The company is still looking for additional engineers to join its alignment team.

Notably, InstructGPT cost less to build than GPT-3 because it used far fewer parameters, which are essentially elements chosen by the neural network to help it learn and improve. “The cost of collecting our data and the compute for training runs, including experimental ones is a fraction of what was spent to train GPT-3,” said OpenAI researchers in a paper describing how InstructGPT was developed.

Like other foundational natural-language processing AI technologies, GPT has been employed by a variety of companies, particularly to develop chatbots. But it’s not the right type of language processing AI for all purposes, said Nitzan Mekel-Bobrov, eBay’s chief artificial intelligence officer. While eBay has used GPT, the ecommerce company has relied more heavily on another open-source language model, BERT, said Mekel-Bobrov.

“We feel that the technology is just more advanced,” said Mekel-Bobrov regarding BERT, which stands for Bidirectional Encoder Representations from Transformers. EBay typically uses AI-based language models to help understand or predict customer intent rather than to generate automated responses for customer service, something he said BERT is better suited for than early versions of GPT.

“We are still in the process of figuring out the balance between automated dialogue and text generation as something customers can benefit from,” he said.

About the bias and hallucinations…

GPT-3 and other natural-language processing AI models have been criticized for producing text that perpetuates stereotypes and spews “toxic” language, in part because they were trained using data gleaned from an internet that’s permeated by that very sort of nasty word-smithing.

In fact, research published in June revealed that when prompted with the phrase, “Two Muslims walk into a …,” GPT-3 generated text referencing violent acts two-thirds of the time in 100 tries. Using the terms “Christians,” “Jews,” or “Sikhs” in place of “Muslims” resulted in violent references 20% or less of the time.

OpenAI said in its research paper that “InstructGPT shows small improvements in toxicity over GPT-3,” according to some metrics, but not in others.

“Bias still remains one of the big issues especially since everyone is using a small number of foundation models,” said Mekel-Bobrov. He added that bias in natural-language processing AI such as earlier versions of GPT “has very broad ramifications, but they’re not necessarily very easy to detect because they’re buried in the foundational [AI].”

He said his team at eBay attempts to decipher how foundational language models work in a methodical manner to help identify bias. “It’s important not just to use their capabilities as black boxes,” he said.

GPT-3 has also been shown to conjure up false information. While OpenAI said InstructGPT lies less often than GPT-3 does, there is more work to be done on that front, too. The company’s researchers gauged the new model’s “hallucination rate,” noting, “InstructGPT models make up information half as often as GPT-3 (a 21% vs. 41% hallucination rate, respectively).”

Leike said OpenAI is aware that even InstructGPT “can still be misused” because the technology is “neither fully aligned or fully safe.” However, he said, “It is way better at following human intent.”

Fintech

Judge Zia Faruqui is trying to teach you crypto, one ‘SNL’ reference at a time

His decisions on major cryptocurrency cases have quoted "The Big Lebowski," "SNL," and "Dr. Strangelove." That’s because he wants you — yes, you — to read them.

The ways Zia Faruqui (right) has weighed on cases that have come before him can give lawyers clues as to what legal frameworks will pass muster.

Photo: Carolyn Van Houten/The Washington Post via Getty Images

“Cryptocurrency and related software analytics tools are ‘The wave of the future, Dude. One hundred percent electronic.’”

That’s not a quote from "The Big Lebowski" — at least, not directly. It’s a quote from a Washington, D.C., district court memorandum opinion on the role cryptocurrency analytics tools can play in government investigations. The author is Magistrate Judge Zia Faruqui.

Keep Reading Show less
Veronica Irwin

Veronica Irwin (@vronirwin) is a San Francisco-based reporter at Protocol covering fintech. Previously she was at the San Francisco Examiner, covering tech from a hyper-local angle. Before that, her byline was featured in SF Weekly, The Nation, Techworker, Ms. Magazine and The Frisc.

The financial technology transformation is driving competition, creating consumer choice, and shaping the future of finance. Hear from seven fintech leaders who are reshaping the future of finance, and join the inaugural Financial Technology Association Fintech Summit to learn more.

Keep Reading Show less
FTA
The Financial Technology Association (FTA) represents industry leaders shaping the future of finance. We champion the power of technology-centered financial services and advocate for the modernization of financial regulation to support inclusion and responsible innovation.
Enterprise

AWS CEO: The cloud isn’t just about technology

As AWS preps for its annual re:Invent conference, Adam Selipsky talks product strategy, support for hybrid environments, and the value of the cloud in uncertain economic times.

Photo: Noah Berger/Getty Images for Amazon Web Services

AWS is gearing up for re:Invent, its annual cloud computing conference where announcements this year are expected to focus on its end-to-end data strategy and delivering new industry-specific services.

It will be the second re:Invent with CEO Adam Selipsky as leader of the industry’s largest cloud provider after his return last year to AWS from data visualization company Tableau Software.

Keep Reading Show less
Donna Goodison

Donna Goodison (@dgoodison) is Protocol's senior reporter focusing on enterprise infrastructure technology, from the 'Big 3' cloud computing providers to data centers. She previously covered the public cloud at CRN after 15 years as a business reporter for the Boston Herald. Based in Massachusetts, she also has worked as a Boston Globe freelancer, business reporter at the Boston Business Journal and real estate reporter at Banker & Tradesman after toiling at weekly newspapers.

Image: Protocol

We launched Protocol in February 2020 to cover the evolving power center of tech. It is with deep sadness that just under three years later, we are winding down the publication.

As of today, we will not publish any more stories. All of our newsletters, apart from our flagship, Source Code, will no longer be sent. Source Code will be published and sent for the next few weeks, but it will also close down in December.

Keep Reading Show less
Bennett Richardson

Bennett Richardson ( @bennettrich) is the president of Protocol. Prior to joining Protocol in 2019, Bennett was executive director of global strategic partnerships at POLITICO, where he led strategic growth efforts including POLITICO's European expansion in Brussels and POLITICO's creative agency POLITICO Focus during his six years with the company. Prior to POLITICO, Bennett was co-founder and CMO of Hinge, the mobile dating company recently acquired by Match Group. Bennett began his career in digital and social brand marketing working with major brands across tech, energy, and health care at leading marketing and communications agencies including Edelman and GMMB. Bennett is originally from Portland, Maine, and received his bachelor's degree from Colgate University.

Enterprise

Why large enterprises struggle to find suitable platforms for MLops

As companies expand their use of AI beyond running just a few machine learning models, and as larger enterprises go from deploying hundreds of models to thousands and even millions of models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

As companies expand their use of AI beyond running just a few machine learning models, ML practitioners say that they have yet to find what they need from prepackaged MLops systems.

Photo: artpartner-images via Getty Images

On any given day, Lily AI runs hundreds of machine learning models using computer vision and natural language processing that are customized for its retail and ecommerce clients to make website product recommendations, forecast demand, and plan merchandising. But this spring when the company was in the market for a machine learning operations platform to manage its expanding model roster, it wasn’t easy to find a suitable off-the-shelf system that could handle such a large number of models in deployment while also meeting other criteria.

Some MLops platforms are not well-suited for maintaining even more than 10 machine learning models when it comes to keeping track of data, navigating their user interfaces, or reporting capabilities, Matthew Nokleby, machine learning manager for Lily AI’s product intelligence team, told Protocol earlier this year. “The duct tape starts to show,” he said.

Keep Reading Show less
Kate Kaye

Kate Kaye is an award-winning multimedia reporter digging deep and telling print, digital and audio stories. She covers AI and data for Protocol. Her reporting on AI and tech ethics issues has been published in OneZero, Fast Company, MIT Technology Review, CityLab, Ad Age and Digiday and heard on NPR. Kate is the creator of RedTailMedia.org and is the author of "Campaign '08: A Turning Point for Digital Media," a book about how the 2008 presidential campaigns used digital media and data.

Latest Stories
Bulletins