Put a few key words into a tool like Midjourney, Stable Diffusion, or DALL-E and it’s easy to see why the whimsical (and often wacky) images have captured investors’ imagination. An AI-generated artwork even recently won an art competition at the Colorado State Fair, a result that didn’t go over well among more traditional artists. It’s become disruptive enough that this week Getty announced a ban of AI-generated images on its platform, following similar moves by some online art communities.
What looks like an interesting art tool has become a prime feeding ground for investors. Investor interest has been nearly overwhelming for Poly’s Abhay Agarwal, who is building a “DALL-E for design assets” company. “It has literally been like dropping yourself into the Ganga River and fully being bathed in it,” Agarwal said of the interest. He’s already had over 80 meetings with VCs and is only halfway done following YC’s Demo Day.
- The challenge now for investors is finding the business case in AI-generated imagery. Already, some companies like Stitch Fix have been experimenting with the technology, but with mixed success. “I feel quite strongly that these technologies are quite world-changing,” Khosla Ventures partner Kanu Gulati told me. “They’re still early. A lot of their shortcomings are known, but the community is super, super active and trying to resolve them.”
- Perhaps unsurprisingly, the initial startup applications have been around design, marketing and e-commerce, like a company doing AI-generated stock imagery or a startup building AI models for fashion brands so they can skip photoshoots. Gulati has invested in startups like Rosebud, which is doing AI-generated photos and videos (including NFTs), while Khosla Ventures has directly backed research lab OpenAI, the creator of DALL-E. Poly is pitching itself as a way for designers to use AI to generate textures.
- Already looking ahead, Gulati thinks AI imagery will be used with other forms of generative AI-like text, and that’s where more value can be created. “There will be huge industries out there giving Adobe a run for their money because of using these latest technologies,” Gulati said. “And these will be built on a new stack of AI-first companies.”
The hype wave is similar to GPT-3, a generative AI text tool with an API that businesses can build off of. The problem is that investors can easily fall into the trap of thinking the two generative models are the same.
- For generative text, there can be a lower bar for quality and also a lower bar for utility. If the AI makes mistakes, it’s easy to clean up typos. But plenty of people can also write their own mediocre copy if needed, so the value of some tools is diminished if replacing a human with AI doesn’t really save much cost or work.
- The bar for images is much higher, because if an image comes back where something is wrong, then it has to be tossed — you can’t correct it easily. But at the same time, the utility is high because, frankly, most humans can’t create a drawing anywhere near the quality of the output of the AI, Agarwal explained.
- “For text modeling, someone can do a mediocre job of it on their own,” Agarwal said. “With image modeling, you can't. 99.9% of people in this world cannot create a convincing illustration, even given an infinite amount of time.”
Just because it’s magical doesn’t mean it can magic away its shortcomings. As Charlie Warzel pointed out in a smart piece, “What feels like magic is actually incredibly complicated and ethically fraught.”
- The black box algorithms behind much of the programs have already raised serious concerns about copyright and other legal claims as it’s not known what imagery the models were trained off of. Stable Diffusion recently did release its training model, and much of it came from Pinterest imagery and Thomas Kinkade’s art, per Andy Baio’s analysis.
- Already, there’s a lot of bias in the models. Run a search for a startup founder or venture capitalist and it almost always returns a white man as the image. Even a search that included “teacher,” a predominantly female profession, returned images of men. “Bias will continue to be a big challenge, which investors and founders have to solve before these become sustainable enterprises,” Gulati said.
- And with every tool on the internet, what can be used for good can also be used for evil. Stable Diffusion recently open-sourced its technology in a way that could allow people to circumvent safeguards and create pornography, deepfakes, and violent imagery — something tools like DALL-E block. There are websites and Discord forums popping up specifically around AI-generated pornography already, and people posting images of Bernie Sanders in a “Mad Max” deepfake.
Creating a future for generative AI startups won’t be as easy as painting a picture of the opportunity. Founders and investors will have to both take responsibility for understanding the shortcomings of generative AI and solving them. It takes more than “hustling and flipping when you see a quick opportunity to leverage an open-source technology,” said Agarwal. Instead, he argued technologists need to become stewards of the technology and build it for whatever business application is needed. For Poly, that means creating and training its models around textures and design elements so that it can responsibly tailor the model in a way that allows it to build a business. “I don't believe that once a model was released into the open-source public that somehow that means that everybody can jump on that and start using it for whatever use case,” Agarwal said.
A version of this story appeared in Protocol's Pipeline newsletter. Sign up here to get it in your inbox every Saturday.