OpenAI inks deal to train AI on Reddit data

5 Min Read

OpenAI has reached a deal with Reddit to make use of the social information web site’s knowledge for coaching AI fashions.

In a blog post on OpenAI’s press relations web site, the corporate mentioned that the Reddit partnership will present it entry to “real-time, structured and distinctive content material” — e.g. posts and replies — from Reddit, permitting its instruments and fashions to “higher perceive and showcase” that content material. Reddit content material might be included into ChatGPT, OpenAI’s well-liked conversational AI, and the businesses will work collectively to convey unspecified new “AI-powered options” to each Reddit customers and moderators.

OpenAI can even change into a Reddit promoting accomplice.

“Reddit might be constructing on OpenAI’s platform of AI fashions to convey its highly effective imaginative and prescient to life,” OpenAI wrote within the put up. “Utilizing LLMs, ML, and AI enable Reddit to enhance the person expertise for everybody.”

OpenAI has a number of comparable licensing offers with content material suppliers starting from inventory media libraries to information publishers. However the uncommon angle to this one is that Sam Altman, OpenAI’s CEO, has an 8.7% stake in Reddit, making him the third-largest shareholder, and was as soon as a member of the corporate’s board of administrators.

In an try and discourage scrutiny, OpenAI says in its press launch that, whereas Altman stays a Reddit shareholder, the partnership “was led by OpenAI’s COO [Brad Lightcap]” and “accepted by [OpenAI’s] impartial board of administrators.” (I’ll notice right here that Altman is a member of OpenAI’s board; he recused himself for this determination, nonetheless, an OpenAI spokesperson tells TechCrunch.)

See also  OLMo: Enhancing the Science of Language Models

Reddit has made knowledge licensing agreements an more and more central a part of its development technique because it navigates the market as a public firm.

In its IPO prospectus, Reddit revealed that it has contractual agreements to license its knowledge to customers including Google value a mixed over $200 million. And, in its first earnings report as a public firm, Reddit reported a 450% year-over-year improve in non-ad income, attributable primarily to these agreements.

Reddit inventory was up 11% in prolonged buying and selling following the announcement of the OpenAI deal.

“The paradox I see is that, as extra content material on the web is written by machines, there’s an rising premium on content material that comes from actual folks,” Reddit CEO Steve Huffman mentioned throughout the firm’s earnings name in March. “And we’ve got almost twenty years of genuine dialog.”

Reddit’s platform — which has over 1 billion posts and greater than 16 billion feedback, figures that develop each day because of its a whole lot of hundreds of thousands of energetic customers — is a gold mine for generative AI corporations, whose fashions be taught from examples of content material, like textual content and pictures, to generate new, comparable content material.

However the firm may face pushback from customers involved about the way it’s monetizing their knowledge.

It’s instructive to have a look at Stack Overflow, the Q&A discussion board for software program builders, which not too long ago inked an settlement with OpenAI to produce knowledge for the latter’s mannequin coaching. In protest, some customers deleted their top-rated solutions to questions on the neighborhood. However Stack Overflow restored the deleted posts and banned these customers, claiming that they weren’t in compliance with its phrases of service.

See also  How To Train ChatGPT On Your Data & Build Custom AI Chatbot

Reddit has already voiced its displeasure with one try and afford Reddit customers larger management over their very own knowledge.

Vana, a startup constructed on the blockchain, is making an attempt to launch an information “DAO” (Digital Autonomous Group) to let Reddit customers pool their knowledge and allow them to determine collectively how that mixed knowledge’s used (or bought). Reddit banned Vana’s subreddit devoted to dialogue in regards to the DAO, in a press release to TechCrunch, and accused the corporate of “exploiting” its knowledge export controls.

We’re launching an AI publication! Join right here to start out receiving it in your inboxes on June 5.

Source link

TAGGED: , , , , ,
Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter CoinGecko Free Api Key to get this plugin works.