Reddit Sues Perplexity Over Alleged Data Scraping
I was happy to read the news Wednesday that Reddit (RDDT) sued Perplexity (private) for Web scraping.
Scraping is not legal, yet it is common and has occurred for two decades or more. I can’t think of any company that has been penalized for scraping.
The LLM companies - all of them - scrape the Web. That’s where their training data is sourced. Our Technology news aggregation app - T2D Pulse - does not scrape the Web as I would never steal 3rd party content. Pulse ingests content via RSS feeds, which serve up article headlines, brief summaries, and a thumbnail image. The analytics, search capability, trend analysis and article sharing features are all Pulse’s technology. Should readers wish to read an article, that happens on the publisher’s website, not on Pulse.
The fact that OpenAI, Anthropic, Gemini, Grok, Mistral, Meta and so many other LLM builders leverage 3rd party content for their benefit tells one much about the character of the people leading the LLM companies.
Here’s hoping that RDDT squeezes every last dime out of Perplexity. It won’t happen however. There is too much political will supporting the efforts of OpenAI and the LLM industry. Reddit will not extract value from Perplexity is the most probably outcome.