Tech

Reddit to update web standard to block automated website scraping

By Jessy Falconposts

Posted on June 26, 2024

What you need to know:

N) announced on Tuesday that it will revise a web standard utilized by the platform to prevent automated data scraping, responding to concerns over AI startups circumventing these measures to collect content for their systems.
Txt has become essential for publishers seeking to prevent tech companies from using their content without authorization for training AI algorithms and creating summaries for specific search queries.

Reddit (RDDT.N) announced on Tuesday that it will revise a web standard utilized by the platform to prevent automated data scraping, responding to concerns over AI startups circumventing these measures to collect content for their systems. This update coincides with increasing allegations against artificial intelligence companies for plagiarizing publisher content to produce AI-generated summaries without appropriate attribution or consent.

Reddit announced plans to update the Robots Exclusion Protocol, commonly known as “robots.txt,” which specifies which sections of a website can be crawled. Additionally, the company stated it would continue to employ rate-limiting measures to manage the volume of requests from individual sources. Furthermore, Reddit intends to block unidentified bots and crawlers from scraping data, which involves gathering and storing raw information from its platform.

In recent times, robots.txt has become essential for publishers seeking to prevent tech companies from using their content without authorization for training AI algorithms and creating summaries for specific search queries. Last week, content licensing startup TollBit notified publishers that several AI firms were circumventing this protocol to scrape content from their websites. This follows a Wired investigation that uncovered AI search startup Perplexity’s apparent ability to circumvent robots.txt restrictions on its web crawler.

In early June, business media publisher Forbes accused Perplexity of plagiarizing its investigative stories to fuel generative AI systems without proper attribution. Reddit announced on Tuesday that researchers and organizations like the Internet Archive will retain access to its content for non-commercial purposes.

Post Views: 4

Do you have a story or an opinion to share? Email us on: info@falconposts.com Or follow the Falconposts on X Platform or WhatsApp for the latest updates.

Related Items:Falconposts, falconpostsuganda, Featured, latestnews, Tech, trending

Click to comment

Falconposts

Leave a Reply

Latest Posts

Google to showcase AI, new Pixel phones at surprise Aug 13 event

Bamasaba Cultural Icon Mrs. Catherine Hanyiga Passes Away

53-Year-Old Man Found Dead in Pallisa Police Custody After Domestic Dispute

Uganda Cranes Legend Baker Kasigwa Passes Away at 91

Kitara FC Thrashes Express FC 7-0 in Historic Victory

Paris to Honour Slain Ugandan Olympic Runner Rebecca Cheptegei with Dedicated Sports Venue

Falconposts on Facebook

About Falconposts