X, formerly known as Twitter, has updated its terms of service, effective September 29, 2023, to explicitly forbid data scraping and crawling without prior written consent. This change comes shortly after an update to its Privacy Policy, where X announced its intention to collect biometric data and professional education and employment history.
Previously, crawling was allowed if it followed the rules in the robots.txt file, a guide for web crawlers on which parts of a website they can visit. However, the new terms require explicit written consent for any scraping or crawling activities.
But what’s the difference between crawling and scraping? Crawling collects web pages to create data indices, while scraping downloads web pages to extract specific data, like product details or pricing information.
To clarify, web scraping extracts publicly available data from websites and saves it locally, while web crawling discovers URLs and links to create data indices. Data scraping is an efficient way to gather web data and doesn’t need an internet connection.
Alongside these terms, X has updated its robots.txt file, affecting web crawlers, including Google’s. These updates limit access to specific data types, such as likes, retweets on particular posts, and account-related information.
These changes come in response to X’s recent platform adjustments, which temporarily restricted posts for logged-out users and removed the login requirement for tweet access. Elon Musk, X’s CEO, cited the need for these measures due to excessive data scraping, affecting regular users.
Musk has previously voiced opposition to companies using Twitter/X data for training AI models, even issuing legal threats against Microsoft for its alleged misuse of platform data for AI training. In July, Musk took legal action against unidentified defendants involved in unauthorized data collection.
The impact of these measures on data accessibility and X’s relationship with web crawlers, including tech giants like Google, remains uncertain.