Web scraping public data sources is likely not a violation of anti-hacking law
A recent decision by the Ninth Circuit Court of Appeals in the case HiQ Labs v. LinkedIn holds that scraping a public website is likely not a violation of the Computer Fraud and Abuse Act (CFAA). The court’s decision is not a final one, but serves as a primary injunction: HiQ is seeking to preserve its access to LinkedIn’s public data while the lawsuit is pending, so the court is issuing its preliminary prediction of the likely outcome of the lawsuit.
HiQ is a data analytics company that scrapes public LinkedIn profile data for its talent management services. LinkedIn sent a cease and desist letter to HiQ, requesting that HiQ stop collecting public profile data from its site. HiQ refused to comply with LinkedIn’s request. To ensure that LinkedIn would not take any measures to disrupt its web scraping, HiQ sued LinkedIn to prove that its scraping activities are legal. So far, the court seems to be siding with HiQ, suggesting that harvesting publicly available data does not constitute hacking.
Designed to ensure that computer hacking crimes did not go unpunished, the CFAA is a federal cybersecurity bill enacted in 1986 that prohibits accessing a computer without authorization. The CFAA specifically prohibits circumventing a computer’s access permission, but what constitutes authorization is a difficult and complex question. Does authorization depend on a computer’s architecture or design (e.g. log in requirements) or what a computer owner wants (e.g. a cease and desist letter)? In the case of HiQ Labs v. LinkedIn, the lack of log in requirements to access profile data trumps LinkedIn’s cease and desist letter in determining whether or not HiQ has authorization to access data.
Court interpretations of the CFAA has zigzagged over time between an open internet and a closed one. A recent case Facebook v. Power Ventures gave more power to cease and desist letters as a form of authorization control. Power Ventures, however, accessed the data of logged in Facebook users, so the company did not have proper authorization and needed to comply with the cease and desist letter.
Web scraping has long been a murky subject for companies and engineering teams. For teams that navigate private and public data sources, data ownership remains difficult to define. While many companies hope to protect the data they have collected on their platforms, others are looking to leverage that data to build new platforms. With greater freedom for web scraping, developers will have access to many new and legal data sources, but may also need to take steps to protect their own public data.
Want to get more of these in your inbox?
Subscribe for weekly updates from the Software team.