Data Legality Guidance

Am I allowed to use Web-scraped data?

WarningWe don’t provide legal counselling

This site does not provide legal guidance. The information below is provided for discussion and as a suggestion only. Authors should consult with a qualified party, such as a university counsel or a lawyer, as appropriate.

How Could I use data Illegally?

Quite simply put, if a data provider distributes data under certain conditions specified in license agreement, as pointed out in our dedicated section on Licensing, and an author violates those conditions, then use of the data is illegal. Therefore, in cases where an explicit license exists, checking whether data usage is legally sound is often straightforward. Most offical or public data providers (U. Michigan’s PSID or U. Minnesota’s IPUMS etc) publish data under some kind of license that explicitly allows researchers to use the data in certain ways. The researcher usually cannot access the data without agreeing to the license.

The more complicated cases arise if an entity’s primary purpose is not that of a providing data for research purposes as in the case of the above mentioned institutions, but where the distribution of data is a side product of some other activity. Online marketplaces like LinkedIn, ebay or Airbnb come to mind as one example, where the actual business case may be the enabling of transactions, but where along the way data is produced, which may be accessible on the web for rearchers.

Tip“No license” does not mean “no conditions”

The absence of an explicit license on a website does not imply that all forms of usage of collected data are allowed. In fact, copyright law usually applies, meaning “All rights reserved”.

Scraping Data from Websites

Unfortunately, the legal situation surrounding the legality of collecting data from websites is complex. There is variation across jurisdictions as to which kind - if any - of webscraping activity is considered legal. What is more, it is not even straightforward to establish which country’s law to apply in cross-border web scraping activity.

EU Law

The situation in the EU is evolving as well. Precedent in some countries allows web scraping (Denmark 2006), while in others (CNIL France 2020) the ownership of data which is publicly accessible on the web belongs still with to the individual which generated the data. Hence, not all potential uses of this data may be allowed.

Footnotes

  1. Note that there is no blanket approval, and all such exceptions must be discussed with a journal editor, who will balance benefits to society with the risk associated with illegal usage.↩︎