Suggested Information for Data and Code Hosting

Trusted Repositories

Journals and institutions have assessed a number of trusted repositories:

List of Additional Acceptable Trusted Repositories in Economics

A list of trusted repositories that have been found to be acceptable for the purpose of archiving social and economic data can be found here:

https://social-science-data-editors.github.io/reference/TrustedRepositories.html

The list is maintained by the editors collaborating on this site. To suggest an addition, please issue a pull request, or email one of the editors.

Permanent Identifiers: Digital Object Identifiers (DOI) et al

A sufficient, but not necessary criterion for a “trusted repository” is the assignment of permanent identifiers, such as Digital Object Identifiers (DOI).

https://doi.org/10.3886/ICPSR30261.v6

Some repositories (often university-based) ones will also assign handles:

https://hdl.handle.net/1813/45789

Others assign DOI upon demand. We generally suggest requesting a DOI if possible. Examples:

However, care must be taken when using permanent identifiers: the URL in the address bar is (almost) never the same as the DOI or handle. All permanent identifiers are redirects: they constitute a permanent entry that points to wherever the most recent version of the object can be found:

Only the first entry in each of the examples above should be used for citing, not the second.

NOT ACCEPTABLE

A variety of (unfortunately) commonly used web-accessible locations are not acceptable as data repositories for the purpose of an article’s supplementary materials:

  • Github, Gitlab, etc. because a project’s owner can delete a git repository at any time (but see this page on how to leverage Zenodo to enable proper archiving of code and software) (see also questions in the FAQ);
  • Google pages, university and personal faculty web pages - they can all be deleted by the owner or by the employer (the university) without regards to archival characteristics of its contents (but talk to your university library - they may have a way to facilitate archiving of web pages - and investigate the Wayback Machine for a similar purpose);
  • Dropbox, Box.com, and similar cloud-based data and file sharing services - again, they can all be deleted at short notice, or when payment stops

Some good examples

“Immigration Restrictions as Active Labor Market Policy: Evidence from the Mexican Bracero Exclusion, Replication files and raw data” (Michael Clemens) - Hosted on Harvard Dataverse at https://dataverse.harvard.edu/dataverse/bracero - Contains two datasets: - Clemens, Michael, 2017, “Raw scanned PDFs of primary sources for workers, wages, and crops”, https://doi.org/10.7910/DVN/DJHVHB, Harvard Dataverse, V1 - Clemens, Michael, 2018, “Replication Data for: Immigration Restrictions as Active Labor Market Policy: Evidence from the Mexican Bracero Exclusion”, https://doi.org/10.7910/DVN/17M4ZP, Harvard Dataverse, V1

“United States Newspaper Panel, 1869-2004” (Gentzkow, Shapiro, Sinkinson) - Hosted on ICPSR at https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/30261 - Contains - Gentzkow, Matthew, Shapiro, Jesse M., and Sinkinson, Michael. United States Newspaper Panel, 1869-2004. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2014-12-10. https://doi.org/10.3886/ICPSR30261.v6

“Socioeconomic High-resolution Rural-Urban Geographic Dataset for India (SHRUG)” (Asher and Novosad) - Hosted on Harvard Dataverse at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DPESAK - Contains: - Asher, Sam; Novosad, Paul, 2019, “Socioeconomic High-resolution Rural-Urban Geographic Dataset for India (SHRUG)”, https://doi.org/10.7910/DVN/DPESAK, Harvard Dataverse, V1, UNF:6:Upe25NYAZwR+6VsDt5X2lQ==

Challenges in Hosting of Data and Code at Restricted-Access Data Centers

Users of restricted-access data centers (RADC, such as FSRDCs, CASD, etc.) face certain challenges in the handling of data and code as described in this document:

  • researchers (end-users) may not be able to provide DOI or similar persistent identifiers for some data
  • researchers may not be able to discern the presrvation policy for certain data sets
  • researchers may not be able to remove all code from the center, or such removal is subject to restrictions
  • data citation guidance may be lacking, or may not be obvious (see Data Citation Guidance for general guidance)

A few guidelines

  • Request as much code as the RADC will allow the researcher to remove. Subsequently handle it equivalently to the general code guidance, but make special note (placeholders, explanatory text) of any redacted information.
  • In addition, some RADC may provide the ability to deposit code internally and confidentially. Use such interal repositories, and make a note of their location in the publicly deposited code or in supplementary documents.

Self-generated repositories (second best)

If a RADC has at least an archival or backup policy of sufficient length (e.g., 10 or more years), but does not offer a formal repository, then the following procedure allows users to find and request code and data - As before, request as much code as is feasible, and deposit it in a public repository (e.g., openICPSR, Dataverse, Zenodo). Don’t publish it yet. - If possible at such repositories, pre-register a DOI - At Zenodo: click the appropriation request button, and a DOI will be assigned, e.g., 10.5281/zenodo.NNNNN. - At openICPSR: projects are called openicpsr-NNNNN. The DOI is derived from the project number as 10.3886/ENNNNNV1. - If you already have a DOI assigned to your manuscript or (published) paper, you can alternatively use that (see 10.1093/restud/rdw057 for an excellent example). - In the RADC, create a two-level directory with the name of the DOI. - Move both data (following guidelines outlined here) and all code (not just the confidential part) to subdirectories. The resulting directory structure will look something like this:

/some/path/project/10.5281/zenodo.NNNNN/:
      data/original/rawdata.dta
      data/derived/analysis.dta
      programs/01_cleaning.do
      programs/02_analysis.do
  • Confirm with the RADC’s administrative staff how long project files are kept as archives or in backup (often 5-10 years)
  • Add a statement to the public README.md (and to article materials). See Sample RADC Statement 1 and Sample RADC Statement 2.

Some examples