Requested Information for Code

Readings

Some scientists, including economists, have put together excellent guidance and tutorials. See

README and master script

All replication archives should have a README (in PDF, text, or a simple formatting language such as Markdown, like this document). The README should provide a sufficient description to understand the structure of the replication archive (directory structure, what is acquired from third parties, what is generated by scripts, how much output to expect). It should document each file or class of files that are included.

We strongly encourage the provision of a master script. The master script should run all programs necessary to provide the outputs, in the right sequence, robustly.

In some cases, the master script might also serve as a README (for instance, “README.bash”, “README.py”, “README.Rmd”), as long as it satisfies all conditions of the README as well (i.e., ample comments).

Configuration

  • We strongly encourage the use of a single configuration file, containing all path names.

  • We strongly encourage the use of relative paths for including programs, data, and outputs in the main program

  • We strongly encourage listing ALL dependencies and packages needed to run.

    • including them in the project
    • or listing them out in a “setup” program
  • We strongly encourage setting any seeds for random number generators

  • Sample configuration for Stata

  • Sample configuration for R

  • Python way

Things not to do

  • We strongly discourage writing comments like
Run this a first time, generating column 1 of Table 3.
Then comment out line 55, then run a second time, which should
give you column 3 of Table 3.
Then uncomment line 55, change the parameter in line 67 to "5",
and run again to get column 2 of Table 3.

(this is only slightly paraphrased from an actual example). - Avoid ambiguous or imprecise instructions like

Have superDynare available

or

Use the outreg55 package

(no URL or installation command provided)

xkcd-data-pipeline

Things to do

  • Explain briefly how to RUN your code - do not assume that everybody now or in the future knows how to run make, Matlab+Dynare, or even Stata.
  • Write code that can be run without human intervention, ideally without using a graphical interface.
  • Use functions/programs/loops/etc. to iterate through variations of an otherwise identical procedure (but ensure that the purpose of the loop is well described)
  • Identify all requirements to allow somebody to successfully run the code who has NOT been experimenting with the software and code for the past 5 years on the same laptop. This means
  • what packages need to be installed, from where, which versions
ssc install outreg55, from(https://myurl/to/o) // accurate as of 2018-10-03

or r # Known to work with dplyr 0.7.6 install.packages(c("dplyr","devtools")) library(dplyr) library(devtools) install_github("myrepo/superols") - Write out all tables to files (do not simply display on-screen) - Stata: regsave, outest, texsave, etc. are sample packages - Matlab: writetable - R: any number of packages, for instance kable or kableExtra - Write out all figures to files (do not simply display on-screen) - Stata: graph export (use pdf or eps) - Matlab: saveas - R: various native commands, such as png(), pdf()

Versioning of packages

You should be precise about the relevant packages. Ideally specify a specific version to install, or provide the package. At a minimum, specify the version of a package that you produced your results with.

  • Stata:
    • making a distinction between Stata Journal and SSC versions can sometimes help. If the package license permits, distributing the package with the core code is also possible.
    • Redirect the adopath to a project-specific directory.
  • R:

Some facts

Most economists in the late 2000s (through 2018) use either Stata or Matlab. Assume that replicators have not (yet) learned your preferred software if you are using something else, and provide some guidance how to use it.

software usage AER Figure: Software usage in the AER, 2000-2018 (Source: Baylis and Schrimpf, 2018)