validate the listed contents (manifest) of the data submission against the actual contents of the data submission (see reference README). Is information missing?
Also cross-check against the data section of the manuscript
Create a list of all data sets referenced or provided
Create a list of all tables, figures, and in-text numbers from the manuscript (excludes the online appendix). If the author provides such a list, validate it against the manuscript.
For each listed data source
verify information about the data
identify that the data set has a clear name
verify licensing and access information. Is the original data accessible to researchers other than the author (license)? Does the author have the rights to redistribute the data (if not holder of the copyright)? For data generated by the author, is a license provided?
Is the data cited in the manuscript? In the README?
verify information in the data
are all variables labeled (Stata) or is information on each variable provided (codebook)?
if the data are not provided (confidential data), are summary statistics on the data provided (in the manuscript, as part of the archive)?
if the data ARE provided, are there any potentially sensitive data on the dataset? (Example software: R version and Stata version provided by J-PAL)
Some of the following may be OK if adequately described in the README and/or the manuscript data section
No names of people
No social security numbers, credit card numbers, etc.
No addresses or precise geo-locations (GPS numbers)
For each listed table, figure, in-text number
can you identify the piece of code responsible for generating that number or figure
does the code produce an identifiable output that contains those numbers or figures?
Conduct a code verification, if data is available
create a directory containing only the programs and data provided
Follow instructions from README or the code to
Download all additional data not provided within the replication archive
if not already contained within the existing code, it may be useful to create a config file (sample Stata config file), and call it from all to-be-run programs
(Stata only) Reset the PERSONAL, PLUS and SITE locations that Stata uses to search for ado files (this is automatically done by the sample Stata config file)
install all identified requirements
run all code as per instructions in the README or the code
identify all error messages
identify all outputs as per the README and the list of tables/figures/in-text numbers
compare the output to the tables/figures/numbers in the manuscript.