Data Citation and Reproducibility on WRDS
This content was based on a contribution by Matthew Pierson at the Wharton Research Data Services (WRDS), and adapted for our purposes. All errors are mine.
Citing WRDS data
Researchers can cite the individual dataset or query used to access data via a unique and extensible URL. Each section of the URL represents a further step in the chain of access.
Example Citation:
Standard and Poors. (year). “Compustat North American Fundamentals Annual Data”. Provided by Wharton Research Data Services (WRDS).
Available at: https://wrds-www.wharton.upenn.edu/pages/get-data/compustat-capital-iq-standard-poors/compustat/north-america-daily/fundamentals-annual/, last accessed on (date).
Documenting Web Queries
Researchers can demonstrate exactly what variables were accessed, row counts, and date ranges using the following methods:
Output Pages and Logs
Every web query generates a unique output page upon submission.

- Toggle Input Parameters: Clicking this displays the exact input parameters of the query.
- Log Files:
_sas.log(or_sql.log): Displays the code that generated the query._grid.log: Demonstrates the server-side log file of the job.
- Tracking: These logs trace the exact output file name, allowing them to be matched against the import function of a researcher’s code.
- Retention: Log files are only kept for 2 days. Users should save these with their output to submit as part of their Data and Code package.
Programmatic and Server Access
For access via server or other non-GUI methods, researchers should submit their log files as part of their replication packages.
- Library & Dataset Identification: Logs provide indications of the library and dataset accessed (e.g., CRSP Daily Securities data is identified as
crsp.dsf). - Data Dictionary: Individual dataset names can be translated using the WRDS Data Dictionary.
- Verification: Log files indicate variables used and the date accessed by the nature of most statistical programs.

