Logo

Data and Code Guidance by Data Editors

Guidance for authors wishing to create data and code supplements, and for replicators.

Requested information for data and code replication packages

On this page:

This is a draft document. Please provide comments by creating a new issue in this Github project.

Readings

The following readings might be useful for structuring project, code, and data. It is useful to consult these at an early stage of the project, as subsequent adjustments will be small and incremental, rather than large and disruptive.

Some concepts

We will refer to a (simplified) data structure as described below. Real-life data structures are often more complex, and the distinctions made in the simplified example should be adapted accordingly.

graph TD; subgraph Dataflow; A((Input data)) ==> B[Cleaning programs]; B ==> C((Analysis data)); C ==> D[Analysis programs] D ==> E((Outputs)); end; B -.-> F(("Auxiliary data
(created)")); F -.-> C; Z((Source)) -.-> X[Data citation] -.-> A;

General Rules and Guidelines

Requirements

We require that

Some journals may require a README in a specific format.

Suggestions

We strongly suggest using some best-practices as suggested by the literature cited above:

This document provides some practical guidance.

We strongly suggest using the template README available on this site.

Encourage

We encourage you

Data

Regarding the data, enough information should be provided

For details, see Requested information for data.

Citing Data and Code

All data should be cited, as per journal guidelines:

For a discussion with some suggestions, see our Data citation guidance.

Data and Code Availability Statements

Some of the information historically captured by “README” files is more formally captured by newer “Data (and Code) Availability Statements”. They expand on and complement data citations. Sample language should be incorporated into a README, a distinct document, or a distinct section of the manuscript.

Some examples are listed here.

Programs and Code

We strongly suggest

For details, see our discussion on Requested information for code.

Data and Code Hosting

Journals have made supplementary materials available on their websites since the early 2000s. As the popular and scientific web-accessible global infrastructure has matured, other possibilities have opened up. We comment on important features to consider when depositing code and data.

Principles

A code and data repository (or “archive”) should satisfy a few criteria:

Not every web-based location is a code or data repository; on the other hand, numerous non-web based archives are legitimate locations for data to be found (e.g., National Archives).

For further details, see our discussion on Requested information on hosting code and data.

Licensing questions

Issues about licensing are complex, and this site touches on this topic in the discussion of licensing. We encourage authors to take licensing considerations seriously.