Sample code to generate descriptive statistics

Various code snippets to generate descriptive statistics of a dataset, especially when data cannot be included for licensing or confidentiality reasons.
Author

Various

Generating codebooks

Stata

In Stata,the native ‘codebook’ command can generate such information:

// Stata
use my_input_data
describe
codebook

See code/01_codebook_fancy.md for a fancier example, and code/02_codebook_plaintext.md for the code and output from the simpler example.

R

In R, the dataMaid [1], [2] can accomplish a similar task:

# use the    dataMaid   package
library(dataMaid)
makeCodebook(my_input_data)

See code/03_codebook_dataMaid for an example.

SAS

In SAS, PROC CONTENTS and PROC MEANS may very well provide all that is needed:

proc contents;
proc means;
run;

See code/04_codebook_SAS for an example.

Creationg “zero-obs” datasets

Alternatively, you can just provide an empty file that replicates the structure (schema) of your data. Often, this can be achieved by simply setting the number of observations to zero.

Stata

// Stata
use my_input_data
keep if 0
save zero_input_data, replace

Example:

. sysuse auto
(1978 automobile data)

. keep if 0
(74 observations deleted)

. desc

Contains data from /usr/local/stata/ado/base/a/auto.dta
 Observations:             0                  1978 automobile data
    Variables:            12                  13 Apr 2024 17:45
                                              (_dta has notes)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
make            str18   %-18s                 Make and model
price           int     %8.0gc                Price
mpg             int     %8.0g                 Mileage (mpg)
rep78           int     %8.0g                 Repair record 1978
headroom        float   %6.1f                 Headroom (in.)
trunk           int     %8.0g                 Trunk space (cu. ft.)
weight          int     %8.0gc                Weight (lbs.)
length          int     %8.0g                 Length (in.)
turn            int     %8.0g                 Turn circle (ft.)
displacement    int     %8.0g                 Displacement (cu. in.)
gear_ratio      float   %6.2f                 Gear ratio
foreign         byte    %8.0g      origin     Car origin
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: foreign
     Note: Dataset has changed since last saved.

R

# Read the RDS file into an R object
my_data <- readRDS("path/to/my_input_data.rds")
# Create a new, empty data frame with the same structure
zero_obs_df <- my_data[FALSE, ]
# To verify, check the dimensions
dim(zero_obs_df)
# [1] 0 X
# Save the empty data frame to a new RDS file
saveRDS(zero_obs_df, "path/to/zero_input_data.rds")

Alternatively, using the dplyr package:

library(dplyr)
my_data <- readRDS("path/to/my_input_data.rds")
zero_obs_df <- slice(my_data, 0)
saveRDS(zero_obs_df, "path/to/zero_input_data.rds")

SAS

data zero_input_data;
    set my_input_data (obs=0)
run;

or


proc sql;
  create table zero_input_table like my_input_table;
quit;