Codebook example for STATA
Simple example
The following is perfectly acceptable content, but not necessarily pretty to view. The core code only requires native commands. Note that it is important that output be to a plaintext log file, as SMCL (Stata’s fancy log format) is not portable.
. capture close log
. set more 1
. set linesize 147
. log using "01_codebook_plaintext.txt", replace text
---------------------------------------------------------------------------------------------------------------------------------------------------
name: <unnamed>
log: /mnt/local/slow_home/vilhuber/Workspace-non-encrypted/git/AEA/aea-de-guidance/code/01_codebook_plaintext.txt
log type: text
opened on: 1 Oct 2018, 17:13:43
. di
. sysuse auto
(1978 Automobile Data)
. describe
Contains data from /usr/local/stata14/ado/base/a/auto.dta
obs: 74 1978 Automobile Data
vars: 12 13 Apr 2014 17:45
size: 3,182 (_dta has notes)
---------------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair Record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn Circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear Ratio
foreign byte %8.0g origin Car type
---------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: foreign
. codebook
---------------------------------------------------------------------------------------------------------------------------------------------------
make Make and Model
---------------------------------------------------------------------------------------------------------------------------------------------------
type: string (str18), but longest is str17
unique values: 74 missing "": 0/74
examples: "Cad. Deville"
"Dodge Magnum"
"Merc. XR-7"
"Pont. Catalina"
warning: variable has embedded blanks
---------------------------------------------------------------------------------------------------------------------------------------------------
price Price
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [3291,15906] units: 1
unique values: 74 missing .: 0/74
mean: 6165.26
std. dev: 2949.5
percentiles: 10% 25% 50% 75% 90%
3895 4195 5006.5 6342 11385
---------------------------------------------------------------------------------------------------------------------------------------------------
mpg Mileage (mpg)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [12,41] units: 1
unique values: 21 missing .: 0/74
mean: 21.2973
std. dev: 5.7855
percentiles: 10% 25% 50% 75% 90%
14 18 20 25 29
---------------------------------------------------------------------------------------------------------------------------------------------------
rep78 Repair Record 1978
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [1,5] units: 1
unique values: 5 missing .: 5/74
tabulation: Freq. Value
2 1
8 2
30 3
18 4
11 5
5 .
---------------------------------------------------------------------------------------------------------------------------------------------------
headroom Headroom (in.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (float)
range: [1.5,5] units: .1
unique values: 8 missing .: 0/74
tabulation: Freq. Value
4 1.5
13 2
14 2.5
13 3
15 3.5
10 4
4 4.5
1 5
---------------------------------------------------------------------------------------------------------------------------------------------------
trunk Trunk space (cu. ft.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [5,23] units: 1
unique values: 18 missing .: 0/74
mean: 13.7568
std. dev: 4.2774
percentiles: 10% 25% 50% 75% 90%
8 10 14 17 20
---------------------------------------------------------------------------------------------------------------------------------------------------
weight Weight (lbs.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [1760,4840] units: 10
unique values: 64 missing .: 0/74
mean: 3019.46
std. dev: 777.194
percentiles: 10% 25% 50% 75% 90%
2020 2240 3190 3600 4060
---------------------------------------------------------------------------------------------------------------------------------------------------
length Length (in.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [142,233] units: 1
unique values: 47 missing .: 0/74
mean: 187.932
std. dev: 22.2663
percentiles: 10% 25% 50% 75% 90%
157 170 192.5 204 218
---------------------------------------------------------------------------------------------------------------------------------------------------
turn Turn Circle (ft.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [31,51] units: 1
unique values: 18 missing .: 0/74
mean: 39.6486
std. dev: 4.39935
percentiles: 10% 25% 50% 75% 90%
34 36 40 43 45
---------------------------------------------------------------------------------------------------------------------------------------------------
displacement Displacement (cu. in.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [79,425] units: 1
unique values: 31 missing .: 0/74
mean: 197.297
std. dev: 91.8372
percentiles: 10% 25% 50% 75% 90%
97 119 196 250 350
---------------------------------------------------------------------------------------------------------------------------------------------------
gear_ratio Gear Ratio
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (float)
range: [2.19,3.89] units: .01
unique values: 36 missing .: 0/74
mean: 3.01486
std. dev: .456287
percentiles: 10% 25% 50% 75% 90%
2.43 2.73 2.955 3.37 3.72
---------------------------------------------------------------------------------------------------------------------------------------------------
foreign Car type
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (byte)
label: origin
range: [0,1] units: 1
unique values: 2 missing .: 0/74
tabulation: Freq. Numeric Label
52 0 Domestic
22 1 Foreign
Prettier example
This is a bit more involved example, but illustrates the core code.
Requirements
This fancier example leverages the markdown
package as of 2018-10-01. Here we install it locally to this project.
. log using "01_codebook_fancy.smcl", replace smcl
---------------------------------------------------------------------------------------------------------------------------------------------------
name: <unnamed>
log: /mnt/local/slow_home/vilhuber/Workspace-non-encrypted/git/AEA/aea-de-guidance/code/01_codebook_fancy.smcl
log type: smcl
opened on: 1 Oct 2018, 17:13:43
. qui shell mkdir ado
. sysdir set PLUS "./ado/"
Once the markdoc package is installed, we can create marginally fancier codebooks as well (see the output).
Fancy introduction
For instance we could write a fancy introduction here.
File structure
We can now describe the file structure.
. describe
Contains data from /usr/local/stata14/ado/base/a/auto.dta
obs: 74 1978 Automobile Data
vars: 12 13 Apr 2014 17:45
size: 3,182 (_dta has notes)
---------------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair Record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn Circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear Ratio
foreign byte %8.0g origin Car type
---------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: foreign
Summary statistics
. codebook
---------------------------------------------------------------------------------------------------------------------------------------------------
make Make and Model
---------------------------------------------------------------------------------------------------------------------------------------------------
type: string (str18), but longest is str17
unique values: 74 missing "": 0/74
examples: "Cad. Deville"
"Dodge Magnum"
"Merc. XR-7"
"Pont. Catalina"
warning: variable has embedded blanks
---------------------------------------------------------------------------------------------------------------------------------------------------
price Price
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [3291,15906] units: 1
unique values: 74 missing .: 0/74
mean: 6165.26
std. dev: 2949.5
percentiles: 10% 25% 50% 75% 90%
3895 4195 5006.5 6342 11385
---------------------------------------------------------------------------------------------------------------------------------------------------
mpg Mileage (mpg)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [12,41] units: 1
unique values: 21 missing .: 0/74
mean: 21.2973
std. dev: 5.7855
percentiles: 10% 25% 50% 75% 90%
14 18 20 25 29
---------------------------------------------------------------------------------------------------------------------------------------------------
rep78 Repair Record 1978
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [1,5] units: 1
unique values: 5 missing .: 5/74
tabulation: Freq. Value
2 1
8 2
30 3
18 4
11 5
5 .
---------------------------------------------------------------------------------------------------------------------------------------------------
headroom Headroom (in.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (float)
range: [1.5,5] units: .1
unique values: 8 missing .: 0/74
tabulation: Freq. Value
4 1.5
13 2
14 2.5
13 3
15 3.5
10 4
4 4.5
1 5
---------------------------------------------------------------------------------------------------------------------------------------------------
trunk Trunk space (cu. ft.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [5,23] units: 1
unique values: 18 missing .: 0/74
mean: 13.7568
std. dev: 4.2774
percentiles: 10% 25% 50% 75% 90%
8 10 14 17 20
---------------------------------------------------------------------------------------------------------------------------------------------------
weight Weight (lbs.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [1760,4840] units: 10
unique values: 64 missing .: 0/74
mean: 3019.46
std. dev: 777.194
percentiles: 10% 25% 50% 75% 90%
2020 2240 3190 3600 4060
---------------------------------------------------------------------------------------------------------------------------------------------------
length Length (in.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [142,233] units: 1
unique values: 47 missing .: 0/74
mean: 187.932
std. dev: 22.2663
percentiles: 10% 25% 50% 75% 90%
157 170 192.5 204 218
---------------------------------------------------------------------------------------------------------------------------------------------------
turn Turn Circle (ft.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [31,51] units: 1
unique values: 18 missing .: 0/74
mean: 39.6486
std. dev: 4.39935
percentiles: 10% 25% 50% 75% 90%
34 36 40 43 45
---------------------------------------------------------------------------------------------------------------------------------------------------
displacement Displacement (cu. in.)
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (int)
range: [79,425] units: 1
unique values: 31 missing .: 0/74
mean: 197.297
std. dev: 91.8372
percentiles: 10% 25% 50% 75% 90%
97 119 196 250 350
---------------------------------------------------------------------------------------------------------------------------------------------------
gear_ratio Gear Ratio
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (float)
range: [2.19,3.89] units: .01
unique values: 36 missing .: 0/74
mean: 3.01486
std. dev: .456287
percentiles: 10% 25% 50% 75% 90%
2.43 2.73 2.955 3.37 3.72
---------------------------------------------------------------------------------------------------------------------------------------------------
foreign Car type
---------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (byte)
label: origin
range: [0,1] units: 1
unique values: 2 missing .: 0/74
tabulation: Freq. Numeric Label
52 0 Domestic
22 1 Foreign