Data dictionary

Modified

September 13, 2024

Background

We make use of the datadictionary package here.

This is not a perfect solution. Among other challenges, this package throws many warnings. But we will use it for the time being.

Note

This data dictionary workflow does not include the questions that were asked. Adding that, and more descriptive information, is a high priority.

Load data and generate dictionary

Note

This data dictionary workflow does not include the questions that were asked. Adding that, and more descriptive information, is a high priority.

scr_df <- readr::read_csv(paste0(here::here(), "/data/csv/screening/agg/PLAY-screening-datab-latest.csv"),
                          show_col_types = FALSE)

scr_dd <- datadictionary::create_dictionary(scr_df)

readr::write_csv(scr_dd, paste0(here::here(), "/data/csv/screening/dd/PLAY-screening-data-dictionary.csv"))

Here are the data this package provides:

scr_dd |>
  kableExtra::kable() |>
  kableExtra::kable_classic()
item label class summary value
Rows in dataset 855
Columns in dataset 68
session_id No label numeric mean 66176
median 67167
min 38196
max 74521
missing 0
session_name No label character unique responses 820
missing 0
session_date No label Date mean 2023-07-01
mode 2023-08-13
min 2019-06-09
max 2024-10-12
missing 0
session_release No label character unique responses 3
missing 0
participant_ID No label character unique responses 93
missing 0
participant_birthdate No label Date mean 2022-01-04
mode 2021-08-18
min 2017-06-08
max 2023-10-27
missing 0
participant_gender No label character unique responses 2
missing 0
participant_race No label character unique responses 10
missing 0
participant_ethnicity No label character unique responses 3
missing 0
participant_language No label character unique responses 13
missing 0
exclusion_reason No label character unique responses 15
missing 794
group_name No label character unique responses 2
missing 0
context_setting No label character unique responses 2
missing 9
context_country No label character unique responses 2
missing 50
context_state No label character unique responses 19
missing 34
vol_id No label numeric mean 1288
median 1370
min 899
max 1705
missing 0
participant_disability No label character unique responses 5
missing 697
pilot_pilot No label logical missing 855
submit_date No label POSIXct POSIXt mean 2023-06-09
mode 2024-02-09 2024-04-19
min 2019-10-08
max 2024-09-08
missing 25
site_id No label character unique responses 31
missing 25
play_id No label character unique responses 691
missing 147
child_age_mos No label numeric mean 18
median 18
min 9.24
max 42.6
missing 27
child_sex No label character unique responses 3
missing 27
child_bornonduedate No label character unique responses 3
missing 33
child_onterm No label character unique responses 3
missing 141
child_birthage No label numeric mean 5
median 3
min -20
max 367
missing 54
child_weight_pounds No label numeric mean 7
median 7
min 4
max 100
missing 37
child_weight_ounces No label numeric mean 7
median 7
min 0
max 143
missing 49
child_birth_complications No label character unique responses 3
missing 37
child_birth_complications_specify No label character unique responses 67
missing 789
child_hearing_disabilities No label character unique responses 3
missing 37
child_hearing_disabilities_specify No label character unique responses 2
missing 854
child_vision_disabilities No label character unique responses 3
missing 37
child_vision_disabilities_specify No label character unique responses 6
missing 850
child_major_illnesses_injuries No label character unique responses 3
missing 37
child_illnesses_injuries_specify No label character unique responses 30
missing 822
child_developmentaldelays No label character unique responses 3
missing 149
child_developmentaldelays_specify No label character unique responses 9
missing 847
child_sleep_time No label character unique responses 98
missing 38
child_wake_time No label character unique responses 114
missing 39
child_nap_hours No label character unique responses 39
missing 39
child_sleep_location No label character unique responses 6
missing 39
mom_bio No label character unique responses 4
missing 51
mom_childbirth_age No label numeric mean 33
median 33
min 20.22
max 121.22
missing 54
mom_race No label character unique responses 9
missing 44
mom_birth_country No label character unique responses 5
missing 44
mom_birth_country_specify No label character unique responses 39
missing 767
mom_education No label character unique responses 15
missing 45
mom_employment No label character unique responses 4
missing 45
mom_occupation No label character unique responses 485
missing 241
mom_jobs_number No label character unique responses 7
missing 239
mom_training No label character unique responses 3
missing 47
biodad_childbirth_age No label character unique responses 638
missing 25
biodad_race No label character unique responses 13
missing 25
language_spoken_mom No label character unique responses 6
missing 33
language_spoken_mom_comments No label character unique responses 17
missing 839
language_spoken_child No label character unique responses 6
missing 27
language_spoken_home_comments No label character unique responses 831
missing 25
language_spoken_child_comments No label character unique responses 14
missing 842
language_spoken_home No label character unique responses 7
missing 32
language_spoken_house_other No label character unique responses 15
missing 841
language_spoken_home_other No label character unique responses 2
missing 854
childcare_types No label character unique responses 14
missing 232
childcare_location No label character unique responses 26
missing 830
childcare_hours No label character unique responses 71
missing 423
childcare_number No label numeric mean 6
median 6
min 0
max 25
missing 426
childcare_age No label numeric mean 7
median 5
min 0
max 45
missing 536
childcare_language No label character unique responses 37
missing 422

Extracting questions

An alternative approach to generating the data dictionary starts with the questionnaire files themselves.