Modified

September 13, 2024

Background

We make use of the datadictionary package here.

This is not a perfect solution. Among other challenges, this package throws many warnings. But we will use it for the time being.

Note

This data dictionary workflow does not include the questions that were asked. Adding that, and more descriptive information, is a high priority.

Load data and generate dictionary

Note

This data dictionary workflow does not include the questions that were asked. Adding that, and more descriptive information, is a high priority.

Code
scr_df <- readr::read_csv(paste0(here::here(), "/data/csv/screening/agg/PLAY-screening-datab-latest.csv"),
                          show_col_types = FALSE)

scr_dd <- datadictionary::create_dictionary(scr_df)

readr::write_csv(scr_dd, paste0(here::here(), "/data/csv/screening/dd/PLAY-screening-data-dictionary.csv"))

Here are the data this package provides:

Code
scr_dd |>
  kableExtra::kable() |>
  kableExtra::kable_classic()
item label class summary value
Rows in dataset 909
Columns in dataset 68
session_id No label numeric mean 66685
median 67894
min 38196
max 75258
missing 0
session_name No label character unique responses 874
missing 0
session_date No label Date mean 2023-07-30
mode 2023-08-13
min 2019-06-09
max 2024-12-06
missing 0
session_release No label character unique responses 3
missing 0
participant_ID No label character unique responses 94
missing 0
participant_birthdate No label Date mean 2022-02-03
mode 2021-08-18
min 2017-06-08
max 2024-01-03
missing 0
participant_gender No label character unique responses 2
missing 0
participant_race No label character unique responses 11
missing 0
participant_ethnicity No label character unique responses 3
missing 0
participant_language No label character unique responses 13
missing 0
exclusion_reason No label character unique responses 15
missing 845
group_name No label character unique responses 2
missing 0
context_setting No label character unique responses 2
missing 8
context_country No label character unique responses 2
missing 49
context_state No label character unique responses 19
missing 38
vol_id No label numeric mean 1298
median 1376
min 899
max 1705
missing 0
participant_disability No label character unique responses 5
missing 747
pilot_pilot No label logical missing 909
submit_date No label POSIXct POSIXt mean 2023-06-09
mode 2024-02-09 2024-04-19
min 2019-10-08
max 2024-09-08
missing 78
site_id No label character unique responses 31
missing 78
play_id No label character unique responses 692
missing 200
child_age_mos No label numeric mean 18
median 18
min 9.24
max 42.6
missing 80
child_sex No label character unique responses 3
missing 80
child_bornonduedate No label character unique responses 3
missing 86
child_onterm No label character unique responses 3
missing 194
child_birthage No label numeric mean 5
median 3
min -20
max 367
missing 107
child_weight_pounds No label numeric mean 7
median 7
min 4
max 100
missing 90
child_weight_ounces No label numeric mean 7
median 7
min 0
max 143
missing 102
child_birth_complications No label character unique responses 3
missing 90
child_birth_complications_specify No label character unique responses 67
missing 843
child_hearing_disabilities No label character unique responses 3
missing 90
child_hearing_disabilities_specify No label character unique responses 2
missing 908
child_vision_disabilities No label character unique responses 3
missing 90
child_vision_disabilities_specify No label character unique responses 6
missing 904
child_major_illnesses_injuries No label character unique responses 3
missing 90
child_illnesses_injuries_specify No label character unique responses 30
missing 876
child_developmentaldelays No label character unique responses 3
missing 202
child_developmentaldelays_specify No label character unique responses 9
missing 901
child_sleep_time No label character unique responses 99
missing 91
child_wake_time No label character unique responses 114
missing 92
child_nap_hours No label character unique responses 39
missing 92
child_sleep_location No label character unique responses 6
missing 92
mom_bio No label character unique responses 4
missing 104
mom_childbirth_age No label numeric mean 33
median 33
min 20.22
max 121.22
missing 107
mom_race No label character unique responses 9
missing 97
mom_birth_country No label character unique responses 5
missing 97
mom_birth_country_specify No label character unique responses 39
missing 821
mom_education No label character unique responses 15
missing 98
mom_employment No label character unique responses 4
missing 98
mom_occupation No label character unique responses 486
missing 294
mom_jobs_number No label character unique responses 7
missing 292
mom_training No label character unique responses 3
missing 100
biodad_childbirth_age No label character unique responses 639
missing 78
biodad_race No label character unique responses 13
missing 78
language_spoken_mom No label character unique responses 6
missing 86
language_spoken_mom_comments No label character unique responses 17
missing 893
language_spoken_child No label character unique responses 6
missing 80
language_spoken_home_comments No label character unique responses 832
missing 78
language_spoken_child_comments No label character unique responses 14
missing 896
language_spoken_home No label character unique responses 7
missing 85
language_spoken_house_other No label character unique responses 15
missing 895
language_spoken_home_other No label character unique responses 2
missing 908
childcare_types No label character unique responses 14
missing 285
childcare_location No label character unique responses 26
missing 884
childcare_hours No label character unique responses 71
missing 476
childcare_number No label numeric mean 6
median 6
min 0
max 25
missing 479
childcare_age No label numeric mean 7
median 5
min 0
max 45
missing 589
childcare_language No label character unique responses 38
missing 475

Extracting questions

An alternative approach to generating the data dictionary starts with the questionnaire files themselves.