
Clean Data and Format for Statistical Analysis
- Individual-level data will be presented in a horizontal format, with one
row per subject (per timepoint, per visit, etc.. if applicable) and one study
parameter per column. The first row may indicate column names.
- Aggregate data will be presented in a horizontal format, with one row per
table cell and one study parameter per column.
- The values for all text and numeric codes will be provided to the
BAC.
- Each subject in the dataset will have a unique identifier (no names) or
groups of variables that allow for unique identification of each
subject.
- There will be no intervening rows of text in the dataset. Also, no
blank lines or spaces between lines in the dataset.
- Text and numeric fields are the only type permitted. No characters such
as "+" or "*" are acceptable.
- Data validation and plausibility checks are performed and documented. For
example, there should be no pregnant men, no systolic blood pressure over
250, and age and date of birth should be consistent.
- Any derived variables are documented, e.g. verify calculation of Body
Mass Index as weight(kg)/height(m)2.
- Missing data are coded consistently for each variable, e.g. use a "99"
for missing, not a "."
For some and a "99" for other observations of the same variable to indicate a
missing value. Also, make sure that true zero can be distinguished from
missing, more generally, and that a missing value is not a valid value, e.g.,
99 is not a good missing indicator for systolic or diastolic blood
pressure.
- For ASCII (text) files, a complete record layout is provided to the
BAC.
In This Section
For CCEB Help -
Information Pages