Title: | An interactive interface to homogenize messy panel data into a long format |
---|---|
Description: | Panel data structure can sometimes change over the course of data collection, which can be a real nuisance when cleaning data. This package aims to document the state of all waves of data in a comprehensive manner and then homogenize the data into a long dataset ready for data transformations. |
Authors: | Patrick Anker [aut, cre] , Global TIES for Children [cph] (https://research.steinhardt.nyu.edu/ihdsc/global-ties) |
Maintainer: | Patrick Anker <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.5 |
Built: | 2024-11-11 05:15:30 UTC |
Source: | https://github.com/nyuglobalties/panelcleaner |
Add a variable mapping to a panel
add_mapping(panel, mapping, ...) ## Default S3 method: add_mapping(panel, mapping, ...) ## S3 method for class 'unhomogenized_panel' add_mapping(panel, mapping, ...)
add_mapping(panel, mapping, ...) ## Default S3 method: add_mapping(panel, mapping, ...) ## S3 method for class 'unhomogenized_panel' add_mapping(panel, mapping, ...)
panel |
A panel object |
mapping |
A |
... |
Parameters passed onto other methods |
The input panel with a new mapping
add_mapping(default)
: The default method
add_mapping(unhomogenized_panel)
: Method for unhomogenized panels
Adds a wave of data to a panel, assigned to wave wave
add_wave(panel, data, wave, ...)
add_wave(panel, data, wave, ...)
panel |
A panel object |
data |
A |
wave |
The wave tag associated with this dataset |
... |
Other arguments passed to methods |
Add multiple waves of data to a panel
add_waves(panel, data, ...)
add_waves(panel, data, ...)
panel |
A panel object |
data |
A list of |
... |
Other arguments passed to methods |
Modify a wave's dataset
amend_wave(x, wave_id, wave_db, ...)
amend_wave(x, wave_id, wave_db, ...)
x |
A panel, currently an unhomogenized panel |
wave_id |
The desired wave tag |
wave_db |
The amended dataset |
... |
Other parameters passed to methods |
This schema outlines the column names used by default during panel mapping
creation, when not overridden with the .schema
parameter in
panel_mapping()
.
default_panel_mapping_schema()
default_panel_mapping_schema()
Collect multiple waves of data into a panel
enpanel(name, ..., .id_col = "id", .waves_col = "wave")
enpanel(name, ..., .id_col = "id", .waves_col = "wave")
name |
The name of the panel data, e.g. an assessment name |
... |
The waves of the panel. If unnamed arguments are used, the parameter order will be used as the wave tags. If named parameters are used, the names of the parameters will be used as the wave tags. |
.id_col |
The name of the column that has the observed participant IDs. Multiple names can be used if multiple columns uniquely identify participants. |
.waves_col |
The column name for the wave tags once the panel is
homogenized and bound together into a single |
An unhomogenized_panel
object
Once all waves are collected into a single unhomogenized_panel
object, this
will homogenize variable names and, where applicable, categorical codings
according to a panel mapping.
homogenize_panel(panel, mapping = NULL, ...) bind_waves(panel, allow_issues = FALSE, ...)
homogenize_panel(panel, mapping = NULL, ...) bind_waves(panel, allow_issues = FALSE, ...)
panel |
An unhomogenized panel |
mapping |
A panel mapping. If NULL, a panel mapping must be attached
to the |
... |
Parameters to be used for context, usually for defining a panel schema |
allow_issues |
If |
An unhomogenized_panel
that is ready to be homogenized using
bind_waves()
bind_waves()
: Bind waves into a homogenized panel after
successful homogenization
The first part of the homogenization process is to harmonize wave variable names to the homogenized name. If either there are missing wave variable names and a provided homogenized name or provided variable names and a missing homogenized name, an error will be thrown. The original version of panelcleaner included a notion of "issues" that would allow harmonization with errors, but after continued practice of using panelcleaner, this behavior is deprecated in favor of halting the harmonization process altogether.
The next step is to harmonize the codings for categorical data. As panelcleaner was
intended to be used in a data processing pipeline before analysis was conducted in
Stata, the desired behavior of panelcleaner is to separate values from labels, unlike
R's factor
class. Codings are written using rcoder::coding()
. The harmonization process
is similar to names: errors will be thrown if wave codings and homogenized codings
aren't both present or missing, and the codings in all waves will be recoded to the
homogenized coding.
The last step is to harmonize variable descriptions. This part is optional. It will
only happen if the homogenized_description
(or custom name specified with a custom
panelcleaner schema) is present. The same types will occur for descriptions. The only
thing different about harmonizing descriptions is that it doesn't affect the data:
it operates by assigning the bpr.description
attribute for variables. This feature
is really only useful if you intend you data to be used in a
blueprintr project.
In some cases the default behavior of panelcleaner is too restrictive, especially during the beginning of data collection. Often, APIs or general data exports don't include variables that don't have any submissions yet, but you still want to keep those variables in your input data. These parameters lift some restrictions on panelcleaner's behavior:
drop_na_homogenized
: If TRUE
, any NA entries in the homogenized_name column will be
ignored, as if the row in the panel mapping doesn't exist.
ignored_missing_codings
: If TRUE
, waves with NA codings but with non-NA homogenized
codings will not have their values homogenized.
ignored_missing_homogenized_codings
: If TRUE
, any variables that have defined wave
codings but no homogenized coding will not have their codings homogenized.
error_missing_raw_variables
: If FALSE
, raw variables that should be present in the
data, given the panel mapping, but aren't will not throw an error. Instead, they'll be
added to the list of issues.
replace_missing_with_na
: If TRUE
, raw_variables that should be present in the data,
given the panel mapping, but are not will be created and filled with NA values. A message
will be displayed of all the variables where this action was applied. This value supersedes
error_missing_raw_variables
.
Some issues may be uncovered over the course of homogenization. Rather than halting execution immediately upon encountering these problems, these issues are stored within the panel object. Use this function to view the issues with a panel.
issues(x)
issues(x)
x |
A panel object |
Create a panel mapping
panel_mapping(df, waves, .schema = list())
panel_mapping(df, waves, .schema = list())
df |
A |
waves |
A vector of all known waves recorded in the mapping file |
.schema |
A list of column names that outline the structure of the
mapping file. |
Gets the data.frame
assigned to wave_id
from a panel.
wave(x, wave_id, ...)
wave(x, wave_id, ...)
x |
A panel, currently an unhomogenized panel |
wave_id |
The wave tag to which the dataset is assigned |
... |
Other parameters passed to other methods |