Title: | Lightweight Data Structure for Recoding Categorical Data without Factors |
---|---|
Description: | A data structure and toolkit for documenting and recoding categorical data that can be shared in other statistical software. |
Authors: | Patrick Anker [aut, cre] |
Maintainer: | Patrick Anker <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2025-02-17 03:29:03 UTC |
Source: | https://github.com/nyuglobalties/rcoder |
Evaluate a collection of codings from a character vector
as_coding_list(x)
as_coding_list(x)
x |
A character vector |
A list of codings
char_vec <- c("coding(code('Yes', 1), code('No', 0))", "") as_coding_list(char_vec)
char_vec <- c("coding(code('Yes', 1), code('No', 0))", "") as_coding_list(char_vec)
Stores a coding at the "rcoder.coding" attribute of a vector
assign_coding(vec, .coding, .bpr = TRUE)
assign_coding(vec, .coding, .bpr = TRUE)
vec |
A vector |
.coding |
A 'coding' object |
.bpr |
Also overwrite the "bpr.coding" attribute with the character representation of '.coding'. Used for interop with blueprintr variable decorations. |
The vector with its "rcoder.coding" attribute set to '.coding'
[recode_vec()]
cdng <- coding(code("Yes", 3), code("Maybe", 2), code("No", 1)) vec <- sample(1:3, 50, replace = TRUE) assign_coding(vec, cdng)
cdng <- coding(code("Yes", 3), code("Maybe", 2), code("No", 1)) vec <- sample(1:3, 50, replace = TRUE) assign_coding(vec, cdng)
The most fundamental components of a 'code' object are the 'label' and 'value' elements. A 'code' object is essentially a key-value tuple that has some extra metadata.
code( label, value, description = label, links_from = label, missing = FALSE, ... )
code( label, value, description = label, links_from = label, missing = FALSE, ... )
label |
A short label for a value in a vector |
value |
The raw value found in the respective vector |
description |
A longer-form label for the value, if extra context for that label is needed |
links_from |
A reference to another 'code' in another 'coding' object for recoding purposes. |
missing |
Whether this 'code' represents a missing response |
... |
Any extra metadata |
A 'code' object that contains the key-value map of label to value
code("Yes", 1) code("No", 0) code( "No response", -88, description = "Participant ignored question when prompted", missing = TRUE ) code("Missing", NA, links_from = c("Refused", "Absent"))
code("Yes", 1) code("No", 0) code( "No response", -88, description = "Participant ignored question when prompted", missing = TRUE ) code("Missing", NA, links_from = c("Refused", "Absent"))
A 'coding' object holds a list of 'code's that map vector values to human readable labels. An abstraction of factors, this data structure is designed to be portable and not directly attached to the underlying data. Moreover, 'coding' objects can be "linked" for recoding and data lineage purposes. An "empty coding" is used to represent data that has no categorical interpretation.
coding(..., .label = NULL) empty_coding()
coding(..., .label = NULL) empty_coding()
... |
A collection of 'code' objects |
.label |
A label for this coding, available for interoperability |
A 'coding' object that contains each 'code' input
coding(code("Yes", 1), code("No", 0), code("Not applicable", NA)) empty_coding()
coding(code("Yes", 1), code("No", 0), code("Not applicable", NA)) empty_coding()
Converts a 'coding' object into a named vector to be used in the 'labels' parameter for 'haven::labelled()'.
coding_to_haven_labels(coding)
coding_to_haven_labels(coding)
coding |
A coding object |
A named vector representation of the coding
cdng <- coding(code("Yes", 1), code("No", 0)) coding_to_haven_labels(cdng)
cdng <- coding(code("Yes", 1), code("No", 0)) coding_to_haven_labels(cdng)
ODK XLSForms link the categorical codings to a variable type name in the 'survey' sheet. The codings are specified in the 'choices' sheet which has a 'list_name' column that holds the variable type names. Each row that has that name will be associated with that categorical type coding. This function converts 'coding' objects into tables that can be inserted into that 'choices' sheet. The categorical type is specified with the coding '.label'.
coding_to_odk(coding)
coding_to_odk(coding)
coding |
A coding object |
A data.frame or tibble that can be included in an XLSForm 'choices' sheet
[odk_to_coding()]
cdng <- coding(code("Yes", 1), code("No", 0), .label = "yesno") coding_to_odk(cdng)
cdng <- coding(code("Yes", 1), code("No", 0), .label = "yesno") coding_to_odk(cdng)
To prevent requiring attaching the 'rcoder' package, this function takes in an unevaluated expression – assumed to be a 'coding()' call – and evaluates the expression with _only_ 'coding' and 'code' provided to guard against rogue code.
eval_coding(expr)
eval_coding(expr)
expr |
An expression |
An evaluated 'coding' object
eval_coding('coding(code("Yes", 1), code("No", 0))')
eval_coding('coding(code("Yes", 1), code("No", 0))')
Is an object the empty coding?
is_empty_coding(x)
is_empty_coding(x)
x |
An object |
TRUE/FALSE if the object is identical to 'empty_coding()'
is_empty_coding(empty_coding()) is_empty_coding(coding()) is_empty_coding(coding(code("Yes", 1), code("No", 0)))
is_empty_coding(empty_coding()) is_empty_coding(coding()) is_empty_coding(coding(code("Yes", 1), code("No", 0)))
Coding objects can be linked together to create mappings from one or more codings to another. This creates a 'data.frame' that outlines how the codings are linked, to be used in 'make_recode_query()'.
link_codings(to, ..., .to_suffix = "to", .drop_unused = FALSE)
link_codings(to, ..., .to_suffix = "to", .drop_unused = FALSE)
to |
A coding to be linked to |
... |
Codings to be linked from |
.to_suffix |
A suffix signifying which columns in the output 'data.frame' came from 'to' |
.drop_unused |
Logical flag to drop any codes in '...' that have no counterparts in 'to' |
A 'linked_coding_df' with all necessary information for a recoding query
wave1 <- coding( code("Yes", 1), code("No", 2), code("Refused", -88, missing = TRUE) ) wave2 <- coding( code("Yes", "y"), code("No", "n"), code("Missing", ".", missing = TRUE) ) link_codings( to = coding( code("Yes", 1), code("No", 0), code("Missing", NA, links_from = c("Refused", "Missing")) ), wave1, wave2 )
wave1 <- coding( code("Yes", 1), code("No", 2), code("Refused", -88, missing = TRUE) ) wave2 <- coding( code("Yes", "y"), code("No", "n"), code("Missing", ".", missing = TRUE) ) link_codings( to = coding( code("Yes", 1), code("No", 0), code("Missing", NA, links_from = c("Refused", "Missing")) ), wave1, wave2 )
This creates a function that accepts a vector and recodes it from the information provided in a 'linked_coding_df'. Usually this is intended for package authors who want to operate at the recoding relational table level (e.g. mapping multiple codings to one). Most end users should use [recode_vec()] instead.
make_recode_query(linked_codings, from = 1, to_suffix = "to", ...)
make_recode_query(linked_codings, from = 1, to_suffix = "to", ...)
linked_codings |
A 'linked_coding_df' |
from |
A character or integer that selects the correct original coding. Defaults to 1, the first linked coding. |
to_suffix |
The suffix used to signify which columns refer to values to which the vector will be recoded |
... |
Any other parameters passed onto the recoding function selector |
A function with single argument when applied to an input vector will recode the vector appropriately
cdng_old <- coding(code("Yes", 1), code("No", 2)) cdng_new <- coding(code("Yes", 1), code("No", 0)) recode_func <- make_recode_query(link_codings(cdng_new, cdng_old)) vec <- sample(1:2, 20, replace = TRUE) recode_func(vec)
cdng_old <- coding(code("Yes", 1), code("No", 2)) cdng_new <- coding(code("Yes", 1), code("No", 0)) recode_func <- make_recode_query(link_codings(cdng_new, cdng_old)) vec <- sample(1:2, 20, replace = TRUE) recode_func(vec)
Performs to check to see if the set of vector values are equal to or a subset of a coding's values.
matches_coding(vec, coding, ignore_empty = TRUE) verify_matches_coding(vec, coding, ignore_empty = TRUE)
matches_coding(vec, coding, ignore_empty = TRUE) verify_matches_coding(vec, coding, ignore_empty = TRUE)
vec |
A vector |
coding |
A 'coding' object |
ignore_empty |
Logical flag to skip check if coding is empty |
TRUE/FALSE
verify_matches_coding()
: Rather than returning TRUE/FALSE, this function
halts execution if 'matches_coding()' returns FALSE.
vec1 <- sample(1:2, 10, replace = TRUE) vec2 <- sample(0:1, 10, replace = TRUE) cdng <- coding(code("Yes", 1), code("No", 0)) matches_coding(vec1, cdng) matches_coding(vec2, cdng)
vec1 <- sample(1:2, 10, replace = TRUE) vec2 <- sample(0:1, 10, replace = TRUE) cdng <- coding(code("Yes", 1), code("No", 0)) matches_coding(vec1, cdng) matches_coding(vec2, cdng)
Takes a coding a returns a new coding with all codes that represent a missing value.
missing_codes(coding)
missing_codes(coding)
coding |
a coding |
A coding that contains all missing codes. If no codes are found, returns 'empty_coding()'
missing_codes(coding(code("Yes", 1), code("No", 0), code("Missing", NA))) missing_codes(coding(code("Yes", 1), code("No", 0)))
missing_codes(coding(code("Yes", 1), code("No", 0), code("Missing", NA))) missing_codes(coding(code("Yes", 1), code("No", 0)))
ODK XLSForms link the categorical codings to a variable type name in the 'survey' sheet. The codings are specified in the 'choices' sheet which has a 'list_name' column that holds the variable type names. Each row that has that name will be associated with that categorical type coding. This function converts subsets of the choices sheet into individual 'coding' objects.
odk_to_coding(choice_table)
odk_to_coding(choice_table)
choice_table |
A data.frame slice of the "choices" table from an XLSForm |
A ‘coding' object that corresponds to the choices’ slice
[coding_to_odk()]
choice_excerpt <- data.frame( list_name = rep("yesno", 2), name = c("Yes", "No"), label = c(1, 0) ) odk_to_coding(choice_excerpt)
choice_excerpt <- data.frame( list_name = rep("yesno", 2), name = c("Yes", "No"), label = c(1, 0) ) odk_to_coding(choice_excerpt)
A simple interface to recoding a vector based on the coding linking mechanism. If the vector has the "rcoder.coding" attribute, then the coding object stored in that attribute will be used by default.
recode_vec(vec, to, from = NULL, .embed = TRUE, .bpr = TRUE)
recode_vec(vec, to, from = NULL, .embed = TRUE, .bpr = TRUE)
vec |
A vector |
to |
A coding object to which the vector will be recoded |
from |
A coding object that describes the current coding of the vector. Defaults to the "rcoder.coding" attribute value, if it exists, _or_ the "bpr.coding" value (from blueprintr). If neither are found, 'from' stays 'NULL' and the function errors. |
.embed |
If 'TRUE', 'from' will be stored in the "rcoder.coding" attribute |
.bpr |
If 'TRUE', adds the _character_ representation of the coding to the "bpr.coding" attribute. Used for interop with blueprintr variable decorations |
The recoded vector
[assign_coding()]
# Using an explicit `from` vec <- sample(1:3, 50, replace = TRUE) cdng_old <- coding(code("Yes", 3), code("Maybe", 2), code("No", 1)) cdng_new <- coding(code("Yes", 2), code("Maybe", 1), code("No", 0)) recode_vec(vec, to = cdng_new, from = cdng_old) # Using an implicit `from` with assign_coding() vec <- sample(1:3, 50, replace = TRUE) vec <- assign_coding(vec, cdng_old) recode_vec(vec, cdng_new)
# Using an explicit `from` vec <- sample(1:3, 50, replace = TRUE) cdng_old <- coding(code("Yes", 3), code("Maybe", 2), code("No", 1)) cdng_new <- coding(code("Yes", 2), code("Maybe", 1), code("No", 0)) recode_vec(vec, to = cdng_new, from = cdng_old) # Using an implicit `from` with assign_coding() vec <- sample(1:3, 50, replace = TRUE) vec <- assign_coding(vec, cdng_old) recode_vec(vec, cdng_new)