--- title: "Introduction to the rddi package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to the rddi package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, eval = FALSE} library(rddi) ``` ## Introduction The `rddi` package is developer-focused package provides R representations of DDI Codebook 2.5 elements to construct fully-validated XML while still being flexible. This package covers all elements in the DDI codebook schema, including the attributes and constraints of each element. ## Installation Install the latest stable version from CRAN: ```{r eval = FALSE} install.packages("rddi") ``` Install the development from this repository with: ```{r eval = FALSE} # install.packages("devtools") devtools::install_github("Global-TIES-for-Children/rddi") ``` ## What is the Data Documentation Initiative (DDI)? The Data Documentation Initiative (DDI) is an international standard for describing data from surveys in the social and health sciences. There are currently two versions in use, the DDI-Codebook (v 2.5), which is used for single datasets, and DDI Lifecycle (3.0) which tracks a project through it's lifecycle. This package only covers the codebook. For more information go to . A DDI-Codebook is structured as follows |- codeBook |----docDscr |----stdyDscr |----fileDscr |----dataDscr codeBook serves as the root node, the stdyDscr holds study level metadata, dataDscr holds variable level metadata, and the fileDscr and docDscr holds administrative metadata about the dataset. The only required element is titl in stdyDscr/citation/titlStmt. ## Designed uses of rddi `rddi` was designed to work with [blueprintr](https://github.com/nyuglobalties/blueprintr), a plugin to the drake and targets packages. It's meant to describe the data that is produced through these kinds of pipelines. There is a `ddi_` function for each DDI-Codebook element that accepts a dots function (`...`) consisting of other `ddi_` functions and attributes. The root node is always `ddi_codebook()` followed by one or more of the following functions (`ddi_docDscr()`, `ddi_stdyDscr()`, `ddi_fileDscr()`, or `ddi_dataDscr()`) and their children. Element attributes are named variables within each function along with children nodes. Each function also checks for the elements constraints (allowed children, allowed attributes, and the cardinality of each child and attribute if designated by the DDI-Codebook schema). For information on the constraints and attributes of each element a link to the DDI documentation is included in the function's documentation. In order to use rddi, vectorized functions like the `apply()` family of functions in base R or the `map()` functions from the `purrr` package, might be necessary depending on the number of repeating elements. To use them, you first need to create a splat() function to convert the results into a dots function. A sample splat function is below ```{r eval = FALSE} splat <- function(x, f) { do.call(f, x) } ``` Below is an example of splatting one child node into the parent node. ```{r example1, eval = FALSE} library(purrr) # to convert to dots (...) splat <- function(x, f) { do.call(f, x) } # read in metadata held in a csv ds <- read_csv("insert path to file here") # create a variable to add var to dataset descr_var <- pmap( .l = list(name = ds$name, description = ds$description), .f = function(name, description) ddi_var(varname = name, ddi_labl(description)) ) # Build the codebook codebook <- ddi_codeBook( ddi_stdyDscr( ddi_citation( ddi_titlStmt( ddi_titl("test") ) ) ), splat(descr_var, ddi_dataDscr) ) ``` In cases where a parent node has multiple child nodes, there are two options 1) splat multiple functions into the parent node or 2) create the parent node and append directly to the content. ```{r example2, eval = FALSE} # splatting two or more node types into the parent node codebook <- ddi_codeBook( ddi_stdyDscr( ddi_citation( ddi_titlStmt( ddi_titl("test") ) ) ), splat(c(descr_varGrp, descr_var), ddi_dataDscr) ) ``` ```{r example3, eval = FALSE} # Creating a parent node in a function and then appending other child nodes to the content. The below works best with a YAML or JSON data source. generate_stdyInfo <- function(dat) { stdyInfo <- ddi_stdyInfo( splat( c(descr_dataKind(dat), descr_anlyUnit(dat), descr_universe(dat$stdyDscr$stdyInfo$sumDscr$universe), descr_nation(dat), descr_geogCover(dat),descr_geogUnit(dat), descr_timePrd(dat), descr_collDate(dat)), ddi_sumDscr), splat(descr_keyword(dat), ddi_subject) ) stdyInfo$content <- append(stdyInfo$content, descr_abstract(dat)) stdyInfo } # splatting two or more node types into the parent node codebook <- ddi_codeBook( ddi_stdyDscr( ddi_citation( ddi_titlStmt( ddi_titl("test") ) ), generate_stdyInfo(dat) ) ) ``` ## Mixed Content DDI elements The following DDI nodes are mixed content which means that they can attributes, elements, and text - `ddi_universe`, `ddi_srcOrig`, `ddi_sampProc`, `ddi_timeMeth`, `ddi_collMode`, `ddi_resInstru`, `ddi_relMat`, `ddi_relPubl`, `ddi_relStdy`, `ddi_othRefs`, `ddi_anlyUnit`, `ddi_dataKind`, `ddi_geogCover`, `ddi_geogUnit`, `ddi_nation`, `ddi_anlysUnit`, and `ddi_qstn`. You can treat mixed content nodes similar to regular branch nodes but with a character value along with child nodes. ```{r example4, eval = FALSE} mixed_content_cb <- ddi_codeBook( ddi_stdyDscr( ddi_citation( ddi_titlStmt( ddi_titl("Study title") ) ), ddi_othrStdyMat( ddi_relMat( "description of related material", ddi_citation( ddi_titlStmt( ddi_titl("Title of Related Material") ) ) ) ) ) ) ``` ## Validate against the DDI-Codebook schema ```{r validate, eval = FALSE} codebook %>% validate_codebook() #> [1] TRUE #> attr(,"errors") #> character(0)` ``` ## Export as XML Use the `as_xml()` and `xml2::write_xml` functions to convert to xml and export. If using mixed content nodes you have to replace the html special entities with "<" and ">" before using `xml2::write_xml`. ```{r export, eval = FALSE} library(xml2) write_xml(as_xml(codebook), "codebook.xml") # for codebooks with mixed content elements xml <- as_xml(mixed_content_cb) xml <- gsub("<", "<", xml) xml <- gsub(">", ">", xml) # gsub converts xml variable to string so we have to change it back to xml xml <- xml2::read_xml(xml) write_xml(xml, "codebook.xml") ```