step_opls_denoise creates a 'specification' of a recipe step that will filter the first orthogonal component of the OPLS transfomation on the columns.

step_opls_denoise(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  outcome = NULL,
  Wortho = NULL,
  Portho = NULL,
  skip = FALSE,
  id = rand_id("opls_denoise")
)

# S3 method for step_opls_denoise
tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose which variables are affected by the step. See selections() for more details. For the tidy method, these are not currently used.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

outcome

When a single outcome is available, character string or call to dplyr::vars() can be used to specify a single outcome variable.

Wortho

A vector a weights for the first orthogonal component. This is NULL until computed by prep.recipe().

Portho

A vector of loadings for the first orthogonal component. This is NULL until computed by prep.recipe().

skip

A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations

id

A character string that is unique to this step to identify it.

x

A step_opls_denoise object.

Value

An updated version of recipe with the new step added to the sequence of existing steps (if any). For the tidy method, a tibble with columns terms (the selectors or variables selected), value (the standard deviations and means), and statistic for the type of value.

Details

Orthogonal Projection to Latent Structurees (OPLS) allows the separation of the predictor variations that are correlated and orthogonal to the response. This allows to remove systematic variation that are not correlated to the response.

The OPLS algorithm is implemented only for binary outcomes!

OPLS calculation uses the implementation of the R package: https://bioconductor.org/packages/release/bioc/html/ropls.html

References

Trygg, J., & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16(3), 119–128. doi:10.1002/cem.695 https://onlinelibrary.wiley.com/doi/abs/10.1002/cem.695

Thévenot, E. A., Roux, A., Xu, Y., Ezan, E., & Junot, C. (2015). Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses. Journal of Proteome Research, 14(8), 3322–3335. doi:10.1021/acs.jproteome.5b00354  https://pubs.acs.org/doi/10.1021/acs.jproteome.5b00354

Examples

#> ── Attaching packages ────────────────────────────────────── tidymodels 0.1.2 ──
#> broom 0.7.6 purrr 0.3.4 #> dials 0.0.9 recipes 0.1.15 #> dplyr 1.0.5 rsample 0.0.9 #> ggplot2 3.3.3 tidyr 1.1.3 #> infer 0.5.4 tune 0.1.3 #> modeldata 0.1.0 workflows 0.2.2 #> parsnip 0.1.5 yardstick 0.0.8
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ── #> purrr::discard() masks scales::discard() #> tidyr::extract() masks tidySpectR::extract() #> dplyr::filter() masks stats::filter() #> dplyr::lag() masks stats::lag() #> recipes::step() masks stats::step()
library(tidySpectR) data(sacurine) attach(sacurine) genderFc <- sampleMetadata[, "gender"] urinedata <- dataMatrix %>% cbind(genderFc) %>% as_tibble() %>% add_column(id = rownames(dataMatrix), .before = 1) %>% select(-id) rec <- recipe(urinedata, genderFc ~.) %>% step_normalize(all_predictors()) %>% step_opls_denoise(all_predictors(), outcome = "genderFc") tidy(rec)
#> # A tibble: 2 x 6 #> number operation type trained skip id #> <int> <chr> <chr> <lgl> <lgl> <chr> #> 1 1 step normalize FALSE FALSE normalize_1OPbL #> 2 2 step opls_denoise FALSE FALSE opls_denoise_547042
rec %>% prep() %>% bake(NULL)
#> # A tibble: 183 x 110 #> genderFc `(2-methoxyethox… `(gamma)Glu-Leu… `1-Methyluric a… `1-Methylxanthi… #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 -0.770 -1.20 -0.102 -0.378 #> 2 2 0.160 0.0410 -0.795 -0.762 #> 3 1 -0.180 -0.104 0.273 0.780 #> 4 1 -1.37 0.147 1.05 0.723 #> 5 1 0.0986 1.27 -0.210 -0.263 #> 6 1 0.555 0.328 0.687 0.774 #> 7 1 -0.572 -0.224 0.438 0.635 #> 8 1 -0.384 1.38 -0.478 -0.584 #> 9 2 -0.521 -1.89 0.272 -0.244 #> 10 1 0.140 -1.51 1.02 1.14 #> # … with 173 more rows, and 105 more variables: 1,3-Dimethyluric acid <dbl>, #> # 1,7-Dimethyluric acid <dbl>, 2-acetamido-4-methylphenyl acetate <dbl>, #> # 2-Aminoadipic acid <dbl>, 2-Hydroxybenzyl alcohol <dbl>, #> # 2-Isopropylmalic acid <dbl>, 2-Methylhippuric acid <dbl>, #> # 2,2-Dimethylglutaric acid <dbl>, 3-Hydroxybenzyl alcohol <dbl>, #> # 3-Hydroxyphenylacetic acid <dbl>, #> # 3-Indole carboxylic acid glucuronide <dbl>, #> # 3-Methyl-2-oxovaleric acid <dbl>, 3-Methylcrotonylglycine <dbl>, #> # 3,3-Dimethylglutaric acid <dbl>, 3,4-Dihydroxybenzeneacetic acid <dbl>, #> # 3,5-dihydroxybenzoic acid/3,4-dihydroxybenzoic acid <dbl>, #> # 3,7-Dimethyluric acid <dbl>, 4-Acetamidobutanoic acid isomer 2 <dbl>, #> # 4-Acetamidobutanoic acid isomer 3 <dbl>, 4-Hydroxybenzoic acid <dbl>, #> # 4-Methylhippuric acid/3-Methylhippuric acid <dbl>, #> # 5-Hydroxyindoleacetic acid <dbl>, 5-Sulfosalicylic acid <dbl>, #> # 6-(2-hydroxyethoxy)-6-oxohexanoic acid <dbl>, #> # 6-(carboxymethoxy)-hexanoic acid <dbl>, 9-Methylxanthine <dbl>, #> # Acetaminophen glucuronide <dbl>, Acetylphenylalanine <dbl>, #> # alpha-N-Phenylacetyl-glutamine <dbl>, Aminosalicyluric acid <dbl>, #> # Asp-Leu/Ile isomer 1 <dbl>, Asp-Leu/Ile isomer 2 <dbl>, #> # Aspartic acid <dbl>, Azelaic acid <dbl>, Benzoic acid isomer <dbl>, #> # Chenodeoxycholic acid isomer <dbl>, Cinnamoylglycine <dbl>, #> # Citric acid <dbl>, Dehydroepiandrosterone 3-glucuronide <dbl>, #> # Dehydroepiandrosterone sulfate <dbl>, Deoxyhexose <dbl>, #> # Dimethylguanosine <dbl>, FMNH2 <dbl>, Fumaric acid <dbl>, #> # Gentisic acid <dbl>, Glu-Val <dbl>, Gluconic acid and/or isomers <dbl>, #> # Glucuronic acid and/or isomers <dbl>, Glyceric acid <dbl>, #> # Glycocholic acid isomer 2 <dbl>, Glycocholic acid isomer 3 <dbl>, #> # Heptylmalonic acid <dbl>, Hexanoylglycine <dbl>, Hippuric acid <dbl>, #> # Hydroxybenzyl alcohol isomer <dbl>, Hydroxyphenyllactic acid <dbl>, #> # Hydroxysuberic acid isomer 1 <dbl>, Hydroxysuberic acid isomer 2 <dbl>, #> # Isovalerylalanine isomer <dbl>, Ketoleucine <dbl>, Kynurenic acid <dbl>, #> # Malic acid <dbl>, Methoxysalicylic acid isomer <dbl>, #> # Methyl (hydroxymethyl)pyrrolidine-carboxylate/Methyl (hydroxy)piperidine-carboxylate <dbl>, #> # Methylinosine <dbl>, Mevalonic acid isomer 1 <dbl>, #> # Monoethyl phthalate <dbl>, N-Acetyl-aspartic acid <dbl>, #> # N-Acetylisoleucine <dbl>, N-Acetylleucine <dbl>, N-Acetyltryptophan <dbl>, #> # N-Acetyltryptophan isomer 3 <dbl>, N2-Acetylaminoadipic acid <dbl>, #> # N4-Acetylcytidine <dbl>, Nicotinuric acid isomer <dbl>, #> # Ortho-Hydroxyphenylacetic acid <dbl>, Oxoglutaric acid <dbl>, #> # p-Anisic acid <dbl>, p-Hydroxyhippuric acid <dbl>, #> # p-Hydroxymandelic acid <dbl>, p-Hydroxyphenylacetic acid <dbl>, #> # Pantothenic acid <dbl>, Pentose <dbl>, Phe-Tyr-Asp (and isomers) <dbl>, #> # Porphobilinogen <dbl>, Pyridoxic acid isomer 1 <dbl>, #> # Pyridylacetylglycine <dbl>, Pyrocatechol sulfate <dbl>, #> # Pyroglutamic acid <dbl>, Pyrroledicarboxylic acid <dbl>, #> # Pyruvic acid <dbl>, Quinic acid <dbl>, Salicylic acid <dbl>, #> # Sebacic acid <dbl>, Suberic acid <dbl>, Sulfosalicylic acid isomer <dbl>, #> # Taurine <dbl>, Testosterone glucuronide <dbl>, #> # Tetrahydrohippuric acid <dbl>, Threo-3-Phenylserine <dbl>, …