You are here: CTSPedia>CTSpedia Web>ResearchTopics>PartialLeastSquares (31 May 2011, MaryBanach)EditAttach

Partial least squares can be viewed as a dimension reduction method that reduces the dimension of the predictor space by constructing a sequence of linear combinations of the original predictor variables. If we denote the predictors as X1, X2, . . ., Xp, then PLS dimension reduction reduces the p-dimensional predictor space into a lower K-dimensional component space. Depending on the area of application, the constructed linear combinations are called components, factors or latent variables. The PLS components are constructed by maximizing the sample covariance between the response vector **y** and linear combination **Xc**, where X is the *n* -by- *p* predictor matrix for *n* observations and vector **c**. So each PLS components is a linear combination

T = w1X1 + w2X2 + . . . + w *p* X *p*

and each component is orthogonal to every other component.

The above PLS fit constructs 3 PLS components. The objects returned from fitting PLS are:>library(multtest) # to get Golub et al. (1999) leukemia data >data(golub) >X <- t(golub); dim(X) #[1] 38 3051 >y <- matrix(golub.cl) # sample labels: 27 ALL (0) and 11 AML (1) >pls.fit <- PLS(X, X, y, 3) # fit PLS

> names(pls.fit) [1] "W" "PVEX" "PVEY" "T1" "T2" "B" "V"T1 consists of the three PLS fitted components. The PLS header function describes in detail the function inputs and outputs. Below are plots of the PLS components with markers for each type of samples (ALL or AML):

The above PLS function provides the same results as the SAS implementation in PROC PLS[LINK to SAS reference]. Attached is a SAS macro for running a sample PLS: samplepls.txt.

PLS regression can also be implemented in R (pls [LINK - http://cran.r-project.org/] [LINK to section on PLS regression below.]

*Other examples*. PLS is often used in the field of chemometrics, with spectrometric calibration as a classic example. Here, spectrographic readings are taken at *p* frequencies or wavelengths on *n* samples of known concentrations. In clinical epidemiology, PLS dimension reduction has been applied to clinical variables such as dietary/nutrient intakes. Additional details can be found in the guide to SAS PROC PLS.

*PCA Implementation in SAS and R*. A standard R function to implement PCA is *princomp* when *n* > *p*. For cases where *p* > *n*, like genomics expression data, other functions can be used like *eigen* and *svd*. PCA can also be implemented in SAS using PROC PRINCOMP.

In principal components regression, the above description is applicable with the PLS components replaced by the principal components in the regression model.

Similar to PLSR, PLS classification involves using the constructed PLS components (possibly along with other clinical or demographic variables) in a classification procedure (second stage). There are many classification methods that can be used for this purpose, including linear or quadratic discriminant analysis, centroid methods, nearest neighbors, neural networks, etc. The book by Hastie et al. (2001)[LINK http://www-stat.stanford.edu/~tibs/ElemStatLearn/] may be useful to learn more about classification methods. See Nguyen and Rocke (2002)[LINK to reference] for examples of PLS cancer classification with genomic expression data.

- Standardization: In the PLS dimension reduction step, centering and/or scaling may be needed.
- Generalization and over-fitting: With sufficient data, a training (learning) data set and a test data set should be created to test how well the fitted model performs on new (test) data. For instance, the original data can be divided into 2/3 and 1/3 for training and testing. If this is not feasible, cross-validation (one one out or M-fold CV) is an alternative.
- In fitting the PLS using the PLS.r [LINK] function with training and test data:

>pls.fit <- PLS(X.train, X.test, y, 3) # fit 3-component PLSthe outputs T1 and T2 are the training and test components respectively. The PLS() function also can take more than one outcome variables (continuous or

- Danh V. Nguyen: UC Davis \x{2013} CTSpedia article and R functions. (February 14, 2009)
- Golubetal.(1999).Molecularclassificationofcancer:classdiscoveryandclasspredictionbygeneexpressionmonitoring,Science,Vol.286:531-537.http://www-genome.wi.mit.edu/MPR
- Hastie, T., Tibshirani, R., Friedman, J. (2001) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
- Nguyen Rocke (2002)Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18, 39-50.
- SAS PROC PLS. SAS User\x{2019}s Guide. Cary, N.C.

I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|

PNG | PLS_fig1.PNG | manage | 43.7 K | 16 Mar 2009 - 19:22 | CTSpediaAdmin | PLS fig 1 |

Topic revision: r7 - 31 May 2011 - 17:14:14 - MaryBanach

Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding CTSPedia? Send feedback

Ideas, requests, problems regarding CTSPedia? Send feedback