Distribution based truncation for variable selection in subspace methods for multivariate regression

Publisert 2013

Les på engelsk

Publikasjonsdetaljer

Tidsskrift : Chemometrics and Intelligent Laboratory Systems , vol. 122 , p. 103–111 , 2013

Utgiver : Elsevier

Internasjonale standardnummer :
Trykt : 0169-7439
Elektronisk : 1873-3239

Publikasjonstype : Vitenskapelig artikkel

Bidragsytere : Liland, Kristian Hovde; Høy, Martin; Martens, Harald; Sæbø, Solve

Lenker :
ARKIV : http://hdl.handle.net/11250/24...
DOI : doi.org/10.1016/j.chemolab.201...

Har du spørsmål om noe vedrørende publikasjonen, kan du kontakte Nofimas bibliotekleder.

Kjetil Aune
Bibliotekleder
kjetil.aune@nofima.no

Sammendrag

Analysis of data containing a vast number of features, but only a limited number of informative ones, requires methods that can separate true signal from noise variables. One class of methods attempting this is the sparse partial least squares methods for regression (sparse PLS). This paper aims at improving the theoretical foundation, speed and robustness of such methods. A general justification of truncation of PLS loading weights is achieved through distribution theory and the central limit theorem. We also introduce a quick plug-in based truncation procedure based on a novel application of theory intended for analysis of variance for experiments without replicates. The result is a versatile and intuitive method that performs component-wise variable selection very efficiently and in a less ad hoc manner than existing methods. Prediction performance is on par with existing methods, while robustness is ensured through a better theoretical foundation.