Konferansebidrag og faglig presentasjon » Vitenskapelig foredrag
(Towards) a tool-box of methods for multi-matrix modelling
PLS07'; Matforsk, Ås, 2007-09-05–2007-09-07
Sæbø, Solve; Kohler, Achim; Kohler, Achim; Næs, Tormod; Martens, Harald
New measurement technology is steadily increasing our access to multi-variable data sources. A prominent example is the bioinformatic industry which generates high-dimensional data blocks in the areas of genomics, transcriptomics, proteomics, metabolomics and phenomics. In some cases the various methods may be used to study the same biological samples, which makes it natural to relate the blocks of data to one-another. It may also desirable to relate the measurement variables (genes, proteins, metabolites etc) to matrices expressing background knowledge on these variables, e.g. knowledge on gene-gene dependencies in the regulation of biochemical processes. There are several methods for exploring and analyzing single blocks and the relation between pairs of blocks. Methods have also been developed for multi-block analyzes (Serial-PLS , Multiblock-PLS , L-LPS , domino-PLS ). The large number of data sources increases the possibilities for discovering consistent patterns of covariation in the system, but it also increases the number of modelling choices. There are several aspects that need to be carefully considered in multi-matrix analyzes. For instance, how the latent variables should be extracted. Typically, the methods are based on iterative algorithms involving eigen-structure analyses and deflation of the data blocks with respect to previously identified latent variables. The way the latent variables are used in the deflation step is reflecting the focus and the objective of the study. In PLSR it is customary to use the score-vector of the predictor matrix to deflate both the response matrix and the predictor matrix itself. This reflects the intension of using the variation in the predictors to describe the response variation. However, a more explorative analysis of the common patterns of covariation between the two matrices would perhaps involve deflation of the two matrices by their respective score-vectors in a kind of «local deflation» (e.g. as in ). The block deflation options are increasing in number with the number of blocks in the system, but given a clearly defined objective of the study it is possible to find natural strategies for data block deflations. In this process choices must also be made upon aspects like orthogonality of scores, variable centering and weighting. In this paper we will work towards a general «language» for multi-matrix data modelling by defining a set of rules for inter-block relations. The intension is to provide the necessary toolbox of methods for multi-matrix and multi-directional data modelling. The tool-box includes traditional elements like bi-linear modelling of individual matrices (PCA), relevance-coupling between pairs of matrices (PLSR), two-directional «corner» modelling (L-PLS), multi-way modelling (Tucker-3 , N-way-PLS  and N-way L-PLS ) and multivariate low-rank dynamic modelling (AR-PLS), leading up to multi-matrix generalizations (DAG-based Domino-PLS and cyclic Domino-PLS).