Tidsskrift: Chemometrics and Intelligent Laboratory Systems, vol. 104, p. 140–153–14, 2010
Open Access: none
As systems biology develops, various types of high-throughput -omics data become rapidly available. An increasing challenge is to analyze such massive data, interpret the results and validate the findings. Data analysis for most of the omics-techniques is in a fledgling immature stage. Alone the dimensionality of the data tables calls for new ways to reveal structure in the data, without cognitive overflow and excessive false discovery rate. Multi-block methods have been developed and adapted in order to find common variation patterns in data and depict these findings on graphical displays while providing tools to enhance the interpretation of the outcomes. In particular, multi-block methods based on latent variables are powerful tools to study block and global variation patterns, e.g. by inspecting block and global score plots. These methods can be used to achieve a graphical overview over sample and variable variation patterns in an efficient way. However, a visual detection of patterns may be subjective and, therefore, there is a need for validation tools. In this paper tools for validation of visually identified patterns in multi-block results are presented. Cross-validated estimates of Root Mean Square Error (RMSE) for block results are introduced for estimating the number of relevant PCs of the Consensus Principal Component Analysis (CPCA) models. Furthermore, important variables are identified by approximate t-tests based on Procrustes-corrected jackknifing. For the assessment of the stability of score patterns, block stability plots are introduced. Outliers can be revealed graphically on block and global level by stability plots. (C) 2010 Elsevier B.V. All rights reserved.