Python高维数据分析
¥43.00定价
作者: 赵煜辉
出版时间:2020-08
出版社:西安电子科技大学出版社
- 西安电子科技大学出版社
- 9787560655772
- 1版
- 349548
- 平装
- 16开
- 2020-08
- 408
- 276
- TP311.561
- 数理科学和化学
内容简介
本书从矩阵计算如特征值分解和奇异值分解出发,讨论了正规方程的最小二乘法模型引出欠秩线性方程组的求解方法问题;然后介绍了两种有损的降维方法,即主成分分析(主成分回归)和偏最小二乘回归,包括模型、 算法和多个实例,并扩展到线性回归的正则化方法,给出了岭回归和Lasso的原理算法和实例;最后通过红外光谱的标定迁移实例将线性模型扩展到迁移学习领域。
本书每章都有基于Python语言和Sklearn机器学习库的红外光谱数据集分析的实例。红外光谱集是关于物质吸光率的纯数据,可以与其标签标示的数据物质浓度直接进行回归分析,读者在阅读中可以把精力最大限度地集中在高维数据的建模、 算法实现和分析过程上。
本书既可作为信息管理和信息系统专业、 计算机相关专业和大数据专业的教学用书,也可作为从事光谱分析、 化学分析的工程人员及化学计量学研究人员的参考书,还适合对数据分析和研究感兴趣的其他Python工程师学习阅读。本书引用的原始文献和数据对上述人员是非常有帮助的。
本书每章都有基于Python语言和Sklearn机器学习库的红外光谱数据集分析的实例。红外光谱集是关于物质吸光率的纯数据,可以与其标签标示的数据物质浓度直接进行回归分析,读者在阅读中可以把精力最大限度地集中在高维数据的建模、 算法实现和分析过程上。
本书既可作为信息管理和信息系统专业、 计算机相关专业和大数据专业的教学用书,也可作为从事光谱分析、 化学分析的工程人员及化学计量学研究人员的参考书,还适合对数据分析和研究感兴趣的其他Python工程师学习阅读。本书引用的原始文献和数据对上述人员是非常有帮助的。
目录
Chapter 1 Basis of Matrix Calculation 1
1.1 Fundamental Concepts 1
1.1.1 Notation 1
1.1.2 “BiggerBlock” Interpretations of Matrix Multiplication 1
1.1.3 Fundamental Linear Algebra 3
1.1.4 Four Fundamental Subspaces of a Matrix 7
1.1.5 Vector Norms 8
1.1.6 Determinants 9
1.1.7 Properties of Determinants 10
1.2 The Most Basic Matrix Decomposition 11
1.2.1 Gaussian Elimination 11
1.2.2 The LU Decomposition 13
1.2.3 The LDM Factorization 13
1.2.4 The LDL Decomposition for Symmetric Matrices 13
1.2.5 Cholesky Decomposition 14
1.2.6 Applications and Examples of the Cholesky Decomposition 14
1.2.7 Eigendecomposition 16
1.2.8 Matrix Norms 24
1.2.9 Covariance Matrices 26
1.3 Singular Value Decomposition (SVD) 29
1.3.1 Orthogonalization 29
1.3.2 Existence Proof of the SVD 29
1.3.3 Partitioning the SVD 32
1.3.4 Properties and Interpretations of the SVD 33
1.3.5 Relationship between SVD and ED 35
1.3.6 Ellipsoidal Interpretation of the SVD 37
1.3.7 An Interesting Theorem 38
1.4 The Quadratic Form 39
1.4.1 Quadratic Form Theory 39
1.4.2 The Gaussian MultiVariate Probability Density Function 42
1.4.3 The Rayleigh Quotient 44
Chapter 2 The Solution of Least Squares Problems 46
2.1 Linear Least Squares Estimation 46
2.1.1 Example: Autoregressive Modelling 46
2.1.2 The LeastSquares Solution 48
2.1.3 Interpretation of the Normal Equations 50
2.1.4 Properties of the LS Estimate 51
2.1.5 Linear LeastSquares Estimation and the Cramer Rao Lower Bound 55
2.2 A Generalized “PseudoInverse” Approach to Solving the Leastsquares Problem 57
2.2.1 Least Squares Solution Using the SVD 57
2.2.2 Interpretation of the PseudoInverse 60
Chapter 3 Principal Component Analysis 62
3.1 Introductory Example 62
3.2 Theory 68
3.2.1 Taking Linear Combinations 68
3.2.2 Explained Variation 68
3.2.3 PCA as a Model 69
3.2.4 Taking More Components 70
3.3 History of PCA 71
3.4 Practical Aspects 71
3.4.1 Preprocessing 71
3.4.2 Choosing the Number of Components 72
3.4.3 When Using PCA for Other Purposes 79
3.4.4 Detecting Outliers 79
References 86
3.5 Sklearn PCA 87
3.5.1 Source Code 87
3.5.2 Examples 93
3.6 Principal Component Regression 94
3.6.1 Source Code 95
3.6.2 KFold CrossValidation 98
3.6.3 Examples 98
3.7 Subspace Methods for Dynamic Model Estimation in PAT Applications 106
3.7.1 Introduction 106
3.7.2 Theory 107
3.7.3 State Space Models in Chemometrics 109
3.7.4 Milk Coagulation Monitoring 110
3.7.5 State Space Based Monitoring 111
3.7.6 Results 112
3.7.7 Concluding remarks 116
3.7.8 Appendix 116
References 119
Chapter 4 Partial Least Squares Analysis 121
4.1 Basic Concept 121
4.1.1 Partial Least Squares 122
4.1.2 Form of Partial Least Squares 123
4.1.3 PLS Regression 125
4.1.4 Statistic 126
Reference 127
4.2 NIPALS and SIMPLS Algorithm 129
4.2.1 NIPALS 129
4.2.2 SIMPLS 132
References 137
4.3 Programming Method of Standard Partial Least Squares 138
4.3.1 Crossvalidation 138
4.3.2 Procedure of NIPALS 146
4.4 Example Application 150
4.4.1 Demo of PLS 150
4.4.2 Corn Dataset 157
4.4.3 Wheat Dataset 161
4.4.4 Pharmaceutical Tablet Dataset 163
4.5 Stack Partial Least Squares 165
4.5.1 Introduction 165
4.5.2 Theory of Stack Partial Least Squares 166
4.5.3 Demo of SPLS 170
4.5.4 Experiments 177
References 185
Chapter 5 Regularization 187
5.1 Regularization 187
5.1.1 Classification 187
5.1.2 Tikhonov Regularization 188
5.1.3 Regularizers for Sparsity 188
5.1.4 Other Uses of Regularization in Statistics and Machine Learning 189
5.2 Ridge Regression: Biased Estimation for Nonorthogonal Problems 190
5.2.1 Properties of Best Linear Unbiased Estimation 191
5.2.2 Ridge Regression 192
5.2.3 The Ridge Trace 193
5.2.4 Mean Square Error Properties of Ridge Regression 195
5.2.5 A General Form of Ridge Regression 199
5.2.6 Relation to Other Work in Regression 199
5.2.7 Selecting a Better Estimate of β 200
References 201
5.3 Lasso 202
5.3.1 Introduction 202
5.3.2 Theory of the Lasso 203
References 208
5.4 The Example of Ridge Regression and Lasso Regression 208
5.4.1 Example 208
5.4.2 Practical Example 211
5.5 Sparse PCA 216
5.5.1 Introduction 216
5.5.2 Motivation and Method Details 218
5.5.3 SPCA for p ≥ n and Gene Expression Arrays 222
5.5.4 Demo of SPCA 223
References 227
Chapter 6 Transfer Method 228
6.1 Calibration Transfer of Spectral Models[1] 228
6.1.1 Introduction 228
6.1.2 Calibration Transfer Setting 229
6.1.3 Related Work 231
6.1.4 New or Adapted Methods 235
6.1.5 Standardfree Alternatives to Methods Requiring Transfer Standards 236
References 237
6.2 PLS Subspace Based Calibration Transfer for NIR Quantitative Analysis 240
6.2.1 Calibration Transfer Method 241
6.2.2 Experimental 242
6.2.3 Results and Discussion 243
6.2.4 Conclusion 252
References 252
6.3 Calibration Transfer Based on Affine Invariance for NIR without Standard Samples 252
6.3.1 Theory 253
6.3.2 Experimental 257
6.3.3 Results and Discussion 258
6.3.4 Conclusions 266
1.1 Fundamental Concepts 1
1.1.1 Notation 1
1.1.2 “BiggerBlock” Interpretations of Matrix Multiplication 1
1.1.3 Fundamental Linear Algebra 3
1.1.4 Four Fundamental Subspaces of a Matrix 7
1.1.5 Vector Norms 8
1.1.6 Determinants 9
1.1.7 Properties of Determinants 10
1.2 The Most Basic Matrix Decomposition 11
1.2.1 Gaussian Elimination 11
1.2.2 The LU Decomposition 13
1.2.3 The LDM Factorization 13
1.2.4 The LDL Decomposition for Symmetric Matrices 13
1.2.5 Cholesky Decomposition 14
1.2.6 Applications and Examples of the Cholesky Decomposition 14
1.2.7 Eigendecomposition 16
1.2.8 Matrix Norms 24
1.2.9 Covariance Matrices 26
1.3 Singular Value Decomposition (SVD) 29
1.3.1 Orthogonalization 29
1.3.2 Existence Proof of the SVD 29
1.3.3 Partitioning the SVD 32
1.3.4 Properties and Interpretations of the SVD 33
1.3.5 Relationship between SVD and ED 35
1.3.6 Ellipsoidal Interpretation of the SVD 37
1.3.7 An Interesting Theorem 38
1.4 The Quadratic Form 39
1.4.1 Quadratic Form Theory 39
1.4.2 The Gaussian MultiVariate Probability Density Function 42
1.4.3 The Rayleigh Quotient 44
Chapter 2 The Solution of Least Squares Problems 46
2.1 Linear Least Squares Estimation 46
2.1.1 Example: Autoregressive Modelling 46
2.1.2 The LeastSquares Solution 48
2.1.3 Interpretation of the Normal Equations 50
2.1.4 Properties of the LS Estimate 51
2.1.5 Linear LeastSquares Estimation and the Cramer Rao Lower Bound 55
2.2 A Generalized “PseudoInverse” Approach to Solving the Leastsquares Problem 57
2.2.1 Least Squares Solution Using the SVD 57
2.2.2 Interpretation of the PseudoInverse 60
Chapter 3 Principal Component Analysis 62
3.1 Introductory Example 62
3.2 Theory 68
3.2.1 Taking Linear Combinations 68
3.2.2 Explained Variation 68
3.2.3 PCA as a Model 69
3.2.4 Taking More Components 70
3.3 History of PCA 71
3.4 Practical Aspects 71
3.4.1 Preprocessing 71
3.4.2 Choosing the Number of Components 72
3.4.3 When Using PCA for Other Purposes 79
3.4.4 Detecting Outliers 79
References 86
3.5 Sklearn PCA 87
3.5.1 Source Code 87
3.5.2 Examples 93
3.6 Principal Component Regression 94
3.6.1 Source Code 95
3.6.2 KFold CrossValidation 98
3.6.3 Examples 98
3.7 Subspace Methods for Dynamic Model Estimation in PAT Applications 106
3.7.1 Introduction 106
3.7.2 Theory 107
3.7.3 State Space Models in Chemometrics 109
3.7.4 Milk Coagulation Monitoring 110
3.7.5 State Space Based Monitoring 111
3.7.6 Results 112
3.7.7 Concluding remarks 116
3.7.8 Appendix 116
References 119
Chapter 4 Partial Least Squares Analysis 121
4.1 Basic Concept 121
4.1.1 Partial Least Squares 122
4.1.2 Form of Partial Least Squares 123
4.1.3 PLS Regression 125
4.1.4 Statistic 126
Reference 127
4.2 NIPALS and SIMPLS Algorithm 129
4.2.1 NIPALS 129
4.2.2 SIMPLS 132
References 137
4.3 Programming Method of Standard Partial Least Squares 138
4.3.1 Crossvalidation 138
4.3.2 Procedure of NIPALS 146
4.4 Example Application 150
4.4.1 Demo of PLS 150
4.4.2 Corn Dataset 157
4.4.3 Wheat Dataset 161
4.4.4 Pharmaceutical Tablet Dataset 163
4.5 Stack Partial Least Squares 165
4.5.1 Introduction 165
4.5.2 Theory of Stack Partial Least Squares 166
4.5.3 Demo of SPLS 170
4.5.4 Experiments 177
References 185
Chapter 5 Regularization 187
5.1 Regularization 187
5.1.1 Classification 187
5.1.2 Tikhonov Regularization 188
5.1.3 Regularizers for Sparsity 188
5.1.4 Other Uses of Regularization in Statistics and Machine Learning 189
5.2 Ridge Regression: Biased Estimation for Nonorthogonal Problems 190
5.2.1 Properties of Best Linear Unbiased Estimation 191
5.2.2 Ridge Regression 192
5.2.3 The Ridge Trace 193
5.2.4 Mean Square Error Properties of Ridge Regression 195
5.2.5 A General Form of Ridge Regression 199
5.2.6 Relation to Other Work in Regression 199
5.2.7 Selecting a Better Estimate of β 200
References 201
5.3 Lasso 202
5.3.1 Introduction 202
5.3.2 Theory of the Lasso 203
References 208
5.4 The Example of Ridge Regression and Lasso Regression 208
5.4.1 Example 208
5.4.2 Practical Example 211
5.5 Sparse PCA 216
5.5.1 Introduction 216
5.5.2 Motivation and Method Details 218
5.5.3 SPCA for p ≥ n and Gene Expression Arrays 222
5.5.4 Demo of SPCA 223
References 227
Chapter 6 Transfer Method 228
6.1 Calibration Transfer of Spectral Models[1] 228
6.1.1 Introduction 228
6.1.2 Calibration Transfer Setting 229
6.1.3 Related Work 231
6.1.4 New or Adapted Methods 235
6.1.5 Standardfree Alternatives to Methods Requiring Transfer Standards 236
References 237
6.2 PLS Subspace Based Calibration Transfer for NIR Quantitative Analysis 240
6.2.1 Calibration Transfer Method 241
6.2.2 Experimental 242
6.2.3 Results and Discussion 243
6.2.4 Conclusion 252
References 252
6.3 Calibration Transfer Based on Affine Invariance for NIR without Standard Samples 252
6.3.1 Theory 253
6.3.2 Experimental 257
6.3.3 Results and Discussion 258
6.3.4 Conclusions 266