
一组空气污染数据的主成分分析.doc
14页1一组空气污染数据的主成分分析一组空气污染数据的主成分分析【说明】下面的多元统计分析练习题摘自 R.A. Johnson 等编写的《应用多元统计分析 (第五版) 》 ,原书为:Richard A. Johnson and Dean W. Wichern. Applied Multivariate Statistical Analysis (5th Ed). Pearson Education, Inc. 2003我看的是中国统计出版社(China Statistics Press)2003 年发行的影印本 第一题为原书第 1.6 题,即第 1 章的第 6 题,第二题为原书第 8.12 题,即第 8 章的第 12 题 第二题用的是第一题的数据1 习题习题1.6. The data in Table 1.5 are 42 measurements on air-pollution variables recorded at 12:00 noon in the Los Angeles area on different days. (a) Plot the marginal dot diagrams for all the variables. (b) Construct the , Sn, and R arrays, and interpret the entries in R. xTABLE 1.5 AIR-POLLUTION DATAWind (x1)Solar radiation (x2) CO (x3)NO (x4)NO2 (x5)O3 (x6)HC (x7)898721282 710743953 710343563 1088528154 691428103 8905212124 9847412155 5726421144 7825111113 864521394 671541033 691421273 7727418103 1070421173 1072418103 977419103 87641773 871531644 967421323 96933953 1062531444298842763 8804213114 53033523 6835110234 88432763 6784211113 879217103 66243983 103731723 871411073 752411284 54865843 6754110243 103541692 885419102 586316122 5867213182 779749253 77952862 6686211143 84043652Source: Data courtesy of Professor G.C. Tiao.8.12. Consider the air-pollution data listed in Table 1.5. Your job is to summarize these data in fewer than p=7 dimensions if possible. Conduct a principal component analysis of the data using both the covariance matrix S and the correlation matrix R. What have you learned? Does it make any difference which matrix is chosen for analysis? Can the data be summarized in three or fewer dimensions? Can you interpret the principal components?2 部分解答部分解答2.1 部分统计参数部分统计参数利用 Excel 计算的平均值()和标准差xWindSolar radiationCONONO2O3HCAverag e7.573.8571434.5476192.190476210.0476199.4047619 3.0952381 Stdev 1.581138817.3353881.23372091.08735743.37098375.5658345 0.6917466Excel 给出的协方差矩阵 SWindSolar CONONO2O3HC3radiationWind2.4404762 Solar radiation-2.714286 293.36054 CO-0.369048 3.8163265 1.4858277 NO-0.452381-1.353741 0.65759641.154195 NO2-0.571429 6.6020408 2.2596372 1.0623583 11.092971 O3-2.178571 30.057823 2.7545351 -0.791383 3.052154230.24093 HC0.1666667 0.60884350.138322 0.1723356 1.0192744 0.5804989 0.4671202Excel 给出相关系数矩阵 RWindSolar radiationCONONO2O3HCWind1 Solar radiation-0.1014421 CO-0.1938030.18279341 NO-0.269543-0.0735690.50215251 NO2-0.1098250.1157320.55658380.29689811 O3-0.2535930.31912370.4109288-0.1339520.16664221 HC0.15609790.05201040.16603230.23470430.44776780.15445061从相关系数矩阵可以看出,CO 与 NO、NO2相关性明显,O3与 Solar radiation、CO 相 关性明显。
后面的主成分分析将 CO 与 NO、NO2归并到一个主成分,将 O3与 Solar radiation 归并到一个主成分,将 HC、Wind 归并到一个主成分HC 与 Wind 的相关系数并 不高,但从正相关的角度看,二者的数值倒是最高的方差极大正交旋转之后,HC 与 CO、NO、NO2归并到一个因子,因为 HC 与 NO2的相关系数较高,与 CO、NO 的相关系 数高于其他变量2.2 主成分分析之一主成分分析之一————数据未经标准化数据未经标准化下面是从相关矩阵 R 出发,SPSS 给出的结果原始数据未经标准化所谓从 R 出发, 就是在 SPSS 的 Factor Analysis: Extraction—Analysis 选项中选中 Correlation MatrixSPSS 给出的相关系数矩阵(Correlation Matrix) ,与 Excel 计算的结果一样Correlation Matrix1.000-.101-.194-.270-.110-.254.156-.1011.000.183-.074.116.319.052-.194.1831.000.502.557.411.166-.270-.074.5021.000.297-.134.235-.110.116.557.2971.000.167.448-.254.319.411-.134.1671.000.154.156.052.166.235.448.1541.000WINDSolar radiationCONONO2O3HCWINDSolar radiationCONONO2O3HC4公因子方差(Communalities)表如下。
公因子方差变化于 0.544~0.795 之间,相差不 是很大但是,公因子方差值没有达到 0.8 以上的,可见每一个变量体现在三个主成分中 的信息都不超过 80%Communalities1.000.737 1.000.544 1.000.725 1.000.795 1.000.681 1.000.722 1.000.722WIND Solar radiation CO NO NO2 O3 HCInitialExtractionExtraction Method: Principal Component Analysis.特征根与方差贡献(Total Variance Explained)如下表可见提取三个主成分可以解释 原来 7 格变量的 70.384%Total Variance Explained2.33733.38333.3832.33733.38333.3831.38619.80053.1831.38619.80053.1831.20417.20170.3841.20417.20170.384.72710.38780.771.6539.33590.106.5377.66797.773.1562.227100.000Component 1234567Total% of VarianceCumulative %Total% of VarianceCumulative %Initial EigenvaluesExtraction Sums of Squared LoadingsExtraction Method: Principal Component Analysis.5Scree PlotComponent Number7654321Eigenvalue2.52.01.51.0.50.0主成分载荷矩阵(Component Matrix)见下表。
Component Matrixa-.362.328.706 .314-.620.246 .842-8.03E-03-.125 .577.512-.447 .761.235.216 .496-.667.175 .488.362.594WIND Solar radiation CO NO NO2 O3 HC123ComponentExtraction Method: Principal Component Analysis.3 components extracted.a. 将上表从 SPSS 中复制到 Excel 中,进行涂色分类,结果如下表所示Component123WIND-0.362020.3278090.706084Solar radiation0.31424-0.619970.24631CO0.842417-0.00803-0.12466NO0.5772430.511736-0.44671NO20.7612940.2351830.215682O30.496126-0.667490.175399HC0.4882570.3624660.593692主成分分类如下:6第一主成分的主要相关变量:CO、NO、NO2。
第二主成分的主要相关变量:Solar radiation、O3 第三主成分的主要相关变量:Wind、HC 在主成分载荷图(Component Plot)中,三个变量分别落入三个不同的主成分代表的区 域主成分得分表如下最后一栏对几个典型的样本给出了简单的解释注意解释的时候 看清主成分载荷矩阵中载荷值的正负号Casesf1f2f3典型的说明S10.61591-0.8186-0.38418S20.03194-0.36015-0.26343S3-0.34752-0.54481-0.49701S40.2425-0.302931.80367样本 4 代表的区域 Wind、HC 污染严重S5-0.12729-0.91941-0.4042S60.72612-0.192781.21954S72.036860.899821.4607S82.573090.77732-0.34124样本 7 和 8 代表的区域与 CO、NO、NO2污染有 明显的关系S90.09802-0.817360.3033。












