Cross-Validation - Development of a Statistical Model

6.3 Development of a Statistical Model

6.3.1 Cross-Validation

The question is, whether the linear model can only reproduce the data it has been constructed with or really capture the underlying physics aectingV_oc, J_sc and F F. Cross-validation, a testing practise in statistics, is thus used to check the quality of the model. During cross-validation the original data is partitioned and the data analysis and model development carried out on one subsample. The remaining subsamples are retained from this analysis and used later to test the model based on the initial analysis.

The model given by equations 6.14a, 6.14b and 6.14c is tested using the so-called leave-one-out cross-validation. As the name suggests, one observation vector is left out when doing the PCA for the model and then the model's predictions are tested on the left-out observation vector. This procedure is then repeated for every single observation vector in the data. The data used for the construction of the model consists of 36 observation vectors, i.e.

the data of 36 substrates. Thus the cross-validation is carried out 36 times.

The summary of the results of the cross-validation is shown in tables 6.14 and6.15. The full tables can be found in the appendix on pages198and 199.

The predictions in the table are all based on a model with PCs 15 to 17, but the predictions for substrate S 439. When leaving out S 439 during cross-validation, PC 17 of the PCA on the remaining data had too little contribution (<10⁻¹⁵) to V_oc, J_sc and F F, with a subsequent large error in the prediction. The values for S 439 thus validated with a model based on PCs 14 to 16.

The best predictions are obtained for V_oc, both when considering the devia-tions as percentage (table 6.14) and with respect to the standard deviation

164 CHAPTER 6. EXPERIMENTS AND ANALYSIS

Table 6.14: This table summarises the results from the cross-validation by showing the quality of predictions in terms of a percentage deviation from the measured result per substrate. The total number of substrates is 36. n gives the number of substrates, the properties of which have been predicted better than the indicated percent. The values for the power conversion eciency η are calculated from the predicted values of open circuit voltage V_oc, short circuit current densityJ_sc and ll factor F F. The predictions of V_ocare very good with all values lying within 10% of the measured values. The same data is shown in table 6.15 with deviations with respect to the standard deviations σ per substrate.

Voc Jsc F F η

Percent n Cum% n Cum% n Cum% n Cum%

<2.5% 24 67% 4 11% 13 36% 4 11%

<5% 32 89% 11 31% 14 39% 6 17%

<10% 36 100% 22 61% 22 61% 15 42%

<15% 36 100% 29 81% 25 69% 18 50%

<20% 36 100% 30 83% 27 75% 23 64%

<30% 36 100% 33 92% 30 83% 27 75%

≥30% 0 100% 3 100% 6 100% 9 100%

Table 6.15: This table summarises the results from the cross-validation by showing the quality of predictions in terms of the standard deviations of the measured values per substrate. The total number of substrates is 36. n gives the number of substrates, the properties of which have been predicted better than i times the standard deviation σ. The values for η are again calculated from the predicted values of V_oc, J_sc and F F. For 21 substrates, i.e. 58% of all, the predicted values for F F were within 2×σ. The same data is shown in table 6.14 with deviations given in percent.

V_oc J_sc F F η

i n Cum% n Cum% n Cum% n Cum%

<1 20 56% 7 19% 14 39% 7 19%

<1.5 24 67% 9 25% 17 47% 11 31%

<2 26 72% 16 44% 21 58% 15 42%

<2.5 30 83% 19 53% 22 61% 17 47%

<3 32 89% 22 61% 23 64% 24 67%

<4 34 94% 27 75% 27 75% 25 69%

≥4 2 100% 9 100% 9 100% 11 100%

6.3. DEVELOPMENT OF A STATISTICAL MODEL 165 per substrate (table 6.15). All predicted values are within 10% of the mea-sured values. However, because the standard deviations for theV_oc is always smaller than 5% (see table 6.5), the quality of predictions in terms of the standard deviation is lower. Still, predictions of V_oc, which are within 10%

of the measured value for all OSCs, and 56%<1σ are very good, given the spread in the production data. This also indicates that the main parameters responsible for the variations inV_oc are included in the PCA.

The quality of the predictions for J_sc and F F is not as good as compared to the V_oc predictions when considering the percentage dierence between predicted and measured value only. However, the standard deviations per substrate are larger than they are for the V_oc described by reasons in sec-tion 6.1.3. Assuming that the model reects the underlying physics, there are most likely parameters, which were not included in the analysis, but have an eect on J_sc and F F. The necessity of including the parameter OP into the analysis shows that the variations in humidity during PEDOT:PSS spin-coating, which had not been monitored, might have an inuence. Another parameter, which is known to have an inuence on J_sc is the morphology of the absorber layer. It should largely depend on the production parame-ters, but is dicult to be assessed directly for every OSC. It is also currently not possible to parameterise the morphology, which would be necessary to include it into the analysis.

Not surprisingly, the predictions for the power conversion eciency η which are calculated from V_oc, J_sc and F F, are deviating from the actual values, because the dierences add up (see tables 6.14 and 6.15). Still the results are encouraging, not only if one considers that the model is based on just 36 substrates. Although 36 substrates are a large number comparing to what is typically published in literature, the experiments will need to continue to create a large data base.

Summary

• A linear model for V_oc, J_sc and F F was developed on basis of the PCA. It uses the fact that the last principal components have near zero variance, indicating a linear relationship between the variables.

• The cross-validation of the developed statistical model can reproduce the OSC properties well, given the small data set. Especially the

pre-166 CHAPTER 6. EXPERIMENTS AND ANALYSIS dicted values forV_oc look promising as all are within 10% of the actual values. This suggesting that the main production parameters with an inuence onV_oc are included in the model.

• The larger discrepancies between predicted and actual values forJ_scand F F (21 of 36 substrates predicted <10% from actual values) suggest that one or more underlying parameters, which leads to variations, have not yet been included in the PCA. Possible candidates are the humidity during PEDOT:PSS spin-coating, which will be monitored in future, and the morphology the absorber layer, which is dicult to parameterise and capture in the PCA.

• Removing substrates from the data base used for the model reduces the quality of predictions, suggesting that the number of substrates (36) is at the lower limit of substrates necessary to construct the statistical model.

• The quality of the model has to be veried with new experiments, which are currently being planned.

Im Dokument Identification and Analysis of Key Parameters in Organic Solar Cells (Seite 171-174)