• Keine Ergebnisse gefunden

Unsupervised Exploration: Discovery Without Understanding?

4 The Deep Learning Predicament .1 The Utility of Unsupervised Learning

4.2 Unsupervised Exploration: Discovery Without Understanding?

Now consider the conditions under which remarkable discoveries are quite often made: Exploratory phases, which are not strongly guided by theory. Exploratory experimentation has been recognized for being special in many ways; most impor-tantly in that “[t]he typical context of exploratory experimentation is the formation of [...] conceptual frameworks.” (Steinle, 1997, p. S71; emph. added)

As an example, Steinle (1997, p. S72) discusses Faraday’s introduction of the concept of magnetic force-lines, which ultimately lead to Maxwell’s electrodynam-ics.24 As we now know, this was an important step towards major scientific progress.

But it took “two [...] decades” until even Faraday’s “concept was fully developed”

(ibid.).

It now comes in handy that we have chosen particle physics as a major case study, for it is about to enter an exploratory phase: Despite its predictive success, the

22 ... and regression; cf., however, Fn. 4.

23 Cf. Farina et al. (2020, p. 6) for similar problems with more data-driven approaches.

24 See, however, Steinle (2016, Chapt. 7), for a wealth of further examples.

SM neither includes gravity nor dark matter, cannot explain neutrino oscillations, and has suspicious ‘fine-tuning’ properties. At the same time, many of the SM’s favoured extensions have been ruled out by evidence, and future theoretical devel-opments are far from obvious. Accordingly, the preamble of the 2020 update of the European strategy for particle physics emphasizes “the exploration of a new realm of energies” (European Strategy Group, 2020, p. 5).

Now recall, however, how physicists struggled greatly to make sense of the ‘par-ticle zoo’ discovered in the 20th century. The discovery of muons, for instance, was famously greeted with the query “Who ordered that?” by Isidor Isaac Rabi. It is hence perfectly conceivable that the upcoming exploratory phase will reveal sub-tle traces of further unexpected particles. But now the following question arises: if such a discovery was powered by an unsupervised DNN, would physicists be able to make sense of it?

Given everything that was said so far, I believe that this is far from clear, and that highly similar problems may arise in other data-heavy sciences. For instance, con-sider the following verdicts from the earth science community:

Unsupervised learning may aid the discovery of novel relationships[...] across the different dimensions of climate modelling [...]. A subsequent challenge for the Earth System community would be where an unsupervised approach reveals new system connections, requiring mechanistic understanding. (Hunt-ingford et al., 2019, p. 5)

[D]eep learning will soon be the leading method for classifying and predicting space-time structures in the geosciences. More challenging is to gain under-standing in addition to optimal prediction, and to achieve models that have maximally learned from data, while still taking into account physical and bio-logical knowledge. (Reichstein et al., 2019, p. 200)

To see where precisely the problems originate, first recall the difference between unsupervised and (semi-)supervised learning discussed in the previous section. In terms of underlying mechanisms, this difference plays out as follows: In a discovery based on supervised DNNs, the labels stem from a conceptualization of the target.

Ideally, this will allow researchers to bypass w-opacity, because steps (II) and (III) will be fixed by some explanatory models’ suggesting those labels.

Weakly supervised DNNs rely only on ‘negative’ labels, unsupervised ones on no labels at all. Thus, in both cases, the connection to underlying mechanisms becomes severed. In the unsupervised case, however, these observations extend even to mod-els explaining the non-anomalous data. Thus unsupervised modmod-els not only yield the greatest discovery potential, but at the same time also the greatest disconnect to prior knowledge of data-generating mechanisms.

Now, if an unsupervised DNN (weakly)25 predicts the presence of a novel phe-nomenon, physicists would certainly have several aces up their sleeves. They might

25 Of course, it cannot be excluded that a DNN analyzing, say, the overall pattern of how known parti-cles distribute across the existing data will be able to predict further partiparti-cles at higher energies, i.e., to predict particles in the strong sense. However, at present, this seems to be mere speculation.

try to match it to existing proposals for new physics by adjusting certain free param-eters in the corresponding physics-models. Or they might reassess their understand-ing of the background-physics in the domain where a significant anomaly is beunderstand-ing indicated, with the goal of seeing whether the prediction was spurious.

However, absent any plausible physics model or reason for doubting the DNN’s prediction, a real problem for explanation and understanding would arise from such an event. To see this clearly, consider also the importance of background theories in exploration (e.g. Franklin, 2005). The distinction Franklin makes between ‘back-ground’ and ‘local’ theories in biology parallels similar distinctions made by Wal-lace (2020) and Karaca (2013) in physics. In detail, background theories determine the general structure of mechanisms in biology, or the structure of state spaces in physics, whereas local theories determine only the concrete ingredients to a particu-lar mechanism or the state space for the problem at hand. In exploration, moreover, background theories “direct inquirers to the kinds of properties that could possibly have a [...] role in their local investigations[...]” (Franklin, 2005, p. 891).

Now at the very inception of present-day particle physics stand Rutherford’s scat-tering-experiments (Duncan & Janssen, 2019, pp. 154), and famously, the first to make sense of several experimental findings here was Bohr. His explanation was based upon the assumption of quantized orbits for electrons (Duncan & Janssen, 2019, pp. 11–12), a development which contributed greatly to his later development of the atom model. Ultimately, this lead to modern-day quantum theory—a then-new background theory, which even introduced mathematics unfamiliar to physicists at the time.

The conceptual shift between classical and quantum theories is thus as radical as any (just think ‘superposition’ and ‘entanglement’), which underscores Steinle’s observations. Against the manifold successes of what is now called ‘classical phys-ics’, such a leap must have clearly seemed inconceivable to many at the time (just recall Kelvin’s infamous ‘two clouds’). Yet it happened, stimulated by empirical findings that could not be properly conceptualized within the classical framework.

The point, then, is this: While the role Franklin ascribes to background theories in exploratory research may be correct in principle, exploration can even induce the need for new background theories. Given also the profound surprises particle physi-cists have faced in the past, we can’t exclude big conceptual shifts lurking at unex-plored energies.

A little more precisely, particle physics’ present background theory is quantum field theory (QFT), so most candidate physics explanations would presumably be presented in terms of some QFT-Lagrangian. However, given, e.g., the well-known difficulties of integrating Einstein’s general theory of relativity into QFT, it is by no means certain that QFT has the resources for providing the desired new model. And it is impossible to estimate how big a conceptual shift will be required in finding that new background theory.

The outlook on the quest for scientific understanding in an exploratory context where an unsupervised DNN powers new discoveries now comes out as follows:

Being c-instrumental, we could not expect an explanation directly from the DNN.

However, in pursuing steps (I)–(III) scientists would have to rely on previously established concepts, and hence be prone to assigning the wrong meaning to the DL

model. This would definitely hinder an extended understanding in terms of a new background theory, for:

Research questions can be posed only with particular concepts. In the context of another conceptual scheme they may well fail to make sense, in which case they elude attention. (Steinle, 2016, p. 333)

And again, this is not mere philosophical speculation in the void, but an actual sci-entific problem recognized (if somewhat vaguely) by active researchers:

Even complex problems in computer vision have been solved by hand-crafted features that reflect the assumptions and expectations that arise from com-mon world knowledge. In geoscience and climate science, such global, general knowledge is still partly missing, and indeed, is exactly what we are seeking in research (hence, it cannot be an assumption). (Reichstein et al., 2019, p. 200;

emph. added)

Note the crucial role of w-opacity here: It is exactly the ability of DNNs to ‘auto-matically discover’ important, complex features that further prediction and, at the same time, provide a basis for understanding underlying mechanisms (such as the development of secondary protein structure, or the intermediate decay of a certain massive particle). But if these features are so far not understood by humans, it is far from clear how to abstract them from a successful DNN by means of standard interpretability methods; for “the subsequent interpretation of the final state of the trained network depends on human categories expressed in natural language by the human evaluators” (López-Rubio, 2020, p. 12).

Hence, to put it in the words suggested to me by an anonymous referee, the DNN finds significant features, but the translation of these into scientific concepts is up to scientists’ ability and knowledge. This is because concepts are expressed linguisti-cally, but current state of the art DNNs do not have the ability to generate linguistic descriptions of the concepts that underlie these automatically discovered significant features.

5 Conclusion

I have argued that DNNs are c-instrumental models that harvest their success in a w-opaque way. Even though they excel as predictive tools, they thus do not deliver explanations themselves and may conceal information relevant for new-concept-for-mation. As I have shown, this creates the possibility of unprecedented gaps between discovery and understanding in the near future; in particular, when the following four factors are jointly present:

1. In an exploratory experimental context, 2. an unsupervised model

3. predicts an unexpected discovery, whose understanding 4. requires a radical conceptual shift.

Emphatically, I am not claiming that only under conditions 1.–4. will scientists face problems in gathering understanding of underlying mechanisms from DL, nor that this is bound to happen in case 1.–4. occur: Certainly, executing step (III) can be hard under far less drastic circumstances, and maybe the right set of geniuses, with the necessary ‘exotic’ ideas, are around the corner if and when 1.–4. happen (as in the quantum revolution).

However, hoping for geniuses to be around is certainly not a satisfying response to this problem, and given the current state of several big-data sciences (as well as the astonishing DL successes witnessed therein), I submit that we might plausibly face a scenario like the above in the near future.

What this means for science as whole remains to be seen: Will future scientists value prediction over explanation? Or will they develop new skills for constructing explanatory models from sparse information? There are certainly already some steps that seem to point in the latter direction: Reichstein et al. (2019), for instance, sug-gest a hybrid approach in which physics information is incorporated into the train-ing, and Alvarez Melis and Jaakkola (2018) propose a framework for DNNs that are, in a sense, ‘self explaining’.

However, neither of these proposals seems to make contact with the conceptual challenges I have raised in this paper, and I am not convinced that any approach to ‘self-explanation’ could deliver something like the complex, involved—and often quite ingenious—models that arise from advanced physical (or, more generally: sci-entific) theorizing.

In any case, I hope to have shown convincingly that, in a precise sense, the w-opacity and c-instrumentality of DL models indeed have the potential to pro-foundly ‘change the face of science’.

Acknowledgements The research for this paper was funded by the German Research Foundation, as part of the research unit The Epistemology of the Large Hadron Collider (DFG; Grant FOR 2063). I thank audiences at the 2018 conference The Science and Art of Simulation at the HLRS in Stuttgart, Germany, and at the 2020 Workshop Machine Learning: Prediction Without Explanation? at the Karlsruhe Institute for Technology (KIT), Germany, for helpful discussions on the subject matter. I also owe thanks to Paul Grünke, Rafaela Hillerbrand, Marianne van Panhuys, Gregor Schiemann, and Christian Zeitnitz, all of whom are members of the Poject B1: The impact of computer simulations and machine learning on the epistemic status of LHC Data, part of the DFG/FWF-funded research unit The Epistemology of the LHC.

I have also profited from various comments by regular visitors of the research unit’s internal seminar, as well as from two anonymous referees.

Funding Open Access funding enabled and organized by Projekt DEAL. The research for this paper was funded by the German Research Foundation, as part of the research unit The Epistemology of the Large Hadron Collider (DFG; Grant FOR 2063).

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-mons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

References

Albertsson, K., Altoe, P., Anderson, D., Andrews, M., Espinosa, J. P. A., Aurisano, A., Basara, L., Bevan, A., Bhimji, W., Bonacorsi, D., Calafiura, P., Campanelli, M., Capps, L., Carminati, F., Carrazza, S., Childers, T., Coniavitis, E., Cranmer, K., David, C., … Zapata, O. (2018). Machine learning in high energy physics community white paper. Journal of Physics: Conference Series, 1085(2), 022008.

Alvarez Melis, D., & Jaakkola, T. (2018). Towards robust interpretability with self-explaining neural net-works. Advances in Neural Information Processing Systems, 31, 7775–7784.

Baldi, P., Sadowski, P., & Whiteson, D. (2014). Searching for exotic particles in high-energy physics with deep learning. Nature Communications, 5, 4308.

Basso, A., Lisciandra, C., & Marchionni, C. (2017). Hypothetical models in social science. In L. Magnani

& T. Bertolotti (Eds.), Springer handbook of model-based science (pp. 413–433). Springer.

Beisbart, C. (2012). How can computer simulations produce new knowledge? European Journal for Phi-losophy of Science, 2(3), 395–434.

Boge, F. J. (2019a). How to infer explanations from computer simulations. Studies in History and Phi-losophy of Science Part A. https:// doi. org/ 10. 1016/j. shpsa. 2019. 12. 003

Boge, F. J. (2019b). Why computer simulations are not inferences, and in what sense they are experi-ments. European Journal for Philosophy of Science, 9(1), 13.

Boge, F. J., & Zeitnitz, C. (2020). Polycratic hierarchies and networks: What simulation-modeling at the LHC can teach us about the epistemology of simulation. Synthese. https:// doi. org/ 10. 1007/

s11229- 020- 02667-3

Bokulich, A. (2011). How scientific models can explain. Synthese, 180(1), 33–45.

Bokulich, A. (2014). How the tiger bush got its stripes: ‘How possibly’ vs. ‘how actually’ model explana-tions’. The Monist, 97(3), 321–338.

Buckner, C. (2018). Empiricism without magic: Transformational abstraction in deep convolutional neu-ral networks. Synthese, 195(12), 5339–5372.

Buckner, C. (2019). Deep learning: A philosophical introduction. Philosophy Compass, 14(10), e12625.

Buckner, C. (2020). Understanding adversarial examples requires a theory of artefacts for deep learning.

Nature Machine Intelligence, 2(12), 731–736.

Buckner, C. J. (2021). Black boxes, or unflattering mirrors? Comparative bias in the science of machine behavior. The British Journal for the Philosophy of Science. https:// doi. org/ 10. 1086/ 714960 Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big

Data & Society, 3(1), 1–12.

Bzdok, D., Altman, N., & Krzywinski, M. (2018). Statistics versus machine learning. Nature Methods, 15(4), 233–234.

Card, D., Mas, A., & Rothstein, J. (2008). Tipping and the dynamics of segregation*. The Quarterly Journal of Economics, 123(1), 177–218.

Chang, S., Cohen, T., & Ostdiek, B. (2018). What is the machine learning? Physical Review D, 97(5), 6.

Chattopadhyay, A., Hassanzadeh, P., & Pasha, S. (2020). Predicting clustered weather patterns: A test case for applications of convolutional neural networks to spatio-temporal climate data. Scientific Reports, 10(1), 1–13.

Chirimuuta, M. (2020). Prediction versus understanding in computationally enhanced neuroscience. Syn-these. https:// doi. org/ 10. 1007/ s11229- 020- 02713-0

Cho, A. (2017). Ai’s early proving ground: The hunt for new particles. Science, 357(6346), 20.

Clark, W. A. (1991). Residential preferences and neighborhood racial segregation: A test of the schelling segregation model. Demography, 28(1), 1–19.

Craver, C. F. (2006). When mechanistic models explain. Synthese, 153(3), 355–376.

Creel, K. A. (2020). Transparency in complex computational systems. Philosophy of Science, 87(4), 568–589.

Davies, P. L. (2014). Data analysis and approximate models. CRC Press.

de Regt, H. (2017). Understanding scientific understanding. Oxford University Press.

Douglas, H. E. (2009). Reintroducing prediction to explanation. Philosophy of Science, 76(4), 444–463.

Duncan, A., & Janssen, M. (2019). Constructing quantum mechanics (Vol. 1). Oxford University Press.

Durán, J. M. (2017). Varying the explanatory span: Scientific explanation for computer simulations.

International Studies in the Philosophy of Science, 31(1), 27–45.

Durán, J. M. (2018). Computer simulations in science and engineering. Springer Nature.

Durán, J. M. (2020). What is a simulation model? Minds and Machines, 30(3), 301–323.

European Strategy Group. (2020). 2020 update of the european strategy for particle physics.

http:// europ eanst rateg yupda te. web. cern. ch/ sites/ europ eanst rateg yupda te. web. cern. ch/ files/

CERN- ESU- 015- 2020

Farina, M., Nakai, Y., & Shih, D. (2020). Searching for new physics with deep autoencoders. Physical Review D, 101(7), 075021.

Franklin, L. R. (2005). Exploratory experiments. Philosophy of Science, 72(5), 888–899.

Frigg, R., & Reiss, J. (2009). The philosophy of simulation: Hot new issues or same old stew? Synthese, 169(3), 593–613.

Giere, R. (2006). Scientific perspectivism. University of Chicago Press.

Goodfellow, I.  J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples.

http:// arxiv. org/ abs/ 1412. 6572

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. The MIT Press.

Grimm, S. R. (2010). The goal of explanation. Studies in History and Philosophy of Science Part A, 41(4), 337–344.

Grüne-Yanoff, T. (2013). Appraising models nonrepresentationally. Philosophy of Science, 80(5), 850–861.

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), 1–42.

Hacking, I. (1992). The self-vindication of the laboratory sciences. In A. Pickering (Ed.), Science as practice and culture (pp. 29–64). The University of Chicago Press.

Harris, T. (2003). Data models and the acquisition and manipulation of data. Philosophy of Science, 70(5), 1508–1517.

Hasse, H., & Lenhard, J. (2017). Boon and bane: On the role of adjustable parameters in simulation mod-els. In J. Lenhard & M. Carrier (Eds.), Mathematics as a tool (pp. 93–116). Springer.

Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning: Data mining, infer-ence, and prediction. Springer.

Hornik, K., Stinchcombe, M., White, H., et al. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.

Humphreys, P. (2004). Extending ourselves: Computational science, empiricism, and scientific method.

Oxford University Press.

Humphreys, P. (2009). The philosophical novelty of computer simulation methods. Synthese, 169(3), 615–626.

Humphreys, P. (2013). Data analysis: Models or techniques? Foundations of Science, 18(3), 579–581.

Huntingford, C., Jeffers, E. S., Bonsall, M. B., Christensen, H. M., Lees, T., & Yang, H. (2019). Machine learning and artificial intelligence to aid climate change research and preparedness. Environmental Research Letters, 14(12), 124007.

Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.

Karaca, K. (2013). The strong and weak senses of theory-ladenness of experimentation. Science in Con-text, 26(01), 93–136.

Karaca, K. (2018). Lessons from the large hadron collider for model-based experimentation. Synthese, 195(12), 5431–5452.

Kasabov, N. (2019). Time-space, spiking neural networks and brain-inspired artificial intelligence.

Springer.

Khalifa, K. (2017). Understanding, explanation, and scientific knowledge. Cambridge University Press.

Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In I. Lakatos &

A. Musgrave (Eds.), Criticism and the growth of knowledge: Proceedings of the International Col-loquium in the Philosophy of Science, London, 1965 (pp. 91–196). Cambridge University Press.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.

Lenhard, J. (2006). Surprised by a nanowire: Simulation, control, and understanding. Philosophy of Sci-ence, 73(5), 605–616.

Lenhard, J. (2007). Computer simulation: The cooperation between experimenting and modeling. Phi-losophy of Science, 74(2), 176–194.

Lenhard, J. (2019). Calculated surprises. Oxford University Press.

Lenhard, J., & Winsberg, E. (2010). Holism, entrenchment, and the future of climate model pluralism.

Studies in History and Philosophy of Modern Physics, 41(3), 253–262.

Leonelli, S. (2016). Data-centric biology. University of Chicago Press.

Leonelli, S. (2019). What distinguishes data from models? European Journal for Philosophy of Science, 9(2), 22.

López-Rubio, E. (2020). Throwing light on black boxes: Emergence of visual categories from deep learn-ing. Synthese. https:// doi. org/ 10. 1007/ s11229- 020- 02700-5

Lyre, H. (2008). Does the Higgs mechanism exist? International Studies in the Philosophy of Science, 22(2), 119–133.

Lyre, H. (2020). The state space of artificial intelligence. Minds and Machines. https:// doi. org/ 10. 1007/

s11023- 020- 09538-3

Maher, P. (1988). Prediction, accommodation, and the logic of discovery. Philosophy of Science, 1988(1), 273–285.

Massimi, M. (2005). Pauli’s exclusion principle: The origin and validation of a scientific principle. Cam-bridge University Press.

McCrea, R. (2009). Explaining sociospatial patterns in South East Queensland, Australia. Economy and Space, 41(9), 2201–2214.

Minsky, M., & Papert, S. A. (1969). Perceptrons: An introduction to computational geometry. MIT Press.

Morello, V., Barr, E., Bailes, M., Flynn, C., Keane, E., & van Straten, W. (2014). Spinn: A straightfor-ward machine learning solution to the pulsar candidate selection problem. Monthly Notices of the Royal Astronomical Society, 443(2), 1651–1662.

Morrison, M. (1999). Models as autonomous agents. In M. Morrison & M. S. Morgan (Eds.), Models as mediators (pp. 38–65). Cambridge University Press.

Morrison, M. (2015). Reconstructing reality: Models, mathematics, and simulations. Oxford University

Morrison, M. (2015). Reconstructing reality: Models, mathematics, and simulations. Oxford University