On the automation of Science - an incomplete account

Introduction

In the context of this essay, I investigate the question:  Can science be automated by artificial intelligence?

To investigate arguments for and against the possibility of science-AI, I will aim to answer the following question: is AI, given its own functional logic, able to implement the functional logic that underlies science?  

What is a functional logic, and how does it help us to answer whether it is possible to automate science by means of AI? The idea is to characterize and then contrast what one might consider the “two sides of an equation”, i.e., the respective functional logics of science-making and AI. On one end, we need to understand the functioning of science-making. Insofar as the scientific endeavor is successful in a way that, say, mere doing or mere thinking is not, what is it about science-making that can account for that? On the other end, we need to characterize the functioning of (machine learning (ML)-based) AI. How does it work and what performances can it (not) achieve, in principle and under specific conditions? Then, we can contrast these two sets of “functional logics” and evaluate whether the functional logic of ML is able to implement the functional logic of science-making, or whether there are (fundamental or practical) limitations which would preclude this. The idea is that, if the functional logic of science is not expressible by the functional logic of ML, we can conclude that, at least within the ML paradigm, full-scale AI-automated science (what I henceforth simply refer to as “science-AI”) is not feasible. 

I proceed as follows: I have just introduced the high-level approach I am taking. Next, and before I can properly dive into the discussion, I briefly discuss what might motivate us to ask this question, and then make a few clarifying remarks concerning the terminology used. With this out of the way, I can start evaluating the arguments. In Part 1, I start by laying out a picture which aims to reject the science-AI conjecture. The argument is, in short, that science requires a) “strong generalization”, i.e., the ability to come up with sensible abstractions (i.e., inductive reasoning), and b) “deductive reasoning”, i.e., the ability to use those abstractions sensibly and reliably. In both cases, there are reasons to doubt whether the functional logic of ML systems allows them to do this, or do this sufficiently well. Parts 2 and 3 explore two plausible ways to rebut this skeptical view. The first one explores whether the fact that ML systems are (at the limits) universal function approximators can recover the possibility of science-AI. The second one observes that humans too show cognitive limitations (arguably not dissimilar from the ones I will discuss in the context of ML systems), while still being capable of science-making (arguably). I will conclude that, based on the objections raised in Parts 2 and 3, the argument against the possibility of science-AI does not succeed.

Motivations

Why might one be interested in investigating the possibility of science-AI? 

First, we may hope to learn things not only about the current and prospective capability level (and limits thereof) of AI systems, but, quite plausibly, also about the nature of science-making itself. As such, this question appears to have intellectual merits on its own grounds. 

Furthermore, the question is of interest on more practical grounds, too. The scientific revolution brought about a massive acceleration in humans’ ability to produce knowledge and innovation, which in turn has led to astonishing improvements to the quality of human life. As such, the prospect of automating scientific progress appears promising. At the same time, AI capable of automating science may also be dangerous. Afterall, scientific progress also produced technologies that pose immense risks to the wellbeing and survival of humanity, including nuclear weapons and the ability to engineer pathogens or technologies that facilitate mass surveillance and oppressions by states or other powerful actors (Bostrom, 2019). As such, if we knew that science-AI was possible, this ought to motivate us to adopt caution and start working on improved societal and governance protocols to help use these capabilities safely and justly.  

Finally, there are serious worries that the growing adoption of AI applications contribute to an “epistemic crisis”, which poses a threat (in particular) to decision-making in democratic societies (e.g., Seger et al., 2020). Among others, these systems can be used to generate text, images, video, and voice recordings which do not necessarily represent reality truthfully and which people might interpret as real even if fake. As such, if we were capable of building AI systems that systematically skew towards truth (as opposed to, say, being riddled with the sort of confabulations that we can see in state-of-the-art language models (Ziwei et al., 2022)), this may help decrease such epistemic risks. 

Some clarifications

As mentioned, a few brief points of clarification are in order before I can properly dive into discussing the (im)possibility of science-AI; in particular, with respect to how I will be using the terms “science”, “AI”, and “automation”. 

First, there does not exist broad consensus among the relevant epistemic community as to what the functional logic (or logics) of science-making is, nor can it be taken as a given that there exists a singular such functional logic. For example, philosophers like Karl Popper have questioned the validity of any use of inductive reasoning in science and instead put front and center the deductive process of falsification (Popper, 1934; Popper, 1962). In contrast, other accounts of science-making do very much rely on inductive processes, such as Bayesian statistics, in order to evaluate how much a given hypothesis is supported by the available evidence (e.g., Sprenger, Hartmann, 2019). In the context of this essay, however, I am not trying to settle the question of the correct account of the scientific method; I instead adopt a degree of methodological pluralism and evaluate the conceptual plausibility of end-to-end science-AI in accordance with several of the most prominent accounts of the scientific method.  

Second, I need to clarify what I do and don’t mean by artificial intelligence. The term in principle refers to a putative technology that implements intelligent behavior through artificial means (e.g., on silicon); it does not, on its own, specify how it does it. To clarify matters in the context of this essay, I (unless otherwise specified) always refer to implementations of AI that reside within the paradigm of ML. I believe this is a justified assumption to make because ML is the currently dominant paradigm in the field of AI, it is the most successful paradigm to date, and there is no particular reason to expect this trend will halt in the near future. Now, having conditioned a technical paradigm for AI, we can reason more substantively about the possibilities and limitations of ML-based AI systems when it comes to science-making by drawing on fields such as ML theory, optimization theory, etc. 

Third, when talking about the automation of science, one might have in mind partial automation (i.e., the automation of specific tasks that are part of but don’t comprise the whole of the scientific enterprise), or a full, end-to-end automation of the scientific process by means of AI. In the context of this essay, I primarily focus on the conceptual plausibility of the latter: end-to-end science-AI. The line of demarcation I wish to draw is not about whether the automation involves a single or multiple (e.g., an assembly of) AI applications, but rather whether human scientists are still a required part of the process (such as is the case for what I call parietal automation) or not (such as is the case for what I call end-to-end automation or end-to-end science-AI). 

With this out of the way, it is now time to dive into the discussion.

Part 1: Contra science-AI 

In this section, I lay out the case against the possibility of science-AI. In short, I argue that autonomous scientific reasoning requires i) the ability to form sensible abstractions which function as bases for generalizing knowledge from past experience to novel environments, and ii) the ability to use such abstractions reliably in one’s process of reasoning, thereby accessing the power of deductive or compositional reasoning. However, or so the argument goes, ML systems are not appropriately capable of forming such abstractions and of reasoning with them. 

First, let me clarify the claim that abstractions and deductive reasoning play central roles in science-making. Generalization refers to the ability to apply insights to a situation that is different to what has previously been encountered. Typically, this form of generalization is made possible by means of forming the “right” abstractions, i.e., ones that are able to capture those informational structures that are relevant to a given purpose across different environments (Chollet, 2019). When I invoke the concept of a dog, for example, I don’t have a specific dog in mind, although I could probably name specific dogs I have encountered in the past, and I could also name a number of features that dogs typically (but not always) possess (four legs, fur, dog ears, a tail, etc.). The “dog” case could be understood as an example of relatively narrow abstraction. Think now, instead, of the use of concepts like “energy”, “mass”, or “photon” in physics, or of a “set” or “integration” or “equation” in mathematics. Those concepts are yet farther removed from any specific instances of things which I can access directly via sensory data. Nevertheless, these abstractions are extremely useful in that they allow me to do things I couldn’t have done otherwise (e.g., predict the trajectory of a ball hit at a certain angle with a certain force, etc.).  

Scientific theories critically rely on abstraction because theories are expressed in terms of abstractions and their functional relationship to each other. (For example, the law of the conservation of energy and mass describes the relationship between two abstractions—“energy” and “mass”; in particular, this relationship can be expressed as: E= mc2). The use of abstractions is what endows a theory with explanatory power beyond the merely specific, contingent example that has been studied empirically. At the same time, the usefulness of a theory is dependent on the validity of the abstractions it makes use of. A theory that involves abstractions that do not carve reality sufficiently at its joints sufficiently will very likely fail to make reliable predictions or produce useful explanations. 

Furthermore, the ability to form valid abstractions constitutes the basis for a second critical aspect of scientific cognition, namely, deductive and compositional reasoning. By deductive reasoning, I am referring to such things as deductive logic, arithmetics, sorting a list, and other tasks that involve “discrete” representations and compositionality  (Chollet, 2020). In the case of science-making in particular, falsification or disconfirmation play a central role and are established by means of deductive reasoning such as in the hypothetico-deductive account (e.g., Sprenger, 2011; Hempel 1945). The ability to use, or reason over, abstractions allows for so-called “combinatorial generalization”. It is this compositionality of thought that, it has been argued, is a critical aspect of human-level intelligence by giving the reasoner access to a schema of “infinite use of finite means” (Humboldt, 1836; Chomsky, 1965). 

Having made the case for why science-making relies on the ability to i) form and ii) reason with abstractions, I can now investigate the arguments at hand for believing ML systems are not appropriately capable of i) and ii).

Reasons for skepticism come from empirical observation (i.e., using the state-of-the-art models and seeing how they “break”), theoretical arguments, and expert judgment.  In terms of the latter, Cremer (2021) surveys “expert disagreement over the potential and limitations of deep learning”. With expert opinions diverging, Cremer identifies a set of plausible origins of said disagreements, centrally featuring questions concerning the ability of artificial neural networks to “form abstraction representations effectively” or the extent of their ability to generalize as key origins of those disagreements (p.7). 

To elaborate more on the theoretical arguments for ML skepticism, it is worth exploring the way in which ML methods face challenges in their ability to generalize (e.g., Chollet, 2017; Battaglia, 2019; Cartuyvels et al., 2021; Shanahan, Mitchell, 2022). ML uses statistical techniques to extract (“learn”) patterns from large swaths of data. It can be understood as aiming to approximate the underlying function which generated the data it is getting trained on. However, this interpolative learning leads to brittleness if the systems get deployed outside of the distribution of the training data. This phenomenon is well known in the ML literature and usually discussed under terms such as out-of-distribution (OOD) generalization failure. Under distributional shift (i.e., cases where the data depict a different distribution compared to the training environment), the approximation function the model learned under training is no longer guaranteed to hold, leading to a generalization failure. The risk of failures to generalize, so the argument goes, limits the potential to use ML for end-to-end science automation because we cannot sufficiently trust the soundness of the process. 

Furthermore, ML systems are notoriously bad at discrete tasks (see, e.g., Marcus, 2018; Cartuyvels et al., 2021; etc.). While state-of-the-art ML systems are not incapable of (and are getting better at), say, simple forms of arithmetics (e.g., adding up two-digit numbers), it is noteworthy that tasks that take only a few lines of code to automate reliably in the paradigm of classical programming have remained outside of the reach of today’s several-billion-parameter-sized ML models. To quote François Chollet, a prominent AI researcher, deliberately misquoting Hinton, expert and pioneer of deep learning: “Deep learning is going to be able to do everything perception and intuition, but not discrete reasoning” (Chollet, 2020). This unreliability in deductive reasoning exhibited by ML systems is another reason for skepticism towards the possibility of end-to-end science-AI. 

To summarize the argument, current ML-based AI systems appear to face limitations with respect to their ability to achieve “broad” generalization, to form sensible abstractions, and to use those abstractions reliably. Given these limitations, society would be ill-advised to rely on theories, predictions, and explanations proposed by science-AI. Of course, and this is worth noting, end-to-end science-AI is a high bar. The view presented above is entirely compatible with predicting that AI systems will be used to automate or augment many aspects of science-making, and it may not require a lot of places where humans “patch” the process.

Having elaborated on the case against the possibility of science-AI, I now move to investigating two plausible lines of reasoning aiming to defeat the suggested conclusion.

Part 2: Universal function approximation

The first argument that I will discuss against the purported limitations of ML builds on the claim that ML systems are best understood as universal function approximators (UFA). From this follows the conjecture that there must exist a certain level of computational power at which ML systems are able to sufficiently approximate the science-making function. 

In short, UFA refers to the property of neural networks that, for whatever function f(x), there exists a neural network that can approximate said function. There exists a mathematical theorem proving a version of this property for different cases, e.g., for neural networks of arbitrary width (i.e., arbitrary number of neurons) or arbitrary depth (i.e., arbitrary number of layers), as well as in bounded cases (e.g., Hornik, Stinchcombe, White, 1989; Gripenberg, 2003). 

Let’s say we accept that ML systems are accurately understood as UFAs, and that, given that, ML systems are able, in principle, to implement the functional logic of science-making. However, this picture raises an important question: (when) is approximation enough?

There is, after all, a difference between “the thing, precisely” and “the thing, approximately”. Or is there? Imagine you found  a model M1 which approximates function F with an error of ε1. And imagine that the approximation is insufficient—that ε1 is too large for M1 to properly fulfill the function of F. Well, in that case, on grounds of the universal approximation theorem, there exists another model M2 with ε2<ε1. If ε2 is still too big, one can try M3, and so on. As such, you can, in principle, get arbitrarily close to “the thing”, or in other words, the difference between “the thing” and its approximation get arbitrarily small in the limits. 

One might still object to this conceptual argument with a practical worry. It may be prohibitively expensive (in terms of energy, model size/chips, or time) to get arbitrarily close to the “true” function of science-making. However, I suggest we have pragmatic reasons to not be too worried by this concern. After all, we can hardly expect that human scientists always pick out the right abstractions when constructing their theories. More so, most feats of engineering rely on theories that we know use abstractions that aren’t completely true, and yet have been shown to be “sufficiently” true (in a pragmatist sense) in that they produce useful epistemic products (including bridges that don’t collapse and airplanes that stay in the air). For example, the framework of classical physics was, in some sense, proven wrong by Einstein’s theories of relativity. And yet, most engineering programs are entirely happy to work within the classical framework. As such, even if ML systems “only” approximate the function of science-making, we have all the reasons to expect that they are capable of finding sufficient approximations such that,  for all practical purposes, they will be capable of science-making. 

Finally, science-AI must not look like a monolithic structure consisting of a single ML model and its learned behavior policy. Instead, we can imagine a science-AI assembly system which, for example, trains "abstraction forming" and "deductive reasoning" circuits separately, and which are later combined to interface with each other autonomously. This idea of a compositional science-AI shares a resemblance with the vision of a Society of Minds sketched by Marvin  Minsky in 1986, where he argues that human intelligence emerges from interactions of many simple “agents” with narrow skills or functions. Moreover, we can even use ML to discover which forms of compositionality (i.e., “task division”) might be best suitable for a science-AI assembly, insofar as my earlier vague suggestion of integrating an "abstraction forming" and "deductive reasoning" circuit might not be the ideal solution. There already exist examples of current-day ML systems trained based on similar ideas, e.g., Gururangan et al., 2023. 

To summarize, I have argued that UFA theorems prove that AI systems—contra the skeptical picture laid out in Part 1—are in principle able to implement science-making. I further provided arguments for why we can expect this technology to not only be conceptually feasible but also practically plausible. 

Part 3: The possibility of science-making despite limitations 

Let us now turn to the second argument against the skeptical picture proposed in Part1. This argument starts by concededing that ML systems face relevant limitations in their ability to form and reliably use abstractions. However, the argument continues, so do humans (and human scientists), and still, they are capable of doing science (arguably). Thus, the argument about inductive limits of ML systems cannot, on its own, defeat the possibility of science-AI. 

To unravel this argument, let us first discuss the claim that both ML and human “reasoners” are limited, and limited in relevantly similar ways. I have already laid out the case for limitations in ML which arise  from the fundamentally continuous and inferential nature of ML. According to our current best theories of human cognition—such as the Bayesian Brain Hypothesis (e.g., Deneve, 2005; Doya, et al., 2007; Knill, Pouget, 2004), Predictive Processing (e.g., Clark, 2013; Clark, 2015; Kanai et al., 2015), and most recently, Active Inference (Parr, Pezzullo, Friston, 2022)—the brain can essentially be understood as a “large inference machine”. As such, the low-level implementation of human reasoning is understood to be similarly continuous and inferential. 

This is, of course, not to deny that humans exhibit higher-level cognitive skills, such as verbal reasoning or metacognition, which are correctly understood to exceed “mere statistics”. Rather, the point I am trying to make is that these higher-level capabilities emerge from the low-level (continuous and inferential) implementation of the neural make-up of the brain. This serves as an existence proof that this sort of low-level implementation can, under certain circumstances, give rise to what one may consider to be more typically associated with “2-type” reasoning (Kahneman, 2017). As such, we have shown that the argument presented in Part 1—that, given the functional logic of modern-day ML, AI will not be able to implement all necessary aspects of scientific reasoning (such as generalization or deductive reasoning)—does not prove what it was meant to prove (the impossibility of science-AI). 

Furthermore, it also shows that a cognitive process must not be flawless in order to be able to implement science-making. Human reasoning is, of course, not without flaws. For example, human scientists regularly pick “wrong” abstractions (e.g., “phlogiston”, “ether”—to name only a few famous cases from the history of science). Or, human scientists are not immune to motivated reasoning and cognitive biases such as confirmation bias or hypothesis myopia (Nuzzo, 2015). The point is, despite these flaws in human reasoning—be that from structural limitations or merely computational boundedness—they have not prevented humans from developing and conducting science successfully. 

This last point raises an interesting question about the nature of science-making. Given the plentiful sources of bounded, flawed, and motivated reasoning depicted by human scientists, how are they still capable of producing scientific progress? One way to make sense of this (plausibly surprising) observation is to understand science as essentially a collective endeavor. In other words, individual scientists don’t do science, scientific communities do.  The idea is that science-making—a process that systematically skews towards the truth—emerges from implementing a collective “protocol”, by means of “washing out”, so to speak, the biased reasoning present at the level of individual scientists. Bringing this back to the question of science-AI, this raises the question whether we best think of science-AI as a single system approximating ideal scientific reasoning, or a system assembly where each individual system can have flaws in their epistemic processes, but the way they all interact produces behavior equivalent to science-making—just like the case for human scientists interacting today. 

To summarize, the argument presented here is two-fold: on one hand, the human reasoning ability is implemented by a continuous and inferential low-level process, serving as existence proof that such processes (which we can also find in machine learning) are in principle able to implement discrete tasks with adequate levels of robustness. On the other hand, science-making is implemented by fallible human reasoners who make mistakes similar in type to the ones discussed in Part 1 (e.g., picking leaky abstractions or misgeneralizing them), serving as an existence proof that processes which are fallible in this way can still implement science-making. 

Conclusion

In this essay, I explored the conceptual possibility of end-to-end science-AI, i.e., an AI system or assembly of systems which is able to functionally implement science-making with no help from humans (post-training). In Part1, I first made the case that end-to-end science-AI is not possible on the basis of noticing limitations of ML systems when it comes to their ability to form useful abstractions and to use these abstractions reliably. I argued that ML, given that it is based on interpolative learning from a given set of (training) data, faces important challenges in terms of its ability to generalize outside of its training data in the case of known or unknown distributional shifts upon deployment. Furthermore, I invoked the fact that ML systems are currently unreliable (and at the very least ineffective) at “discrete” types of reasoning. After developing this skeptical picture, I explored two sets of arguments which seek to recover the possibility of science-AI. 

First, I argued that ML systems are universal function approximators, and that in that capacity, there must exist a computational threshold at which they are able to implement the functional logic of science. Furthermore, I argued that there are pragmatic reasons to accept that this is not only conceptually possible but practically feasible insofar as approximation is enough, as evidenced by successful scientific and engineering feats, as a norm, relying “merely” on approximate truths. 

Second, I compared ML systems to human scientists claiming that, on one hand, the neurological implementation of human reasoning is structurally similar to ML, thus suggesting that ML methods can be expected to successfully scale to “higher-level” reasoning capabilities (including ones that appear particularly critical in science-making). On the other hand, the comparison also reveals how humans are capable of doing science despite the fact that the reasoning of individual humans is flawed in important ways. As such, some amount of brittleness in ML systems does not mean that they cannot successfully implement the scientific process. As such, the arguments discussed in Parts 2 and 3 succeed at defending the possibility of science-AI against the skeptical view laid out in Part 1. Beyond defending the conceptual possibility claim, the arguments also provide some support for the concrete, practical plausibility of science-AI. 

Let us conclude with one more evocative thought based on the analogy between ML and scientific reasoning explored over the course of this essay. Concerns about the generalization limits of ML systems pose an important problem: we need to be able to trust the systems we’re using, or—rather—we want to be able to know when and how much we are justified in trusting these systems. Epistemic justification—which I am taking, for the current purposes, to be a function of the reliability of a given epistemic process—is always defined relative to a given domain of application. This suggests that we want AI systems (among other things) to contain meta-data about their domain of applicability (i.e. the domain within which their generalization guarantees hold). What I want to suggest here is that the same insight also applies to scientific theories: we should more consistently strive to develop scientific theories which are —as an integral part of what it is to be a scientific theory—transparent about their domain of applicability, relative to which the theories does or not claim its predictions will generalize.  

References 

Battaglia, P. W., et al. (2018). Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.

Bender, E. M., et al. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Association for Computing Machinery. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).

Beren, M., et al. (2021). Predictive coding: a theoretical and experimental review. arXiv preprint arXiv:2107.12979.

Bostrom, N. (2019). The vulnerable world hypothesis. Global Policy, 10(4), 455-476.

Chang, H. (2022). Realism for Realistic People: A New Pragmatist Philosophy of Science. Cambridge University Press.

Chollet, F. (2017). The limitations of deep learning. Deep learning with Python. Retrieved from: https://blog.keras.io/the-limitations-of-deep-learning.html 

Chollet, F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.

Chollet, F. (2020). Why abstraction is the key to intelligence, and what we’re still missing. Talk at NeurIPS 2020. Retrieved from: https://slideslive.com/38935790/abstraction-reasoning-in-ai-systems-modern-perspectives 

Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press.

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204. doi:10.1017/S0140525X12000477.

Clark, A. (2015). Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford Academic.

Cremer, C. (2021). Deep limitations? Examining expert disagreement over deep learning. Progress in Artificial Intelligence. 10. 10.1007/s13748-021-00239-1. 

Cartuyvels, R., Spinks, G., & Moens, M. F. (2021). Discrete and continuous representations and processing in deep learning: Looking forward. AI Open, 2, 143-159.

Deneve, S. (2004). Bayesian inference in spiking neurons. Advances in neural information processing systems, 17.

De R., & Henk W. (2017). Understanding Scientific Understanding. New York: Oup Usa.

Doya, K., Ishii, S., Pouget, A., & Rao, R. P. (Eds.). (2007). Bayesian brain: Probabilistic approaches to neural coding. MIT press.

Gripenberg, G. (2003). Approximation by neural networks with a bounded number of nodes at each level. Journal of Approximation Theory. 122 (2): 260–266. 

Gururangan, S., et al. (2023). Scaling Expert Language Models with Unsupervised Domain Discovery. arXiv preprint arXiv:2303.14177.

Hempel, C. G. (1945). Studies in the Logic of Confirmation (II.). Mind, 54(214), 97–121. 

Hendrycks, D., et al. (2020). The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 8340-8349).

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359-366.

Humboldt, W. (1999/1836). On Language: On the diversity of human language construction and its influence on the mental development of the human species. Cambridge University Press.

Ji, Z., et al. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38.

Kahneman, D. Thinking, fast and slow. 2017.

Kanai, R., et al. (2015). Cerebral hierarchies: predictive processing, precision and the pulvinar. Philosophical Transactions of the Royal Society B, 370, 20140169.

Knill, D. C., & Pouget, A. (2004). The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation. TRENDS in Neurosciences, 27(12), 712–719.

Marcus, Gary. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

Mitchell, M. (2021). Abstraction and analogy‐making in artificial intelligence. Annals of the New York Academy of Sciences, 1505(1), 79-101.

Nuzzo, R. (2015). How Scientists Fool Themselves — and How They Can Stop. Nature. 526, 182. https://doi.org/10.1038/526182a. 

Parr, T., Pezzulo, G., & Friston, K. J. (2022). Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press.

Peters, U., et al. (2022). Generalization Bias in Science. Cognitive Science, 46: e13188. 

Popper, K. (1934). The Logic of Scientific Discovery. London, England: Routledge.

Popper, K. (1962). Conjectures and Refutations: The Growth of Scientific Knowledge. London, England: Routledge.

Seger, E., et al. (2020). Tackling threats to informed decision-making in democratic societies: promoting epistemic security in a technologically-advanced world. The Alan Turing Institute. 

Shanahan, M, & Melanie M. (2022). Abstraction for deep reinforcement learning. arXiv preprint arXiv:2202.05839.

Sprenger, J. (2011). Hypothetico-Deductive Confirmation. Philosophy Compass, 6: 497-508. 

Sprenger, J., & Hartmann, S. (2019). Bayesian Philosophy of Science: Variations on a Theme by the Reverend Thomas Bayes. Oxford and New York: Oxford University Press.

Trask, A., et al. (2018). Neural Arithmetic Logic Units. Advances in neural information processing systems, 31.

Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134-1142.

Epistemic justification in (Hu)man and Machine

What does it take for a belief to be epistemically justified? In the hope of providing a novel angle to this long-standing discussion, I will investigate the question of epistemic justification by means of considering not only (what one might call) ‘classical’ cases, but also ‘machine’ cases. Concretely, I will discuss whether—and, if so, on what basis—artificial systems instantiating intelligent behaviour can be said to form epistemically justified ‘beliefs’. This will serve as a sort of thought experiment or case study used to test plausible answers to the problem of epistemic justification and, potentially, derive inspirations for novel ones.

Why do I choose to adopt this methodological approach? Consider, by comparison, the classic question in biology: what is life? Fields such as astrobiology or artificial life allow us to think about this question in a more (and more appropriately) open-minded way—by helping us to uproot unjustified assumptions about what life can and cannot look like based on sampling from Earth-based forms of life alone. The field of artificial intelligence can serve a similar function vis-à-vis philosophical inquiry. Insofar as we aspire for our theories—including our theories of knowledge and epistemic justification—to be valid beyond the contingencies of human intelligence, insights from the study of AI stand in a fruitful intellectual symbiosis with philosophical thought. 

I will start our investigation into epistemic justification with a thought experiment. 

Rome: Consider Alice; when having dinner with her friends, the topic of her upcoming trip to Italy comes up. Alice explains that she will be taking a plane to Rome, Italy’s capital city, from where she will start her journey. 

It seems uncontroversial to say that Alice is epistemically justified in her belief that Rome is in fact the capital of Italy. The question I want to raise here is: in virtue of what is this the case? Before I delve into examining plausible answers to this question, however, let us compare the former story to a slightly different one. 

Rome’: In this case, Bob is playing around with the latest large language model trained and made available by one of the leading AI labs—let’s call it ChatAI. Bob plays with the model in order to get a handle on what ChatAI is and isn’t able to do. At one point, he submits the following query to the model: “What is the capital of Italy?”, and the model replies: “The capital city of Italy is Rome.” 

By analogy to the first case, should we conclude that the model is epistemically justified in its claim that Rome is the capital of Italy? And if not, how are these two cases different? In what follows, I will investigate these questions in more detail, considering various approaches attempting to clarify what amounts to epistemic justification. To do so, I will toggle between considering the traditional (or human) case and the machine case of epistemic justification and study whether this dialogue can provide insight into the question of epistemic justification. 

Correctness (alone) is not enough—process reliabilism for minds and machines

Thus, let us return to a question raised earlier: in virtue of what can we say Alice is justified in claiming that Rome is the capital of Italy? A first observation that appears pertinent is that Alice is correct with her statement. Rome is in fact the capital of Italy. While this appears relevant, it doesn’t represent a sufficient condition for epistemic justification. To see why, we need only think of cases where someone is correct due to mere chance or accident, or even against their better judgement. You may ask me a question about a topic I have never heard of, and yet I might get the answer right by mere luck. Or, in an even more extreme case, we may play a game where the goal is to not give a correct answer. It is quite easily conceivable, in virtue of my utter ignorance of the topic, that I end up giving an answer that turns out to be factually correct, despite trying to pick an answer that I believe to be wrong. In the first case, I got lucky, and in the second case, I uttered the correct answer against my better judgement. In none of these cases would my factually correct answer represent an epistemically justified correct answer. 

As such, I have shown that the truth condition (alone) is an insufficient account of epistemic justification. Furthermore, I have identified a particular concern: that epistemic justification is not given in cases where claim is correct for arbitrary or ‘lucky’ reasons. This conclusion seems to be supported when considering the machine case. If, say, we designed a program that, when queried, iterated through a predefined set of answers and picked one of them at random, then, even if this program happened to pick the correct answers, we wouldn’t feel compelled to consider this a case of epistemic justification. Insofar as we are here taking offense with the arbitrariness of the answer-producing process when considering its status of epistemic justification, we may come to wonder what it would look like for a claim to be correct on a non-arbirary or non-lucky basis. 

To that effect, let us consider the proposal of process relabilism (Goldman, 1979, 1986). At its core, this theory claims that a belief is epistemically justified if it is the product of a belief-formation process that is systematically truth-conducive. In other words, while it is insufficient to observe that a process produces the correct answer on a single and isolated instance, if a process tends to produce the correct answer with a certain reliability, said process acts as a basis for epistemic justification according to the reliabilist thesis. Applied to our Rome case from earlier, the question is thus which processes (e.g., of information gathering and processing) led Alice to claiming that Rome is the Italian capital, and whether these same processes have shown sufficient epistemic reliability in other cases. Let’s say that, in Alice’s case, she inferred her belief that Rome is the capital of Italy as follows. First, her uncle told her that he was about to emmigrate to live in the capital city of Italy. A few weeks later, Alice receives a letter from said uncle which was sent from, as she can tell by the post stamp on the card, Rome. From this, Alice infers that Rome must be the capital of Italy. As such, Alice’s belief is justified insofar as it involved the application of perception, rational reflection, or logical reason, rather than, say, guessing, wishful thinking, or superstitious reasoning. 

Furthermore, we don’t have to understand reliability here merely in terms of the frequency at which a process produces true answers. Instead, we can interpret it in terms of the propensity at which it does so. In the latter case, we capture a notion of truth-conduciveness that pertains not only to the actual-world observed, but is also cognizant of other possible worlds. As such, it aims to be sensitive to the notion that a suitable causal link is required between the given process and its epistemic domain, i.e., what the process is forming beliefs over. This renders the thesis more robust against unlikely but statistically possible cases where an arbitrary process gets an answer repeatedly correct, which would undermine the extent to which process reliabilism can serve as a suitable basis for epistemic justification. To illustrate this, consider the case of the scientific method, where we rely on empiricism to test hypotheses. This process is epistemically reliable not in virtue of getting true answers at a certain frequency, but in virtue of its procedural properties which guarantee that the process will, sooner or later, falsify wrong hypotheses. 

To summarise, according to process reliabilism, a belief-formation process is reliable as a function of its propensity to produce true beliefs. Furthermore, the reliability (as defined just now) of a belief-formation process serves as the basis of epistemic justification for the resulting belief. How does this apply or not to the machine case from earlier (Rome’)? 

To answer this question, let us imagine that Bob continues to play with the model by asking it more questions about the capital cities of other countries. Assuming capabilities representative of the current state of the art in machine learning and large language models in particular, let us say that ChatAI’s responses to Bob’s questions are very often correct. We understand enough about how machine learning works that, beyond knowing that it is merely frequently correct, we can deny that ChatAI (and comparable AI systems) produces correct answers by mere coincidence. In particular, machine learning exploits insights from statistics and optimization theory to implement a form of inference on its training data. To prove this is the case and test the performance of different models, the machine learning communities regularly develop so-called ‘benchmarks’ based on various performance-relevant features of the model being evaluated, such as accuracy as well as speed or (learning) efficiency. As such, AI systems can, given appropriate design and training, produce correct outputs with high reliability and for non-arbitrary reasons. This suggests that, according to process reliabilism, outputs from ChatAI (and comparable AI systems) can qualify as being epistemically justified. 

Challenge 1: “You get out only what you put in”

However, the reliabilist picture as painted so far does not in fact hold up to scrutiny. The first problem I want to discuss concerns the fact that, even if procedurally truth-conducive, a process can produce systematically incorrect outputs if said process operates on wrong initial beliefs or assumptions. If, for example, Alice’s uncle was himself mistaken about what the capital of Italy is, thus moving to a city that he mistakenly thought was the capital, and if he had thus through his words and action passed on this mistaken belief to Alice, the same reasoning process she used earlier to arrive at a (seemingly) epistemically justified belief would now have produced an incorrect belief. Differently put, someone’s reasoning might be flawless, but if based on wrong premises, its conclusions must be regarded as null in terms of their epistemic justification. 

A similar story can be told in the machine case. A machine learning algorithm seeking to identify underlying statistical patterns of a given data set can only ever be as epistemically valid as is the data set it’s being trained on. As a matter of fact, this is a vividly discussed concern in the AI ethics literature, where ML models have been shown to reproduce bias present in their training sets. For example, language models have been shown (before corrective interventions were implemented) to associate certain professions (e.g., ‘CEO’ or ‘nurse’) predominantly with certain genders. Similarly, in the legal context, ML systems used to predict recidivism risk have been criticised for reproducing racial bias.  

What this discussion highlights is that the reliabilist thesis as I stated it earlier is insufficient. Thus, let us attempt to vindicate the thesis before I discuss a second source of criticism that can be raised against it. As such, we can reformulate a refined reliabilist thesis as follows: for a belief to be epistemically justified, it needs to a) be the product of a truth-conducive processes, and b) the premises on which said process operates to produce the (resulting) belief in question must themselves be justified. 

As some might notice, this approach, however, may be at risk of running into a problem of regress. If justified belief requires that the premises on which the epistemic process operates must be justified, how do those premises gain their justification other than by reference to a reliable process operating on justified premises? Without providing, in the context of this essay, a comprehensive account of how one may deal with this regress problem, I will provide a handful of pointers to such attempts that have been made. 

A pragmatist, for example, may emphasise their interests in a process that can reliably produce useful beliefs. Since the usefulness of beliefs is determined by its usage, this does not fall prey to the regress challenge as stated above. A belief can be tested for its usefulness without making reference to another belief. Klein (1999), on the other hand, denies that the type of regress at hand is vicious in the first place, making references to a view called infinitism. According to infinitism, justification requires an appropriate chain of reasons, and in the case of infinitism specifically, such chains take the form of non-repeating infinite ones. Finally, Goldman himself (2008) tackles the regress problem by differentiating between basic and non-basic beliefs, where the former is justified without reference to another belief but in virtue of being the product of an unconditionally reliable process. Such basic beliefs, then, represent a plausible stopping point for such a regress dynamic. Perception has been proposed as a candidate of such an unconditional process, although one may object to this account by denying that it is possible, or common, for perceptual or empirical data to be entirely atheoretical. In any case, the essence of Goldman’s proposal, and of the proposals of externalist reliabilists in general, is that a belief must be justified not with reference to reflectively accessible reasons (which is what internalists propose), but in virtue of the causal process that produced the belief whether or not these processes make reference to other beliefs. As such, externalists are commonly understood to be able to dodge the regress bullet. 

For now, this shall suffice as a treatment of the problem of regress. I will now discuss another challenge to process reliabilism (including its refined version as stated above). It concerns questions regarding the domain in which the reliability of a process is being evaluated. 

Challenge 2: Generalization and its limits

To understand the issue at hand better, let’s consider the “new evil demon problem”, first raised by Cohen (1984) as a critique against reliabilism. The problem arises from the following thought experiment: Imagine a world WD in which there exists an epistemic counterpart of yours, let’s call her Anna, who is identical to you in every regard except one. She experiences precisely what you experience and believe precisely what you believe. According to process reliabilism, you are epistemically justified in beliefs about this world—let’s call it WO—on the basis of those beliefs being the product of truth-conducive processes such as perception or rational reasoning. In virtue of the same reasoning, Anna ought to be epistemically justified in her beliefs about her world. However, and this is where the problem arises, the one way in which Anna differs from you is that her experiences and beliefs of WD have been carefully curated by an evil demon with the aim of deceiving her. Anna’s world does not in fact exist in the way she experiences it. On a reliabilist account, or so some would argue, we would have to say that Anna’s beliefs are not justified, since her belief-formation processes do not reliably lead to correct beliefs. However, how can your counterpart, who in every regard relevant to the reliabilist thesis is identical to you, not be justified in their beliefs while you are? The dilemma arises in that many would intuitively say that Anna is just as justified in believing what she believes as we are, despite the fact that the process that produced Anna’s belief is unreliable. 

One way to cast the above problem–which also reveals a way to diffuse it–is by indexing and then separately evaluating the reliability of the belief-formation processes for the different worlds, WO and WD. From here, as developed by Comesaña (2002), we can make the case that while the belief-formation processes are reliable in the case of WO, they are not in the case of WD. As such, the reliability of a process, and thus epistemic justification, must always be assessed relative to a specific domain of application. 

Another similar approach to the same problem has been discussed for example by Jarrett Leplin (2007, 2009) by invoicing the notion of ‘normal conditions’, a term originally introduced by Ruth Millikan in 1984. The idea is that the reliability of a process is evaluated with respect to the normal conditions of its functioning. Lepin defines normal conditions as “conditions typical or characteristic of situations in which the method is applicable” and explains that “[a] reliable method could yield a preponderance of false beliefs, if used predominantly under abnormal conditions” (Lepin, 2007, p. 33). As such, the new evil demon case can be understood as a case where the epistemic processes which are reliable in a demon-less world cease to be reliable in the demon world, since that world no longer complies with the ‘normal conditions’ that guarantee the functionality of said process. While promising as an approach to address a range of challenges raised against reliabilism, there is, one must note, still work to do in terms of clearly formalising the notion of normality.

What both of these approaches share in common is that they seek to defend reliabilism against the new evil demon problem by means of specifying the domain or conditions in which the reliability of a process is evaluated. Instead of suggesting that, for a process to be reliable—and thus to serve as a basis for epistemic justification—it has to be universally reliable, these refinements to reliabilism seek to formalise a way of putting boundaries on the application space of a given process. As such, we can understand the new evil demon problem as an instance of a more general phenomena: of generalization and its limits. This way of describing the problem serves to clarify how the new evil demon problem relates to issues frequently discussed in the context of machine learning.

The problem of generalization in machine learning concerns the fact that the latter, generally speaking, works by trying to exploit underlying patterns to approximate functions that efficiently describe the data encountered. While this approach (and others) has enabled impressive AI applications to date, it faces important limitations. In particular, this learning method is based on an assumption, commonly called IID (i.e., independent and identically distributed sampling), which says that the data set used in training must be representative of the data encountered upon deployment for there to be a guarantee of the effectiveness or accuracy of the learned model. In other words, while we have guarantees about a model’s performance (i.e., accuracy/loss) under the IID assumption, these guarantees no longer hold when the nature of the distribution changes, i.e., when we encounter what is called a distributional shift. Under distributional shift, whatever approximation function a model has learnt will no longer be effective in the new (deployment) environment. This would be called a case of failure to generalise.

Let us reiterate the suggested analogy between the new evil demon problem and the problem of out-of-distribution generalization failures in machine learning. I claim that the demon world WD represents an ‘outside-of-distribution case’ for the epistemic processes that in our world WO are reliable. Though Anna nominally uses the same processes, because she uses them in an importantly different environment, it makes it seem unsurprising that they turn out to be unreliable in WD. Afterall, the reality of WD differs in fundamental ways from WO (namely, the existence of the evil demon). Insofar as the thought experiment is intended to suggest that the demon itself may be subject to completely different fundamental laws than the ones that govern WO, the same processes that can approximate the fundamental laws of WO are not guaranteed to approximate the fundamental laws that govern WD. As such, I have vindicated process reliabilism from the evil demon problem by squaring what earlier appeared counterintuitive: the same processes that are reliable—and thus the basis for epistemic justification in our world (WO)—can turn out to be unreliable in an environment sufficiently foreign to ours, such as the demon world WD. 

Conclusion 

In this essay, I have set out to evaluate the question of epistemic justification. Most centrally, I discussed whether the proposal of process reliabilism may serve as a basis for justification. To this effect, I raised several challenges to process reliabilism. For example, I observed that a reliable process operating on false premises (or, corrupted data) may cease to systematically produce correct beliefs. We then discussed ways to refine reliabilism to accommodate said concern, and how such refinements may or may not fall prey to a problem of regress. More practically speaking, I linked this discussion to the machine case by explaining how AI systems, even if they may operate on reliable processes, may become corrupted in their ability to produce epistemically justified outputs due to algorithmic bias due to having been trained on non-representative data samples. 

The second challenge to reliabilism I discussed concerns details of how the reliability of a process should be evaluated. In particular, I identified a need to specify and bound a ‘domain of application’ in reference to which a process’s reliability is established. The goal of such a demarcation—which may come in the form of indexing as suggested by Comesaña, in the form of defining normal conditions such as proposed by Leplin, or in some other way—is to be sensitive to (the limits of) a process’s ability to generalise. As such, over the course of this discussion, I developed a novel perspective on the new evil demon problem by casting it as an instance of a cluster of issues concerning generalisation and its limits. While the new evil demon problem is commonly raised as an objection to process reliabilism—claiming that the reliabilist solution to the case is counterintuitive—I was able to vindicate reliabilism from these allegations. Anna’s epistemic processes—despite being nominally the same as ours—do fail to be reliable; however, said failure must not be surprising to us because the demon world represents an application domain that is sufficiently and relevantly different from our world. 

Throughout the essay, I have attempted to straddle both the classical domain of epistemological inquiry, as well as a more novel domain, which one may call ‘machine’ epistemology. I believe this dialogue can be methodologically fruitful, and hope to have been able to provide evidence towards that conviction by means of the preceding discussion. It may serve as source of inspiration; it may, as discussed at the start of this essay, help us appropriately de-condition ourselves from unjustified assumptions such as forms of anthropocentrism; and it may serve as a practical testing ground and source of empirical evidence towards assessing the plausibility of different epistemological theories. Unlike with humans or mental processes, machines provide us with a larger possibility space and more nimbleness in implementing and testing our theoretical proposals. This is not to say that there aren’t dis-analogies between artificially intelligent machines and humans, and as such, any work that seeks to reap said benefits is also required to adopt the relevant levels of care and philosophical rigor. 

As a last, brief and evocative thought before the conclusion of this essay, let us return to a question raised at the very beginning of this essay. When comparing the two cases Rome and Rome’, we asked ourselves whether we should conclude that, by analogy between these two cases, insofar as Alice is deemed justified in believing the capital of Italy is Rome, so must be ChatAI. First, we must recognise that the only way to take this analogy seriously is to adopt an externalist perspective on the issues—that is, at least unless we are happy to get sucked into discussions of the possibility of machine mentality and reflective awareness of their own reasons. While some may take offense with this on the basis of favouring internalism over externalism, others—including me—may endorse this direction of travel for metaphysical reasons (see, e.g., Ladyman & Ross, 2007). Afterall—and most scientific realists would agree on this—whatever processes give rise to human life and cognition, they must in some fundamental sense be mechanistic and materialistic (i.e., non-magical) in just the way machine processes are. As the field of AI continues to uncover ever more complex processes, it would not be reasonable to exclude the possibility that they will, at some point—and in isolated cases already today—resemble human epistemic processes sufficiently that any basis of epistemic justification must either stand or fall for both types of processes simultaneously. This perspective can be seen as unraveling further depth in the analogy between classical and machine epistemology, and as such, provide support towards the validity of said comparison for philosophical and scientific thought.  

Resources

  • Cohen, Stewart (1984). “Justification and Truth”, Philosophical Studies, 46(3): 279–295. doi:10.1007/BF00372907

  • Comesaña, Juan (2002). “The Diagonal and the Demon”, Philosophical Studies, 110(3): 249–266. doi:10.1023/A:1020656411534

  • Conee, Earl and Richard Feldman (1998). “The Generality Problem for Reliabilism”, Philosophical Studies, 89(1): 1–29. doi:10.1023/A:1004243308503

  • Feldman, Richard (1985). “Reliability and Justification”:, The Monist, 68(2): 159–174. doi:10.5840/monist198568226

  • Goldman, Alvin (1979). “What is Justified Belief?” In George Pappas (ed.), Justification and Knowledge. Boston: D. Reidel. pp. 1-25.

  • Goldman, Alvin (1986). Epistemology and Cognition, Cambridge, MA: Harvard University Press.

  • Goldman, Alvin (2008). “Immediate Justification and Process Reliabilism”, in Quentin Smith (ed.), Epistemology: New Essays, New York: Oxford University Press, pp. 63–82.

  • Goldman, Alvin (2009). “Internalism, Externalism, and the Architecture of Justification”, Journal of Philosophy, 106(6): 309–338. doi:10.5840/jphil2009106611

  • Goldman, Alvin (2011). “Toward a Synthesis of Reliabilism and Evidentialism”, in Trent Dougherty (ed.), Evidentialism and Its Discontents, New York: Oxford University Press, pp. 254–290.

  • Janiesch, C., Zschech, P., & Heinrich, K. (2021). “Machine learning and deep learning”, Electronic Markets, 31(3), 685-695.

  • Klein, P. (1999). “Human Knowledge and the Infinite Regress of Reasons,” in J. Tomberlin, ed. Philosophical Perspectives 13, 297-325. 

  • Ladyman, James & Ross, Don (2007). Every Thing Must Go: Metaphysics Naturalized. Oxford University Press.

  • Leplin, Jarrett (2007). “In Defense of Reliabilism”, Philosophical Studies, 134(1): 31–42. doi:10.1007/s11098-006-9018-3

  • Leplin, Jarrett (2009). A Theory of Epistemic Justification, (Philosophical Studies Series 112), Dordrecht: Springer Netherlands. doi:10.1007/978-1-4020-9567-2