On the automation of Science - an incomplete account

Introduction

In the context of this essay, I investigate the question:  Can science be automated by artificial intelligence?

To investigate arguments for and against the possibility of science-AI, I will aim to answer the following question: is AI, given its own functional logic, able to implement the functional logic that underlies science?  

What is a functional logic, and how does it help us to answer whether it is possible to automate science by means of AI? The idea is to characterize and then contrast what one might consider the “two sides of an equation”, i.e., the respective functional logics of science-making and AI. On one end, we need to understand the functioning of science-making. Insofar as the scientific endeavor is successful in a way that, say, mere doing or mere thinking is not, what is it about science-making that can account for that? On the other end, we need to characterize the functioning of (machine learning (ML)-based) AI. How does it work and what performances can it (not) achieve, in principle and under specific conditions? Then, we can contrast these two sets of “functional logics” and evaluate whether the functional logic of ML is able to implement the functional logic of science-making, or whether there are (fundamental or practical) limitations which would preclude this. The idea is that, if the functional logic of science is not expressible by the functional logic of ML, we can conclude that, at least within the ML paradigm, full-scale AI-automated science (what I henceforth simply refer to as “science-AI”) is not feasible. 

I proceed as follows: I have just introduced the high-level approach I am taking. Next, and before I can properly dive into the discussion, I briefly discuss what might motivate us to ask this question, and then make a few clarifying remarks concerning the terminology used. With this out of the way, I can start evaluating the arguments. In Part 1, I start by laying out a picture which aims to reject the science-AI conjecture. The argument is, in short, that science requires a) “strong generalization”, i.e., the ability to come up with sensible abstractions (i.e., inductive reasoning), and b) “deductive reasoning”, i.e., the ability to use those abstractions sensibly and reliably. In both cases, there are reasons to doubt whether the functional logic of ML systems allows them to do this, or do this sufficiently well. Parts 2 and 3 explore two plausible ways to rebut this skeptical view. The first one explores whether the fact that ML systems are (at the limits) universal function approximators can recover the possibility of science-AI. The second one observes that humans too show cognitive limitations (arguably not dissimilar from the ones I will discuss in the context of ML systems), while still being capable of science-making (arguably). I will conclude that, based on the objections raised in Parts 2 and 3, the argument against the possibility of science-AI does not succeed.

Motivations

Why might one be interested in investigating the possibility of science-AI? 

First, we may hope to learn things not only about the current and prospective capability level (and limits thereof) of AI systems, but, quite plausibly, also about the nature of science-making itself. As such, this question appears to have intellectual merits on its own grounds. 

Furthermore, the question is of interest on more practical grounds, too. The scientific revolution brought about a massive acceleration in humans’ ability to produce knowledge and innovation, which in turn has led to astonishing improvements to the quality of human life. As such, the prospect of automating scientific progress appears promising. At the same time, AI capable of automating science may also be dangerous. Afterall, scientific progress also produced technologies that pose immense risks to the wellbeing and survival of humanity, including nuclear weapons and the ability to engineer pathogens or technologies that facilitate mass surveillance and oppressions by states or other powerful actors (Bostrom, 2019). As such, if we knew that science-AI was possible, this ought to motivate us to adopt caution and start working on improved societal and governance protocols to help use these capabilities safely and justly.  

Finally, there are serious worries that the growing adoption of AI applications contribute to an “epistemic crisis”, which poses a threat (in particular) to decision-making in democratic societies (e.g., Seger et al., 2020). Among others, these systems can be used to generate text, images, video, and voice recordings which do not necessarily represent reality truthfully and which people might interpret as real even if fake. As such, if we were capable of building AI systems that systematically skew towards truth (as opposed to, say, being riddled with the sort of confabulations that we can see in state-of-the-art language models (Ziwei et al., 2022)), this may help decrease such epistemic risks. 

Some clarifications

As mentioned, a few brief points of clarification are in order before I can properly dive into discussing the (im)possibility of science-AI; in particular, with respect to how I will be using the terms “science”, “AI”, and “automation”. 

First, there does not exist broad consensus among the relevant epistemic community as to what the functional logic (or logics) of science-making is, nor can it be taken as a given that there exists a singular such functional logic. For example, philosophers like Karl Popper have questioned the validity of any use of inductive reasoning in science and instead put front and center the deductive process of falsification (Popper, 1934; Popper, 1962). In contrast, other accounts of science-making do very much rely on inductive processes, such as Bayesian statistics, in order to evaluate how much a given hypothesis is supported by the available evidence (e.g., Sprenger, Hartmann, 2019). In the context of this essay, however, I am not trying to settle the question of the correct account of the scientific method; I instead adopt a degree of methodological pluralism and evaluate the conceptual plausibility of end-to-end science-AI in accordance with several of the most prominent accounts of the scientific method.  

Second, I need to clarify what I do and don’t mean by artificial intelligence. The term in principle refers to a putative technology that implements intelligent behavior through artificial means (e.g., on silicon); it does not, on its own, specify how it does it. To clarify matters in the context of this essay, I (unless otherwise specified) always refer to implementations of AI that reside within the paradigm of ML. I believe this is a justified assumption to make because ML is the currently dominant paradigm in the field of AI, it is the most successful paradigm to date, and there is no particular reason to expect this trend will halt in the near future. Now, having conditioned a technical paradigm for AI, we can reason more substantively about the possibilities and limitations of ML-based AI systems when it comes to science-making by drawing on fields such as ML theory, optimization theory, etc. 

Third, when talking about the automation of science, one might have in mind partial automation (i.e., the automation of specific tasks that are part of but don’t comprise the whole of the scientific enterprise), or a full, end-to-end automation of the scientific process by means of AI. In the context of this essay, I primarily focus on the conceptual plausibility of the latter: end-to-end science-AI. The line of demarcation I wish to draw is not about whether the automation involves a single or multiple (e.g., an assembly of) AI applications, but rather whether human scientists are still a required part of the process (such as is the case for what I call parietal automation) or not (such as is the case for what I call end-to-end automation or end-to-end science-AI). 

With this out of the way, it is now time to dive into the discussion.

Part 1: Contra science-AI 

In this section, I lay out the case against the possibility of science-AI. In short, I argue that autonomous scientific reasoning requires i) the ability to form sensible abstractions which function as bases for generalizing knowledge from past experience to novel environments, and ii) the ability to use such abstractions reliably in one’s process of reasoning, thereby accessing the power of deductive or compositional reasoning. However, or so the argument goes, ML systems are not appropriately capable of forming such abstractions and of reasoning with them. 

First, let me clarify the claim that abstractions and deductive reasoning play central roles in science-making. Generalization refers to the ability to apply insights to a situation that is different to what has previously been encountered. Typically, this form of generalization is made possible by means of forming the “right” abstractions, i.e., ones that are able to capture those informational structures that are relevant to a given purpose across different environments (Chollet, 2019). When I invoke the concept of a dog, for example, I don’t have a specific dog in mind, although I could probably name specific dogs I have encountered in the past, and I could also name a number of features that dogs typically (but not always) possess (four legs, fur, dog ears, a tail, etc.). The “dog” case could be understood as an example of relatively narrow abstraction. Think now, instead, of the use of concepts like “energy”, “mass”, or “photon” in physics, or of a “set” or “integration” or “equation” in mathematics. Those concepts are yet farther removed from any specific instances of things which I can access directly via sensory data. Nevertheless, these abstractions are extremely useful in that they allow me to do things I couldn’t have done otherwise (e.g., predict the trajectory of a ball hit at a certain angle with a certain force, etc.).  

Scientific theories critically rely on abstraction because theories are expressed in terms of abstractions and their functional relationship to each other. (For example, the law of the conservation of energy and mass describes the relationship between two abstractions—“energy” and “mass”; in particular, this relationship can be expressed as: E= mc2). The use of abstractions is what endows a theory with explanatory power beyond the merely specific, contingent example that has been studied empirically. At the same time, the usefulness of a theory is dependent on the validity of the abstractions it makes use of. A theory that involves abstractions that do not carve reality sufficiently at its joints sufficiently will very likely fail to make reliable predictions or produce useful explanations. 

Furthermore, the ability to form valid abstractions constitutes the basis for a second critical aspect of scientific cognition, namely, deductive and compositional reasoning. By deductive reasoning, I am referring to such things as deductive logic, arithmetics, sorting a list, and other tasks that involve “discrete” representations and compositionality  (Chollet, 2020). In the case of science-making in particular, falsification or disconfirmation play a central role and are established by means of deductive reasoning such as in the hypothetico-deductive account (e.g., Sprenger, 2011; Hempel 1945). The ability to use, or reason over, abstractions allows for so-called “combinatorial generalization”. It is this compositionality of thought that, it has been argued, is a critical aspect of human-level intelligence by giving the reasoner access to a schema of “infinite use of finite means” (Humboldt, 1836; Chomsky, 1965). 

Having made the case for why science-making relies on the ability to i) form and ii) reason with abstractions, I can now investigate the arguments at hand for believing ML systems are not appropriately capable of i) and ii).

Reasons for skepticism come from empirical observation (i.e., using the state-of-the-art models and seeing how they “break”), theoretical arguments, and expert judgment.  In terms of the latter, Cremer (2021) surveys “expert disagreement over the potential and limitations of deep learning”. With expert opinions diverging, Cremer identifies a set of plausible origins of said disagreements, centrally featuring questions concerning the ability of artificial neural networks to “form abstraction representations effectively” or the extent of their ability to generalize as key origins of those disagreements (p.7). 

To elaborate more on the theoretical arguments for ML skepticism, it is worth exploring the way in which ML methods face challenges in their ability to generalize (e.g., Chollet, 2017; Battaglia, 2019; Cartuyvels et al., 2021; Shanahan, Mitchell, 2022). ML uses statistical techniques to extract (“learn”) patterns from large swaths of data. It can be understood as aiming to approximate the underlying function which generated the data it is getting trained on. However, this interpolative learning leads to brittleness if the systems get deployed outside of the distribution of the training data. This phenomenon is well known in the ML literature and usually discussed under terms such as out-of-distribution (OOD) generalization failure. Under distributional shift (i.e., cases where the data depict a different distribution compared to the training environment), the approximation function the model learned under training is no longer guaranteed to hold, leading to a generalization failure. The risk of failures to generalize, so the argument goes, limits the potential to use ML for end-to-end science automation because we cannot sufficiently trust the soundness of the process. 

Furthermore, ML systems are notoriously bad at discrete tasks (see, e.g., Marcus, 2018; Cartuyvels et al., 2021; etc.). While state-of-the-art ML systems are not incapable of (and are getting better at), say, simple forms of arithmetics (e.g., adding up two-digit numbers), it is noteworthy that tasks that take only a few lines of code to automate reliably in the paradigm of classical programming have remained outside of the reach of today’s several-billion-parameter-sized ML models. To quote François Chollet, a prominent AI researcher, deliberately misquoting Hinton, expert and pioneer of deep learning: “Deep learning is going to be able to do everything perception and intuition, but not discrete reasoning” (Chollet, 2020). This unreliability in deductive reasoning exhibited by ML systems is another reason for skepticism towards the possibility of end-to-end science-AI. 

To summarize the argument, current ML-based AI systems appear to face limitations with respect to their ability to achieve “broad” generalization, to form sensible abstractions, and to use those abstractions reliably. Given these limitations, society would be ill-advised to rely on theories, predictions, and explanations proposed by science-AI. Of course, and this is worth noting, end-to-end science-AI is a high bar. The view presented above is entirely compatible with predicting that AI systems will be used to automate or augment many aspects of science-making, and it may not require a lot of places where humans “patch” the process.

Having elaborated on the case against the possibility of science-AI, I now move to investigating two plausible lines of reasoning aiming to defeat the suggested conclusion.

Part 2: Universal function approximation

The first argument that I will discuss against the purported limitations of ML builds on the claim that ML systems are best understood as universal function approximators (UFA). From this follows the conjecture that there must exist a certain level of computational power at which ML systems are able to sufficiently approximate the science-making function. 

In short, UFA refers to the property of neural networks that, for whatever function f(x), there exists a neural network that can approximate said function. There exists a mathematical theorem proving a version of this property for different cases, e.g., for neural networks of arbitrary width (i.e., arbitrary number of neurons) or arbitrary depth (i.e., arbitrary number of layers), as well as in bounded cases (e.g., Hornik, Stinchcombe, White, 1989; Gripenberg, 2003). 

Let’s say we accept that ML systems are accurately understood as UFAs, and that, given that, ML systems are able, in principle, to implement the functional logic of science-making. However, this picture raises an important question: (when) is approximation enough?

There is, after all, a difference between “the thing, precisely” and “the thing, approximately”. Or is there? Imagine you found  a model M1 which approximates function F with an error of ε1. And imagine that the approximation is insufficient—that ε1 is too large for M1 to properly fulfill the function of F. Well, in that case, on grounds of the universal approximation theorem, there exists another model M2 with ε2<ε1. If ε2 is still too big, one can try M3, and so on. As such, you can, in principle, get arbitrarily close to “the thing”, or in other words, the difference between “the thing” and its approximation get arbitrarily small in the limits. 

One might still object to this conceptual argument with a practical worry. It may be prohibitively expensive (in terms of energy, model size/chips, or time) to get arbitrarily close to the “true” function of science-making. However, I suggest we have pragmatic reasons to not be too worried by this concern. After all, we can hardly expect that human scientists always pick out the right abstractions when constructing their theories. More so, most feats of engineering rely on theories that we know use abstractions that aren’t completely true, and yet have been shown to be “sufficiently” true (in a pragmatist sense) in that they produce useful epistemic products (including bridges that don’t collapse and airplanes that stay in the air). For example, the framework of classical physics was, in some sense, proven wrong by Einstein’s theories of relativity. And yet, most engineering programs are entirely happy to work within the classical framework. As such, even if ML systems “only” approximate the function of science-making, we have all the reasons to expect that they are capable of finding sufficient approximations such that,  for all practical purposes, they will be capable of science-making. 

Finally, science-AI must not look like a monolithic structure consisting of a single ML model and its learned behavior policy. Instead, we can imagine a science-AI assembly system which, for example, trains "abstraction forming" and "deductive reasoning" circuits separately, and which are later combined to interface with each other autonomously. This idea of a compositional science-AI shares a resemblance with the vision of a Society of Minds sketched by Marvin  Minsky in 1986, where he argues that human intelligence emerges from interactions of many simple “agents” with narrow skills or functions. Moreover, we can even use ML to discover which forms of compositionality (i.e., “task division”) might be best suitable for a science-AI assembly, insofar as my earlier vague suggestion of integrating an "abstraction forming" and "deductive reasoning" circuit might not be the ideal solution. There already exist examples of current-day ML systems trained based on similar ideas, e.g., Gururangan et al., 2023. 

To summarize, I have argued that UFA theorems prove that AI systems—contra the skeptical picture laid out in Part 1—are in principle able to implement science-making. I further provided arguments for why we can expect this technology to not only be conceptually feasible but also practically plausible. 

Part 3: The possibility of science-making despite limitations 

Let us now turn to the second argument against the skeptical picture proposed in Part1. This argument starts by concededing that ML systems face relevant limitations in their ability to form and reliably use abstractions. However, the argument continues, so do humans (and human scientists), and still, they are capable of doing science (arguably). Thus, the argument about inductive limits of ML systems cannot, on its own, defeat the possibility of science-AI. 

To unravel this argument, let us first discuss the claim that both ML and human “reasoners” are limited, and limited in relevantly similar ways. I have already laid out the case for limitations in ML which arise  from the fundamentally continuous and inferential nature of ML. According to our current best theories of human cognition—such as the Bayesian Brain Hypothesis (e.g., Deneve, 2005; Doya, et al., 2007; Knill, Pouget, 2004), Predictive Processing (e.g., Clark, 2013; Clark, 2015; Kanai et al., 2015), and most recently, Active Inference (Parr, Pezzullo, Friston, 2022)—the brain can essentially be understood as a “large inference machine”. As such, the low-level implementation of human reasoning is understood to be similarly continuous and inferential. 

This is, of course, not to deny that humans exhibit higher-level cognitive skills, such as verbal reasoning or metacognition, which are correctly understood to exceed “mere statistics”. Rather, the point I am trying to make is that these higher-level capabilities emerge from the low-level (continuous and inferential) implementation of the neural make-up of the brain. This serves as an existence proof that this sort of low-level implementation can, under certain circumstances, give rise to what one may consider to be more typically associated with “2-type” reasoning (Kahneman, 2017). As such, we have shown that the argument presented in Part 1—that, given the functional logic of modern-day ML, AI will not be able to implement all necessary aspects of scientific reasoning (such as generalization or deductive reasoning)—does not prove what it was meant to prove (the impossibility of science-AI). 

Furthermore, it also shows that a cognitive process must not be flawless in order to be able to implement science-making. Human reasoning is, of course, not without flaws. For example, human scientists regularly pick “wrong” abstractions (e.g., “phlogiston”, “ether”—to name only a few famous cases from the history of science). Or, human scientists are not immune to motivated reasoning and cognitive biases such as confirmation bias or hypothesis myopia (Nuzzo, 2015). The point is, despite these flaws in human reasoning—be that from structural limitations or merely computational boundedness—they have not prevented humans from developing and conducting science successfully. 

This last point raises an interesting question about the nature of science-making. Given the plentiful sources of bounded, flawed, and motivated reasoning depicted by human scientists, how are they still capable of producing scientific progress? One way to make sense of this (plausibly surprising) observation is to understand science as essentially a collective endeavor. In other words, individual scientists don’t do science, scientific communities do.  The idea is that science-making—a process that systematically skews towards the truth—emerges from implementing a collective “protocol”, by means of “washing out”, so to speak, the biased reasoning present at the level of individual scientists. Bringing this back to the question of science-AI, this raises the question whether we best think of science-AI as a single system approximating ideal scientific reasoning, or a system assembly where each individual system can have flaws in their epistemic processes, but the way they all interact produces behavior equivalent to science-making—just like the case for human scientists interacting today. 

To summarize, the argument presented here is two-fold: on one hand, the human reasoning ability is implemented by a continuous and inferential low-level process, serving as existence proof that such processes (which we can also find in machine learning) are in principle able to implement discrete tasks with adequate levels of robustness. On the other hand, science-making is implemented by fallible human reasoners who make mistakes similar in type to the ones discussed in Part 1 (e.g., picking leaky abstractions or misgeneralizing them), serving as an existence proof that processes which are fallible in this way can still implement science-making. 

Conclusion

In this essay, I explored the conceptual possibility of end-to-end science-AI, i.e., an AI system or assembly of systems which is able to functionally implement science-making with no help from humans (post-training). In Part1, I first made the case that end-to-end science-AI is not possible on the basis of noticing limitations of ML systems when it comes to their ability to form useful abstractions and to use these abstractions reliably. I argued that ML, given that it is based on interpolative learning from a given set of (training) data, faces important challenges in terms of its ability to generalize outside of its training data in the case of known or unknown distributional shifts upon deployment. Furthermore, I invoked the fact that ML systems are currently unreliable (and at the very least ineffective) at “discrete” types of reasoning. After developing this skeptical picture, I explored two sets of arguments which seek to recover the possibility of science-AI. 

First, I argued that ML systems are universal function approximators, and that in that capacity, there must exist a computational threshold at which they are able to implement the functional logic of science. Furthermore, I argued that there are pragmatic reasons to accept that this is not only conceptually possible but practically feasible insofar as approximation is enough, as evidenced by successful scientific and engineering feats, as a norm, relying “merely” on approximate truths. 

Second, I compared ML systems to human scientists claiming that, on one hand, the neurological implementation of human reasoning is structurally similar to ML, thus suggesting that ML methods can be expected to successfully scale to “higher-level” reasoning capabilities (including ones that appear particularly critical in science-making). On the other hand, the comparison also reveals how humans are capable of doing science despite the fact that the reasoning of individual humans is flawed in important ways. As such, some amount of brittleness in ML systems does not mean that they cannot successfully implement the scientific process. As such, the arguments discussed in Parts 2 and 3 succeed at defending the possibility of science-AI against the skeptical view laid out in Part 1. Beyond defending the conceptual possibility claim, the arguments also provide some support for the concrete, practical plausibility of science-AI. 

Let us conclude with one more evocative thought based on the analogy between ML and scientific reasoning explored over the course of this essay. Concerns about the generalization limits of ML systems pose an important problem: we need to be able to trust the systems we’re using, or—rather—we want to be able to know when and how much we are justified in trusting these systems. Epistemic justification—which I am taking, for the current purposes, to be a function of the reliability of a given epistemic process—is always defined relative to a given domain of application. This suggests that we want AI systems (among other things) to contain meta-data about their domain of applicability (i.e. the domain within which their generalization guarantees hold). What I want to suggest here is that the same insight also applies to scientific theories: we should more consistently strive to develop scientific theories which are —as an integral part of what it is to be a scientific theory—transparent about their domain of applicability, relative to which the theories does or not claim its predictions will generalize.  

References 

Battaglia, P. W., et al. (2018). Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.

Bender, E. M., et al. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Association for Computing Machinery. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).

Beren, M., et al. (2021). Predictive coding: a theoretical and experimental review. arXiv preprint arXiv:2107.12979.

Bostrom, N. (2019). The vulnerable world hypothesis. Global Policy, 10(4), 455-476.

Chang, H. (2022). Realism for Realistic People: A New Pragmatist Philosophy of Science. Cambridge University Press.

Chollet, F. (2017). The limitations of deep learning. Deep learning with Python. Retrieved from: https://blog.keras.io/the-limitations-of-deep-learning.html 

Chollet, F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.

Chollet, F. (2020). Why abstraction is the key to intelligence, and what we’re still missing. Talk at NeurIPS 2020. Retrieved from: https://slideslive.com/38935790/abstraction-reasoning-in-ai-systems-modern-perspectives 

Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press.

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204. doi:10.1017/S0140525X12000477.

Clark, A. (2015). Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford Academic.

Cremer, C. (2021). Deep limitations? Examining expert disagreement over deep learning. Progress in Artificial Intelligence. 10. 10.1007/s13748-021-00239-1. 

Cartuyvels, R., Spinks, G., & Moens, M. F. (2021). Discrete and continuous representations and processing in deep learning: Looking forward. AI Open, 2, 143-159.

Deneve, S. (2004). Bayesian inference in spiking neurons. Advances in neural information processing systems, 17.

De R., & Henk W. (2017). Understanding Scientific Understanding. New York: Oup Usa.

Doya, K., Ishii, S., Pouget, A., & Rao, R. P. (Eds.). (2007). Bayesian brain: Probabilistic approaches to neural coding. MIT press.

Gripenberg, G. (2003). Approximation by neural networks with a bounded number of nodes at each level. Journal of Approximation Theory. 122 (2): 260–266. 

Gururangan, S., et al. (2023). Scaling Expert Language Models with Unsupervised Domain Discovery. arXiv preprint arXiv:2303.14177.

Hempel, C. G. (1945). Studies in the Logic of Confirmation (II.). Mind, 54(214), 97–121. 

Hendrycks, D., et al. (2020). The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 8340-8349).

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359-366.

Humboldt, W. (1999/1836). On Language: On the diversity of human language construction and its influence on the mental development of the human species. Cambridge University Press.

Ji, Z., et al. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38.

Kahneman, D. Thinking, fast and slow. 2017.

Kanai, R., et al. (2015). Cerebral hierarchies: predictive processing, precision and the pulvinar. Philosophical Transactions of the Royal Society B, 370, 20140169.

Knill, D. C., & Pouget, A. (2004). The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation. TRENDS in Neurosciences, 27(12), 712–719.

Marcus, Gary. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

Mitchell, M. (2021). Abstraction and analogy‐making in artificial intelligence. Annals of the New York Academy of Sciences, 1505(1), 79-101.

Nuzzo, R. (2015). How Scientists Fool Themselves — and How They Can Stop. Nature. 526, 182. https://doi.org/10.1038/526182a. 

Parr, T., Pezzulo, G., & Friston, K. J. (2022). Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press.

Peters, U., et al. (2022). Generalization Bias in Science. Cognitive Science, 46: e13188. 

Popper, K. (1934). The Logic of Scientific Discovery. London, England: Routledge.

Popper, K. (1962). Conjectures and Refutations: The Growth of Scientific Knowledge. London, England: Routledge.

Seger, E., et al. (2020). Tackling threats to informed decision-making in democratic societies: promoting epistemic security in a technologically-advanced world. The Alan Turing Institute. 

Shanahan, M, & Melanie M. (2022). Abstraction for deep reinforcement learning. arXiv preprint arXiv:2202.05839.

Sprenger, J. (2011). Hypothetico-Deductive Confirmation. Philosophy Compass, 6: 497-508. 

Sprenger, J., & Hartmann, S. (2019). Bayesian Philosophy of Science: Variations on a Theme by the Reverend Thomas Bayes. Oxford and New York: Oxford University Press.

Trask, A., et al. (2018). Neural Arithmetic Logic Units. Advances in neural information processing systems, 31.

Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134-1142.