On the automation of Science - an incomplete account

Introduction

In the context of this essay, I investigate the question:  Can science be automated by artificial intelligence?

To investigate arguments for and against the possibility of science-AI, I will aim to answer the following question: is AI, given its own functional logic, able to implement the functional logic that underlies science?  

What is a functional logic, and how does it help us to answer whether it is possible to automate science by means of AI? The idea is to characterize and then contrast what one might consider the “two sides of an equation”, i.e., the respective functional logics of science-making and AI. On one end, we need to understand the functioning of science-making. Insofar as the scientific endeavor is successful in a way that, say, mere doing or mere thinking is not, what is it about science-making that can account for that? On the other end, we need to characterize the functioning of (machine learning (ML)-based) AI. How does it work and what performances can it (not) achieve, in principle and under specific conditions? Then, we can contrast these two sets of “functional logics” and evaluate whether the functional logic of ML is able to implement the functional logic of science-making, or whether there are (fundamental or practical) limitations which would preclude this. The idea is that, if the functional logic of science is not expressible by the functional logic of ML, we can conclude that, at least within the ML paradigm, full-scale AI-automated science (what I henceforth simply refer to as “science-AI”) is not feasible. 

I proceed as follows: I have just introduced the high-level approach I am taking. Next, and before I can properly dive into the discussion, I briefly discuss what might motivate us to ask this question, and then make a few clarifying remarks concerning the terminology used. With this out of the way, I can start evaluating the arguments. In Part 1, I start by laying out a picture which aims to reject the science-AI conjecture. The argument is, in short, that science requires a) “strong generalization”, i.e., the ability to come up with sensible abstractions (i.e., inductive reasoning), and b) “deductive reasoning”, i.e., the ability to use those abstractions sensibly and reliably. In both cases, there are reasons to doubt whether the functional logic of ML systems allows them to do this, or do this sufficiently well. Parts 2 and 3 explore two plausible ways to rebut this skeptical view. The first one explores whether the fact that ML systems are (at the limits) universal function approximators can recover the possibility of science-AI. The second one observes that humans too show cognitive limitations (arguably not dissimilar from the ones I will discuss in the context of ML systems), while still being capable of science-making (arguably). I will conclude that, based on the objections raised in Parts 2 and 3, the argument against the possibility of science-AI does not succeed.

Motivations

Why might one be interested in investigating the possibility of science-AI? 

First, we may hope to learn things not only about the current and prospective capability level (and limits thereof) of AI systems, but, quite plausibly, also about the nature of science-making itself. As such, this question appears to have intellectual merits on its own grounds. 

Furthermore, the question is of interest on more practical grounds, too. The scientific revolution brought about a massive acceleration in humans’ ability to produce knowledge and innovation, which in turn has led to astonishing improvements to the quality of human life. As such, the prospect of automating scientific progress appears promising. At the same time, AI capable of automating science may also be dangerous. Afterall, scientific progress also produced technologies that pose immense risks to the wellbeing and survival of humanity, including nuclear weapons and the ability to engineer pathogens or technologies that facilitate mass surveillance and oppressions by states or other powerful actors (Bostrom, 2019). As such, if we knew that science-AI was possible, this ought to motivate us to adopt caution and start working on improved societal and governance protocols to help use these capabilities safely and justly.  

Finally, there are serious worries that the growing adoption of AI applications contribute to an “epistemic crisis”, which poses a threat (in particular) to decision-making in democratic societies (e.g., Seger et al., 2020). Among others, these systems can be used to generate text, images, video, and voice recordings which do not necessarily represent reality truthfully and which people might interpret as real even if fake. As such, if we were capable of building AI systems that systematically skew towards truth (as opposed to, say, being riddled with the sort of confabulations that we can see in state-of-the-art language models (Ziwei et al., 2022)), this may help decrease such epistemic risks. 

Some clarifications

As mentioned, a few brief points of clarification are in order before I can properly dive into discussing the (im)possibility of science-AI; in particular, with respect to how I will be using the terms “science”, “AI”, and “automation”. 

First, there does not exist broad consensus among the relevant epistemic community as to what the functional logic (or logics) of science-making is, nor can it be taken as a given that there exists a singular such functional logic. For example, philosophers like Karl Popper have questioned the validity of any use of inductive reasoning in science and instead put front and center the deductive process of falsification (Popper, 1934; Popper, 1962). In contrast, other accounts of science-making do very much rely on inductive processes, such as Bayesian statistics, in order to evaluate how much a given hypothesis is supported by the available evidence (e.g., Sprenger, Hartmann, 2019). In the context of this essay, however, I am not trying to settle the question of the correct account of the scientific method; I instead adopt a degree of methodological pluralism and evaluate the conceptual plausibility of end-to-end science-AI in accordance with several of the most prominent accounts of the scientific method.  

Second, I need to clarify what I do and don’t mean by artificial intelligence. The term in principle refers to a putative technology that implements intelligent behavior through artificial means (e.g., on silicon); it does not, on its own, specify how it does it. To clarify matters in the context of this essay, I (unless otherwise specified) always refer to implementations of AI that reside within the paradigm of ML. I believe this is a justified assumption to make because ML is the currently dominant paradigm in the field of AI, it is the most successful paradigm to date, and there is no particular reason to expect this trend will halt in the near future. Now, having conditioned a technical paradigm for AI, we can reason more substantively about the possibilities and limitations of ML-based AI systems when it comes to science-making by drawing on fields such as ML theory, optimization theory, etc. 

Third, when talking about the automation of science, one might have in mind partial automation (i.e., the automation of specific tasks that are part of but don’t comprise the whole of the scientific enterprise), or a full, end-to-end automation of the scientific process by means of AI. In the context of this essay, I primarily focus on the conceptual plausibility of the latter: end-to-end science-AI. The line of demarcation I wish to draw is not about whether the automation involves a single or multiple (e.g., an assembly of) AI applications, but rather whether human scientists are still a required part of the process (such as is the case for what I call parietal automation) or not (such as is the case for what I call end-to-end automation or end-to-end science-AI). 

With this out of the way, it is now time to dive into the discussion.

Part 1: Contra science-AI 

In this section, I lay out the case against the possibility of science-AI. In short, I argue that autonomous scientific reasoning requires i) the ability to form sensible abstractions which function as bases for generalizing knowledge from past experience to novel environments, and ii) the ability to use such abstractions reliably in one’s process of reasoning, thereby accessing the power of deductive or compositional reasoning. However, or so the argument goes, ML systems are not appropriately capable of forming such abstractions and of reasoning with them. 

First, let me clarify the claim that abstractions and deductive reasoning play central roles in science-making. Generalization refers to the ability to apply insights to a situation that is different to what has previously been encountered. Typically, this form of generalization is made possible by means of forming the “right” abstractions, i.e., ones that are able to capture those informational structures that are relevant to a given purpose across different environments (Chollet, 2019). When I invoke the concept of a dog, for example, I don’t have a specific dog in mind, although I could probably name specific dogs I have encountered in the past, and I could also name a number of features that dogs typically (but not always) possess (four legs, fur, dog ears, a tail, etc.). The “dog” case could be understood as an example of relatively narrow abstraction. Think now, instead, of the use of concepts like “energy”, “mass”, or “photon” in physics, or of a “set” or “integration” or “equation” in mathematics. Those concepts are yet farther removed from any specific instances of things which I can access directly via sensory data. Nevertheless, these abstractions are extremely useful in that they allow me to do things I couldn’t have done otherwise (e.g., predict the trajectory of a ball hit at a certain angle with a certain force, etc.).  

Scientific theories critically rely on abstraction because theories are expressed in terms of abstractions and their functional relationship to each other. (For example, the law of the conservation of energy and mass describes the relationship between two abstractions—“energy” and “mass”; in particular, this relationship can be expressed as: E= mc2). The use of abstractions is what endows a theory with explanatory power beyond the merely specific, contingent example that has been studied empirically. At the same time, the usefulness of a theory is dependent on the validity of the abstractions it makes use of. A theory that involves abstractions that do not carve reality sufficiently at its joints sufficiently will very likely fail to make reliable predictions or produce useful explanations. 

Furthermore, the ability to form valid abstractions constitutes the basis for a second critical aspect of scientific cognition, namely, deductive and compositional reasoning. By deductive reasoning, I am referring to such things as deductive logic, arithmetics, sorting a list, and other tasks that involve “discrete” representations and compositionality  (Chollet, 2020). In the case of science-making in particular, falsification or disconfirmation play a central role and are established by means of deductive reasoning such as in the hypothetico-deductive account (e.g., Sprenger, 2011; Hempel 1945). The ability to use, or reason over, abstractions allows for so-called “combinatorial generalization”. It is this compositionality of thought that, it has been argued, is a critical aspect of human-level intelligence by giving the reasoner access to a schema of “infinite use of finite means” (Humboldt, 1836; Chomsky, 1965). 

Having made the case for why science-making relies on the ability to i) form and ii) reason with abstractions, I can now investigate the arguments at hand for believing ML systems are not appropriately capable of i) and ii).

Reasons for skepticism come from empirical observation (i.e., using the state-of-the-art models and seeing how they “break”), theoretical arguments, and expert judgment.  In terms of the latter, Cremer (2021) surveys “expert disagreement over the potential and limitations of deep learning”. With expert opinions diverging, Cremer identifies a set of plausible origins of said disagreements, centrally featuring questions concerning the ability of artificial neural networks to “form abstraction representations effectively” or the extent of their ability to generalize as key origins of those disagreements (p.7). 

To elaborate more on the theoretical arguments for ML skepticism, it is worth exploring the way in which ML methods face challenges in their ability to generalize (e.g., Chollet, 2017; Battaglia, 2019; Cartuyvels et al., 2021; Shanahan, Mitchell, 2022). ML uses statistical techniques to extract (“learn”) patterns from large swaths of data. It can be understood as aiming to approximate the underlying function which generated the data it is getting trained on. However, this interpolative learning leads to brittleness if the systems get deployed outside of the distribution of the training data. This phenomenon is well known in the ML literature and usually discussed under terms such as out-of-distribution (OOD) generalization failure. Under distributional shift (i.e., cases where the data depict a different distribution compared to the training environment), the approximation function the model learned under training is no longer guaranteed to hold, leading to a generalization failure. The risk of failures to generalize, so the argument goes, limits the potential to use ML for end-to-end science automation because we cannot sufficiently trust the soundness of the process. 

Furthermore, ML systems are notoriously bad at discrete tasks (see, e.g., Marcus, 2018; Cartuyvels et al., 2021; etc.). While state-of-the-art ML systems are not incapable of (and are getting better at), say, simple forms of arithmetics (e.g., adding up two-digit numbers), it is noteworthy that tasks that take only a few lines of code to automate reliably in the paradigm of classical programming have remained outside of the reach of today’s several-billion-parameter-sized ML models. To quote François Chollet, a prominent AI researcher, deliberately misquoting Hinton, expert and pioneer of deep learning: “Deep learning is going to be able to do everything perception and intuition, but not discrete reasoning” (Chollet, 2020). This unreliability in deductive reasoning exhibited by ML systems is another reason for skepticism towards the possibility of end-to-end science-AI. 

To summarize the argument, current ML-based AI systems appear to face limitations with respect to their ability to achieve “broad” generalization, to form sensible abstractions, and to use those abstractions reliably. Given these limitations, society would be ill-advised to rely on theories, predictions, and explanations proposed by science-AI. Of course, and this is worth noting, end-to-end science-AI is a high bar. The view presented above is entirely compatible with predicting that AI systems will be used to automate or augment many aspects of science-making, and it may not require a lot of places where humans “patch” the process.

Having elaborated on the case against the possibility of science-AI, I now move to investigating two plausible lines of reasoning aiming to defeat the suggested conclusion.

Part 2: Universal function approximation

The first argument that I will discuss against the purported limitations of ML builds on the claim that ML systems are best understood as universal function approximators (UFA). From this follows the conjecture that there must exist a certain level of computational power at which ML systems are able to sufficiently approximate the science-making function. 

In short, UFA refers to the property of neural networks that, for whatever function f(x), there exists a neural network that can approximate said function. There exists a mathematical theorem proving a version of this property for different cases, e.g., for neural networks of arbitrary width (i.e., arbitrary number of neurons) or arbitrary depth (i.e., arbitrary number of layers), as well as in bounded cases (e.g., Hornik, Stinchcombe, White, 1989; Gripenberg, 2003). 

Let’s say we accept that ML systems are accurately understood as UFAs, and that, given that, ML systems are able, in principle, to implement the functional logic of science-making. However, this picture raises an important question: (when) is approximation enough?

There is, after all, a difference between “the thing, precisely” and “the thing, approximately”. Or is there? Imagine you found  a model M1 which approximates function F with an error of ε1. And imagine that the approximation is insufficient—that ε1 is too large for M1 to properly fulfill the function of F. Well, in that case, on grounds of the universal approximation theorem, there exists another model M2 with ε2<ε1. If ε2 is still too big, one can try M3, and so on. As such, you can, in principle, get arbitrarily close to “the thing”, or in other words, the difference between “the thing” and its approximation get arbitrarily small in the limits. 

One might still object to this conceptual argument with a practical worry. It may be prohibitively expensive (in terms of energy, model size/chips, or time) to get arbitrarily close to the “true” function of science-making. However, I suggest we have pragmatic reasons to not be too worried by this concern. After all, we can hardly expect that human scientists always pick out the right abstractions when constructing their theories. More so, most feats of engineering rely on theories that we know use abstractions that aren’t completely true, and yet have been shown to be “sufficiently” true (in a pragmatist sense) in that they produce useful epistemic products (including bridges that don’t collapse and airplanes that stay in the air). For example, the framework of classical physics was, in some sense, proven wrong by Einstein’s theories of relativity. And yet, most engineering programs are entirely happy to work within the classical framework. As such, even if ML systems “only” approximate the function of science-making, we have all the reasons to expect that they are capable of finding sufficient approximations such that,  for all practical purposes, they will be capable of science-making. 

Finally, science-AI must not look like a monolithic structure consisting of a single ML model and its learned behavior policy. Instead, we can imagine a science-AI assembly system which, for example, trains "abstraction forming" and "deductive reasoning" circuits separately, and which are later combined to interface with each other autonomously. This idea of a compositional science-AI shares a resemblance with the vision of a Society of Minds sketched by Marvin  Minsky in 1986, where he argues that human intelligence emerges from interactions of many simple “agents” with narrow skills or functions. Moreover, we can even use ML to discover which forms of compositionality (i.e., “task division”) might be best suitable for a science-AI assembly, insofar as my earlier vague suggestion of integrating an "abstraction forming" and "deductive reasoning" circuit might not be the ideal solution. There already exist examples of current-day ML systems trained based on similar ideas, e.g., Gururangan et al., 2023. 

To summarize, I have argued that UFA theorems prove that AI systems—contra the skeptical picture laid out in Part 1—are in principle able to implement science-making. I further provided arguments for why we can expect this technology to not only be conceptually feasible but also practically plausible. 

Part 3: The possibility of science-making despite limitations 

Let us now turn to the second argument against the skeptical picture proposed in Part1. This argument starts by concededing that ML systems face relevant limitations in their ability to form and reliably use abstractions. However, the argument continues, so do humans (and human scientists), and still, they are capable of doing science (arguably). Thus, the argument about inductive limits of ML systems cannot, on its own, defeat the possibility of science-AI. 

To unravel this argument, let us first discuss the claim that both ML and human “reasoners” are limited, and limited in relevantly similar ways. I have already laid out the case for limitations in ML which arise  from the fundamentally continuous and inferential nature of ML. According to our current best theories of human cognition—such as the Bayesian Brain Hypothesis (e.g., Deneve, 2005; Doya, et al., 2007; Knill, Pouget, 2004), Predictive Processing (e.g., Clark, 2013; Clark, 2015; Kanai et al., 2015), and most recently, Active Inference (Parr, Pezzullo, Friston, 2022)—the brain can essentially be understood as a “large inference machine”. As such, the low-level implementation of human reasoning is understood to be similarly continuous and inferential. 

This is, of course, not to deny that humans exhibit higher-level cognitive skills, such as verbal reasoning or metacognition, which are correctly understood to exceed “mere statistics”. Rather, the point I am trying to make is that these higher-level capabilities emerge from the low-level (continuous and inferential) implementation of the neural make-up of the brain. This serves as an existence proof that this sort of low-level implementation can, under certain circumstances, give rise to what one may consider to be more typically associated with “2-type” reasoning (Kahneman, 2017). As such, we have shown that the argument presented in Part 1—that, given the functional logic of modern-day ML, AI will not be able to implement all necessary aspects of scientific reasoning (such as generalization or deductive reasoning)—does not prove what it was meant to prove (the impossibility of science-AI). 

Furthermore, it also shows that a cognitive process must not be flawless in order to be able to implement science-making. Human reasoning is, of course, not without flaws. For example, human scientists regularly pick “wrong” abstractions (e.g., “phlogiston”, “ether”—to name only a few famous cases from the history of science). Or, human scientists are not immune to motivated reasoning and cognitive biases such as confirmation bias or hypothesis myopia (Nuzzo, 2015). The point is, despite these flaws in human reasoning—be that from structural limitations or merely computational boundedness—they have not prevented humans from developing and conducting science successfully. 

This last point raises an interesting question about the nature of science-making. Given the plentiful sources of bounded, flawed, and motivated reasoning depicted by human scientists, how are they still capable of producing scientific progress? One way to make sense of this (plausibly surprising) observation is to understand science as essentially a collective endeavor. In other words, individual scientists don’t do science, scientific communities do.  The idea is that science-making—a process that systematically skews towards the truth—emerges from implementing a collective “protocol”, by means of “washing out”, so to speak, the biased reasoning present at the level of individual scientists. Bringing this back to the question of science-AI, this raises the question whether we best think of science-AI as a single system approximating ideal scientific reasoning, or a system assembly where each individual system can have flaws in their epistemic processes, but the way they all interact produces behavior equivalent to science-making—just like the case for human scientists interacting today. 

To summarize, the argument presented here is two-fold: on one hand, the human reasoning ability is implemented by a continuous and inferential low-level process, serving as existence proof that such processes (which we can also find in machine learning) are in principle able to implement discrete tasks with adequate levels of robustness. On the other hand, science-making is implemented by fallible human reasoners who make mistakes similar in type to the ones discussed in Part 1 (e.g., picking leaky abstractions or misgeneralizing them), serving as an existence proof that processes which are fallible in this way can still implement science-making. 

Conclusion

In this essay, I explored the conceptual possibility of end-to-end science-AI, i.e., an AI system or assembly of systems which is able to functionally implement science-making with no help from humans (post-training). In Part1, I first made the case that end-to-end science-AI is not possible on the basis of noticing limitations of ML systems when it comes to their ability to form useful abstractions and to use these abstractions reliably. I argued that ML, given that it is based on interpolative learning from a given set of (training) data, faces important challenges in terms of its ability to generalize outside of its training data in the case of known or unknown distributional shifts upon deployment. Furthermore, I invoked the fact that ML systems are currently unreliable (and at the very least ineffective) at “discrete” types of reasoning. After developing this skeptical picture, I explored two sets of arguments which seek to recover the possibility of science-AI. 

First, I argued that ML systems are universal function approximators, and that in that capacity, there must exist a computational threshold at which they are able to implement the functional logic of science. Furthermore, I argued that there are pragmatic reasons to accept that this is not only conceptually possible but practically feasible insofar as approximation is enough, as evidenced by successful scientific and engineering feats, as a norm, relying “merely” on approximate truths. 

Second, I compared ML systems to human scientists claiming that, on one hand, the neurological implementation of human reasoning is structurally similar to ML, thus suggesting that ML methods can be expected to successfully scale to “higher-level” reasoning capabilities (including ones that appear particularly critical in science-making). On the other hand, the comparison also reveals how humans are capable of doing science despite the fact that the reasoning of individual humans is flawed in important ways. As such, some amount of brittleness in ML systems does not mean that they cannot successfully implement the scientific process. As such, the arguments discussed in Parts 2 and 3 succeed at defending the possibility of science-AI against the skeptical view laid out in Part 1. Beyond defending the conceptual possibility claim, the arguments also provide some support for the concrete, practical plausibility of science-AI. 

Let us conclude with one more evocative thought based on the analogy between ML and scientific reasoning explored over the course of this essay. Concerns about the generalization limits of ML systems pose an important problem: we need to be able to trust the systems we’re using, or—rather—we want to be able to know when and how much we are justified in trusting these systems. Epistemic justification—which I am taking, for the current purposes, to be a function of the reliability of a given epistemic process—is always defined relative to a given domain of application. This suggests that we want AI systems (among other things) to contain meta-data about their domain of applicability (i.e. the domain within which their generalization guarantees hold). What I want to suggest here is that the same insight also applies to scientific theories: we should more consistently strive to develop scientific theories which are —as an integral part of what it is to be a scientific theory—transparent about their domain of applicability, relative to which the theories does or not claim its predictions will generalize.  

References 

Battaglia, P. W., et al. (2018). Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.

Bender, E. M., et al. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Association for Computing Machinery. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).

Beren, M., et al. (2021). Predictive coding: a theoretical and experimental review. arXiv preprint arXiv:2107.12979.

Bostrom, N. (2019). The vulnerable world hypothesis. Global Policy, 10(4), 455-476.

Chang, H. (2022). Realism for Realistic People: A New Pragmatist Philosophy of Science. Cambridge University Press.

Chollet, F. (2017). The limitations of deep learning. Deep learning with Python. Retrieved from: https://blog.keras.io/the-limitations-of-deep-learning.html 

Chollet, F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.

Chollet, F. (2020). Why abstraction is the key to intelligence, and what we’re still missing. Talk at NeurIPS 2020. Retrieved from: https://slideslive.com/38935790/abstraction-reasoning-in-ai-systems-modern-perspectives 

Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press.

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204. doi:10.1017/S0140525X12000477.

Clark, A. (2015). Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford Academic.

Cremer, C. (2021). Deep limitations? Examining expert disagreement over deep learning. Progress in Artificial Intelligence. 10. 10.1007/s13748-021-00239-1. 

Cartuyvels, R., Spinks, G., & Moens, M. F. (2021). Discrete and continuous representations and processing in deep learning: Looking forward. AI Open, 2, 143-159.

Deneve, S. (2004). Bayesian inference in spiking neurons. Advances in neural information processing systems, 17.

De R., & Henk W. (2017). Understanding Scientific Understanding. New York: Oup Usa.

Doya, K., Ishii, S., Pouget, A., & Rao, R. P. (Eds.). (2007). Bayesian brain: Probabilistic approaches to neural coding. MIT press.

Gripenberg, G. (2003). Approximation by neural networks with a bounded number of nodes at each level. Journal of Approximation Theory. 122 (2): 260–266. 

Gururangan, S., et al. (2023). Scaling Expert Language Models with Unsupervised Domain Discovery. arXiv preprint arXiv:2303.14177.

Hempel, C. G. (1945). Studies in the Logic of Confirmation (II.). Mind, 54(214), 97–121. 

Hendrycks, D., et al. (2020). The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 8340-8349).

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359-366.

Humboldt, W. (1999/1836). On Language: On the diversity of human language construction and its influence on the mental development of the human species. Cambridge University Press.

Ji, Z., et al. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38.

Kahneman, D. Thinking, fast and slow. 2017.

Kanai, R., et al. (2015). Cerebral hierarchies: predictive processing, precision and the pulvinar. Philosophical Transactions of the Royal Society B, 370, 20140169.

Knill, D. C., & Pouget, A. (2004). The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation. TRENDS in Neurosciences, 27(12), 712–719.

Marcus, Gary. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

Mitchell, M. (2021). Abstraction and analogy‐making in artificial intelligence. Annals of the New York Academy of Sciences, 1505(1), 79-101.

Nuzzo, R. (2015). How Scientists Fool Themselves — and How They Can Stop. Nature. 526, 182. https://doi.org/10.1038/526182a. 

Parr, T., Pezzulo, G., & Friston, K. J. (2022). Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press.

Peters, U., et al. (2022). Generalization Bias in Science. Cognitive Science, 46: e13188. 

Popper, K. (1934). The Logic of Scientific Discovery. London, England: Routledge.

Popper, K. (1962). Conjectures and Refutations: The Growth of Scientific Knowledge. London, England: Routledge.

Seger, E., et al. (2020). Tackling threats to informed decision-making in democratic societies: promoting epistemic security in a technologically-advanced world. The Alan Turing Institute. 

Shanahan, M, & Melanie M. (2022). Abstraction for deep reinforcement learning. arXiv preprint arXiv:2202.05839.

Sprenger, J. (2011). Hypothetico-Deductive Confirmation. Philosophy Compass, 6: 497-508. 

Sprenger, J., & Hartmann, S. (2019). Bayesian Philosophy of Science: Variations on a Theme by the Reverend Thomas Bayes. Oxford and New York: Oxford University Press.

Trask, A., et al. (2018). Neural Arithmetic Logic Units. Advances in neural information processing systems, 31.

Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134-1142.

Towards a Philosophy of Science of the Artificial

One common way to start an essay on the philosophy of science is to ask: ”Should we be scientific realists?” While this isn’t precisely the question I’m interested in here, it is the entry point of this essay. So bear with me.

Scientific realism, in short, is the view that scientific theories are (approximately) true. Different philosophers have proposed different interpretations of “approximately true”, e.g., as meaning that scientific theories “aim to give us literally true stories about what the world is like” (Van Fraassen, 1980, p. 9), or that “what makes them true or false is something external—that is to say, it is not (in general) our sense data, actual or potential, or the structure of our minds, or our language, etc.” (Hilary Putnam, 1975, p. 69f), or that their terms refer (e.g. Boyd, 1983), or that they correspond to reality (e.g. Fine, 1986). 

Much has been written about whether or not we should be scientific realists. A lot of this discussion has focused on the history of science as a source for empirical support for or against the conjecture of scientific realism. For example, one of the most commonly raised argument in support of scientific realism is the so-called no miracle argument (Boyd 1989): what can better explain the striking success of the scientific enterprise than that scientific theories are (approximately) true—that they are “latching on” to reality in some way or form. Conversely, an influential arguments against scientific realism is the argument from pessimistic meta-induction (e.g. Laudan 1981) which suggests that, given the fact that most past scientific theories have turned out to be false, we should expect our current theories to face the fate of being proven false (as opposed to approximately true). 

In this essay, I consider a different angle on the discussion. Instead of discussing whether scientific realism can adequately explain what we empirical understand about the history of science, I ask whether scientific realism provides us with a satisfying account of the nature and functioning of the future-oriented scientific enterprise—or, what Herbert Simon (1996) called the sciences of the artificial. What I mean by this, in short, is that an epistemology of science needs to be able to account for the fact that the scientist's theorising affects what will come into existence, as well as for how this happens. 

I proceed as follows. In part 1, I explain what I mean by the sciences of the artificial, and motivate the premise of this essay—namely that we should aspire for our epistemology of science to provide an adequate account not only of the inquiry into the “natural”, but also into the nature and coming-into-being of the “artificial”. In part 2, I analyse whether or not scientific realism provides such an account. I conclude that its application to the sciences of the artificial world exposes scientific realism, while not as false, as insufficiently expressive. In part 3, we briefly sketch how an alternative to scientific realism—pragmatism—might present a more satisfactory account. I conclude with a summary of the arguments raised and the key takeaways.

Part 1: The need for a philosophy of science of the artificial

The Sciences of the Artificial—a term coined by Herbert Simon in the eponymous book—refer to domains of scientific enterprise that deal not only with what is, but also with what might be. Examples include the many domains of engineering, medicine or architecture but also fields like psychology, economics or administration. What characterises all of these domains is their descriptive-normative dual nature. The study of what is is, both, informed and mediated by some normative ideal(s). Medicine wants to understand the functioning of the body in order to bring about and relative to a body’s healthy functioning; civil engineering studies materials and applied mechanics in order to build functional and safe infrastructure. In either case, at the end of the day, the central subject of their study is not a matter of what is—it does not exist (yet)—, instead it is the goal of their study to bring it into existence. In Simon’s words, the sciences of the artificial are “concerned not with the necessary but with the contingent [sic] not with how things are but with how they might be [sic] in short, with design” (p. xii).

Some might doubt that a veritable science of the artificial exists. After all, is science not quintessentially concerned with understanding what is—the laws and regularity of the natural world? However, Simon provides what I think is a convincing case that, not only is there a valid and coherent notion of a “science of the artificial”, but also that one of its most interesting dimensions is precisely its epistemology. In the preface to the second edition of the book, he writes: “The contingency of artificial phenomena has always created doubts as to whether they fall properly within the compass of science. Sometimes these doubts refer to the goal-directed character of artificial systems and the consequent difficulty of disentangling prescription from description. This seems to me not to be the real difficulty. The genuine problem is to show how empirical propositions can be made at all about systems that, given different circumstances, might be quite other than they are.” (p. xi).

As such, one of the things we want from our epistemology of science—a theory of the nature of scientific knowledge and the functioning of scientific inquiry—is to provide an adequate treatment of the science of the artificial. It ought, for example, to allow us to think not only about what is true but also what might be true, and how our own theorising affects what comes into being (i.e., what comes to be true). By means of metaphor, insofar as the natural sciences are typically concerned with the making of “maps”, the sciences of the artificial are interested in what goes into the making of “blueprints”. In the philosophy of science, we then get to ask: What is the relationship between maps and blueprints? For example, on one hand, the quality of our maps (i.e., scientific understanding) shape what blueprints we are able to draw (i.e., the things we are able to build). At the same time, our blueprints also end up affecting our maps. As Simon puts it: “The world we live in today is much more a man-made, or artificial, world than it is a natural world. Almost every element in our environment shows evidence of human artifice.“ (p. 2).

One important aspect of the domain of the artificial is that there is usually more than one low-level implementation (and respective theoretical-technological paradigm) through which desired function can be achieved. For example, we have found several ways to travel long distances (e.g. by bicycle, by car, by train, by plane). What is more, there exist several different types of trains (e.g. coal-powered, steam-powered or electric trains; high-speed trains using specialised rolling stocks to reduce friction, or so-called maglevs trains which use magnetic levitation). Most of the time, we must not concern ourselves with the low-level implementation because of their functional equivalency. Precisely because they are designed artefacts, we can expect them to depict largely similar high-level functional properties. If I want to travel from London to Paris, I generally don't have much reason to care what specific type of train I end up finding myself in. 

However, differences in their respective low-level implementation can start to matter under the ‘right’ circumstances, i.e., given relevant variation in external environments. Simon provides us with useful language to talk about this. He writes: “An artifact can be thought of as a meeting point[—]an ‘interface’ in today's terms[—]between an ‘inner’ environment, [i.e.,] the substance and organization of the artifact itself, and an ‘outer’ environment, [i.e.,] the surroundings in which it operates. If the inner environment is appropriate to the outer environment, or vice versa, the artifact will serve its intended purpose.” (p. 6). As such, the (proper) functioning of the artefact is cast in terms of the relationship between the inner environment (i.e., the artefact’s structure or character, its implementation details) and the outer environment (i.e., the conditions in which the artefact operates).” He further provides a simple example to clarify the point: “A bridge, under its usual conditions of service, behaves simply as a relatively smooth level surface on which vehicles can move. Only when it has been overloaded do we learn the physical properties of the materials from which it is built.” (p. 13). 

The point is that two artefacts that have been designed with the same purpose in mind, and which in a wide range of environments behave equivalently, will start to show different behaviours if we enter environments to which their design hasn’t been fully adapted to. Now, their low-level implementation (or ‘inner environments’, in Simon’s terminology) starts to matter. 

To further illustrate the relevance of this point to the current discussion, let us consider the field artificial intelligence (AI)—surely a prime example of a science of the artificial. The aspiration of the field of AI is to find ways to instantiate advanced intelligent behaviour in artificial substrates. As such, it can be understood in terms of its dual nature: it aims to both (descriptively) understand what intelligent behaviour is and how it functions, as well as how to (normatively) implement it. The dominant technological paradigm for building AGI at the moment is machine learning. However, nothing in principle precludes that other paradigms could (or will) be used for implementing AGI (e.g., some variation on symbolic AI, some cybernetic paradigm, soft robotics, or any number of paradigms that haven’t been discovered yet). 

Furthermore, different implementation paradigms for AGI imply different safety- or governance-relevant properties. Imagine, for example, an AGI built in the form of a “singleton” (i.e. a single, monolithic) compared to one built as a multi-agent, distributed system assembly. A singleton AGI, for example, seems more likely to lack interpretability (i.e. behave in ways and for reasons that, by default, remain largely obscure to humans), while a distributed system might be more likely to fall prey to game theoretic pitfalls such as collusion (e.g. Christiano, 2015). It is not at this point properly understood what the different implications of the different paradigms are, but the specifics of this must not matter for the argument I am trying to make here. The point is that, if the goal is to make sure that future AI systems will be safe and used to the benefit of humanity, it may matter a huge deal which of these paradigms is adopted, and to understand what different paradigms imply for considerations of safety and governability. 

As such, the problem of paradigm choice—choices over which implementation roadmap to adopt and which theories to use to inform said roadmap—comes into focus. As philosophers of science, we must ask: What determines paradigm choice? As well as: How, if at all, can a scientist or scientific community navigate questions of paradigm choice “from within” the history of science?

This is where our discussion of the appropriate epistemology for the sciences of the artificial properly begins. Next, let us evaluate whether we can find satisfying answers in scientific realism.  

Part 2: Scientific realism of the artificial

Faced with the question of paradigm choice, one answer that a scientific realist might give is that, what determines the right paradigm choices comes down entirely to how the world is. In other words, what the AI researcher does when trying to figure out how to build AGI is equivalent to uncovering the truth about what AGI, fundamentally, is. We can, of course, at a given point in time be uncertain about the ‘true nature’ of AGI, and thus be exploring different paradigms; but eventually, we will discover which of those paradigms turns out to be the correct one. In other words, the notion of paradigm choice is replaced with the notion of paradigm change. In essence, the aspiration of building AGI is rendered equivalent to the question of what AGI, fundamentally, is. 

As I will argue in what follows, I consider this answer to be dissatisfying in that it denies the very premise of the science of the artificial we have discussed in the earlier section. Consider the following arguments. 

First, the answer by the scientific realist seems to be fundamentally confused about the type-signature of the concept of “AGI”. AGI, in the sense I’ve proposed here, is best understood as a functional description—a design requirement or aspiration. As discussed earlier, it is entirely plausible that there exist several micro-level implementations which are functionally-equivalent to (i.e. depict) generally intelligent behaviour. As such, by treating “the aspiration of building AGI [as equivalent] to the question of what AGI is”, the scientific realist has implicitly already gone away—and thus failed to properly engage with—from the premise of the question. 

Second, note that there are different vantage points from where we could be asking the question. We could take the vantage point of a “forecaster” and ask what artefacts we should expect to exist 100 years from now. Or, we could take the vantage point of a “designer” and ask which artefacts we want to create (or ought to create, given some set of moral, political, aesthetic, or other commitments). While it naturally assumes the vantage point of the forecaster, scientific realism appears inadequate for taking seriously the vantage point of the designer. 

Third, let’s start with the assumption that what will come into existence is a matter of fact. While plausible-sounding at first, further inspection reveals problems with this claim. To show how this is the case, let us consider the following tri-partite characterisation proposed by Eric Drexler (2018). We want to distinguish between three notions of “possible”, namely: (physically) realistic, (techno-economically) plausible, and (socio-politically) credible. This is to say, beyond such facts as the fundamental laws of physics (i.e., the physically realistic), there are other factors—less totalising and yet endowed with some degree of causal force—which shape what comes to be in the future (e.g., economic and sociological pressures). 

Importantly, the physically realistic does not on its own determine what sorts of artefacts come into existence. For example, paradigm A (by mere chance, or for reasons of historical contingency) receives differential economic investment compared to paradigm B, resulting in its faster maturation; or inversely, it might get restrained or banned through political means, resulting in it being blocked, and maybe even eventually forgotten. Examples of political decisions (e.g. regulation, subvention, taxation, etc.) affecting technological trajectories abound. To name just one, consider how the ban on human cloning has, in fact, stopped human cloning activities, as well as any innovations related to making human cloning ‘better’ (cheaper, more convenient, etc.) in some way.

The scientific realist might react to this by arguing that, while the physically realistic is not the only factor that determines what sorts of artefacts come into existence, there is still a matter of fact to the nature and force of economic, political and social factors affecting technological trajectories, all of which could, at least in principle be understood scientifically. While I am happy to grant this view, I argue that the problem lies elsewhere. As we have seen, the interactions between the physically realistic, the techno-economically plausible, and the socio-politically crediblethe are highly complex and, importantly, self-referential. It is exactly this self-referentiality that makes this a case of paradigm choice, rather than paradigm change, when viewed from the position of the scientist. In other words, an adequate answer to the problem of paradigm choice must necessarily consider a “view from inside of the history of science”, as opposed to a “view from nowhere”. After all, the paradigm is being chosen by the scientific community (and the researchers making up that community), and they are making said choice from their own situated perspective.

In summary, it is less that the answers provided by the scientific realist are outright wrong. It rather appears as if the view provided by scientific realism is not expressive enough to deal with the realities of the sciences of the artificial. It can not usefully guide the scientific enterprise when it comes to the considerations brought to light by the sciences of the artificial. Philosophy of science needs to do better if it wants to avoid confirming the accusation raised by Richard Feynman—that philosophers of science are to scientists what ornithologists are to birds; namely irrelevant. 

Next, we will consider whether we can find a different epistemological framework, while holding onto as much realism as possible, appears more adequate for the needs of the sciences of the artificial. 

Part 3: A pragmatic account of the artificial

So far, we have introduced the notion of the science of the artificial, discussed what it demands from the philosophy of science, and observed how scientific realism fails to appropriately respond to those demands. The question is then: Can we do better? 

An alternative account to scientific realism—and the one we will consider in this last section—is pragmatic realism, chiefly originating from the American pragmatists William James, Charles Sander Pierce, John Dewey. For the present discussion, I will largely draw on contemporary work trying to revive a pragmatic philosophy of science that is truly able to guide and support scientific inquiry, such by Robert Toretti, Hasok Chang, Rein Vihalemm. 

Such a pragmatist philosophy of science emphasises scientific research as a practical activity, and the role of an epistemology of science as helping to successfully conduct this activity. While sharing with the scientific realist a commitment to an external reality, pragmatism suggests that our ways of getting to know the world are necessarily mediated by the ways knowledge is created and used, i.e., by our epistemic aims and means of “perception”—both the mind and scientific tools, as well as our scientific paradigms.

Note that pragmatism, as I have presented it here, does at no point do away with the notion of an external reality. As Giere (2006) clarifies, not all types of realism must conscribe to a “full-blown objective realism”(or what Putnam called “metaphysical realism”)—roughly speaking the view that “[t]here is exactly one true and complete description of ‘the way the world is.’” (Putnam, 1981, p. 49). As such, pragmatic realism, while rejecting objective or metaphysical realism, remains squarely committed to realism, and understands scientific inquiry as an activity directed at better understanding reality (Chang, 2022, p. 5; p. 208).  

Let us now consider whether pragmatism is better able to deal with the epistemological demands of the sciences of the artificial than scientific realism. Rather than providing a full-fledged account of how the sciences of the artificial can be theorised within the framework of pragmatic realism, what I set out to do here is more humble in its ambition. Namely, I aim to support my claim that scientific realism is insufficiently expressive as an epistemology of the sciences of the artificial by showcasing there exist alternative different frameworks—in this case pragmatic realism—that do not face the same limitations. In other words, I aim to showcase that, indeed, we can do better. 

First, as we have seen, scientific realism fails to adopt the “view point of the scientist''. As a result, it collapses the question of paradigm choice to a question of paradigm change. This makes scientific realism incapable of addressing the (very real) challenge faced by the scientist; afterall, as I have argued, different paradigms might come with different properties we care about (such as when they concern questions of safety or governance). In contrast to scientific realism, pragmatism explicitly rejects the idea that scientific inquiry can ever adopt a “view from nowhere” (or, a “God’s eye view” as Putnam (1981, p. 49) puts it). Chang (2019) emphasises the “humanistic impulse” in pragmatism. “Humanism in relation to science is a commitment to understand and promote science as something that human agents do, not as a body of knowledge that comes from accessing information about nature that exists completely apart from ourselves and our investigations.” (p. 10). This aligns well with the need of the sciences of the artificial to be able to reason from the point of view of the scientist.

Second, pragmatism, in virtue of focusing on the means through which scientific knowledge is created, recognise the historicity of scientific activity (also see, e.g. Vihalemm, p.3; Chang, 2019). This capacity allows pragmatic realism to reflect the historicity that is also present in the science of the artificial. Recall that, as we discussed earlier, one central epistemological question of the sciences of the artificial concerns how our theorising affects what comes into existence. As such, our prior beliefs, scientific frameworks and tools affect, by means of ‘differential investment’ in designing artefacts under a given paradigm, what sort of reality comes to be. Moreover, the nature of technological progress itself affects what we become able to understand, discover and build in the future. Pragmatism suggests that, rather than there being already a predetermined answer as to which will be the most successful paradigm, the scientist must understand their own scientific activity as part of an iterative and path-dependent epistemic process.

Lastly, consider how the sciences of the artificial entail a ‘strange inversion’ of ‘functional’ and ‘mechanistic’ explanations. In the domain of the natural, the ‘function’ of a system is understood as a viable post-hoc description of the system, resulting from its continuous adaptation to the environment by external pressures. In contrast, in design, the ‘function’ of an artefact becomes that which is antecedent, while the internal environment of the artefact - its low-level implementation - becomes post-hoc. It appears difficult, through the eyes of a scientific realist, to fully accept this inversion. At the same time, accepting it appears to be useful, if not required, in order to epistemology of the sciences of the artificial on its own terms. Pragmatic realism, on the other hand, does not face the same trouble. To exemplify this, let us take Chang’s notion of operational coherence, a deeply pragmatist notion of the yardstick of scientific inquiry, which he describes as “a harmonious fitting-together of actions that is conducive to a successful achievement of one’s aims” (Chang, 2019, p. 14). As such, insofar as we are able to argue that a given practice in the sciences of the artificial possesses such operational coherence, it is compatible with pragmatic realism. What I have tried to show hereby is that the sciences of the artificial, including the ‘strange inversion’ of the role of ‘ functions’ which it entails, is fully theorisable inside of the framework of pragmatic realism. As such, unlike scientific realism, the latter does not fail to engage with the sciences of the artificial on its own terms.

To summarise this section, I have argued, by means of three examples, that pragmatic realism is a promising candidate for a philosophy of science within which it is possible to theorise the sciences of the artificial. In that, pragmatic realism differs from scientific realism. In particular, I have invoked the fact that the sciences of the artificial requires us to take the “point of view of the scientist”, to acknowledge the iterative, path-dependent and self-referential nature of scientific inquiry (i.e., its historicity), and, finally, to accept the central role of ‘function’ in understanding designed artefacts. 

Conclusion

In section 1, I have laid out the case for why we need a philosophy of science that can encompass questions arising from the sciences of the artificial. One central such question is the problem of paradigm choice, which requires the scientific practitioner to understand the ways in which their own theorising affects what will come into existence.

In section 2, I have considered whether scientific realism provides a sufficient account, and concluded that it doesn’t. I have listed three examples of ways in which scientific realism seems to be insufficiently expressive as an epistemology of the sciences of the artificial. Finally, in section 3, I explored whether we can do better, and have provided three examples of epistemic puzzles, arising from the sciences of the artificial, that pragmatic realism, in contrast with scientific realism, is able to account for. 

While scientific realism seems attractive on the basis of its explaining the success of science (of the natural), it does not in fact present a good explanation of the success of the science of the artificial. How, before things like, say, planes, computers, or democratic institutions existed, could we have learnt to build them if all that was involved in the scientific enterprise was uncovering that which (already) is? As such, I claim that the sciences of the artificial provide an important reason for why we should not be satisfied with the epistemological framework provided by scientific realism with respect to understanding and—importantly—guiding scientific inquiry. 


References

Boyd, R. N. (1983). On the current status of the issue of scientific realism. Methodology, Epistemology, and Philosophy of Science: Essays in Honour of Wolfgang Stegmüller on the Occasion of His 60th Birthday, June 3rd, 1983, 45-90.

Chang, H. (2019). Pragmatism, perspectivism, and the historicity of science. In Understanding perspectivism. pp. 10-27. Routledge.

Chang, H. (2022). Realism for Realistic People. Cambridge University Press.

Christiano, P. (2015). On heterogeneous objectives. AI Alignment (medium.com). Retrieved from. https://ai-alignment.com/on-heterogeneous-objectives-b38d0e003399.

Drexler, E. (2018). Paretotopian goal alignment, Talk at EA Global: London 2018. 

Drexler, E. (2019). Reframing Superintelligence. Future of Humanity Institute.

Fine, A. (1986). Unnatural Attitudes: Realist and Antirealist Attachments to Science. Mind, 95(378): 149–177. 

Fu, W., Qian Q. (2023). Artificial Intelligence and Dual Contract. arXiv preprint arXiv:2303.12350.

Laudan, L. (1981). A confutation of convergent realism. Philosophy of science. 48(1), 19-49.

Normile, D. (2018). CRISPR bombshell: Chinese researcher claims to have created gene-edited twins. Science. doi: 10.1126/science.aaw1839.

Putnam, H. (1975). Mathematics,  Matter  and  Method. Cambridge:  Cambridge University Press.

Putnam, H. (1981). Reason, Truth and History. Cambridge: Cambridge University Press.

Simon, H. (1996). The Sciences of the Artificial - 3rd ed, The MIT Press.

Soares, N., Fallenstein, B. (2017). Agent foundations for aligning machine intelligence with human interests: a technical research agenda. The technological singularity: Managing the journey, 103-125.

Toretti, R. (2000). Scientific Realism’ and Scientific Practice. In Evandro Agazzi and Massimo Pauri (eds.), The Reality of the Unobservable. Dordrecht: Kluwer.

Van Fraassen, B. (1980). The scientific image. Oxford University Press.

Vihalemm, R. (2012). Practical Realism: Against Standard Scientific Realism and Anti-realism. Studia Philosophica Estonica. 5/2: 7–22.

Compilation of thoughts on impact-oriented interdisciplinary research

The below doesn’t come in the format of a traditional post; instead of a coherent start-to-end narration, it’s a compilation of topically related thoughts. I am posting them here as one single post because I want to have these thoughts accessible in one place.

[Cross-posted to the EA forum in shortform: 1 2 3.]

(1) Motivation

Below, I briefly discuss some motivating reasons, as I see them, to foster more interdisciplinary thought in EA. This includes ways EA's current set of research topics might have emerged for suboptimal reasons. 

The ocean of knowledge is vast. But the knowledge commonly referenced within EA and longtermism represents only a tiny fraction of this ocean. 

I argue that EA's knowledge tradition is skewed for reasons including but not-limited-to the epistemic merit of those bodies of knowledge. There are good reasons for EA to focus in certain areas:

  • Direct relevance (e.g. if you're trying to do good, it seems clearly relevant to look into philosophy a bunch; if you're trying to do good effectively, it seems clearly relevant to look into economics (among others) a bunch; if you came to think that existential risks are a big deal, it is clearly relevant to look into bioengineering, international relations, etc. a bunch; etc.)

  • Evidence of epistemic merit (e.g. physics has more evidence for epistemic merit than psychology, which in return has more evidence for epistemic merit than astrology; in other words, beliefs gathered from different fields are are likely to pay more/less rent, or are likely to be more/less explanatory virtuous)

However, some of the reasons we’ve ended up with our current foci may not be as good:

  • Founder effects

  • The, in parts arbitrary, way academic disciplines have been carved up

  • Inferential distances between knowledge traditions that hamper the free diffusion of knowledge between disciplines and schools of thought

Having a skewed knowledge basis is problematic. There is a significant likelihood that we are missing out on insights or perspectives that might critically advance our undertaking. We don’t know what we don’t know. We have all the reasons to expect that we have blindspots. 

***

I am interested in the potential value and challenges of interdisciplinary research. 

Neglectedness

(Academic) incentives make it harder for transdisciplinary thought to flourish, resulting in what I expect to be an undersupply thereof. One way of thinking about why we would see an undersupply of interdisciplinry thought is in terms of "market inefficiencies". For one, individual actors are incentivised (because it’s less risky) to work on topics that are already recognised as interesting by the community (“exploitation”), as opposed to venturing into new bodies of knowledge that might or might not prove insightful (“exploration”). What is “already recognized as valuable by the community”, however, will only in part be determined by epistemic considerations, and in another part be shaped by path-dependencies. 

For two, “markets” are insufficiently liquid and thus tend to fail where we cannot easily specify what we want. I’d argue that this is the case for transdisciplinary work. This is generally true for intellectual work, but is likely even more true for transdisciplinary work due to the relatively siloed structure of academia that adds additional “transaction costs” to attempts of communicating across disciplinary boundaries. 

One way to reduce these inefficiencies is by  improving the interfaces between the disciplines. "Domain scanning" and "episetmic translation" are precisely about creating such interfaces. Their purpose is to identify knowledge that is concretely relevant to a given target domain and make that knolwege accessible to thinkers entrenched in the "vocabulary" of that target domain.  A useful interface between political philosophy and computer science, for example, might require a mathematical formalization of central ideas such as justice. 
 

Challenges

At the same time, doing interdisciplinary well  is callenging. For example, interdisciplinary research can only be as valuable as a researcher's ability to identify knowledge relevant to their target domain; or as a research community's quality assurance/error correction mechanisms. Phenomena like citogenesis or motivatiogensis are examples of manifestations of these difficulties. 

There have been various attempts at overcoming these incentive barriers, for example the Santa Fe Institute whose organizational structure completely disregards scientific disciplines; -ARPAs have a similar flavour; the field of cybernetics which proposed an inherently transdisciplinary view on regulatory systems; or the recent surge in the literature on “mental models” (e.g. here or here).  

A closer inspection of such examples - in how far they were successful and how they went about it - might bear some interesting insights. I don't have the capacity to properly puruse such case studies in the near future, but it's definteily something on my list of potentially promising (side) projects. 

If readers are aware of other examples of innovative approaches trying to solve this problem that might make for insightful case studies, I’d love to hear them.

(2) A model : “domain scanning” and “epistemic translation”

The below provides definitions and explanations of "domain scanning" and "epistemic translation", in an attempt of adding further gears to how interdisciplinary research works.

I suggest understanding domain scanning and epistemic translation as a specific type of research that both plays (or ought to play) an important role as part of a larger research progress, or can be usefully pursued as “its own thing”. 

Domain Scanning

By domain scanning, I mean the activity of searching through diverse bodies and traditions of knowledge with the goal of identifying insights, ontologies or methods relevant to another body of knowledge or to a research question (e.g. AI alignment, Longtermism, EA). 

I call source domains those bodies of knowledge where insights are being drawn from. The body of knowledge that we are trying to inform through this approach is called the target domain. A target domain can be as broad as an entire field or subfield or a specific research problem (in which case I often use the term target problem instead of target domain).  

Domain scanning isn’t about comprehensively surveying the entire ocean of knowledge, but instead about selectively scouting for “bright spots” - domains that might importantly inform the target domain or problem. 

An important rationale for domain scanning is the belief that model selection is a critical part of the research process. By model selection, I mean the way we choose to conceptualize a problem at a high-level of abstraction (as opposed to, say, working out the details given a certain model choice). In practice, however, this step often doesn’t happen at all because most research happens within a paradigm that is already “in the water”. 

As an example, say an economist wants to think about a research question related to economic growth. They will think about how to model economic growth and will make choices according to the shape of their research problem. They might for example decide between using an endogenous growth or an exogenous growth model, and other modeling choices at a similar level of abstraction. However, those choices happen within an already comparably limited space of assumptions - in this case namely neoclassical economics. It's at this higher level of abstraction that I think we're often not sufficiently looking beyond a given paradigm. Like fish in the water. 

Neoclassical economics, as an example, is based on assumptions such as agents being rational and homogenous, and the economy being an equilibrium system. Those are, in fact, not straightforward assumptions to make, as heterodox economics have in recent years slowly been bringing to the attention of the field. Complexity economics, for example, drops the above-mentioned assumptions which helps broaden our understanding of economics in ways I think are really important. Notably, complexity economics is inspired by the study of non-equilibrium systems from physics and its conception of heterogeneous and boundedly rational agents come from fields such as psychology and organizational studies. 

Research within established paradigms is extremely useful a lot of the time and I am not suggesting that an economist who tackles their research question from a neoclassical angle is necessarily doing something wrong. However, this type of research can only ever make incremental progress. As a research community, I do think we have a strong interest in fostering, at a structural level, the quality of interdisciplinary transfer. 

The role of model selection is particularly important in the case of pre-paradigmatic fields (examples include AI Safety or Complexity Science). In this case, your willingness to test different frameworks for conceiving of a given problem seems particularly valuable in expectation. Converging too early on one specific way of framing the problem risks locking in the burgeoning field too early. Pre-paradigmatic fields can often appear fairly chaotic, unorganized and unprincipled (“high entropy”). While this is sometimes evidence against the epistemic merit of a research community, I tend to want to abstain from holding this against emerging fields, because, since the variance of outcomes is higher, the potential upsides are higher too. (Of course, one’s overall judgement of the promise of an emerging paradigm will also depend more than just this factor.)

 

Epistemic Translation

By epistemic translation, I mean the activity of rendering knowledge commensurable between different domains. In other words, epistemic translation refers to the intellectual work necessary to i) understand a body of knowledge, ii) identify its relevance for your target domain/problem, and iii) render relevant conceptual insights accessible to (the research community of) the target domain, often by integrating it. 

Epistemic translation isn’t just about translating one vocabulary into another or merely sharing factual information. It’s about expanding the concept space of the target domain by integrating new conceptual insights and perspectives. 

The world is complicated and we are at any one time working with fairly simple models of reality. By analogy, when I look at a three-dimensional cube, I can only see a part of the entire cube at any one time. By taking different perspectives on the same cube and putting these perspectives together - an exercise one might call “triangulating reality” -, I can start to develop an increasingly accurate understanding of the cube. The box inversion hypothesis by Jan Kulveit is another, AI alignment specific example of what I’m thinking about.

I think something like this is true for understanding reality at large, - be it magnitudes more difficult than the cube example suggests. Domain scanning is about seeking new perspectives on your object of inquiry, and epistemic translation is required for integrating these numerous perspectives with one another in an epistemically faithful manner. 

In the case of translation between technical and non-technical fields - say translating central notions of political philosophy into game theoretic or CS language - the major obstacle to epistemic translation is formalization. A computer scientist might well be aware of, say, the depth of discourse on topics like justice or democracy. But that doesn’t yet mean that they can integrate this knowledge into their own research or engineering. Formalization is central to creating useful disciplinary interfaces and close to no resources are spent to systematically spreading up this process. 

Somewhere in between domain scanning and epistemic translation, we could talk about “prospecting” as the activity of providing epistemic updates on how valuable a certain source domain is likely to be. This involves some scanning and some translation work (therefore categorized as “in between the two”), and would serve the central function of a community mechanism for coordinating around what a community might want to pay attention to.

(3) A list of fields/questions for interdisciplinary AI alignment research

The following list of fields and leading questions could be interesting for interdisciplinry AI alignment reserach. I started to compile this list to provide some anchorage for evaluating the value of interdiscplinary research for EA causes, specifically AI alignment. 

Some comments on the list: 

  • Some of these domains are likely already very much on the radar of some people, other’s are more speculative.

  • In some cases I have a decent idea of concrete lines of question that might be interesting, in other cases all I do is very broadly gesturing that “something here might be of interest”.

  • I don’t mean this list to be comprehensive or authoritative. On the contrary, this list is definitely skewed by domains I happened to have come across and found myself interested in.

  • While this list is specific to AI alignment (/safety/governance), I think the same rationale applies to other EA-relevant domains and I'd be excited for other people to compile similar lists relevant to their area of interest/expertise.

 

Very interested in hearing thoughts on the below!

 

Target domain: AI alignment/safety/governance 

  1. Evolutionary biology

    1. Evolutionary biology seems to have a lot of potentially interesting things to say about AI alignment. Just a few examples include:

      1. The relationship between environment, agent, evolutionary paths (which e.g. relates to to the role of training environments)

      2. Niche construction as an angle on embedded agency

      3. The nature of intelligence

  2. Linguistics and Philosophy of language

    1. Lots of things that are relevant to understanding the nature and origin of (general) intelligence better.

    2. Sub-domains, such as semiotics could, for example, have relevant insights on topics like delegation and interpretability.

  3. Cognitive science and neuroscience

    1. Examples include Minsky’s Society of Minds (“The power of intelligence stems from our vast diversity, not from any single, perfect principle”), Hawkin’s A thousand brains (the role of reference frames for general intelligence), Frinston et al’s Predictive Coding/Predictive Processing (in its most ambitious versions a near universal theory of all things cognition, perception, comprehension and agency), and many more

  4. Information theory

    1. Information theory is hardly news to the AI alignment idea space. However, there might still be value on the table from deeper dives or more out-of-the-orderly applications of its insights. One example of this might be this paper on The Information Theory of Individuality.

  5. Cybernetics/Control Systems

    1. Cybernetics seems straightforwardly relevant to AI alignment. Personally, I’d love to have a piece of writing synthesising the most exciting intellectual developments under cybernetics done by someone with awareness of where the AI alignment field is at currently.

  6. Complex systems studies

    1. What does the study of complex systems have to say about robustness, interoperability, emergent alignment? It also offers insights into and methodology for approaching self-organization and collective intelligence which is interesting in particular in multi-multi scenarios.

  7. Heterodox schools of economic thinking

    1. Schools of thought are trying to reimagine the economy/capitalism and (political) organization, e.g. through decentralization and self-organization, by working on antitrust, by trying to understand potentially radical implications of digitalization on the fabric of the economy, etc. Complexity economics, for example, can help understanding the out-of-equilibrium dynamics that shape much of our economy and lives.

  8. Political economy

    1. An interesting framework for thinking about AI alignment as a socio-technical challenge. Particularly relevant from a multi-multi perspective, or for thinking along the lines of cooperative AI. Pointer: Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles

  9. Political theory

    1. The richness of the history of political thought is astonishing; the most obvious might be ideas related to social choice or principles of governance. (A denses while also high-quality overview is offered by this podcast series History Of Ideas.) The crux in making the depth of political thought available and relevant to AI alignment is formalization, which seems extremely undersupplied in current academia for very similar reasons as I’ve argued above.

  10. Management and organizational theory, Institutional economics and Institutional design

    1. Has things to say about e.g. interfaces (read this to get a gist for why I think interfaces are interesting for AI alignment); delegation ( e.g. Organizations and Markets by Herbert SImon; (potentially) the ontology form forms and (the relevant) agent boundaries (e.g. The secret to social forms has been in institutional economics all along?)

    2. Talks for example about desiderata for institutions like robustness (e.g. here), or about how to understand and deal with institutional path-dependencies (e.g. here).

Early 2021

Early 2021

Types of generalism

I am interested in the nature of inter- and transdisciplinary research, which often involves some notion of “generalism”. There are different ways to further conceptualize generalism in this context.

First, a bit of terminology that I will rely on throughout this post: I call bodies of knowledge where insights are being drawn from source domains”. The body of knowledge that is being informed by this approach is called the “target domain”. 

Directionality of generalism

We can distinguish between SFI-style generalism from FHI-style generalism? (h/t particlemania for first formulating this idea)

  • In the case of SFI-style generalism, the source domain is fixed and they have a portfolio of target domains that may gain value from “export”. 

  • In the case of FHI-style generalism, the target domain is fixed and the approach is to build a portfolio of diverse source expertise. 

In the case of SFI, their source domain is the study of complex systems, which they apply to topics as varied as life and intelligence, cities, economics and institutions, opinion formation, etc.

In the case of FHI, the target domain is fixed, although more vaguely than it might be, via the problem of civilization-scale consequentialism and source domains include philosophy, international relations, machine learning and more.

Full vs partial generalism

Partial generalism: Any one actor should focus on one (or a similarly small number of) source domains to draw from. 

Arguments: 

  • Ability: Any one actor can only be well-positioned to work with a small number of source domains because doing this work well requires expertise with the source domain. Expertise takes time to develop, so naturally, the number of source domains a single person will be able to draw upon (with adequate epistemic rigor) is limited. 

  • Increasing returns to depth: The deeper an actor’s expertise in two fields they are translating between, the higher the expected value of their work. This can apply to individual researchers as well as to a team/organization doing generalist researchers. 

Full generalism: As long as you fix your target domain, an actor can and should venture into many source domains. 

Arguments: 

  • Ability: An actor can do high-quality research while drawing from a (relatively) large number of source domains, some of which they only learn about as they discover them. This “ability” could come from several sources: 

    • The researchers’ inherent cognitive abilities

    • The structure (i.e. lack of depth) of the field (sometimes a field might be sufficiently shallow in its structure that the assumption that someone can get adequately oriented within this field is justified)

    • Error correction mechanisms within the intellectual community being sufficiently fit (which means that, even if an individual starts out by getting some important things wrong, error correction mechanisms guarantee that these mistakes will be readily discovered and corrected for). 

  • Increasing returns to scope: The richer (in intellectual diversity) an actor’s expertise, the juicier the insights. Again, this argument could apply to an individual or groups of individuals working closely together.

Note that you can achieve full generalism at an organizational level while having a team of individuals that all engage in partial generalism.