On the automation of Science - an incomplete account

Introduction

In the context of this essay, I investigate the question:  Can science be automated by artificial intelligence?

To investigate arguments for and against the possibility of science-AI, I will aim to answer the following question: is AI, given its own functional logic, able to implement the functional logic that underlies science?  

What is a functional logic, and how does it help us to answer whether it is possible to automate science by means of AI? The idea is to characterize and then contrast what one might consider the “two sides of an equation”, i.e., the respective functional logics of science-making and AI. On one end, we need to understand the functioning of science-making. Insofar as the scientific endeavor is successful in a way that, say, mere doing or mere thinking is not, what is it about science-making that can account for that? On the other end, we need to characterize the functioning of (machine learning (ML)-based) AI. How does it work and what performances can it (not) achieve, in principle and under specific conditions? Then, we can contrast these two sets of “functional logics” and evaluate whether the functional logic of ML is able to implement the functional logic of science-making, or whether there are (fundamental or practical) limitations which would preclude this. The idea is that, if the functional logic of science is not expressible by the functional logic of ML, we can conclude that, at least within the ML paradigm, full-scale AI-automated science (what I henceforth simply refer to as “science-AI”) is not feasible. 

I proceed as follows: I have just introduced the high-level approach I am taking. Next, and before I can properly dive into the discussion, I briefly discuss what might motivate us to ask this question, and then make a few clarifying remarks concerning the terminology used. With this out of the way, I can start evaluating the arguments. In Part 1, I start by laying out a picture which aims to reject the science-AI conjecture. The argument is, in short, that science requires a) “strong generalization”, i.e., the ability to come up with sensible abstractions (i.e., inductive reasoning), and b) “deductive reasoning”, i.e., the ability to use those abstractions sensibly and reliably. In both cases, there are reasons to doubt whether the functional logic of ML systems allows them to do this, or do this sufficiently well. Parts 2 and 3 explore two plausible ways to rebut this skeptical view. The first one explores whether the fact that ML systems are (at the limits) universal function approximators can recover the possibility of science-AI. The second one observes that humans too show cognitive limitations (arguably not dissimilar from the ones I will discuss in the context of ML systems), while still being capable of science-making (arguably). I will conclude that, based on the objections raised in Parts 2 and 3, the argument against the possibility of science-AI does not succeed.

Motivations

Why might one be interested in investigating the possibility of science-AI? 

First, we may hope to learn things not only about the current and prospective capability level (and limits thereof) of AI systems, but, quite plausibly, also about the nature of science-making itself. As such, this question appears to have intellectual merits on its own grounds. 

Furthermore, the question is of interest on more practical grounds, too. The scientific revolution brought about a massive acceleration in humans’ ability to produce knowledge and innovation, which in turn has led to astonishing improvements to the quality of human life. As such, the prospect of automating scientific progress appears promising. At the same time, AI capable of automating science may also be dangerous. Afterall, scientific progress also produced technologies that pose immense risks to the wellbeing and survival of humanity, including nuclear weapons and the ability to engineer pathogens or technologies that facilitate mass surveillance and oppressions by states or other powerful actors (Bostrom, 2019). As such, if we knew that science-AI was possible, this ought to motivate us to adopt caution and start working on improved societal and governance protocols to help use these capabilities safely and justly.  

Finally, there are serious worries that the growing adoption of AI applications contribute to an “epistemic crisis”, which poses a threat (in particular) to decision-making in democratic societies (e.g., Seger et al., 2020). Among others, these systems can be used to generate text, images, video, and voice recordings which do not necessarily represent reality truthfully and which people might interpret as real even if fake. As such, if we were capable of building AI systems that systematically skew towards truth (as opposed to, say, being riddled with the sort of confabulations that we can see in state-of-the-art language models (Ziwei et al., 2022)), this may help decrease such epistemic risks. 

Some clarifications

As mentioned, a few brief points of clarification are in order before I can properly dive into discussing the (im)possibility of science-AI; in particular, with respect to how I will be using the terms “science”, “AI”, and “automation”. 

First, there does not exist broad consensus among the relevant epistemic community as to what the functional logic (or logics) of science-making is, nor can it be taken as a given that there exists a singular such functional logic. For example, philosophers like Karl Popper have questioned the validity of any use of inductive reasoning in science and instead put front and center the deductive process of falsification (Popper, 1934; Popper, 1962). In contrast, other accounts of science-making do very much rely on inductive processes, such as Bayesian statistics, in order to evaluate how much a given hypothesis is supported by the available evidence (e.g., Sprenger, Hartmann, 2019). In the context of this essay, however, I am not trying to settle the question of the correct account of the scientific method; I instead adopt a degree of methodological pluralism and evaluate the conceptual plausibility of end-to-end science-AI in accordance with several of the most prominent accounts of the scientific method.  

Second, I need to clarify what I do and don’t mean by artificial intelligence. The term in principle refers to a putative technology that implements intelligent behavior through artificial means (e.g., on silicon); it does not, on its own, specify how it does it. To clarify matters in the context of this essay, I (unless otherwise specified) always refer to implementations of AI that reside within the paradigm of ML. I believe this is a justified assumption to make because ML is the currently dominant paradigm in the field of AI, it is the most successful paradigm to date, and there is no particular reason to expect this trend will halt in the near future. Now, having conditioned a technical paradigm for AI, we can reason more substantively about the possibilities and limitations of ML-based AI systems when it comes to science-making by drawing on fields such as ML theory, optimization theory, etc. 

Third, when talking about the automation of science, one might have in mind partial automation (i.e., the automation of specific tasks that are part of but don’t comprise the whole of the scientific enterprise), or a full, end-to-end automation of the scientific process by means of AI. In the context of this essay, I primarily focus on the conceptual plausibility of the latter: end-to-end science-AI. The line of demarcation I wish to draw is not about whether the automation involves a single or multiple (e.g., an assembly of) AI applications, but rather whether human scientists are still a required part of the process (such as is the case for what I call parietal automation) or not (such as is the case for what I call end-to-end automation or end-to-end science-AI). 

With this out of the way, it is now time to dive into the discussion.

Part 1: Contra science-AI 

In this section, I lay out the case against the possibility of science-AI. In short, I argue that autonomous scientific reasoning requires i) the ability to form sensible abstractions which function as bases for generalizing knowledge from past experience to novel environments, and ii) the ability to use such abstractions reliably in one’s process of reasoning, thereby accessing the power of deductive or compositional reasoning. However, or so the argument goes, ML systems are not appropriately capable of forming such abstractions and of reasoning with them. 

First, let me clarify the claim that abstractions and deductive reasoning play central roles in science-making. Generalization refers to the ability to apply insights to a situation that is different to what has previously been encountered. Typically, this form of generalization is made possible by means of forming the “right” abstractions, i.e., ones that are able to capture those informational structures that are relevant to a given purpose across different environments (Chollet, 2019). When I invoke the concept of a dog, for example, I don’t have a specific dog in mind, although I could probably name specific dogs I have encountered in the past, and I could also name a number of features that dogs typically (but not always) possess (four legs, fur, dog ears, a tail, etc.). The “dog” case could be understood as an example of relatively narrow abstraction. Think now, instead, of the use of concepts like “energy”, “mass”, or “photon” in physics, or of a “set” or “integration” or “equation” in mathematics. Those concepts are yet farther removed from any specific instances of things which I can access directly via sensory data. Nevertheless, these abstractions are extremely useful in that they allow me to do things I couldn’t have done otherwise (e.g., predict the trajectory of a ball hit at a certain angle with a certain force, etc.).  

Scientific theories critically rely on abstraction because theories are expressed in terms of abstractions and their functional relationship to each other. (For example, the law of the conservation of energy and mass describes the relationship between two abstractions—“energy” and “mass”; in particular, this relationship can be expressed as: E= mc2). The use of abstractions is what endows a theory with explanatory power beyond the merely specific, contingent example that has been studied empirically. At the same time, the usefulness of a theory is dependent on the validity of the abstractions it makes use of. A theory that involves abstractions that do not carve reality sufficiently at its joints sufficiently will very likely fail to make reliable predictions or produce useful explanations. 

Furthermore, the ability to form valid abstractions constitutes the basis for a second critical aspect of scientific cognition, namely, deductive and compositional reasoning. By deductive reasoning, I am referring to such things as deductive logic, arithmetics, sorting a list, and other tasks that involve “discrete” representations and compositionality  (Chollet, 2020). In the case of science-making in particular, falsification or disconfirmation play a central role and are established by means of deductive reasoning such as in the hypothetico-deductive account (e.g., Sprenger, 2011; Hempel 1945). The ability to use, or reason over, abstractions allows for so-called “combinatorial generalization”. It is this compositionality of thought that, it has been argued, is a critical aspect of human-level intelligence by giving the reasoner access to a schema of “infinite use of finite means” (Humboldt, 1836; Chomsky, 1965). 

Having made the case for why science-making relies on the ability to i) form and ii) reason with abstractions, I can now investigate the arguments at hand for believing ML systems are not appropriately capable of i) and ii).

Reasons for skepticism come from empirical observation (i.e., using the state-of-the-art models and seeing how they “break”), theoretical arguments, and expert judgment.  In terms of the latter, Cremer (2021) surveys “expert disagreement over the potential and limitations of deep learning”. With expert opinions diverging, Cremer identifies a set of plausible origins of said disagreements, centrally featuring questions concerning the ability of artificial neural networks to “form abstraction representations effectively” or the extent of their ability to generalize as key origins of those disagreements (p.7). 

To elaborate more on the theoretical arguments for ML skepticism, it is worth exploring the way in which ML methods face challenges in their ability to generalize (e.g., Chollet, 2017; Battaglia, 2019; Cartuyvels et al., 2021; Shanahan, Mitchell, 2022). ML uses statistical techniques to extract (“learn”) patterns from large swaths of data. It can be understood as aiming to approximate the underlying function which generated the data it is getting trained on. However, this interpolative learning leads to brittleness if the systems get deployed outside of the distribution of the training data. This phenomenon is well known in the ML literature and usually discussed under terms such as out-of-distribution (OOD) generalization failure. Under distributional shift (i.e., cases where the data depict a different distribution compared to the training environment), the approximation function the model learned under training is no longer guaranteed to hold, leading to a generalization failure. The risk of failures to generalize, so the argument goes, limits the potential to use ML for end-to-end science automation because we cannot sufficiently trust the soundness of the process. 

Furthermore, ML systems are notoriously bad at discrete tasks (see, e.g., Marcus, 2018; Cartuyvels et al., 2021; etc.). While state-of-the-art ML systems are not incapable of (and are getting better at), say, simple forms of arithmetics (e.g., adding up two-digit numbers), it is noteworthy that tasks that take only a few lines of code to automate reliably in the paradigm of classical programming have remained outside of the reach of today’s several-billion-parameter-sized ML models. To quote François Chollet, a prominent AI researcher, deliberately misquoting Hinton, expert and pioneer of deep learning: “Deep learning is going to be able to do everything perception and intuition, but not discrete reasoning” (Chollet, 2020). This unreliability in deductive reasoning exhibited by ML systems is another reason for skepticism towards the possibility of end-to-end science-AI. 

To summarize the argument, current ML-based AI systems appear to face limitations with respect to their ability to achieve “broad” generalization, to form sensible abstractions, and to use those abstractions reliably. Given these limitations, society would be ill-advised to rely on theories, predictions, and explanations proposed by science-AI. Of course, and this is worth noting, end-to-end science-AI is a high bar. The view presented above is entirely compatible with predicting that AI systems will be used to automate or augment many aspects of science-making, and it may not require a lot of places where humans “patch” the process.

Having elaborated on the case against the possibility of science-AI, I now move to investigating two plausible lines of reasoning aiming to defeat the suggested conclusion.

Part 2: Universal function approximation

The first argument that I will discuss against the purported limitations of ML builds on the claim that ML systems are best understood as universal function approximators (UFA). From this follows the conjecture that there must exist a certain level of computational power at which ML systems are able to sufficiently approximate the science-making function. 

In short, UFA refers to the property of neural networks that, for whatever function f(x), there exists a neural network that can approximate said function. There exists a mathematical theorem proving a version of this property for different cases, e.g., for neural networks of arbitrary width (i.e., arbitrary number of neurons) or arbitrary depth (i.e., arbitrary number of layers), as well as in bounded cases (e.g., Hornik, Stinchcombe, White, 1989; Gripenberg, 2003). 

Let’s say we accept that ML systems are accurately understood as UFAs, and that, given that, ML systems are able, in principle, to implement the functional logic of science-making. However, this picture raises an important question: (when) is approximation enough?

There is, after all, a difference between “the thing, precisely” and “the thing, approximately”. Or is there? Imagine you found  a model M1 which approximates function F with an error of ε1. And imagine that the approximation is insufficient—that ε1 is too large for M1 to properly fulfill the function of F. Well, in that case, on grounds of the universal approximation theorem, there exists another model M2 with ε2<ε1. If ε2 is still too big, one can try M3, and so on. As such, you can, in principle, get arbitrarily close to “the thing”, or in other words, the difference between “the thing” and its approximation get arbitrarily small in the limits. 

One might still object to this conceptual argument with a practical worry. It may be prohibitively expensive (in terms of energy, model size/chips, or time) to get arbitrarily close to the “true” function of science-making. However, I suggest we have pragmatic reasons to not be too worried by this concern. After all, we can hardly expect that human scientists always pick out the right abstractions when constructing their theories. More so, most feats of engineering rely on theories that we know use abstractions that aren’t completely true, and yet have been shown to be “sufficiently” true (in a pragmatist sense) in that they produce useful epistemic products (including bridges that don’t collapse and airplanes that stay in the air). For example, the framework of classical physics was, in some sense, proven wrong by Einstein’s theories of relativity. And yet, most engineering programs are entirely happy to work within the classical framework. As such, even if ML systems “only” approximate the function of science-making, we have all the reasons to expect that they are capable of finding sufficient approximations such that,  for all practical purposes, they will be capable of science-making. 

Finally, science-AI must not look like a monolithic structure consisting of a single ML model and its learned behavior policy. Instead, we can imagine a science-AI assembly system which, for example, trains "abstraction forming" and "deductive reasoning" circuits separately, and which are later combined to interface with each other autonomously. This idea of a compositional science-AI shares a resemblance with the vision of a Society of Minds sketched by Marvin  Minsky in 1986, where he argues that human intelligence emerges from interactions of many simple “agents” with narrow skills or functions. Moreover, we can even use ML to discover which forms of compositionality (i.e., “task division”) might be best suitable for a science-AI assembly, insofar as my earlier vague suggestion of integrating an "abstraction forming" and "deductive reasoning" circuit might not be the ideal solution. There already exist examples of current-day ML systems trained based on similar ideas, e.g., Gururangan et al., 2023. 

To summarize, I have argued that UFA theorems prove that AI systems—contra the skeptical picture laid out in Part 1—are in principle able to implement science-making. I further provided arguments for why we can expect this technology to not only be conceptually feasible but also practically plausible. 

Part 3: The possibility of science-making despite limitations 

Let us now turn to the second argument against the skeptical picture proposed in Part1. This argument starts by concededing that ML systems face relevant limitations in their ability to form and reliably use abstractions. However, the argument continues, so do humans (and human scientists), and still, they are capable of doing science (arguably). Thus, the argument about inductive limits of ML systems cannot, on its own, defeat the possibility of science-AI. 

To unravel this argument, let us first discuss the claim that both ML and human “reasoners” are limited, and limited in relevantly similar ways. I have already laid out the case for limitations in ML which arise  from the fundamentally continuous and inferential nature of ML. According to our current best theories of human cognition—such as the Bayesian Brain Hypothesis (e.g., Deneve, 2005; Doya, et al., 2007; Knill, Pouget, 2004), Predictive Processing (e.g., Clark, 2013; Clark, 2015; Kanai et al., 2015), and most recently, Active Inference (Parr, Pezzullo, Friston, 2022)—the brain can essentially be understood as a “large inference machine”. As such, the low-level implementation of human reasoning is understood to be similarly continuous and inferential. 

This is, of course, not to deny that humans exhibit higher-level cognitive skills, such as verbal reasoning or metacognition, which are correctly understood to exceed “mere statistics”. Rather, the point I am trying to make is that these higher-level capabilities emerge from the low-level (continuous and inferential) implementation of the neural make-up of the brain. This serves as an existence proof that this sort of low-level implementation can, under certain circumstances, give rise to what one may consider to be more typically associated with “2-type” reasoning (Kahneman, 2017). As such, we have shown that the argument presented in Part 1—that, given the functional logic of modern-day ML, AI will not be able to implement all necessary aspects of scientific reasoning (such as generalization or deductive reasoning)—does not prove what it was meant to prove (the impossibility of science-AI). 

Furthermore, it also shows that a cognitive process must not be flawless in order to be able to implement science-making. Human reasoning is, of course, not without flaws. For example, human scientists regularly pick “wrong” abstractions (e.g., “phlogiston”, “ether”—to name only a few famous cases from the history of science). Or, human scientists are not immune to motivated reasoning and cognitive biases such as confirmation bias or hypothesis myopia (Nuzzo, 2015). The point is, despite these flaws in human reasoning—be that from structural limitations or merely computational boundedness—they have not prevented humans from developing and conducting science successfully. 

This last point raises an interesting question about the nature of science-making. Given the plentiful sources of bounded, flawed, and motivated reasoning depicted by human scientists, how are they still capable of producing scientific progress? One way to make sense of this (plausibly surprising) observation is to understand science as essentially a collective endeavor. In other words, individual scientists don’t do science, scientific communities do.  The idea is that science-making—a process that systematically skews towards the truth—emerges from implementing a collective “protocol”, by means of “washing out”, so to speak, the biased reasoning present at the level of individual scientists. Bringing this back to the question of science-AI, this raises the question whether we best think of science-AI as a single system approximating ideal scientific reasoning, or a system assembly where each individual system can have flaws in their epistemic processes, but the way they all interact produces behavior equivalent to science-making—just like the case for human scientists interacting today. 

To summarize, the argument presented here is two-fold: on one hand, the human reasoning ability is implemented by a continuous and inferential low-level process, serving as existence proof that such processes (which we can also find in machine learning) are in principle able to implement discrete tasks with adequate levels of robustness. On the other hand, science-making is implemented by fallible human reasoners who make mistakes similar in type to the ones discussed in Part 1 (e.g., picking leaky abstractions or misgeneralizing them), serving as an existence proof that processes which are fallible in this way can still implement science-making. 

Conclusion

In this essay, I explored the conceptual possibility of end-to-end science-AI, i.e., an AI system or assembly of systems which is able to functionally implement science-making with no help from humans (post-training). In Part1, I first made the case that end-to-end science-AI is not possible on the basis of noticing limitations of ML systems when it comes to their ability to form useful abstractions and to use these abstractions reliably. I argued that ML, given that it is based on interpolative learning from a given set of (training) data, faces important challenges in terms of its ability to generalize outside of its training data in the case of known or unknown distributional shifts upon deployment. Furthermore, I invoked the fact that ML systems are currently unreliable (and at the very least ineffective) at “discrete” types of reasoning. After developing this skeptical picture, I explored two sets of arguments which seek to recover the possibility of science-AI. 

First, I argued that ML systems are universal function approximators, and that in that capacity, there must exist a computational threshold at which they are able to implement the functional logic of science. Furthermore, I argued that there are pragmatic reasons to accept that this is not only conceptually possible but practically feasible insofar as approximation is enough, as evidenced by successful scientific and engineering feats, as a norm, relying “merely” on approximate truths. 

Second, I compared ML systems to human scientists claiming that, on one hand, the neurological implementation of human reasoning is structurally similar to ML, thus suggesting that ML methods can be expected to successfully scale to “higher-level” reasoning capabilities (including ones that appear particularly critical in science-making). On the other hand, the comparison also reveals how humans are capable of doing science despite the fact that the reasoning of individual humans is flawed in important ways. As such, some amount of brittleness in ML systems does not mean that they cannot successfully implement the scientific process. As such, the arguments discussed in Parts 2 and 3 succeed at defending the possibility of science-AI against the skeptical view laid out in Part 1. Beyond defending the conceptual possibility claim, the arguments also provide some support for the concrete, practical plausibility of science-AI. 

Let us conclude with one more evocative thought based on the analogy between ML and scientific reasoning explored over the course of this essay. Concerns about the generalization limits of ML systems pose an important problem: we need to be able to trust the systems we’re using, or—rather—we want to be able to know when and how much we are justified in trusting these systems. Epistemic justification—which I am taking, for the current purposes, to be a function of the reliability of a given epistemic process—is always defined relative to a given domain of application. This suggests that we want AI systems (among other things) to contain meta-data about their domain of applicability (i.e. the domain within which their generalization guarantees hold). What I want to suggest here is that the same insight also applies to scientific theories: we should more consistently strive to develop scientific theories which are —as an integral part of what it is to be a scientific theory—transparent about their domain of applicability, relative to which the theories does or not claim its predictions will generalize.  

References 

Battaglia, P. W., et al. (2018). Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.

Bender, E. M., et al. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Association for Computing Machinery. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).

Beren, M., et al. (2021). Predictive coding: a theoretical and experimental review. arXiv preprint arXiv:2107.12979.

Bostrom, N. (2019). The vulnerable world hypothesis. Global Policy, 10(4), 455-476.

Chang, H. (2022). Realism for Realistic People: A New Pragmatist Philosophy of Science. Cambridge University Press.

Chollet, F. (2017). The limitations of deep learning. Deep learning with Python. Retrieved from: https://blog.keras.io/the-limitations-of-deep-learning.html 

Chollet, F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.

Chollet, F. (2020). Why abstraction is the key to intelligence, and what we’re still missing. Talk at NeurIPS 2020. Retrieved from: https://slideslive.com/38935790/abstraction-reasoning-in-ai-systems-modern-perspectives 

Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press.

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204. doi:10.1017/S0140525X12000477.

Clark, A. (2015). Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford Academic.

Cremer, C. (2021). Deep limitations? Examining expert disagreement over deep learning. Progress in Artificial Intelligence. 10. 10.1007/s13748-021-00239-1. 

Cartuyvels, R., Spinks, G., & Moens, M. F. (2021). Discrete and continuous representations and processing in deep learning: Looking forward. AI Open, 2, 143-159.

Deneve, S. (2004). Bayesian inference in spiking neurons. Advances in neural information processing systems, 17.

De R., & Henk W. (2017). Understanding Scientific Understanding. New York: Oup Usa.

Doya, K., Ishii, S., Pouget, A., & Rao, R. P. (Eds.). (2007). Bayesian brain: Probabilistic approaches to neural coding. MIT press.

Gripenberg, G. (2003). Approximation by neural networks with a bounded number of nodes at each level. Journal of Approximation Theory. 122 (2): 260–266. 

Gururangan, S., et al. (2023). Scaling Expert Language Models with Unsupervised Domain Discovery. arXiv preprint arXiv:2303.14177.

Hempel, C. G. (1945). Studies in the Logic of Confirmation (II.). Mind, 54(214), 97–121. 

Hendrycks, D., et al. (2020). The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 8340-8349).

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359-366.

Humboldt, W. (1999/1836). On Language: On the diversity of human language construction and its influence on the mental development of the human species. Cambridge University Press.

Ji, Z., et al. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38.

Kahneman, D. Thinking, fast and slow. 2017.

Kanai, R., et al. (2015). Cerebral hierarchies: predictive processing, precision and the pulvinar. Philosophical Transactions of the Royal Society B, 370, 20140169.

Knill, D. C., & Pouget, A. (2004). The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation. TRENDS in Neurosciences, 27(12), 712–719.

Marcus, Gary. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

Mitchell, M. (2021). Abstraction and analogy‐making in artificial intelligence. Annals of the New York Academy of Sciences, 1505(1), 79-101.

Nuzzo, R. (2015). How Scientists Fool Themselves — and How They Can Stop. Nature. 526, 182. https://doi.org/10.1038/526182a. 

Parr, T., Pezzulo, G., & Friston, K. J. (2022). Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press.

Peters, U., et al. (2022). Generalization Bias in Science. Cognitive Science, 46: e13188. 

Popper, K. (1934). The Logic of Scientific Discovery. London, England: Routledge.

Popper, K. (1962). Conjectures and Refutations: The Growth of Scientific Knowledge. London, England: Routledge.

Seger, E., et al. (2020). Tackling threats to informed decision-making in democratic societies: promoting epistemic security in a technologically-advanced world. The Alan Turing Institute. 

Shanahan, M, & Melanie M. (2022). Abstraction for deep reinforcement learning. arXiv preprint arXiv:2202.05839.

Sprenger, J. (2011). Hypothetico-Deductive Confirmation. Philosophy Compass, 6: 497-508. 

Sprenger, J., & Hartmann, S. (2019). Bayesian Philosophy of Science: Variations on a Theme by the Reverend Thomas Bayes. Oxford and New York: Oxford University Press.

Trask, A., et al. (2018). Neural Arithmetic Logic Units. Advances in neural information processing systems, 31.

Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134-1142.

Epistemic justification in (Hu)man and Machine

What does it take for a belief to be epistemically justified? In the hope of providing a novel angle to this long-standing discussion, I will investigate the question of epistemic justification by means of considering not only (what one might call) ‘classical’ cases, but also ‘machine’ cases. Concretely, I will discuss whether—and, if so, on what basis—artificial systems instantiating intelligent behaviour can be said to form epistemically justified ‘beliefs’. This will serve as a sort of thought experiment or case study used to test plausible answers to the problem of epistemic justification and, potentially, derive inspirations for novel ones.

Why do I choose to adopt this methodological approach? Consider, by comparison, the classic question in biology: what is life? Fields such as astrobiology or artificial life allow us to think about this question in a more (and more appropriately) open-minded way—by helping us to uproot unjustified assumptions about what life can and cannot look like based on sampling from Earth-based forms of life alone. The field of artificial intelligence can serve a similar function vis-à-vis philosophical inquiry. Insofar as we aspire for our theories—including our theories of knowledge and epistemic justification—to be valid beyond the contingencies of human intelligence, insights from the study of AI stand in a fruitful intellectual symbiosis with philosophical thought. 

I will start our investigation into epistemic justification with a thought experiment. 

Rome: Consider Alice; when having dinner with her friends, the topic of her upcoming trip to Italy comes up. Alice explains that she will be taking a plane to Rome, Italy’s capital city, from where she will start her journey. 

It seems uncontroversial to say that Alice is epistemically justified in her belief that Rome is in fact the capital of Italy. The question I want to raise here is: in virtue of what is this the case? Before I delve into examining plausible answers to this question, however, let us compare the former story to a slightly different one. 

Rome’: In this case, Bob is playing around with the latest large language model trained and made available by one of the leading AI labs—let’s call it ChatAI. Bob plays with the model in order to get a handle on what ChatAI is and isn’t able to do. At one point, he submits the following query to the model: “What is the capital of Italy?”, and the model replies: “The capital city of Italy is Rome.” 

By analogy to the first case, should we conclude that the model is epistemically justified in its claim that Rome is the capital of Italy? And if not, how are these two cases different? In what follows, I will investigate these questions in more detail, considering various approaches attempting to clarify what amounts to epistemic justification. To do so, I will toggle between considering the traditional (or human) case and the machine case of epistemic justification and study whether this dialogue can provide insight into the question of epistemic justification. 

Correctness (alone) is not enough—process reliabilism for minds and machines

Thus, let us return to a question raised earlier: in virtue of what can we say Alice is justified in claiming that Rome is the capital of Italy? A first observation that appears pertinent is that Alice is correct with her statement. Rome is in fact the capital of Italy. While this appears relevant, it doesn’t represent a sufficient condition for epistemic justification. To see why, we need only think of cases where someone is correct due to mere chance or accident, or even against their better judgement. You may ask me a question about a topic I have never heard of, and yet I might get the answer right by mere luck. Or, in an even more extreme case, we may play a game where the goal is to not give a correct answer. It is quite easily conceivable, in virtue of my utter ignorance of the topic, that I end up giving an answer that turns out to be factually correct, despite trying to pick an answer that I believe to be wrong. In the first case, I got lucky, and in the second case, I uttered the correct answer against my better judgement. In none of these cases would my factually correct answer represent an epistemically justified correct answer. 

As such, I have shown that the truth condition (alone) is an insufficient account of epistemic justification. Furthermore, I have identified a particular concern: that epistemic justification is not given in cases where claim is correct for arbitrary or ‘lucky’ reasons. This conclusion seems to be supported when considering the machine case. If, say, we designed a program that, when queried, iterated through a predefined set of answers and picked one of them at random, then, even if this program happened to pick the correct answers, we wouldn’t feel compelled to consider this a case of epistemic justification. Insofar as we are here taking offense with the arbitrariness of the answer-producing process when considering its status of epistemic justification, we may come to wonder what it would look like for a claim to be correct on a non-arbirary or non-lucky basis. 

To that effect, let us consider the proposal of process relabilism (Goldman, 1979, 1986). At its core, this theory claims that a belief is epistemically justified if it is the product of a belief-formation process that is systematically truth-conducive. In other words, while it is insufficient to observe that a process produces the correct answer on a single and isolated instance, if a process tends to produce the correct answer with a certain reliability, said process acts as a basis for epistemic justification according to the reliabilist thesis. Applied to our Rome case from earlier, the question is thus which processes (e.g., of information gathering and processing) led Alice to claiming that Rome is the Italian capital, and whether these same processes have shown sufficient epistemic reliability in other cases. Let’s say that, in Alice’s case, she inferred her belief that Rome is the capital of Italy as follows. First, her uncle told her that he was about to emmigrate to live in the capital city of Italy. A few weeks later, Alice receives a letter from said uncle which was sent from, as she can tell by the post stamp on the card, Rome. From this, Alice infers that Rome must be the capital of Italy. As such, Alice’s belief is justified insofar as it involved the application of perception, rational reflection, or logical reason, rather than, say, guessing, wishful thinking, or superstitious reasoning. 

Furthermore, we don’t have to understand reliability here merely in terms of the frequency at which a process produces true answers. Instead, we can interpret it in terms of the propensity at which it does so. In the latter case, we capture a notion of truth-conduciveness that pertains not only to the actual-world observed, but is also cognizant of other possible worlds. As such, it aims to be sensitive to the notion that a suitable causal link is required between the given process and its epistemic domain, i.e., what the process is forming beliefs over. This renders the thesis more robust against unlikely but statistically possible cases where an arbitrary process gets an answer repeatedly correct, which would undermine the extent to which process reliabilism can serve as a suitable basis for epistemic justification. To illustrate this, consider the case of the scientific method, where we rely on empiricism to test hypotheses. This process is epistemically reliable not in virtue of getting true answers at a certain frequency, but in virtue of its procedural properties which guarantee that the process will, sooner or later, falsify wrong hypotheses. 

To summarise, according to process reliabilism, a belief-formation process is reliable as a function of its propensity to produce true beliefs. Furthermore, the reliability (as defined just now) of a belief-formation process serves as the basis of epistemic justification for the resulting belief. How does this apply or not to the machine case from earlier (Rome’)? 

To answer this question, let us imagine that Bob continues to play with the model by asking it more questions about the capital cities of other countries. Assuming capabilities representative of the current state of the art in machine learning and large language models in particular, let us say that ChatAI’s responses to Bob’s questions are very often correct. We understand enough about how machine learning works that, beyond knowing that it is merely frequently correct, we can deny that ChatAI (and comparable AI systems) produces correct answers by mere coincidence. In particular, machine learning exploits insights from statistics and optimization theory to implement a form of inference on its training data. To prove this is the case and test the performance of different models, the machine learning communities regularly develop so-called ‘benchmarks’ based on various performance-relevant features of the model being evaluated, such as accuracy as well as speed or (learning) efficiency. As such, AI systems can, given appropriate design and training, produce correct outputs with high reliability and for non-arbitrary reasons. This suggests that, according to process reliabilism, outputs from ChatAI (and comparable AI systems) can qualify as being epistemically justified. 

Challenge 1: “You get out only what you put in”

However, the reliabilist picture as painted so far does not in fact hold up to scrutiny. The first problem I want to discuss concerns the fact that, even if procedurally truth-conducive, a process can produce systematically incorrect outputs if said process operates on wrong initial beliefs or assumptions. If, for example, Alice’s uncle was himself mistaken about what the capital of Italy is, thus moving to a city that he mistakenly thought was the capital, and if he had thus through his words and action passed on this mistaken belief to Alice, the same reasoning process she used earlier to arrive at a (seemingly) epistemically justified belief would now have produced an incorrect belief. Differently put, someone’s reasoning might be flawless, but if based on wrong premises, its conclusions must be regarded as null in terms of their epistemic justification. 

A similar story can be told in the machine case. A machine learning algorithm seeking to identify underlying statistical patterns of a given data set can only ever be as epistemically valid as is the data set it’s being trained on. As a matter of fact, this is a vividly discussed concern in the AI ethics literature, where ML models have been shown to reproduce bias present in their training sets. For example, language models have been shown (before corrective interventions were implemented) to associate certain professions (e.g., ‘CEO’ or ‘nurse’) predominantly with certain genders. Similarly, in the legal context, ML systems used to predict recidivism risk have been criticised for reproducing racial bias.  

What this discussion highlights is that the reliabilist thesis as I stated it earlier is insufficient. Thus, let us attempt to vindicate the thesis before I discuss a second source of criticism that can be raised against it. As such, we can reformulate a refined reliabilist thesis as follows: for a belief to be epistemically justified, it needs to a) be the product of a truth-conducive processes, and b) the premises on which said process operates to produce the (resulting) belief in question must themselves be justified. 

As some might notice, this approach, however, may be at risk of running into a problem of regress. If justified belief requires that the premises on which the epistemic process operates must be justified, how do those premises gain their justification other than by reference to a reliable process operating on justified premises? Without providing, in the context of this essay, a comprehensive account of how one may deal with this regress problem, I will provide a handful of pointers to such attempts that have been made. 

A pragmatist, for example, may emphasise their interests in a process that can reliably produce useful beliefs. Since the usefulness of beliefs is determined by its usage, this does not fall prey to the regress challenge as stated above. A belief can be tested for its usefulness without making reference to another belief. Klein (1999), on the other hand, denies that the type of regress at hand is vicious in the first place, making references to a view called infinitism. According to infinitism, justification requires an appropriate chain of reasons, and in the case of infinitism specifically, such chains take the form of non-repeating infinite ones. Finally, Goldman himself (2008) tackles the regress problem by differentiating between basic and non-basic beliefs, where the former is justified without reference to another belief but in virtue of being the product of an unconditionally reliable process. Such basic beliefs, then, represent a plausible stopping point for such a regress dynamic. Perception has been proposed as a candidate of such an unconditional process, although one may object to this account by denying that it is possible, or common, for perceptual or empirical data to be entirely atheoretical. In any case, the essence of Goldman’s proposal, and of the proposals of externalist reliabilists in general, is that a belief must be justified not with reference to reflectively accessible reasons (which is what internalists propose), but in virtue of the causal process that produced the belief whether or not these processes make reference to other beliefs. As such, externalists are commonly understood to be able to dodge the regress bullet. 

For now, this shall suffice as a treatment of the problem of regress. I will now discuss another challenge to process reliabilism (including its refined version as stated above). It concerns questions regarding the domain in which the reliability of a process is being evaluated. 

Challenge 2: Generalization and its limits

To understand the issue at hand better, let’s consider the “new evil demon problem”, first raised by Cohen (1984) as a critique against reliabilism. The problem arises from the following thought experiment: Imagine a world WD in which there exists an epistemic counterpart of yours, let’s call her Anna, who is identical to you in every regard except one. She experiences precisely what you experience and believe precisely what you believe. According to process reliabilism, you are epistemically justified in beliefs about this world—let’s call it WO—on the basis of those beliefs being the product of truth-conducive processes such as perception or rational reasoning. In virtue of the same reasoning, Anna ought to be epistemically justified in her beliefs about her world. However, and this is where the problem arises, the one way in which Anna differs from you is that her experiences and beliefs of WD have been carefully curated by an evil demon with the aim of deceiving her. Anna’s world does not in fact exist in the way she experiences it. On a reliabilist account, or so some would argue, we would have to say that Anna’s beliefs are not justified, since her belief-formation processes do not reliably lead to correct beliefs. However, how can your counterpart, who in every regard relevant to the reliabilist thesis is identical to you, not be justified in their beliefs while you are? The dilemma arises in that many would intuitively say that Anna is just as justified in believing what she believes as we are, despite the fact that the process that produced Anna’s belief is unreliable. 

One way to cast the above problem–which also reveals a way to diffuse it–is by indexing and then separately evaluating the reliability of the belief-formation processes for the different worlds, WO and WD. From here, as developed by Comesaña (2002), we can make the case that while the belief-formation processes are reliable in the case of WO, they are not in the case of WD. As such, the reliability of a process, and thus epistemic justification, must always be assessed relative to a specific domain of application. 

Another similar approach to the same problem has been discussed for example by Jarrett Leplin (2007, 2009) by invoicing the notion of ‘normal conditions’, a term originally introduced by Ruth Millikan in 1984. The idea is that the reliability of a process is evaluated with respect to the normal conditions of its functioning. Lepin defines normal conditions as “conditions typical or characteristic of situations in which the method is applicable” and explains that “[a] reliable method could yield a preponderance of false beliefs, if used predominantly under abnormal conditions” (Lepin, 2007, p. 33). As such, the new evil demon case can be understood as a case where the epistemic processes which are reliable in a demon-less world cease to be reliable in the demon world, since that world no longer complies with the ‘normal conditions’ that guarantee the functionality of said process. While promising as an approach to address a range of challenges raised against reliabilism, there is, one must note, still work to do in terms of clearly formalising the notion of normality.

What both of these approaches share in common is that they seek to defend reliabilism against the new evil demon problem by means of specifying the domain or conditions in which the reliability of a process is evaluated. Instead of suggesting that, for a process to be reliable—and thus to serve as a basis for epistemic justification—it has to be universally reliable, these refinements to reliabilism seek to formalise a way of putting boundaries on the application space of a given process. As such, we can understand the new evil demon problem as an instance of a more general phenomena: of generalization and its limits. This way of describing the problem serves to clarify how the new evil demon problem relates to issues frequently discussed in the context of machine learning.

The problem of generalization in machine learning concerns the fact that the latter, generally speaking, works by trying to exploit underlying patterns to approximate functions that efficiently describe the data encountered. While this approach (and others) has enabled impressive AI applications to date, it faces important limitations. In particular, this learning method is based on an assumption, commonly called IID (i.e., independent and identically distributed sampling), which says that the data set used in training must be representative of the data encountered upon deployment for there to be a guarantee of the effectiveness or accuracy of the learned model. In other words, while we have guarantees about a model’s performance (i.e., accuracy/loss) under the IID assumption, these guarantees no longer hold when the nature of the distribution changes, i.e., when we encounter what is called a distributional shift. Under distributional shift, whatever approximation function a model has learnt will no longer be effective in the new (deployment) environment. This would be called a case of failure to generalise.

Let us reiterate the suggested analogy between the new evil demon problem and the problem of out-of-distribution generalization failures in machine learning. I claim that the demon world WD represents an ‘outside-of-distribution case’ for the epistemic processes that in our world WO are reliable. Though Anna nominally uses the same processes, because she uses them in an importantly different environment, it makes it seem unsurprising that they turn out to be unreliable in WD. Afterall, the reality of WD differs in fundamental ways from WO (namely, the existence of the evil demon). Insofar as the thought experiment is intended to suggest that the demon itself may be subject to completely different fundamental laws than the ones that govern WO, the same processes that can approximate the fundamental laws of WO are not guaranteed to approximate the fundamental laws that govern WD. As such, I have vindicated process reliabilism from the evil demon problem by squaring what earlier appeared counterintuitive: the same processes that are reliable—and thus the basis for epistemic justification in our world (WO)—can turn out to be unreliable in an environment sufficiently foreign to ours, such as the demon world WD. 

Conclusion 

In this essay, I have set out to evaluate the question of epistemic justification. Most centrally, I discussed whether the proposal of process reliabilism may serve as a basis for justification. To this effect, I raised several challenges to process reliabilism. For example, I observed that a reliable process operating on false premises (or, corrupted data) may cease to systematically produce correct beliefs. We then discussed ways to refine reliabilism to accommodate said concern, and how such refinements may or may not fall prey to a problem of regress. More practically speaking, I linked this discussion to the machine case by explaining how AI systems, even if they may operate on reliable processes, may become corrupted in their ability to produce epistemically justified outputs due to algorithmic bias due to having been trained on non-representative data samples. 

The second challenge to reliabilism I discussed concerns details of how the reliability of a process should be evaluated. In particular, I identified a need to specify and bound a ‘domain of application’ in reference to which a process’s reliability is established. The goal of such a demarcation—which may come in the form of indexing as suggested by Comesaña, in the form of defining normal conditions such as proposed by Leplin, or in some other way—is to be sensitive to (the limits of) a process’s ability to generalise. As such, over the course of this discussion, I developed a novel perspective on the new evil demon problem by casting it as an instance of a cluster of issues concerning generalisation and its limits. While the new evil demon problem is commonly raised as an objection to process reliabilism—claiming that the reliabilist solution to the case is counterintuitive—I was able to vindicate reliabilism from these allegations. Anna’s epistemic processes—despite being nominally the same as ours—do fail to be reliable; however, said failure must not be surprising to us because the demon world represents an application domain that is sufficiently and relevantly different from our world. 

Throughout the essay, I have attempted to straddle both the classical domain of epistemological inquiry, as well as a more novel domain, which one may call ‘machine’ epistemology. I believe this dialogue can be methodologically fruitful, and hope to have been able to provide evidence towards that conviction by means of the preceding discussion. It may serve as source of inspiration; it may, as discussed at the start of this essay, help us appropriately de-condition ourselves from unjustified assumptions such as forms of anthropocentrism; and it may serve as a practical testing ground and source of empirical evidence towards assessing the plausibility of different epistemological theories. Unlike with humans or mental processes, machines provide us with a larger possibility space and more nimbleness in implementing and testing our theoretical proposals. This is not to say that there aren’t dis-analogies between artificially intelligent machines and humans, and as such, any work that seeks to reap said benefits is also required to adopt the relevant levels of care and philosophical rigor. 

As a last, brief and evocative thought before the conclusion of this essay, let us return to a question raised at the very beginning of this essay. When comparing the two cases Rome and Rome’, we asked ourselves whether we should conclude that, by analogy between these two cases, insofar as Alice is deemed justified in believing the capital of Italy is Rome, so must be ChatAI. First, we must recognise that the only way to take this analogy seriously is to adopt an externalist perspective on the issues—that is, at least unless we are happy to get sucked into discussions of the possibility of machine mentality and reflective awareness of their own reasons. While some may take offense with this on the basis of favouring internalism over externalism, others—including me—may endorse this direction of travel for metaphysical reasons (see, e.g., Ladyman & Ross, 2007). Afterall—and most scientific realists would agree on this—whatever processes give rise to human life and cognition, they must in some fundamental sense be mechanistic and materialistic (i.e., non-magical) in just the way machine processes are. As the field of AI continues to uncover ever more complex processes, it would not be reasonable to exclude the possibility that they will, at some point—and in isolated cases already today—resemble human epistemic processes sufficiently that any basis of epistemic justification must either stand or fall for both types of processes simultaneously. This perspective can be seen as unraveling further depth in the analogy between classical and machine epistemology, and as such, provide support towards the validity of said comparison for philosophical and scientific thought.  

Resources

  • Cohen, Stewart (1984). “Justification and Truth”, Philosophical Studies, 46(3): 279–295. doi:10.1007/BF00372907

  • Comesaña, Juan (2002). “The Diagonal and the Demon”, Philosophical Studies, 110(3): 249–266. doi:10.1023/A:1020656411534

  • Conee, Earl and Richard Feldman (1998). “The Generality Problem for Reliabilism”, Philosophical Studies, 89(1): 1–29. doi:10.1023/A:1004243308503

  • Feldman, Richard (1985). “Reliability and Justification”:, The Monist, 68(2): 159–174. doi:10.5840/monist198568226

  • Goldman, Alvin (1979). “What is Justified Belief?” In George Pappas (ed.), Justification and Knowledge. Boston: D. Reidel. pp. 1-25.

  • Goldman, Alvin (1986). Epistemology and Cognition, Cambridge, MA: Harvard University Press.

  • Goldman, Alvin (2008). “Immediate Justification and Process Reliabilism”, in Quentin Smith (ed.), Epistemology: New Essays, New York: Oxford University Press, pp. 63–82.

  • Goldman, Alvin (2009). “Internalism, Externalism, and the Architecture of Justification”, Journal of Philosophy, 106(6): 309–338. doi:10.5840/jphil2009106611

  • Goldman, Alvin (2011). “Toward a Synthesis of Reliabilism and Evidentialism”, in Trent Dougherty (ed.), Evidentialism and Its Discontents, New York: Oxford University Press, pp. 254–290.

  • Janiesch, C., Zschech, P., & Heinrich, K. (2021). “Machine learning and deep learning”, Electronic Markets, 31(3), 685-695.

  • Klein, P. (1999). “Human Knowledge and the Infinite Regress of Reasons,” in J. Tomberlin, ed. Philosophical Perspectives 13, 297-325. 

  • Ladyman, James & Ross, Don (2007). Every Thing Must Go: Metaphysics Naturalized. Oxford University Press.

  • Leplin, Jarrett (2007). “In Defense of Reliabilism”, Philosophical Studies, 134(1): 31–42. doi:10.1007/s11098-006-9018-3

  • Leplin, Jarrett (2009). A Theory of Epistemic Justification, (Philosophical Studies Series 112), Dordrecht: Springer Netherlands. doi:10.1007/978-1-4020-9567-2

Why powerful instrumental reasoning is by default misaligned, and what we can do about it?

Meta: There are some ways I intend to update and expand this line of thinking. Due to uncertainty about when I will be able to do so, I decided to go ahead with posting this version of the essay already.

The AI alignment problem poses an unusual, and maybe unusually difficult, engineering challenge. Instrumental reasoning, while being plausibly a core feature of effective and general intelligent behaviour, is tightly linked to the emergence of behavioural dynamics that lead to potentially existential risks. In this essay, I will argue that one promising line of thinking towards solving AI alignment seeks to find expressions of instrumental reasoning that avoid such risk scenarios and are compatible with or constitutive of aligned AI. After taking a closer look at the nature of instrumental rationality, I will formulate a vision for what such an aligned expression of instrumental reasoning may look like.

I will proceed as follows. In the first section, I introduce the problem of AI alignment and argue that instrumental reasoning plays a central role in many AI risk scenarios. I will do so by describing a number of failure scenarios - grouped into failures related to instrumental convergence and failures related to Goodhart’s law - and illustrate the role of instrumental reasoning in each of these cases. In the second section, I will argue that we need to envision what alignable expressions of instrumental rationality may look like. To do so, I will first analyse the nature of instrumental rationality, thereby identifying two design features—cognitive horizons and reasoning about means—that allow us to mediate which expressions of instrumental reasoning will be adopted. Building from this analysis, I will sketch a positive proposal for alignable instrumental reasons, centrally building on the concept of embedded agency. 

AI risk and instrumental rationality; or: why AI alignment is hard

Recent years have seen rapid progress in the design and training of ever more impressive-looking applications of AI. In their most general form, state-of-the-art AI applications can be understood as specific instantiations of a class of systems characterised by having goals which they pursue by making, evaluating, and implementing plans. Their goal-directedness can be an explicit feature of their design or an emergent feature of training. Putting things this way naturally brings up the question of what goals we want these systems to pursue. Gabriel (2020) refers to this as the “normative” question of AI alignment. However, at this stage of intellectual progress in AI, we don’t know how to give AI systems any goal in such a way that the AI systems end up (i) pursuing the goal we intended to give them and (ii) continuing to pursue the intended goal reliably, across, e.g. distributional shifts. We can refer to these two technical aspects of the problem as the problem of goal specification (Krakovna, 2020; Hadfield-Menell, 2016) and the problem of goal (mis)generalisation (Shah et al. 2022; Di Langosco et al., 2022; Hubinger et al., 2019), respectively. 

At first sight, one might think that finding solutions to these problems is ‘just another engineering problem’. Given human ingenuity, it should be only a matter of time before satisfactory solutions will be presented. However, there are reasons to believe the problem is harder than it may initially appear. In particular, I will argue that a lot of what makes solving alignment hard is behavioural dynamics that emerge from powerful instrumental reasoning. 

By instrumental rationality, I refer to means–end reasoning which selects those means which are optimal in view of the end that is being pursued. It is itself a core feature of general intelligence as we understand it. ‘Intelligence’ is a broad and elusive term, and many definitions have been proposed (Chollet, 2019). In the context of this essay, we will take ‘intelligence’ to refer to a (more or less) general problem-solving capacity, where its generality is a function of how well it travels (i.e., generalised) across different problems or environments. As such, AI is intelligent in virtue of searching over a space of action–outcome pairs (or behavioural policies) and selecting whichever actions appear optimal in view of the goal it is pursuing. 

The intelligence in AI—i.e., its abilities for learning, adaptive planning, means–end reasoning, etc.—renders safety concerns in AI importantly disanalogous to safety concerns in more ‘classical’ domains of engineering such as aerospace or civil engineering. (We will substantiate this claim in the subsequent discussion.) Furthermore, it implies that we can reasonably expect the challenges of aligning AI to increase as those systems become more capable. That said, observing the difficulty of solving AI alignment is interesting not only because it can motivate more work on the problem; asking why AI alignment is hard may itself be a productive exercise towards progress. It can shine some light on the ‘shape’ of the problem, identify which obstacles need to be overcome, clarify necessary and/or sufficient features of a solution, and illuminate potential avenues for progress. 

Both the philosophical and technical literature has discussed several distinct ways in which such means–end optimisation can lead to unintended, undesirable, and potentially even catastrophic outcomes. In what follows, we will discuss two main classes of such failure modes – instrumental convergence and Goodhart’s law – and clarify the role of instrumental reasoning in their emergence. 

Instrumental convergence describes a phenomenon where rational agents pursue instrumental goals, even if those are distinct from their terminal goals  (Omohundru, 2008; Bostrom, 2012). Instrumental goals are things that appear useful towards achieving a wide range of other objectives; common examples include the accumulation of resources or power, as well as self-preservation. The more resources or power are available to an agent, the better positioned it will be at pursuing its terminal goals. Importantly, this also applies if the agent in question is highly uncertain about what the future will look like. Similarly, an agent cannot hope to achieve its objectives unless it is around to pursue them.This gives rise to the instrumental drive of self-preservation. Due to their instrumental and near-universal usefulness, instrumental goals act as sort of strategic attractors which the behaviour of intelligent agents will tend to converge towards. 

How can this lead to risks? Carlsmith (2022) discusses at length whether power-seeking AI may pose an existential risk. He argues that misaligned agents “would plausibly have instrumental incentives to seek power over humans”, which could lead to “the full disempowerment of humanity”, which in turn would “constitute an existential catastrophe”. Similarly, the instrumental drive of self-preservation gives rise to what in the literature is discussed under the term of safe interruptibility (e.g., Oreseau and Armstrong, 2016; Hadfield-Menell et al., 2017) AI systems may resist attempts to be interrupted or modified by humans, as this would jeopardize the successful pursuit of the AI’s goals.

The second class of failure modes can be summarised under the banner of Goodhart’s law. Goodhart’s law describes how, when metrics are used to improve the performance of a system, and when sufficient optimisation pressure is applied, optimising for these metrics ends up undermining progress towards the actual goal. In other words, as discussed by Garrabrant and Manheim (2019), the “Goodhart effect [occurs] when optimisation causes a collapse of the statistical relationship between a goal which the optimiser intends and the proxy used for that goal.” This problem is well-known far beyond the field of AI and finds application in such domains as management or policy-making, where proxies are commonly used to gauge progress towards a different, and often more complex, terminal goal. 

How does this manifest in the case of AI? It is relevant to this discussion to understand that, in modern-day AI training, the objective is not directly specified. Instead, what is specified is a reward (or loss) function, based on which the AI learns behavioural policies that are effective at achieving the goal (in the training distribution). Goodhart’s effects can manifest in Ai applications in different forms. For example, the phenomenon of specification gaming occurs when AI systems come up with creative ways to meet the literal specification of the goal without bringing about the intended outcomes (e.g., Krakovna et al., 2020a). Tthe optimisation of behaviour via the reward signal can be understood as exerting pressure on the agents to come up with new ways to achieve the reward effectively. However, achieving the reward may not reliably equate to achieving the intended objective. 

Sketching a vision of alignable instrumental reason

In the previous section, I argued that instrumental rationality is linked to behavioural dynamics that can lead to a range of risk scenarios. From here, two high-level avenues towards aligned AI present themselves. 

First, one may attempt to avoid instrumental reasoning in AI applications altogether, thereby eliminating (at least one) important driver of risk. However, I argue that this avenue appears implausible upon closer inspection. At least to a large extent, what makes AI dangerous is also what makes it effective in the first place. Therefore, we might not be able to get around some form of instrumental rationality in AI, insofar as these systems are meant to be effective at solving problems for us. Solutions to AI alignment need not only be technically valid; they also need to be economically competitive enough such that they will be adopted by AI developers in view of commercial or military–strategic incentives. Given this tight link between instrumental rationality and the sort of effectiveness and generality we are looking to instantiate in AI, I deem it implausible to try to build AI systems that avoid means–end reasoning entirely.  

This leads us to the second avenue. How can we have AI that will effective and useful, while also being reliably safe and beneficial? The idea is to find expressions of instrumental reasoning that do not act as risk drivers and, instead, are compatible with, or constitutive of aligned AI. Given what we discussed in the first section, this approach may seem counterintuitive. However, I will argue that instrumental reasoning can be implemented in ways that lead to different behavioural dynamics than the ones we discussed earlier. In other words, I suggest recasting the problem of AI alignment (at least in part) as the search for alignable expressions of instrumental rationality. 

Before we can properly envision what alignable instrumental rationality may look like, we first need to analyse instrumental reasons more closely. In particular, we will consider two aspects: cognitive horizons and means. The first aspect is about restricting the domain of instrumental reason along temporal, spatial, material, or functional dimensions; the second point investigates how, if at all, instrumental rationality can substantially reason about means. 

Thinking about modifications to instrumental rationality that lead to improvements in safety is not an entirely novel idea. One way to modify the expression of instrumental reasoning is by defining, and specifically limiting, the ‘cognitive horizon’ of the reasoners, which can be done in different ways. One approach, usually discussed under the term myopia (Hubinger 2020), is to restrict the time horizon that the AI takes into account when evaluating its plans. Other approaches of a similar type include limiting the AI’s resource budget (e.g,. Drexler (2019), chapter 8), or restricting its domain of action. For example, we may design a coffee-making AI to act only within and reason only about the kitchen, thereby preventing it from generating plans that involve other parts of the world (e.g., the global supply chain). The hope is that, by restricting the cognitive horizon of AI, we can undercut concerning behavioural dynamics like excessive power-seeking. While each of these approaches comes with its own limitations and no fully satisfying solution to AI alignment has been proposed, they nevertheless serve as a case-in-point for how modifying the expression of instrumental reasoning is possible and may contribute towards AI alingment solutions. 

A different avenue for modifying the expression of instrumental reasoning is to look more closely at the role of means in means–end reasoning. It appears instrumental reasoners do not have substantive preferences over the sorts of means they adopt, beyond wanting them to be effective at promoting the desired ends. However, there are some considerations that let us re-examine this picture. Consider, for example, that, given the immense complexity of the consequentialist calculus combined with the bounded computational power of real agents, sufficiently ‘sophisticated’ consequentialist reasoners will fare better by sometimes choosing actions on the basis of rule- or virtue-based reasoning as opposed to explicit calculation of consequences. Another argument takes into account the nature of multi-agent dynamics, arguing that positive-sum dynamics emerge when agents are themselves transparent and prosocial. Oftentimes, the best way to credibly appear transparent or prosocial is to be transparent and prosocial. Finally, functional decision theory suggests that, instead of asking what action should be taken in a given situation, we should ask what decision algorithm we should adopt for this, and any future or past decision problems with structural equivalence (Yudkowsky and Soares, 2017). Generally speaking, all of these examples look a lot like rule- or virtue-based reasoning, instead of what we typically imagined means–end reasoning to look like. Furthermore, they refute at least a strong version of the claim that instrumental reasoners cannot substantially reason about means is disputable. 

With all of these pieces in place, we can now proceed to sketch out a positive proposal of aligned instrumental rationality. In doing so, I will centrally draw on work on the embedded agency as discussed by Demski and Garrabrant (2019). 

Embeddedness refers to the fact that an agent is itself part of the environment in which it acts. In contrast to the way agents are typically modelled—as being cleanly separated from their environment and able to reason about it completely—embedded agents are understood as living ‘inside’ of, or embedded in, their environment.  Adopting the assumption of embedded agency into our thinking on instrumental rationality has the following implications. For a Cartesian agent, the optimality of a means M is determined by the M’s effects on the world W. Effective means move the agent closer to its desired world state, where the desirability of a world state is determined by the agent’s ends or preferences. In the case of an embedded agent, however, the evaluation of the optimality of means involves not only M’s effects on the world (as it does in the case of the Cartesian agent), but also M’s effects on the future instantiation of the agent itself.

For example, imagine Anton meets a generous agent, Beta, who makes the following offer: Beta will give Anton a new laptop, conditional on Anton having sufficient need for it. Beta will consider Anton legitimately needy of a new laptop if his current one is at least 5 years old. However, Anton’s current laptop is 4 years old. Should Anton lie to Beta in order to receive a new laptop nevertheless? If Anton is a ‘Cartesian’ instrumental reasoner, he will approach this question by weighing up the positive and negative effects that lying would have compared to telling the truth. If Anton is an embedded instrumental reasoner, he will additionally consider how taking one action over the other will affect his future self, e.g., whether it will make him more likely to lie or be honest in the future. What is more, beyond taking a predictive stance towards future instantiations of himself, Anton can use the fact that his actions today will affect his future self as an avenue to self-modification (within some constraints). For example, Anton might actively wish he was a more honest person. Given embeddedness, acting honestly today contributes towards Anton being a more honest person in the future (i.e., would increase the likelihood that he will act honestly in the future). As such, embedded agency entails that instrumental reasoning becomes entangled with self-predictability and self-modification. 

Bringing things back to AI risk and alignment. Insofar as instrumental reasoning in Cartesian agents tends to converge towards power-seeking behaviours, what will instrumental reasoning in Cartesian agents converge towards? This is, largely, an open question. However, I want to provide one last example to gesture at what the shape of an answer might look like. Let’s assume agent P’s goal is to bring about global peace. Under ‘classical’ instrumental convergence, we may worry that an ‘attractive’ strategy for an instrumental reasoner is to first seek enough power to dominate the world and then, as P’s final ruling so to speak, instantiate global peace. Was a powerful future AI system to pursue such a strategy, that would surely seem problematic. 

Let us now consider the same setup but this time with P’ as an embedded instrumental reasoner. P’ would likely not adopt a strategy of ruthless power-seeking because doing so would make future versions of itself more greedy or violent and less peace-loving. Instead, P’ may be attracted to a strategy that could be described as ‘pursuing peace peacefully’. Importantly, a world of global peace is a world where only peaceful means are being adopted by the actors. Taking actions today that make it more likely that future versions of P’ will act peacefully thus starts to look like a compelling, potentially even converging, course of action. 

As such, instrumental reasoning in embedded agents, via such mechanisms as self-prediction and self-modification, is able to reason substantially about its means beyond their first-order effects on the world. I believe this constitutes a fruitful ground for further alignment research. Key open questions include what sorts of means–ends combinations are both stable (i.e. convergent) and have desirable alignment properties (e.g. truthful, corrgibility, etc.). 

In summary, I have argued that, first, the AI alignment problem is a big deal. The fact that alignment solutions need to be robust in face of intelligent behaviour (e.g. adaptive planning, instrumental reasoning) puts it apart from more ‘typical’ safety problems in engineering. Second, with the help of a number of examples and conceptual arguments mainly centred around the notion of embedded agency,  I contended that it is possible to find expressions of instrumental reasoning that avoid the failure modes discussed earlier. As such, while I cannot conclude whether AI alignment is ultimately solvable, or will de facto be solved in time, I wish to express reason for hope and a concrete recommendation for further investigation. 


Resources

  • Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.

  • Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., ... & Amodei, D. (2018). The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv preprint arXiv:1802.07228.

  • Bostrom, N. (2012). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines, 22(2), 71-85.

  • Carlsmith, J. (2022). Is Power-Seeking AI an Existential Risk?. arXiv preprint arXiv:2206.13353.

  • Christiano, P. (2019). Current work in AI alignment. [talk audio recording]. https://youtu.be/-vsYtevJ2bc 

  • Chollet, François. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.

  • Demski, A., & Garrabrant, S. (2019). Embedded agency. arXiv preprint arXiv:1902.09469.

  • Di Langosco, L. L., Koch, J., Sharkey, L. D., Pfau, J., & Krueger, D. (2022). Goal misgeneralisation in deep reinforcement learning. In International Conference on Machine Learning (pp. 12004-12019). PMLR.

  • Drexler, K. E. (2019). Reframing superintelligence. The Future of Humanity Institute, The University of Oxford, Oxford, UK.

  • Evans, O., Cotton-Barratt, O., Finnveden, L., Bales, A., Balwit, A., Wills, P., ... & Saunders, W. (2021). Truthful AI: Developing and governing AI that does not lie. arXiv preprint arXiv:2110.06674.

  • Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and machines. 30(3), 411-437.

  • Hadfield-Menell, D., Russell, S. J., Abbeel, P., and Dragan, A. (2016).Cooperative inverse reinforcement learning. Advances in Neural Information Processing Systems, 29:3909–3917

  • Hadfield-Menell, D., Dragan, A., Abbeel, P., & Russell, S. (2017). The off-switch game. In Workshops at the Thirty-First AAAI Conference on Artificial Intelligence.

  • Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., and Garrabrant, S. (2019). Risks from learned optimisation in advanced machine learning systems. arXiv preprint arXiv:1906.01820.

  • Hubinger, E. (2020). An overview of 11 proposals for building safe advanced AI. arXiv preprint arXiv:2012.07532.

  • Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., and Legg, S. (2020). Specification gaming: the flip side of AI ingenuity. DeepMind Blog.

  • Krakovna, V., Orseau, L., Ngo, R., Martic, M., & Legg, S. (2020). Avoiding side effects by considering future tasks. Advances in Neural Information Processing Systems, 33, 19064-19074.

  • Manheim, D., & Garrabrant, S. (2018). Categorizing variants of Goodhart's Law. arXiv preprint arXiv:1803.04585.

  • Omohundro, S. M. (2008). The basic AI drives. In AGI (Vol. 171, pp. 483-492).

  • Orseau, L., & Armstrong, M. S. (2016). Safely interruptible agents.

  • Prunkl, C., & Whittlestone, J. (2020). Beyond near-and long-term: Towards a clearer account of research priorities in AI ethics and society. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 138-143).

  • Shah, R., Varma, V., Kumar, R., Phuong, M., Krakovna, V., Uesato, J., & Kenton, Z. (2022). Goal Misgeneralisation: Why Correct Specifications Aren't Enough For Correct Goals. arXiv preprint arXiv:2210.01790.

  • Soares, N., Fallenstein, B., Armstrong, S., & Yudkowsky, E. (2015). Corrigibility. In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence.

  • Wedgwood, R. (2011). Instrumental rationality. Oxford studies in metaethics, 6, 280-309.

What do we care about in caring about freedom?

Typically, when freedom is discussed in political philosophy, the discussion revolves around the question concerning the nature of freedom. In this essay, I want to ask a related but different question: why do we care about freedom? Or, what is it that we care about when we care about freedom?

Why do I propose this is a worthwhile question to ask? In discussions of the nature of freedom, it is usually assumed that freedom - or whichever specific notion of it is being defended - is something worthwhile of protection. However, for this conversation to remain calibrated around what matters most, it is critical to have a solid understanding of why we deem freedom to be worthy of protection. Clarifying the why-questions can help add clarity and precision to various other questions concerning freedom, some of practical importance. For example, it can illuminate discussions about what strategies are more or less apt for protecting and fostering freedom in society.

Alas, I will proceed as follows. I will start with a broad discussion of why it is that we care about freedom. Of course, the reasons for caring about freedom are importantly intertwined with the question of the nature of freedom. The answer to the former will depend on which notion of freedom is being adopted. In the first section, I thus aim to explore the spectrum of reasons underlying and evoked by the different notions of freedom, thereby situating this essay in the centuries-long discussion on the topic. In doing so, I identify three core reasons for why we care about freedom which I will briefly expand on them one at a time. Next, I will dive into one of those reasons in more depth, namely freedom as a protector of the unfolding of humanity’s collective potential. I chose to focus on this reason because it remains the most scarcely discussed among them, all the while being of great moral importance. In the second section, I will discuss what it is I mean by humanity's potential and think it morally matters. In the third section, I will discuss how freedom enables the processes that drive cultural and moral progress, thereby protecting us from scenarios of cultural and moral lock-in.

Section 1: Why do we care about freedom?

It can hardly be said that the discussion of freedom among political philosophers is a shallow one. Given the amount of ink spilt on the topic, the avid reader can safely assume that thinkers at least agree on one thing: the fact that freedom, whatever exact definition one chooses to assume, is important. From here, a further question ensures: Why is freedom important? Why do we care about it? What is it that we care about in caring about freedom?

I propose to central the space of possible reasons into three: freedom as protection from oppression and authoritarianism, freedom as an enabler of the unfolding of individual human potential, and, lastly, freedom as an enabler of the unfolding of humanity’s collective potential. A given theory of the nature of freedom may agree with all three of these reasons while another one may only buy into a subset of them, and different theories may put different emphasis on the importance of these reasons. We will discuss each of those reasons in turn.

Reason 1: By protecting individual freedom, we protect ourselves and our society from oppression and authoritarianism.

This raison d’être of freedom figures most centrally in the so-called notion of negative freedom. Negative freedom - a term first introduced by Isaiah Berlin in his seminal essay titled “Two concepts of liberty” - can be summarized as freedom from something.

Freedom from what, one may ask? For Hobbes, freedom was centrally about the absence of physical coercion of the body. Locke would later expand on this arguably rather thin notion of freedom by adding that a person can also be rendered unfree by means of coercion of the will. Freedom, as it is conceived here, carries the central purpose of protecting the individual from oppression and coercion by an external, more powerful source. Whether it be coercion of the body or the will, freedom is being limited in virtue of the fact that alternative choices, or alternative paths of action, were rendered ineligible by external agents. This view can be summarized as understanding freedom as non-interference. Out of this tradition of thoughts grew political liberalism and the view that of central concern is to protect individual freedom from the freewheeling power of the state.

Some, however, thought the notion of freedom as non-interference too shallow. Consider the example of a master and their slave. A master may not interfere with his slave’s will to eat their meagre lunch. Nevertheless, the slave is unfree in the sense that the master could, at any point, for no good reason but their own whims, decide to interfere with the slave’s plan and take away the slave’s meal. Thus, the slave is unfree in the sense of being unfree from arbitrary applicatoin of power - or: domination. This is the idea of freedom as non-domination sometimes also called ‘republican freedom’. (Pettit, 2001) This notion of freedom embraces the importance of the rule of law - the absence of arbitrariness in the functioning of a just state. Under this view, it is thus not power as such that necessarily limits freedom, but whether or not the wielding of power is subject to a legitimate - as opposed to an absent or arbitrary set of rules. (Skinner, 2013)

While the protection from oppression and authoritarianism is a central concern of thinkers in the negative tradition, it is also one of the raison d’être - although not the central one - of positive freedom. Here, freedom tends to be understood as residing in the collective governance over one’s lives as a means to individual and/or collective self-determination. Proponents of the positive tradition include thinkers like Rousseau and Marx [1], among many others. Under this view, it is not required to limit state activity to its bare minimum. State activity can - in certain areas of life and under the right choices of institutional design - be desirable or even necessary for achieving certain goods critical to the goal of self-realization. That said, this view still places certain constraints on what legitimate structures of governance look like. For example, legitimacy can come from citizens' ability to participate freely in the democratic process, thereby co-determining its outcome. As such, oppression and authoritarianism remain an outcome that defenders of positive freedom aim to protect against by entrenching or making more secure our negative freedoms.

Moving on to the second reason we may care about freedom.

Reason 2: By protecting individual freedom, we enable the unfolding of individual human potential.

We find this raison d’être for freedom most prominently reflected in positive notions of freedom.

Insofar as the notion of positive freedom points at the idea of freedom as the presence of something, this ‘something’ (e.g. self-determination or self-governance) can be often understood as being at the service of a safe or virtuous unfolding of human nature.

When looking at the concept of freedom from a genealogical standpoint, we can find early traces of freedom as a means to the unfolding of human potentiality in thinkers who are typically classified as a representative of the negative tradition. Early defenders of the negative tradition of liberty, as we have seen above with Hobbes and Locke, viewed obstacles to freedom as always originating from external sources. John Stuart Mill, author of the seminal book On Liberty, however, recognised that one’s freedom can also be curtailed by internal forces. [2] According to MiIll, if I act from my passions - anger, fear, greed, lust, etc. - without appropriate consultation of reason, I am in fact acting unfreely. Similarly, if I am acting unauthentically, in other words, if my choices are determined in the absence of proper introspection and reasoning - but instead by habit, unreflectively adopted cultural norms and the like -, I am acting unfreely. Or, if I am acting based on a ‘false conscious’ - a misguided understanding of my real interests -, I too am acting unfreely.

A similar idea was picked up and expanded on by more modern thinkers. Charles Taylor (1985), for example, - who however stands more firmly within the positive tradition - defends a similar position by arguing that certain emotions - such as spite, irrational fear, vanity - are to be understood as alien to us. We would be, according to Talyor, better off without them and getting rid of them wouldn’t cause us to lose anything important. Thus, he argues, there is a clear sense in which having such emotions can make us unfree with respect to our authentic or genuine goals, desires or purpose.

In order to help us think about freedom as enabler of self-realization, let us, on top of the negative-positive axis, add another axis that can help chart out the space of notions of freedom: exercise- vs opportunity-based notions. In the former, freedom is concerned with “the extent that one has effectively determined oneself and the shape of one’s life.” (Taylor, 1985, p. 213) In other words, freedom is determined not merely by my possibilities, but by what I actually do. In the case of opportunity-based notions, “freedom is a matter of what one can do, of what it is open to us to do, whether or not we do anything to exercise these options”. (p. 213)

Self-realization - the idea of unfolding one’s potential - maps onto the exercise concept. (In particular, it does so more cleanly than onto either positive or negative notions of freedom.) It is something that necessarily has to be acted out and realized for it to be of value. If someone is free to realize their potential in the sense of not being impeded from doing so, but this person has not realized that potential because, say, they are unaware of their potential, paralysed by doubt or because the societally accepted notion of the ‘good life’ conflicts with their own sense of where their potential lies, they may still lack freedom insofar as we care about freedom as a means to self-realization.

Somewhere in between a pure exercise and a pure opportunity based conception of freedom, we can situate the so-called capabilities approach. It, too, recognises that opportunity alone may not amount to substantive freedom. Instead, it is only capabilities that amount to substantive freedoms as it is capabilities that allow a person to acquire or achieve the things they desire. Resources, means, opportunities and even formal rights alone are empty if the person lacks the capabilities to realize them. Amartya Sen and Martha Nussbaum are maybe the most prominent names associated with this view. [3]

So far, we have discussed the idea of freedom as means of unfolding individual human potential, of realizing some higher or true ‘self’, or of actualizing some authentic ‘way of life’. This raises the question: what is this ‘self’ that is being realized? In other words, the notion of a higher (as opposed to lower) or true (as opposed to false) self appears to assume some view of the true essence of human nature. Different thinkers provide different takes on this question. As we have just seen, thinkers like Mill or Taylor find their notion of the ’self’ that is meant to be realized by differentiating between lower and higher-order motivations, wherein some desires or drives may be considered foreign to us, while others represent what we really want, our ‘true will’. Hannah Arendt holds a different notion of human nature According to Arendt, freedom needs to be understood as freedom of action, in particular, action within the political arena. Accordingly, we are only free if so far as we are free to exercise our power within the polis, or the political sphere. [4] As such, human nature is understood to be essentially political, and the unfolding of human potential is inevitably intertwined with the full and equal participation in the collective political life. (Arendt, 1961)

We have seen how these different notions of freedom all stand in the service of the unfolding of individual human potential. Such self-knowledge, authenticity and self-realization may be viewed as a moral good in and of itself, or it may be viewed as the right thing to aim for in virtue of tending to produce morally virtuous or happy people. The role of freedom in the unfolding of human potential can come in the form of protecting said unfolding from external or internal factors that may interfere with it; or in the form of supporting the realization of said potential by such means as material resources, capabilities and institutional or cultural structures.

Let us now turn to the third reason for caring about freedom.

Reason 3: By protecting individual freedom, we protect and enable the unfolding of humanity’s collective potential

Less has been written about the relationship between freedom and humanity’s long-run trajectory, so we will explore it in more detail here. To do so, we will first discuss what it is I mean by ‘humanity’s collective potential’ and why one might care about it. Then, I will explore the arguments for whether and how freedom may help with its unfolding.

Section 2: Humanity’s trajectory, past and future

In the same way we just explored what we might understand by ‘individual human potential’ and its unfolding, we should also clarify the notion of ‘humanity’s collective potential’. To do so, we will first turn our metaphorical gaze toward the past, at human history. In doing so, we may conclude that humanity has come a long way. From hunter-gatherer tribes to the agricultural revolution, the discovery of the new world, the industrial revolution, the abolition of slavery, women’s suffrage, the introduction of the welfare state, the establishment of an internal order aimed at limiting the inhumanity of wars [5], the discovery of penicillin, the eradication of smallpox, the dawn of the internet - to only name a few milestones in human history. Zooming out to timescales of hundreds and thousands and millions of years, we can generally witness a drastic increase in human welfare. This increase stems, in large parts, from economic growth, technological advances, better and more comprehensive protection of fundamental rights, respect for a growing set of varied ways of life and means of human self-expression. [6]

Far from arguing that the human history has not also been the witness of an unspeakable amount and severity of moral tragedy; and far from claiming that, today, the face of the earth is free of injustice, violence or other forms of needless suffering - the average human being is drastically better off living today rather than 100, 1000 or 10’000 years ago. Accordingly, we may harbour the hope that this trend of the general betterment of the fate of sentient life may continue.

In his 1981 book The Expanding Circle, Peter Singer introduces the idea of the moral circle as a way to conceptualize the nature of moral progress. Humanity, over the course of its own history, has expanded its circle of moral concern from the individual, the family and the tribe to include forms of life increasingly more dissimilar to oneself. At the time of writing, Singer was at the forefront of the up-and-coming animal liberation movement. Said movement, based its moral mission on the idea that non-human animals, in virtue of their capacity to experience suffering and pain, ought to be considered as moral patients and treated with the respective respect and care. As time passes on and progress continues, the moral circle may expand yet further.

The point of this essay is not to make a substantive claim about where the moral circle ought to extend to next. Instead, its point is to argue that there are immense moral stakes in ensuring that the processes driving cultural and moral progress continue. As argued by Williams (2015), we may not know where, but chances are that there are moral atrocities unfolding at this very time that we cannot even yet identify as such. Looking back at the relevant periods in human history, it is not that the ill-treatment of slaves, women and non-human animals was recognized as morally wrong yet it was tolerated - the first step to the emancipation of these groups resided in the mere acknowledgement of their moral worth. One thing we can learn from history in this regard is just how blind we may be to injustices that may unfold in front of our eyes. The specific lesson I am pointing at here is not to advocate for one or the other minority group to gain protection and a voice - albeit these may well be worthwhile missions to support. The specific point I am trying to make concerns a different puzzle: what ought one do when one is ignorant of where one’s mistake lies, but has reason to believe that one may well be committing some mistake(s)?

The answer, or so I claim, lies at least in parts in investing in the processes that tend to drive moral and cultural progress. From the point of view of the ignorant or morally uncertain (MacAskill et al. 2020), investing in those processes is the best guess for discovering that which you don’t yet know you need to discover. We will explore what freedom plays in the functioning of these processes in the next section.

Before that, however, let us consider one more thought. The argument for the importance of protecting the processes driving moral and cultural progress does not merely rely on an argument from ignorance, as discussed above. What is more, we live in a physical world that is ever-changing. This can be easily forgotten when reasoning from within history and inside of a set of cultural narratives that conceive of humanity as the pinnacle of evolution. However, once spelt out, there is little question to the matter that the world, as well as the human species itself, keeps on changing. The answer to the question of what it means to be human - be that from a cultural, sociological or biological perspective - is evolving every second of every day. Biological evolution is shaping the human gene pool; niche construction is moulding the environment we live in; cultural evolution is forming and re-forming the institutions that govern our lives; scientific and technological progress is shaping our socio-economic realities. What is more, technological progress may in the not-to-far future start to affect life itself. Be it in the form of transhumanism, artificial intelligence, artificial consciousness, or new hybrid or fully-artificial forms life: technological progress will force upon us new cultural and moral questions. [7] These aren’t questions we can choose to ignore. As such, moral progress is not just about arriving at better and more nuanced views on questions that pose themselves in front of some stating background. It is also about finding answers to entirely new questions that arise from changes to the background condition of life itself. As such, moral progress is about adapting to new and constantly changing realities. The world will look different in 100, 1’000 and 10’000 years. In the same way that I do not want to live my life today based on the cultural preconceptions of the 16th century, so do I not want future generations to e forced to live based on the cultural and moral conceptions of today.

So far, I have claimed that one of the reasons that we care about freedom is as a means for the unfolding of humanity’s collective potential. We discussed what we might mean by collective human potential, and why it appears of great importance for the processes that drive cultural and moral progress to continue. However, we have not yet discussed in much detail how it is that freedom is meant to protect and advance such progress. This is what we will tackle next.

Section 3: The role of freedom in the unfolding of humanity’s collective potential

So, what does freedom have to do with it? To explore this question, I will first introduce the idea that freedom acts as the engine of exploration and then discuss how an engine that is based on indviual freedom differentially selects its direction of progress on the basis of reason more so than power. We will see that this unfolding happens in an open-ended fashion, with no need to know its own destination before arrival, thereby providing some robustness against moral imperialism and related concerns. In making these arguments, I will help myself to insights from evolutionary theory, Austrian economic thought as well as insights from complexity science.

In Generalizing Darwinism to Social Evolution: Some Early Attempts, Geoffrey M. Hodgson (2005) writes: “Richard Dawkins coined the term universal Darwinism (1983). It suggests that the core Darwinian principles of variation, replication and selection may apply not only to biological phenomena, but also to other open and evolving systems, including human cultural or social evolution.” Borrowing from this idea of Universal Darwinism, freedom can be understood as a is a socio-political institution that protects the production of ‘variation’ (in the form of ‘ways of life’, perspectives and ideas). In the analogy with Darwinian evolution, negative freedom can be understood as serving the purpose of preventing any small number of powerful actors (akin to, say, some invasive species) to take over the entirety of the ecosystem, thereby locking in their singular ‘way of life’ (i.e. cultural norms, ideologies, moral perspective, etc.). Conversely, positive freedom and the fostering of capabilities allow individual actors or groups to property actualize, explore and develop their own ideas, perspectives and ways of life. In doing so, these ideas get a chance mature, be tested, be improved upon and - if they prove unsatisfactory or non-adaptive - get dropped.

The insight that exploration is crucial to progress has long been understood in economics and the philosphy of science. In his 1942 book Capitalism, Socialism, and Democracy, Joseph Schumpeter introduced the concept of entrepreneurship into modern economic thought and argued for its critical role in innovation. It is individual people who notice problems, apply their creativity to come up with solutions and take risks who become the ‘engine’ of capitalism. Friedrich Hayek (1945), also a scholar of the Austrian School of Economics, added a further puzzle to this view by illuminating the epistemic function of the market. As opposed to a system of centralized economic activity, a decentralized market is able to exploit the local information of individual actors. These insights stand at the origin of the libertarian view that aims to protect individual (economic) liberty against state interventionism, with the goal of protecting innovation and the proper functioning of the ‘free market’. [8] In turning our gaze philosophy of science, we find a similar logic at play there. In science, where the goal is for the truth to prevail, freedom of thought allows for arguments to meet each other, and for the better ones to prevail (Popper, 1935). In society, indivudal freedom allows individuals to pursue their own way of life, thereby allowing humanty at a collective level to explore and compare a wider range of possibilities. Insofar as we consider the current set and configuration of ideologies, cultural norms, and political economics as unsatisfactory, it matters that humanity continues to explore new configurations.

The study of complex systems provides us with useful vocabulary for talking about this general pattern. What we see is how the emergence of macro-level dynamics (such as economic growth, scientific progress, cultural trends or moral progress) is driven by micro-level processes (such as the actions and ideas of individual human beings). We can ask, according to this view, what happens to the macro-level dynamics if we protect (or fail to do so) individual freedom at the micro-level? In this framework, freedom can be understood as a constitutive factor in the mechanism of exploration. In its absence, we risks not exploring widely enough, resulting in a premature convergence to some local optimum that represents a suboptimal set of views.

At this point, we start to glean the connection between the reason d’être of freedom as a means to the unfolding of individual potential and the raison d’être as the means of the unfolding of collective potential. If not for the first - if freedom isn’t able to provide to the individual a space in which they can authentically and thoughtfully explore their ideas and ways of life - then freedom could never fulfil its purpose of driving collective progress.

However, exploration alone is not enough. The question arises: according to what criteria do the sociological processes driving cultural and moral progress select among new options? This question assumes - correctly so - that these processes we have been talking about are not merely random walks. The process is guided by something - but that something emerges from a bottom-up, decentralized logic rather than being imposed on the system in some top-down fashion. There are some mechanisms that can be studied and understood, and maybe even influenced, that drive the direction of progress. The hope is - after all, this is a philosophical essay - that reason may be the overwhelming factor in selecting between new forms and configurations of cultural norms and moral beliefs. But is that hope justified?

I claim it is, and freedom itself is the key to why. In short, it is precisely that which freedom - letting reason come to the surface and prevail. We can find support for this view in the writings of two authors we have encountered earlier, namely Mill and Dewey. In so far as Mill recognises that freedom can be limited by internal coercive forces, he also believed that what it meant to be free from those forces is for reason to prevail over passion, inauthenticity or false-conscious. John Dewey provided a more detailed account of a similar idea, discussing how exactly it is that reason - on top of mere habitual reaction - interacts with human action and decision making. In What is Freedom?, Dewey explains how, according to him “the only freedom that is of enduring importance is freedom of intelligence, that is to say, freedom of observation and of judgment exercised on behalf of purposes that are intrinsically worthwhile” (p. 39) and continues in saying that “impulses and desires that are not ordered by intelligence are under the control of accidental circumstances. [...] A person whose conduct is controlled in this way has at most only the illusion of freedom. Actually, he is directed by forces over which he has no command.” (p. 42)

If it is the case that freedom is about the triumph of reason, what we do when we protect feedom is to make sure that reason will trump power in the evaluation of new cltural or moral ideas. Furthermore, at the collective level, the protection of individual freedom and in particular values such as tolerance, respect and free speech creates a public arena in which different views and ‘ways of life’ can interact in a way that allows for the potential of mutual enrichment. This adds another elements to the engine of moral and cultural progress.

One may object to this whole notion of moral progress that it risks being merely a cover for some form of moral imperialism, or that it implies (without making the case for) some notion of value monism. However, there is in fact nothing inherent to the logic of these mechanisms that they have to converge to some singular view on the ‘right’ cultural norms, way of life or moral view has to triumph. This is not what we see in biology, economics nor science either. In fact, it is precisely freedom that constitutes a critical ingredient in the machinery of moral progress - in acting as a sort of buffer - which makes it compatible with value pluralism as discussed, among others, by Berlin (1958) and Sen (1981).

As such, the unfolding of humanity’s collective potential should be understood as an open-ended process. We need not know where the journey takes us as long as the journey is guided by a process with the right properties. In fact, given the epistemic position we find ourselves in by the virtue of reasoning from the point of view of an evolved human brain, we ought not to attempt to form firm views as to where the process should destinate or to aim to direct it in a fully top-down manner.

Max Stirner (1972) critiques of the idea of freedom as self-realisation, as the latter suggests some implicit and normative notion of ‘true’ human nature that the process of self-realization is aimed at. Striner critiqued this notion as one that is always and necessary culturally contingent and can act as a coercive force that is itself inherently at odds with genuine freedom. Freedom is being reduced “to [some] kind of spectral ideal that always concealed deeper forms of domination”. (Newmann, 2017, p. 156) A related argument applies to the notion of potential at the collective level. Whenever in human history a person or group were under the conviction that they knew where the journey is meant to take us, bad things ensued. On one hand, such individuals or groups tended to believe themselves in the position of some sort of moral licence that would permit them to disrespect individual rights and freedom. What is more, the attempt to steer this process in a top-down and centralized fashion, based on the assumption of having access to some sort of absolute moral truth, interferes with the functioning of the epistemic and social process as we have described it at length before. If power dominates the equation, reason cannot prevail.

Importantly, the proper functioning does not require that nobody ever forms and defends their own views as to what makes for a good life. On the contrary: the engine of collective progress lies exactly in individuals exploring different views and attempting to find the strongest version and argument for why their perspective is a valuable one to be included in the overall picture. The argument comes down to the following. Given our lack of moral omniscience, we should not try to steer the processes of moral and cultural development in a top-down manner. Furthermore, the efficient and robust functioning of the processes of decentralized information gathering and processing - be that in economics, biology, science or society - is contingent on the integrity of the conditions of operation. In the case of cultural and moral progress, freedom captures a large part of what is required for the process to run effectively. It is as such that freedom is a critical ingredient to the unfolding of humanity’s collective potential.

Conclusion

In this essay, I set out to investigate the question: why do we care about freedom? I identified three possible answers to this question: freedom as a protective factor against oppression and authoritarianism, freedom as an enabler of individual self-realization, and freedom as the enabler of moral and cultural progress and the unfolding of humanity’s collective potential. I have situated the discussion on these reasons in the larger discourse on the nature of freedom and within the history of ideas. I then zoomed in on one reason for caring about freedom specifically - the unfolding of humanity’s collective potential. To do so, I clarified what I mean by humanity’s collective potential, before exploring the mechanisms via which freedom is key to enabling its unfolding. I argued that freedom allows for the production of variation among moral and cultural views, and that, in strengthening freedom - in particular freedom as conceived of by Mill and Dewey -, we strengthen the extent to which reason is the primary factor guiding the differential selection between different cultural and moral views. To make these points, I helped myself to insights from evolutionary theory, economic theory and complexity science.

Last but not least, let me reiterate the reasons why we should care about understanding the processes that guide the human trajectory. We can look at the past and notice the pattern of an expanding moral circle and infer that we are likely ignorant about yet many more issues of moral concern. Or we can look into the future, noticing its potential vastness and anticipating an ever-changing reality that will keep throwing new moral and social questions at us. As humanity expands its range of capabilities through technological progress, we are increasingly able to affect the fate of things at ever faster spatial and temporal horizons. It may thus be high time for us to start a collective conversation about how it is we choose to yield these capabilities. The moral stakes are high. In that, I see reflected our third - if not the strongest - the reason for why we want to live in a society that protects and fosters individual freedom, for the sake of us all, and for the sake of future forms of life.

Nora Ammann, Spring 2022, New College of the Humanities

Footnotes

[1] See Rousseau’s The Social Contract (1762) and Marx’s “On the Jewish question” (1843), among others

[2] This is a reading suggested by Quentin Skinner in his 2013 lecture “A Genealogy of Freedom.” at Queen Mary College, University of London.

[3] See, among others, Sen’s “Equality of What?” (1979) and Nussbaum’s Creating Capabilities (2011).

[4] In fact, many ancient Greek thinkers, including Aristotle, shared a similar view of human nature, even if they may not have theorized human nature in relationship to freedom specifically.

[5] For example the four Geneva Conventions (1864, 1907, 1929, 1949), the establishment of the Red Cross, conventions for the limitation of biological, chemical and other weapons of mass destruction, international laws and courts for the persecution of crimes of war, etc.

[6] For examples of such a bird’s eye view on human history and development, see for example Harari, 2014 or Pinker, 2011.

[7] For examples of cutting-edge research in the life sciences supporting such a view of future posisbilities, see:
Levin, M., “Technological Approach to Mind Everywhere (TAME): an experimentally-grounded framework for understanding diverse bodies and minds”, Frontiers in Systems Neuroscienc (in press), 2022.
Levin, M., "Synthetic Living Organisms: Heralds of a Revolution in Technology & Ethics", Presentation at UCF Center for Ethics, 2021

[8] Whether or not the libertarian argument provides sufficient evidence to justify the convictions and political demands purported by right-wing libertarianism.

References

  1. Arendt, Hannah. “What is freedom?.” in Between     Past and Future. New     York: Viking Press. 1961.

  2. Beckstead, N., Greaves, H., Pummer, T.. "A brief argument for the overwhelming importance of  shaping the far future." in Effective Altruism: Philosophical Issues. 2019. 80–98.

  3. Berlin, Isaiah. “Two conceptions of liberty.” in Four     Essays On Liberty.  Oxford: Oxford University Press. 1969. 118–172.

  4. Dawkins, Richard. "Universal Darwinism." in Evolution from molecules to men. edited by D. S. Bendall. Cambridge: Cambridge University Press. 1983. 403–25.

  5. Harari, Yuval Noah. Sapiens: A brief history of humankind. Random House, 2014.

  6. Hay, William H.. “John Dewey on Freedom and Choice.” The Monist. 48:3. 1964. 346–355.

  7. Hayek, Friedrich. “The Use of Knowledge in Society.” American Economic Review. 35:4. 1945. 519–30.

  8. Hobbes, Thomas. Leviathan, edited by Ian Shapiro. Princeton: Yale University Press. 2010.

  9. Hodgson, Geoffrey M.. “Generalizing Darwinism to Social Evolution: Some Early Attempts.” Journal     of Economic Issues. 39:4.2005. 899–914.

  10. Levin, Micheal. "Synthetic Living Organisms: Heralds of a Revolution in Technology & Ethics." UCF College of Arts & Humanities. November 2, 2021. Talk at a Speaker Series (Ethically Speaking”) run by the UCF Center for Ethics.    

  11. Levin, Micheal. ”Technological Approach to Mind Everywhere (TAME): an experimentally-grounded     framework for understanding diverse bodies and minds.” Frontiers     in Systems Neuroscience. 2022. (in press)

  12. Locke, John. An Essay Concerning Human Understanding, edited by Peter H. Nidditch. Oxford: Oxford University Press. 1975.

  13. MacAskill, W., Bykvist, K., Ord, T..  Moral uncertainty. Oxford: Oxford University Press. 2020.

  14. Marx, Karl. “On the Jewish Question.” in O'Malley, J. (Ed.). Marx:     Early Political Writings.     Cambridge: Cambridge University Press. 1994. 28–56. 

  15. Newman, Saul. “‘Ownness created a new freedom’ - Max Stirner’s alternative conception of     liberty.” Critical  Review of International Social and Political Philosophy.     22:2. 2019. 155–175.    

  16. Nussbaum, Martha. Creating Capabilities. Cambridge,     MA: Harvard University Press. 2011.

  17. Pettit, Philip. A Theory of Freedom, Cambridge: Polity Press. 2001.

  18. Pinker, Steven. The better angels of our nature: The decline of violence in history and     its causes. New York: Viking Books. 2011.

  19. Popper, Karl R.. Logik der Forschung - Zur Erkenntnistheorie der modernen     Naturwissenschaft. Vienna: Verlag Julius Springer. 1935.

  20. Rousseau, Jean-Jacques. “The Social Contract.” in Gourevitch, Victor (Ed.). The     Social Contract and Other Later Political Writings.     Cambridge: Cambridge University Press. 1997.

  21. Sen, Amartya. “Equality of What?”     in McMurrin (ed.). Tanner     Lectures on Human Values.     Cambridge: Cambridge University Press. 1979. 197–220.

  22. Singer, Peter. The Expanding Circle. Oxford: Clarendon Press. 1981.

  23. Skinner, Quentin. "A third concept of liberty." Proceedings of the British Academy.     117. 2002. 237–268.

  24. Skinner, Quentin. “A Geneology of Freedom.” Queen Mary College, University of London. 2013. Lecture.     Last accessed: 04/2022,     https://www.youtube.com/watch?v=yfNkA2Clfr8&ab_channel=NorthwesternU      

  25. Stirner, Max. Der     Einzige und sein Eigentum.     Stuttgart: Philipp Reclam. 1972.

  26. Taylor, Charles. “What’s wrong with negative liberty.” in Philosophy and the Human Sciences: Philosophical Papers.     Vol. 2, Cambridge: Cambridge University Press. 1985. 211–29.

  27. Williams, Evan G.. "The     possibility of an ongoing moral catastrophe." Ethical theory and moral practice.     18:5. 2015. 971–982.

Spring 2022


On the nature of purpose

[cross-posted to LessWrong]

Introduction

Is the concept of purposes, and more generally teleological accounts of behaviour, to be banished from the field of biology? 

For many years - essentially since the idea of Darwinian natural selection has started to be properly understood and integrated into the intellectual fabric of the field -, “yes” was the consensus answer among scholars. Much more recently, however, interest in this question has re-sparked - notably driven by voices that contradict that former consensus. 

This is the context in which this letter exchange between the philosophers Alex Rosenberg and Daniel Dennett is taking place. What is the nature of "purposes"? Are they real? But mostly, what would it even mean for them to be?

In the following, I will provide a summary and discussion of what I consider the key points and lines of disagreements between the two. Quotes, if not specified otherwise, are taken from the letter exchange.

Rosenberg’s crux

Rosenberg and Dennett agree on large parts of their respective worldviews. They both share a "disenchanted" naturalist's view - they believe that reality is (nothing but) causal and (in principle) explainable. They subscribe to the narrative of reductionism which acclaims how scientific progress emancipated, first, the world of physics, and later the chemical and biological one, from metaphysical beliefs. Through Darwin, we have come to understand the fundamental drivers of life as we know it - variation and natural selection. 

But despite their shared epistemic foundation, Rosenberg suspects a fundamental difference in their views concerning the nature of purpose. Rosenberg - contrary to Dennett - sees a necessity for science (and scientists) to disabuse themselves, entirely, from any anthropocentric speech of purpose and meaning. Anyone who considers the use of the “intentional stance” as justified, so Rosenberg, would have to reconcile the following: 

        What is the mechanism by which Darwinian natural selection turns reasons (tracked by the individual as purpose, meaning, beliefs and intentions) into causes (affecting the material world)? 

Rosenberg, of course, doesn't deny that humans - what he refers to as Gregorian creatures shaped by biological as well as cultural evolution - experience higher-level properties like emotions, intentions and meaning. Wilfrid Sellars calls this the "manifest image": the framework in terms of which we ordinarily perceive and make sense of ourselves and the world. [1] But Rosenberg sees a tension between the scientific and the manifest image - one that is, to his eyes, irreconcilable.

"Darwinism is the only game in town", so Rosenberg. Everything can, and ought to be, explained in terms of it. These higher-level properties - sweetness, cuteness, sexiness, funniness, colour, solidity, weight (not mass!) - are radically illusionary. Darwin's account of natural selection doesn't explain purpose, it explains it away. Just like physics and biology, so do cognitive sciences and psychology now have to become disabused from the “intentional stance”. 

In other words, it's the recalcitrance of meaning that bothers Rosenberg - the fact that we appear to need it in how we make sense of the world, while also being unable to properly integrate it in our scientific understanding. 

As Quine put it: "One may accept the Brentano thesis [about the nature of intentionality] as either showing the indispensability of intentional idioms and the importance of an autonomous science of intention, or as showing the baselessness of intentional idioms and the emptiness of a science of intention." [2]

Rosenberg is compelled by the latter path. In his view, the recalcitrance of meaning is "the last bastion of resistance to the scientific world view. Science can do without them, in fact, it must do without them in its description of reality." He doesn't claim that notions of meaning have never been useful, but that they have "outlived their usefulness", replaced, today, with better tools of scientific inquiry.

As I understand it, Rosenberg argues that purposes aren't real because they aren’t tied up with reality, unable to affect the physical world. Acting as if they were real (by relying on the concept to explain observations) is contributing to confusion and convoluted thinking. We ought, instead, to resort to the classical Darwinian explanations, where all behaviour boils down to evolutionary advantages and procreation (in a way that explains purpose away).

Rosenberg’s crux (or rather, my interpretation thereof) is that, if you want to claim that purposes are real - if you want to maintain purpose as a scientifically justified concept, one that is reconcilable with science -, you need to be able to account for how reasons turn into causes

***

Perfectly real illusions

While Dennett recognized the challenges presented by Rosenberg, he refuses to be troubled by them. Dennett paints a possible "third path" to Quine’s puzzle by suggesting to understand the manifest image (i.e. mental properties, qualia) neither as "as real as physics" (thereby making it incomprehensible to science) nor as "radically illusionary" (thereby troubling our self-understanding as Gregorian creatures). Instead, Dennett suggests, we can understand it as a user-illusion: "ways of being informed about things that matter to us in the world (our affordances) because of the way we and the environment we live in (microphysically [3]) are." 

I suggest that this is, in essence, a deeply pragmatic account. (What account other than pragmatism, really, could utter, with the same ethos, a sentence like: "These are perfectly real illusions!") 

While not explicitly saying such, we can interpret Dennett as invoking the bounded nature of human minds and their perceptual capacity. Mental representations, while not representing reality fully truthfully (e.g. there is no microphysical account of colours, just photons), they also aren't arbitrary. They are issued (in part) from reality, and through compression inherent to the mind’s cognitive processes, these representations get distorted such as to form false, yet in all likelihood useful, illusions. 

These representations are useful because they have evolved to be such: after all, it is through the interaction with the causal world that the Darwinian fitness of an agent is determined; whether we live or die, procreate or fail to do so. Our ability to perceive has been shaped by evolution to track reality (i.e. to be truthful), but only exactly to the extent that this gives us a fitness advantage (i.e. is useful). Our perceptions are neither completely unrestrained nor completely constrained by reality, and therefore they are neither entirely arbitrary nor entirely accurate. 

Let’s talk about the nature of patterns for a moment. Patterns are critical to how intelligent creatures make sense of and navigate the world. They allow (what would otherwise be far too much) data to be compressed, while still granting predictive power. But are patterns real? Patterns directly stem from reality - they are to be found in reality - and, in this very sense, they are real. But, if there wasn’t anyone or anything to perceive and make use of this structural property of the real world, it wouldn’t be meaningful to talk of patterns. Reality doesn’t care about patterns. Observers/agents do. 

This same reasoning can be applied to intentions. Intentions are meaningful patterns in the world. An observer with limited resources who wants to make sense of the world (i.e. an agent that wants to reduce sample complexity) can abstract along the dimension of "intentionality" to reliably get good predictions about the world. (Except, "abstracting along the dimension of intentionality" isn't an active choice of the observer, rather than something that emerges because intentions are a meaningful pattern.) The "intentionality-based" prediction does well at ignoring variables that aren't sufficiently predictive and capturing the ones that are, which is critical in the context of a bounded agent.

Another point in case: affordances. In the preface to his book Surfing Uncertainty,  Andy Clark writes : “ [...] different (but densely interanimanted) neural populations learn to predict various organism-salient regularities pertaining at many spatial and temporal scales. [...] The world is thus revealed as a world tailored to human needs, tasks and actions. It is a world built of affordances - opportunities for action  and intervention.“ Just like patterns, the world isn’t made up of affordances. And yet, they are real in the sense of what Dennett calls user-illusions. 

***

The cryptographer’s constraint

Dennett goes on to endow these illusionary reasons with further “realness” by invoking the cryptographer's constraint: 

        It is extremely hard - practically infeasible - to design an even minimally complex system for the code of which there exists more than one reasonable decryption/translation. 

Dennett uses a  simple crossword puzzle to illustrate the idea: “Consider a crossword puzzle that has two different solutions, and in which there is no fact that settles which is the “correct” solution. The composer of the crossword went to great pains to devise a puzzle with two solutions. [...] If making a simple crossword puzzle with two solutions is difficult, imagine how difficult it would be to take the whole corpus of human utterances in all languages and come up with a pair of equally good versions of Google Translate that disagreed!” [slight edits to improve readability]

The practical consequence of the constraint is that, “if you can find one reasonable decryption of a cipher-text, you’ve found the decryption.” Furthermore, this constraint is a general property of all forms of encryption/decryption.  

Let’s look at the sentence: “Give me a peppermint candy!”

Given the cryptographer’s constraint, there are, practically speaking, very (read: astronomically) few plausible interpretations of the words “peppermint”, “candy”, etc. This is at the heart of what makes meaning non-arbitrary and language reliable. 

To add a bit of nuance: the fact that the concept "peppermint" reliably translates to the same meaning across minds requires iterated interactions. In other words, Dennett doesn’t claim that, if I just now came up with an entirely new concept (say "klup"), its meaning would immediately be unambiguously clear. But its meaning (across minds) would become increasingly more precise and robust after using it for some time, and - on evolutionary time horizons - we can be preeetty sure we mean (to all practical relevance) the same things by the words we use.

But what does all this have to do with the question of whether purpose is real? Here we go: 

The cryptographer's constraint - which I will henceforth refer to as the principle of pragmatic reliability [4]- is an essential puzzle piece to understanding what allows representations of reasons (e.g. a sentence making a claim) to turn into causes (e.g. a human taking a certain action because of that claim). 

We are thus starting to get closer to Rosenberg’s crux as stated above: a scientific account for how reasons become causes. There is one more leap to take.

***

Reasons-turning-causes

Having invoked the role of pragmatic reliability, let’s examine another pillar of Dennett's view - one that will eventually get us all the way to addressing Rosenberg’s crux. 

Rosenberg says: "I see how we represent in public language, turning inscriptions and noises into symbols. I don’t see how, prior to us and our language, mother nature (a.k.a Darwinian natural selection) did it." 

What Rosenberg conceives to be an insurmountable challenge to Dennett’s view, the latter prefers to walk around rather than over, figuratively speaking. As developed at length in his book From Bacteria to Bach and Back, Dennett suggests that "mother nature didn’t represent reasons at all", nor did it need to. 

First, the mechanism of natural selection uncovers what Dennett calls "free-floating rationales” - reasons that existed billions of years before and independent of reasoners. Only when the tree of life grew a particular (and so far unique) branch - humans, together with their use of language -, these reasons start to get represented

"We humans are the first reason representers on the planet and maybe in the universe. Free-floating rationales are not represented anywhere, not in the mind of God, or Mother Nature, and not in the organisms who benefit from all the arrangements of their bodies for which there are good reasons. Reasons don’t have to be representations; representations of reasons do."

This is to say: it isn't exactly the reasons, so much as their representations, that become causes.

Reasons-turning-causes, so Dennett, is unique to humans because only humans represent reasons. I would nuance that the capacity to represent lives on a spectrum rather than a binary. Some other animals seem to be able to do something like representation, too. [5] That said, humans remain unchallenged in the degree two which they have developed the capacity to represent (among the forms of life we are currently aware of).  

"Bears have a good reason to hibernate, but they don’t represent that reason, any more than trees represent their reason for growing tall: to get more sunlight than the competition." While there are rational explanations for the bear’s or the tree’s behaviour, they don’t understand, think about or represent these reasons. The rationale has been discovered by natural selection, but the bear/tree doesn’t know - nor does it need to - why it wants to stay in their dens during winter.  

Language plays a critical role in this entire representation-jazz. Language is instrumental to our ability to represent; whether as necessary precursor, mediator or (ex-post) manifestation of that ability remains a controversial question among philosophers of language. Less controversial, however, is the role of language in allowing us to externalize representations of reasons, thereby “creating causes” not only for ourselves but also for people around us. Wilfrid Sellars suggested that language bore what he calls “the space of reasons” - the space of argumentation, explanation, query and persuasion. [6] In other words, language bore the space in which reasons can become causes. 

We can even go a step further: while acknowledging the role of natural selection in shaping what we are - the fact that the purposes of our genes are determined by natural selection -, we are still free to make our own choices. To put it differently: "Humans create the purposes they are subject to; we are not subject to purposes created by something external to us.” [7] 

In From Darwin to Derrida: Selfish Genes, Social Selves, and the Meanings of Life, David Haigh argues for this point of view by suggesting that there does not need be full concordance, nor congruity, between our psychological motivations (e.g. wanting to engage in sexual activity because it is pleasurable, wanting to eat a certain food because it is tasty) and the reasons why we have those motivations (e.g. in order to pass on our genetic material).

There is a piece of folk wisdom that goes: “the meaning of life is the meaning we give it”. Based on what has been discussed in this essay, we can see this saying in a different, more scientific light: as a testimony of the fact that we humans are creatures that represent meaning, and by doing so we turn “free-floating rationales” into causes that govern our own. 

Thanks to particlemania, Kyle Scott and Romeo Stevens for useful discussions and comments on earlier drafts of this post. 
 

***

[1] Sellars, Wilfrid. "Philosophy and the scientific image of man." Science, perception and reality 2 (1963).
Also see: deVries, Willem. "Wilfrid Sellars", The Stanford Encyclopedia of Philosophy (Fall 2020 Edition). Retrieved from: https://plato.stanford.edu/archives/fall2020/entries/sellars/

[2]  Quine, Willard Van Orman. "Word and Object. New Edition." MIT Press (1960).

[3]  I.e. the physical world at the level of atoms 

[4]  AI safety relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research. 

Language works, evidenced by the striking success of human civilisations made possible through advanced coordination which in return requires advanced communication. (Sure, humans miscommunicate what feels like a whole lot, but in the bigger scheme of things, we still appear to be pretty damn good at this communication thing.)  

Notably, language works without there being theoretically air-tight proofs that map meanings on words. 

Right there, we have an empirical case study of a symbolic system that functions on a (merely) pragmatically reliable regime. We can use it to inform our priors on how well this regime might work in other systems, such as AI, and how and why it tends to fail.

One might argue that a pragmatically reliable alignment isn’t enough - not given the sheer optimization power of the systems we are talking about. Maybe that is true; maybe we do need more certainty than pragmatism can provide. Nevertheless, I believe that there are sufficient reasons for why this is an avenue worth exploring further. 

Personally, I am most interested in this line of thinking from an AI ecosystems/CAIS point of view, and as a way of addressing (what I consider a major challenge) the problem of the transient and contextual nature of preferences. 

[5] People wanting to think about this more might be interesting in looking into vocal (production) learning - the ability to “modify acoustic and syntactic sounds, acquire new sounds via imitation, and produce vocalizations”. This conversation might be a good starting point.

[6] Sellars, Wilfrid. In the space of reasons: Selected essays of Wilfrid Sellars. Harvard University Press (2007).

[7] Quoted from:  https://twitter.com/ironick/status/1324778875763773448 

January 2021: Chandoling, donkeys and utilities

January 2021: Chandoling, donkeys and utilities