Automating Science with AI? An incomplete account

June 24, 2023 by Nora Ammann in Essay

Introduction

Can science be automated by artificial intelligence? This is a big question, with many moving parts. In this essay, in order to make our work a bit more tractable, while still logging progress towards the big science automation question, I want to ask a somewhat more narrow question— the question of ‘in-principle possibility’. Specifically, I ask:

Can AI, given the intrinsic logic of how it works, automate Science, given its intrinsic logic of functioning?

This question contains a number of big words that need clarifying. What is a functional logic, and how does it help us to answer whether it is possible to automate science by means of AI? The idea is to characterize and then contrast what one might consider the “two sides of an equation”, i.e., the respective functional logics of science-making and AI. On one end, we need to understand the functioning of science-making. Insofar as the scientific endeavor is successful in a way that, say, mere doing or mere thinking is not, what is it about science-making that can account for that? On the other end, we need to characterize the functioning of (machine learning (ML)-based) AI. How does it work and what performances can it (not) achieve, in principle and under specific conditions? Then, we can contrast these two sets of “functional logics” and evaluate whether the functional logic of ML is able to implement the functional logic of science-making, or whether there are (fundamental or practical) limitations which would preclude this. The idea is that, if the functional logic of science is not expressible by the functional logic of ML, we can conclude that, at least within the ML paradigm, full-scale AI-automated science (what I henceforth simply refer to as “science-AI”) is not feasible.

I proceed as follows: I have just introduced the high-level approach I am taking. Next, and before I can properly dive into the discussion, I briefly discuss what might motivate us to ask this question, and then make a few clarifying remarks concerning the terminology used. With this out of the way, I can start evaluating the arguments. In Part 1, I start by laying out a picture which aims to reject the science-AI conjecture. The argument is, in short, that science requires a) “strong generalization”, i.e., the ability to come up with sensible abstractions (i.e., inductive reasoning), and b) “deductive reasoning”, i.e., the ability to use those abstractions sensibly and reliably. In both cases, there are reasons to doubt whether the functional logic of ML systems allows them to do this, or do this sufficiently well. Parts 2 and 3 explore two plausible ways to rebut this skeptical view. The first one explores whether the fact that ML systems are (at the limits) universal function approximators can recover the possibility of science-AI. The second one observes that humans too show cognitive limitations (arguably not dissimilar from the ones I will discuss in the context of ML systems), while still being capable of science-making (arguably). I will conclude that, based on the objections raised in Parts 2 and 3, the argument against the possibility of science-AI does not succeed.

Motivations

Why might one be interested in investigating the possibility of science-AI?

First, we may hope to learn things not only about the current and prospective capability level (and limits thereof) of AI systems, but, quite plausibly, also about the nature of science-making itself. As such, this question appears to have intellectual merits on its own grounds.

Furthermore, the question is of interest on more practical grounds, too. The scientific revolution brought about a massive acceleration in humans’ ability to produce knowledge and innovation, which in turn has led to astonishing improvements to the quality of human life. As such, the prospect of automating scientific progress appears promising. At the same time, AI capable of automating science may also be dangerous. Afterall, scientific progress also produced technologies that pose immense risks to the wellbeing and survival of humanity, including nuclear weapons and the ability to engineer pathogens or technologies that facilitate mass surveillance and oppressions by states or other powerful actors (Bostrom, 2019). As such, if we knew that science-AI was possible, this ought to motivate us to adopt caution and start working on improved societal and governance protocols to help use these capabilities safely and justly.

Finally, there are serious worries that the growing adoption of AI applications contribute to an “epistemic crisis”, which poses a threat (in particular) to decision-making in democratic societies (e.g., Seger et al., 2020). Among others, these systems can be used to generate text, images, video, and voice recordings which do not necessarily represent reality truthfully and which people might interpret as real even if fake. As such, if we were capable of building AI systems that systematically skew towards truth (as opposed to, say, being riddled with the sort of confabulations that we can see in state-of-the-art language models (Ziwei et al., 2022)), this may help decrease such epistemic risks.

Some clarifications

As mentioned, a few brief points of clarification are in order before I can properly dive into discussing the (im)possibility of science-AI; in particular, with respect to how I will be using the terms “science”, “AI”, and “automation”.

First, there does not exist broad consensus among the relevant epistemic community as to what the functional logic (or logics) of science-making is, nor can it be taken as a given that there exists a singular such functional logic. For example, philosophers like Karl Popper have questioned the validity of any use of inductive reasoning in science and instead put front and center the deductive process of falsification (Popper, 1934; Popper, 1962). In contrast, other accounts of science-making do very much rely on inductive processes, such as Bayesian statistics, in order to evaluate how much a given hypothesis is supported by the available evidence (e.g., Sprenger, Hartmann, 2019). In the context of this essay, however, I am not trying to settle the question of the correct account of the scientific method; I instead adopt a degree of methodological pluralism and evaluate the conceptual plausibility of end-to-end science-AI in accordance with several of the most prominent accounts of the scientific method.

Second, I need to clarify what I do and don’t mean by artificial intelligence. The term in principle refers to a putative technology that implements intelligent behavior through artificial means (e.g., on silicon); it does not, on its own, specify how it does it. To clarify matters in the context of this essay, I (unless otherwise specified) always refer to implementations of AI that reside within the paradigm of ML. I believe this is a justified assumption to make because ML is the currently dominant paradigm in the field of AI, it is the most successful paradigm to date, and there is no particular reason to expect this trend will halt in the near future. Now, having conditioned a technical paradigm for AI, we can reason more substantively about the possibilities and limitations of ML-based AI systems when it comes to science-making by drawing on fields such as ML theory, optimization theory, etc.

Third, when talking about the automation of science, one might have in mind partial automation (i.e., the automation of specific tasks that are part of but don’t comprise the whole of the scientific enterprise), or a full, end-to-end automation of the scientific process by means of AI. In the context of this essay, I primarily focus on the conceptual plausibility of the latter: end-to-end science-AI. The line of demarcation I wish to draw is not about whether the automation involves a single or multiple (e.g., an assembly of) AI applications, but rather whether human scientists are still a required part of the process (such as is the case for what I call parietal automation) or not (such as is the case for what I call end-to-end automation or end-to-end science-AI).

With this out of the way, it is now time to dive into the discussion.

Part 1: Contra science-AI

In this section, I lay out the case against the possibility of science-AI. In short, I argue that autonomous scientific reasoning requires i) the ability to form sensible abstractions which function as bases for generalizing knowledge from past experience to novel environments, and ii) the ability to use such abstractions reliably in one’s process of reasoning, thereby accessing the power of deductive or compositional reasoning. However, or so the argument goes, ML systems are not appropriately capable of forming such abstractions and of reasoning with them.

First, let me clarify the claim that abstractions and deductive reasoning play central roles in science-making. Generalization refers to the ability to apply insights to a situation that is different to what has previously been encountered. Typically, this form of generalization is made possible by means of forming the “right” abstractions, i.e., ones that are able to capture those informational structures that are relevant to a given purpose across different environments (Chollet, 2019). When I invoke the concept of a dog, for example, I don’t have a specific dog in mind, although I could probably name specific dogs I have encountered in the past, and I could also name a number of features that dogs typically (but not always) possess (four legs, fur, dog ears, a tail, etc.). The “dog” case could be understood as an example of relatively narrow abstraction. Think now, instead, of the use of concepts like “energy”, “mass”, or “photon” in physics, or of a “set” or “integration” or “equation” in mathematics. Those concepts are yet farther removed from any specific instances of things which I can access directly via sensory data. Nevertheless, these abstractions are extremely useful in that they allow me to do things I couldn’t have done otherwise (e.g., predict the trajectory of a ball hit at a certain angle with a certain force, etc.).

Scientific theories critically rely on abstraction because theories are expressed in terms of abstractions and their functional relationship to each other. (For example, the law of the conservation of energy and mass describes the relationship between two abstractions—“energy” and “mass”; in particular, this relationship can be expressed as: E= mc2). The use of abstractions is what endows a theory with explanatory power beyond the merely specific, contingent example that has been studied empirically. At the same time, the usefulness of a theory is dependent on the validity of the abstractions it makes use of. A theory that involves abstractions that do not carve reality sufficiently at its joints sufficiently will very likely fail to make reliable predictions or produce useful explanations.

Furthermore, the ability to form valid abstractions constitutes the basis for a second critical aspect of scientific cognition, namely, deductive and compositional reasoning. By deductive reasoning, I am referring to such things as deductive logic, arithmetics, sorting a list, and other tasks that involve “discrete” representations and compositionality (Chollet, 2020). In the case of science-making in particular, falsification or disconfirmation play a central role and are established by means of deductive reasoning such as in the hypothetico-deductive account (e.g., Sprenger, 2011; Hempel 1945). The ability to use, or reason over, abstractions allows for so-called “combinatorial generalization”. It is this compositionality of thought that, it has been argued, is a critical aspect of human-level intelligence by giving the reasoner access to a schema of “infinite use of finite means” (Humboldt, 1836; Chomsky, 1965).

Having made the case for why science-making relies on the ability to i) form and ii) reason with abstractions, I can now investigate the arguments at hand for believing ML systems are not appropriately capable of i) and ii).

Reasons for skepticism come from empirical observation (i.e., using the state-of-the-art models and seeing how they “break”), theoretical arguments, and expert judgment. In terms of the latter, Cremer (2021) surveys “expert disagreement over the potential and limitations of deep learning”. With expert opinions diverging, Cremer identifies a set of plausible origins of said disagreements, centrally featuring questions concerning the ability of artificial neural networks to “form abstraction representations effectively” or the extent of their ability to generalize as key origins of those disagreements (p.7).

To elaborate more on the theoretical arguments for ML skepticism, it is worth exploring the way in which ML methods face challenges in their ability to generalize (e.g., Chollet, 2017; Battaglia, 2019; Cartuyvels et al., 2021; Shanahan, Mitchell, 2022). ML uses statistical techniques to extract (“learn”) patterns from large swaths of data. It can be understood as aiming to approximate the underlying function which generated the data it is getting trained on. However, this interpolative learning leads to brittleness if the systems get deployed outside of the distribution of the training data. This phenomenon is well known in the ML literature and usually discussed under terms such as out-of-distribution (OOD) generalization failure. Under distributional shift (i.e., cases where the data depict a different distribution compared to the training environment), the approximation function the model learned under training is no longer guaranteed to hold, leading to a generalization failure. The risk of failures to generalize, so the argument goes, limits the potential to use ML for end-to-end science automation because we cannot sufficiently trust the soundness of the process.

Furthermore, ML systems are notoriously bad at discrete tasks (see, e.g., Marcus, 2018; Cartuyvels et al., 2021; etc.). While state-of-the-art ML systems are not incapable of (and are getting better at), say, simple forms of arithmetics (e.g., adding up two-digit numbers), it is noteworthy that tasks that take only a few lines of code to automate reliably in the paradigm of classical programming have remained outside of the reach of today’s several-billion-parameter-sized ML models. To quote François Chollet, a prominent AI researcher, deliberately misquoting Hinton, expert and pioneer of deep learning: “Deep learning is going to be able to do everything perception and intuition, but not discrete reasoning” (Chollet, 2020). This unreliability in deductive reasoning exhibited by ML systems is another reason for skepticism towards the possibility of end-to-end science-AI.

To summarize the argument, current ML-based AI systems appear to face limitations with respect to their ability to achieve “broad” generalization, to form sensible abstractions, and to use those abstractions reliably. Given these limitations, society would be ill-advised to rely on theories, predictions, and explanations proposed by science-AI. Of course, and this is worth noting, end-to-end science-AI is a high bar. The view presented above is entirely compatible with predicting that AI systems will be used to automate or augment many aspects of science-making, and it may not require a lot of places where humans “patch” the process.

Having elaborated on the case against the possibility of science-AI, I now move to investigating two plausible lines of reasoning aiming to defeat the suggested conclusion.

Part 2: Universal function approximation

The first argument that I will discuss against the purported limitations of ML builds on the claim that ML systems are best understood as universal function approximators (UFA). From this follows the conjecture that there must exist a certain level of computational power at which ML systems are able to sufficiently approximate the science-making function.

In short, UFA refers to the property of neural networks that, for whatever function f(x), there exists a neural network that can approximate said function. There exists a mathematical theorem proving a version of this property for different cases, e.g., for neural networks of arbitrary width (i.e., arbitrary number of neurons) or arbitrary depth (i.e., arbitrary number of layers), as well as in bounded cases (e.g., Hornik, Stinchcombe, White, 1989; Gripenberg, 2003).

Let’s say we accept that ML systems are accurately understood as UFAs, and that, given that, ML systems are able, in principle, to implement the functional logic of science-making. However, this picture raises an important question: (when) is approximation enough?

There is, after all, a difference between “the thing, precisely” and “the thing, approximately”. Or is there? Imagine you found a model M1 which approximates function F with an error of ε1. And imagine that the approximation is insufficient—that ε1 is too large for M1 to properly fulfill the function of F. Well, in that case, on grounds of the universal approximation theorem, there exists another model M2 with ε2<ε1. If ε2 is still too big, one can try M3, and so on. As such, you can, in principle, get arbitrarily close to “the thing”, or in other words, the difference between “the thing” and its approximation get arbitrarily small in the limits.

One might still object to this conceptual argument with a practical worry. It may be prohibitively expensive (in terms of energy, model size/chips, or time) to get arbitrarily close to the “true” function of science-making. However, I suggest we have pragmatic reasons to not be too worried by this concern. After all, we can hardly expect that human scientists always pick out the right abstractions when constructing their theories. More so, most feats of engineering rely on theories that we know use abstractions that aren’t completely true, and yet have been shown to be “sufficiently” true (in a pragmatist sense) in that they produce useful epistemic products (including bridges that don’t collapse and airplanes that stay in the air). For example, the framework of classical physics was, in some sense, proven wrong by Einstein’s theories of relativity. And yet, most engineering programs are entirely happy to work within the classical framework. As such, even if ML systems “only” approximate the function of science-making, we have all the reasons to expect that they are capable of finding sufficient approximations such that, for all practical purposes, they will be capable of science-making.

Finally, science-AI must not look like a monolithic structure consisting of a single ML model and its learned behavior policy. Instead, we can imagine a science-AI assembly system which, for example, trains "abstraction forming" and "deductive reasoning" circuits separately, and which are later combined to interface with each other autonomously. This idea of a compositional science-AI shares a resemblance with the vision of a Society of Minds sketched by Marvin Minsky in 1986, where he argues that human intelligence emerges from interactions of many simple “agents” with narrow skills or functions. Moreover, we can even use ML to discover which forms of compositionality (i.e., “task division”) might be best suitable for a science-AI assembly, insofar as my earlier vague suggestion of integrating an "abstraction forming" and "deductive reasoning" circuit might not be the ideal solution. There already exist examples of current-day ML systems trained based on similar ideas, e.g., Gururangan et al., 2023.

To summarize, I have argued that UFA theorems prove that AI systems—contra the skeptical picture laid out in Part 1—are in principle able to implement science-making. I further provided arguments for why we can expect this technology to not only be conceptually feasible but also practically plausible.

Part 3: The possibility of science-making despite limitations

Let us now turn to the second argument against the skeptical picture proposed in Part1. This argument starts by concededing that ML systems face relevant limitations in their ability to form and reliably use abstractions. However, the argument continues, so do humans (and human scientists), and still, they are capable of doing science (arguably). Thus, the argument about inductive limits of ML systems cannot, on its own, defeat the possibility of science-AI.

To unravel this argument, let us first discuss the claim that both ML and human “reasoners” are limited, and limited in relevantly similar ways. I have already laid out the case for limitations in ML which arise from the fundamentally continuous and inferential nature of ML. According to our current best theories of human cognition—such as the Bayesian Brain Hypothesis (e.g., Deneve, 2005; Doya, et al., 2007; Knill, Pouget, 2004), Predictive Processing (e.g., Clark, 2013; Clark, 2015; Kanai et al., 2015), and most recently, Active Inference (Parr, Pezzullo, Friston, 2022)—the brain can essentially be understood as a “large inference machine”. As such, the low-level implementation of human reasoning is understood to be similarly continuous and inferential.

This is, of course, not to deny that humans exhibit higher-level cognitive skills, such as verbal reasoning or metacognition, which are correctly understood to exceed “mere statistics”. Rather, the point I am trying to make is that these higher-level capabilities emerge from the low-level (continuous and inferential) implementation of the neural make-up of the brain. This serves as an existence proof that this sort of low-level implementation can, under certain circumstances, give rise to what one may consider to be more typically associated with “2-type” reasoning (Kahneman, 2017). As such, we have shown that the argument presented in Part 1—that, given the functional logic of modern-day ML, AI will not be able to implement all necessary aspects of scientific reasoning (such as generalization or deductive reasoning)—does not prove what it was meant to prove (the impossibility of science-AI).

Furthermore, it also shows that a cognitive process must not be flawless in order to be able to implement science-making. Human reasoning is, of course, not without flaws. For example, human scientists regularly pick “wrong” abstractions (e.g., “phlogiston”, “ether”—to name only a few famous cases from the history of science). Or, human scientists are not immune to motivated reasoning and cognitive biases such as confirmation bias or hypothesis myopia (Nuzzo, 2015). The point is, despite these flaws in human reasoning—be that from structural limitations or merely computational boundedness—they have not prevented humans from developing and conducting science successfully.

This last point raises an interesting question about the nature of science-making. Given the plentiful sources of bounded, flawed, and motivated reasoning depicted by human scientists, how are they still capable of producing scientific progress? One way to make sense of this (plausibly surprising) observation is to understand science as essentially a collective endeavor. In other words, individual scientists don’t do science, scientific communities do. The idea is that science-making—a process that systematically skews towards the truth—emerges from implementing a collective “protocol”, by means of “washing out”, so to speak, the biased reasoning present at the level of individual scientists. Bringing this back to the question of science-AI, this raises the question whether we best think of science-AI as a single system approximating ideal scientific reasoning, or a system assembly where each individual system can have flaws in their epistemic processes, but the way they all interact produces behavior equivalent to science-making—just like the case for human scientists interacting today.

To summarize, the argument presented here is two-fold: on one hand, the human reasoning ability is implemented by a continuous and inferential low-level process, serving as existence proof that such processes (which we can also find in machine learning) are in principle able to implement discrete tasks with adequate levels of robustness. On the other hand, science-making is implemented by fallible human reasoners who make mistakes similar in type to the ones discussed in Part 1 (e.g., picking leaky abstractions or misgeneralizing them), serving as an existence proof that processes which are fallible in this way can still implement science-making.

Conclusion

In this essay, I explored the conceptual possibility of end-to-end science-AI, i.e., an AI system or assembly of systems which is able to functionally implement science-making with no help from humans (post-training). In Part1, I first made the case that end-to-end science-AI is not possible on the basis of noticing limitations of ML systems when it comes to their ability to form useful abstractions and to use these abstractions reliably. I argued that ML, given that it is based on interpolative learning from a given set of (training) data, faces important challenges in terms of its ability to generalize outside of its training data in the case of known or unknown distributional shifts upon deployment. Furthermore, I invoked the fact that ML systems are currently unreliable (and at the very least ineffective) at “discrete” types of reasoning. After developing this skeptical picture, I explored two sets of arguments which seek to recover the possibility of science-AI.

First, I argued that ML systems are universal function approximators, and that in that capacity, there must exist a computational threshold at which they are able to implement the functional logic of science. Furthermore, I argued that there are pragmatic reasons to accept that this is not only conceptually possible but practically feasible insofar as approximation is enough, as evidenced by successful scientific and engineering feats, as a norm, relying “merely” on approximate truths.

Second, I compared ML systems to human scientists claiming that, on one hand, the neurological implementation of human reasoning is structurally similar to ML, thus suggesting that ML methods can be expected to successfully scale to “higher-level” reasoning capabilities (including ones that appear particularly critical in science-making). On the other hand, the comparison also reveals how humans are capable of doing science despite the fact that the reasoning of individual humans is flawed in important ways. As such, some amount of brittleness in ML systems does not mean that they cannot successfully implement the scientific process. As such, the arguments discussed in Parts 2 and 3 succeed at defending the possibility of science-AI against the skeptical view laid out in Part 1. Beyond defending the conceptual possibility claim, the arguments also provide some support for the concrete, practical plausibility of science-AI.

Let us conclude with one more evocative thought based on the analogy between ML and scientific reasoning explored over the course of this essay. Concerns about the generalization limits of ML systems pose an important problem: we need to be able to trust the systems we’re using, or—rather—we want to be able to know when and how much we are justified in trusting these systems. Epistemic justification—which I am taking, for the current purposes, to be a function of the reliability of a given epistemic process—is always defined relative to a given domain of application. This suggests that we want AI systems (among other things) to contain meta-data about their domain of applicability (i.e. the domain within which their generalization guarantees hold). What I want to suggest here is that the same insight also applies to scientific theories: we should more consistently strive to develop scientific theories which are —as an integral part of what it is to be a scientific theory—transparent about their domain of applicability, relative to which the theories does or not claim its predictions will generalize.

References

Battaglia, P. W., et al. (2018). Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.

Bender, E. M., et al. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Association for Computing Machinery. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).

Beren, M., et al. (2021). Predictive coding: a theoretical and experimental review. arXiv preprint arXiv:2107.12979.

Bostrom, N. (2019). The vulnerable world hypothesis. Global Policy, 10(4), 455-476.

Chang, H. (2022). Realism for Realistic People: A New Pragmatist Philosophy of Science. Cambridge University Press.

Chollet, F. (2017). The limitations of deep learning. Deep learning with Python. Retrieved from: https://blog.keras.io/the-limitations-of-deep-learning.html

Chollet, F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.

Chollet, F. (2020). Why abstraction is the key to intelligence, and what we’re still missing. Talk at NeurIPS 2020. Retrieved from: https://slideslive.com/38935790/abstraction-reasoning-in-ai-systems-modern-perspectives

Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press.

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204. doi:10.1017/S0140525X12000477.

Clark, A. (2015). Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford Academic.

Cremer, C. (2021). Deep limitations? Examining expert disagreement over deep learning. Progress in Artificial Intelligence. 10. 10.1007/s13748-021-00239-1.

Cartuyvels, R., Spinks, G., & Moens, M. F. (2021). Discrete and continuous representations and processing in deep learning: Looking forward. AI Open, 2, 143-159.

Deneve, S. (2004). Bayesian inference in spiking neurons. Advances in neural information processing systems, 17.

De R., & Henk W. (2017). Understanding Scientific Understanding. New York: Oup Usa.

Doya, K., Ishii, S., Pouget, A., & Rao, R. P. (Eds.). (2007). Bayesian brain: Probabilistic approaches to neural coding. MIT press.

Gripenberg, G. (2003). Approximation by neural networks with a bounded number of nodes at each level. Journal of Approximation Theory. 122 (2): 260–266.

Gururangan, S., et al. (2023). Scaling Expert Language Models with Unsupervised Domain Discovery. arXiv preprint arXiv:2303.14177.

Hempel, C. G. (1945). Studies in the Logic of Confirmation (II.). Mind, 54(214), 97–121.

Hendrycks, D., et al. (2020). The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 8340-8349).

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359-366.

Humboldt, W. (1999/1836). On Language: On the diversity of human language construction and its influence on the mental development of the human species. Cambridge University Press.

Ji, Z., et al. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38.

Kahneman, D. Thinking, fast and slow. 2017.

Kanai, R., et al. (2015). Cerebral hierarchies: predictive processing, precision and the pulvinar. Philosophical Transactions of the Royal Society B, 370, 20140169.

Knill, D. C., & Pouget, A. (2004). The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation. TRENDS in Neurosciences, 27(12), 712–719.

Marcus, Gary. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

Mitchell, M. (2021). Abstraction and analogy‐making in artificial intelligence. Annals of the New York Academy of Sciences, 1505(1), 79-101.

Nuzzo, R. (2015). How Scientists Fool Themselves — and How They Can Stop. Nature. 526, 182. https://doi.org/10.1038/526182a.

Parr, T., Pezzulo, G., & Friston, K. J. (2022). Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press.

Peters, U., et al. (2022). Generalization Bias in Science. Cognitive Science, 46: e13188.

Popper, K. (1934). The Logic of Scientific Discovery. London, England: Routledge.

Popper, K. (1962). Conjectures and Refutations: The Growth of Scientific Knowledge. London, England: Routledge.

Seger, E., et al. (2020). Tackling threats to informed decision-making in democratic societies: promoting epistemic security in a technologically-advanced world. The Alan Turing Institute.

Shanahan, M, & Melanie M. (2022). Abstraction for deep reinforcement learning. arXiv preprint arXiv:2202.05839.

Sprenger, J. (2011). Hypothetico-Deductive Confirmation. Philosophy Compass, 6: 497-508.

Sprenger, J., & Hartmann, S. (2019). Bayesian Philosophy of Science: Variations on a Theme by the Reverend Thomas Bayes. Oxford and New York: Oxford University Press.

Trask, A., et al. (2018). Neural Arithmetic Logic Units. Advances in neural information processing systems, 31.

Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134-1142.

Towards a Philosophy of Science of the Artificial

June 24, 2023 by Nora Ammann in Essay

One common way to start an essay on the philosophy of science is to ask: ”Should we be scientific realists?” While this isn’t precisely the question I’m interested in here, it is the entry point of this essay. So bear with me.

Scientific realism, in short, is the view that scientific theories are (approximately) true. Different philosophers have proposed different interpretations of “approximately true”, e.g., as meaning that scientific theories “aim to give us literally true stories about what the world is like” (Van Fraassen, 1980, p. 9), or that “what makes them true or false is something external—that is to say, it is not (in general) our sense data, actual or potential, or the structure of our minds, or our language, etc.” (Hilary Putnam, 1975, p. 69f), or that their terms refer (e.g. Boyd, 1983), or that they correspond to reality (e.g. Fine, 1986).

Much has been written about whether or not we should be scientific realists. A lot of this discussion has focused on the history of science as a source for empirical support for or against the conjecture of scientific realism. For example, one of the most commonly raised argument in support of scientific realism is the so-called no miracle argument (Boyd 1989): what can better explain the striking success of the scientific enterprise than that scientific theories are (approximately) true—that they are “latching on” to reality in some way or form. Conversely, an influential arguments against scientific realism is the argument from pessimistic meta-induction (e.g. Laudan 1981) which suggests that, given the fact that most past scientific theories have turned out to be false, we should expect our current theories to face the fate of being proven false (as opposed to approximately true).

In this essay, I consider a different angle on the discussion. Instead of discussing whether scientific realism can adequately explain what we empirical understand about the history of science, I ask whether scientific realism provides us with a satisfying account of the nature and functioning of the future-oriented scientific enterprise—or, what Herbert Simon (1996) called the sciences of the artificial. What I mean by this, in short, is that an epistemology of science needs to be able to account for the fact that the scientist's theorising affects what will come into existence, as well as for how this happens.

I proceed as follows. In part 1, I explain what I mean by the sciences of the artificial, and motivate the premise of this essay—namely that we should aspire for our epistemology of science to provide an adequate account not only of the inquiry into the “natural”, but also into the nature and coming-into-being of the “artificial”. In part 2, I analyse whether or not scientific realism provides such an account. I conclude that its application to the sciences of the artificial world exposes scientific realism, while not as false, as insufficiently expressive. In part 3, we briefly sketch how an alternative to scientific realism—pragmatism—might present a more satisfactory account. I conclude with a summary of the arguments raised and the key takeaways.

Part 1: The need for a philosophy of science of the artificial

The Sciences of the Artificial—a term coined by Herbert Simon in the eponymous book—refer to domains of scientific enterprise that deal not only with what is, but also with what might be. Examples include the many domains of engineering, medicine or architecture but also fields like psychology, economics or administration. What characterises all of these domains is their descriptive-normative dual nature. The study of what is is, both, informed and mediated by some normative ideal(s). Medicine wants to understand the functioning of the body in order to bring about and relative to a body’s healthy functioning; civil engineering studies materials and applied mechanics in order to build functional and safe infrastructure. In either case, at the end of the day, the central subject of their study is not a matter of what is—it does not exist (yet)—, instead it is the goal of their study to bring it into existence. In Simon’s words, the sciences of the artificial are “concerned not with the necessary but with the contingent [sic] not with how things are but with how they might be [sic] in short, with design” (p. xii).

Some might doubt that a veritable science of the artificial exists. After all, is science not quintessentially concerned with understanding what is—the laws and regularity of the natural world? However, Simon provides what I think is a convincing case that, not only is there a valid and coherent notion of a “science of the artificial”, but also that one of its most interesting dimensions is precisely its epistemology. In the preface to the second edition of the book, he writes: “The contingency of artificial phenomena has always created doubts as to whether they fall properly within the compass of science. Sometimes these doubts refer to the goal-directed character of artificial systems and the consequent difficulty of disentangling prescription from description. This seems to me not to be the real difficulty. The genuine problem is to show how empirical propositions can be made at all about systems that, given different circumstances, might be quite other than they are.” (p. xi).

As such, one of the things we want from our epistemology of science—a theory of the nature of scientific knowledge and the functioning of scientific inquiry—is to provide an adequate treatment of the science of the artificial. It ought, for example, to allow us to think not only about what is true but also what might be true, and how our own theorising affects what comes into being (i.e., what comes to be true). By means of metaphor, insofar as the natural sciences are typically concerned with the making of “maps”, the sciences of the artificial are interested in what goes into the making of “blueprints”. In the philosophy of science, we then get to ask: What is the relationship between maps and blueprints? For example, on one hand, the quality of our maps (i.e., scientific understanding) shape what blueprints we are able to draw (i.e., the things we are able to build). At the same time, our blueprints also end up affecting our maps. As Simon puts it: “The world we live in today is much more a man-made, or artificial, world than it is a natural world. Almost every element in our environment shows evidence of human artifice.“ (p. 2).

One important aspect of the domain of the artificial is that there is usually more than one low-level implementation (and respective theoretical-technological paradigm) through which desired function can be achieved. For example, we have found several ways to travel long distances (e.g. by bicycle, by car, by train, by plane). What is more, there exist several different types of trains (e.g. coal-powered, steam-powered or electric trains; high-speed trains using specialised rolling stocks to reduce friction, or so-called maglevs trains which use magnetic levitation). Most of the time, we must not concern ourselves with the low-level implementation because of their functional equivalency. Precisely because they are designed artefacts, we can expect them to depict largely similar high-level functional properties. If I want to travel from London to Paris, I generally don't have much reason to care what specific type of train I end up finding myself in.

However, differences in their respective low-level implementation can start to matter under the ‘right’ circumstances, i.e., given relevant variation in external environments. Simon provides us with useful language to talk about this. He writes: “An artifact can be thought of as a meeting point[—]an ‘interface’ in today's terms[—]between an ‘inner’ environment, [i.e.,] the substance and organization of the artifact itself, and an ‘outer’ environment, [i.e.,] the surroundings in which it operates. If the inner environment is appropriate to the outer environment, or vice versa, the artifact will serve its intended purpose.” (p. 6). As such, the (proper) functioning of the artefact is cast in terms of the relationship between the inner environment (i.e., the artefact’s structure or character, its implementation details) and the outer environment (i.e., the conditions in which the artefact operates).” He further provides a simple example to clarify the point: “A bridge, under its usual conditions of service, behaves simply as a relatively smooth level surface on which vehicles can move. Only when it has been overloaded do we learn the physical properties of the materials from which it is built.” (p. 13).

The point is that two artefacts that have been designed with the same purpose in mind, and which in a wide range of environments behave equivalently, will start to show different behaviours if we enter environments to which their design hasn’t been fully adapted to. Now, their low-level implementation (or ‘inner environments’, in Simon’s terminology) starts to matter.

To further illustrate the relevance of this point to the current discussion, let us consider the field artificial intelligence (AI)—surely a prime example of a science of the artificial. The aspiration of the field of AI is to find ways to instantiate advanced intelligent behaviour in artificial substrates. As such, it can be understood in terms of its dual nature: it aims to both (descriptively) understand what intelligent behaviour is and how it functions, as well as how to (normatively) implement it. The dominant technological paradigm for building AGI at the moment is machine learning. However, nothing in principle precludes that other paradigms could (or will) be used for implementing AGI (e.g., some variation on symbolic AI, some cybernetic paradigm, soft robotics, or any number of paradigms that haven’t been discovered yet).

Furthermore, different implementation paradigms for AGI imply different safety- or governance-relevant properties. Imagine, for example, an AGI built in the form of a “singleton” (i.e. a single, monolithic) compared to one built as a multi-agent, distributed system assembly. A singleton AGI, for example, seems more likely to lack interpretability (i.e. behave in ways and for reasons that, by default, remain largely obscure to humans), while a distributed system might be more likely to fall prey to game theoretic pitfalls such as collusion (e.g. Christiano, 2015). It is not at this point properly understood what the different implications of the different paradigms are, but the specifics of this must not matter for the argument I am trying to make here. The point is that, if the goal is to make sure that future AI systems will be safe and used to the benefit of humanity, it may matter a huge deal which of these paradigms is adopted, and to understand what different paradigms imply for considerations of safety and governability.

As such, the problem of paradigm choice—choices over which implementation roadmap to adopt and which theories to use to inform said roadmap—comes into focus. As philosophers of science, we must ask: What determines paradigm choice? As well as: How, if at all, can a scientist or scientific community navigate questions of paradigm choice “from within” the history of science?

This is where our discussion of the appropriate epistemology for the sciences of the artificial properly begins. Next, let us evaluate whether we can find satisfying answers in scientific realism.

Part 2: Scientific realism of the artificial

Faced with the question of paradigm choice, one answer that a scientific realist might give is that, what determines the right paradigm choices comes down entirely to how the world is. In other words, what the AI researcher does when trying to figure out how to build AGI is equivalent to uncovering the truth about what AGI, fundamentally, is. We can, of course, at a given point in time be uncertain about the ‘true nature’ of AGI, and thus be exploring different paradigms; but eventually, we will discover which of those paradigms turns out to be the correct one. In other words, the notion of paradigm choice is replaced with the notion of paradigm change. In essence, the aspiration of building AGI is rendered equivalent to the question of what AGI, fundamentally, is.

As I will argue in what follows, I consider this answer to be dissatisfying in that it denies the very premise of the science of the artificial we have discussed in the earlier section. Consider the following arguments.

First, the answer by the scientific realist seems to be fundamentally confused about the type-signature of the concept of “AGI”. AGI, in the sense I’ve proposed here, is best understood as a functional description—a design requirement or aspiration. As discussed earlier, it is entirely plausible that there exist several micro-level implementations which are functionally-equivalent to (i.e. depict) generally intelligent behaviour. As such, by treating “the aspiration of building AGI [as equivalent] to the question of what AGI is”, the scientific realist has implicitly already gone away—and thus failed to properly engage with—from the premise of the question.

Second, note that there are different vantage points from where we could be asking the question. We could take the vantage point of a “forecaster” and ask what artefacts we should expect to exist 100 years from now. Or, we could take the vantage point of a “designer” and ask which artefacts we want to create (or ought to create, given some set of moral, political, aesthetic, or other commitments). While it naturally assumes the vantage point of the forecaster, scientific realism appears inadequate for taking seriously the vantage point of the designer.

Third, let’s start with the assumption that what will come into existence is a matter of fact. While plausible-sounding at first, further inspection reveals problems with this claim. To show how this is the case, let us consider the following tri-partite characterisation proposed by Eric Drexler (2018). We want to distinguish between three notions of “possible”, namely: (physically) realistic, (techno-economically) plausible, and (socio-politically) credible. This is to say, beyond such facts as the fundamental laws of physics (i.e., the physically realistic), there are other factors—less totalising and yet endowed with some degree of causal force—which shape what comes to be in the future (e.g., economic and sociological pressures).

Importantly, the physically realistic does not on its own determine what sorts of artefacts come into existence. For example, paradigm A (by mere chance, or for reasons of historical contingency) receives differential economic investment compared to paradigm B, resulting in its faster maturation; or inversely, it might get restrained or banned through political means, resulting in it being blocked, and maybe even eventually forgotten. Examples of political decisions (e.g. regulation, subvention, taxation, etc.) affecting technological trajectories abound. To name just one, consider how the ban on human cloning has, in fact, stopped human cloning activities, as well as any innovations related to making human cloning ‘better’ (cheaper, more convenient, etc.) in some way.

The scientific realist might react to this by arguing that, while the physically realistic is not the only factor that determines what sorts of artefacts come into existence, there is still a matter of fact to the nature and force of economic, political and social factors affecting technological trajectories, all of which could, at least in principle be understood scientifically. While I am happy to grant this view, I argue that the problem lies elsewhere. As we have seen, the interactions between the physically realistic, the techno-economically plausible, and the socio-politically crediblethe are highly complex and, importantly, self-referential. It is exactly this self-referentiality that makes this a case of paradigm choice, rather than paradigm change, when viewed from the position of the scientist. In other words, an adequate answer to the problem of paradigm choice must necessarily consider a “view from inside of the history of science”, as opposed to a “view from nowhere”. After all, the paradigm is being chosen by the scientific community (and the researchers making up that community), and they are making said choice from their own situated perspective.

In summary, it is less that the answers provided by the scientific realist are outright wrong. It rather appears as if the view provided by scientific realism is not expressive enough to deal with the realities of the sciences of the artificial. It can not usefully guide the scientific enterprise when it comes to the considerations brought to light by the sciences of the artificial. Philosophy of science needs to do better if it wants to avoid confirming the accusation raised by Richard Feynman—that philosophers of science are to scientists what ornithologists are to birds; namely irrelevant.

Next, we will consider whether we can find a different epistemological framework, while holding onto as much realism as possible, appears more adequate for the needs of the sciences of the artificial.

Part 3: A pragmatic account of the artificial

So far, we have introduced the notion of the science of the artificial, discussed what it demands from the philosophy of science, and observed how scientific realism fails to appropriately respond to those demands. The question is then: Can we do better?

An alternative account to scientific realism—and the one we will consider in this last section—is pragmatic realism, chiefly originating from the American pragmatists William James, Charles Sander Pierce, John Dewey. For the present discussion, I will largely draw on contemporary work trying to revive a pragmatic philosophy of science that is truly able to guide and support scientific inquiry, such by Robert Toretti, Hasok Chang, Rein Vihalemm.

Such a pragmatist philosophy of science emphasises scientific research as a practical activity, and the role of an epistemology of science as helping to successfully conduct this activity. While sharing with the scientific realist a commitment to an external reality, pragmatism suggests that our ways of getting to know the world are necessarily mediated by the ways knowledge is created and used, i.e., by our epistemic aims and means of “perception”—both the mind and scientific tools, as well as our scientific paradigms.

Note that pragmatism, as I have presented it here, does at no point do away with the notion of an external reality. As Giere (2006) clarifies, not all types of realism must conscribe to a “full-blown objective realism”(or what Putnam called “metaphysical realism”)—roughly speaking the view that “[t]here is exactly one true and complete description of ‘the way the world is.’” (Putnam, 1981, p. 49). As such, pragmatic realism, while rejecting objective or metaphysical realism, remains squarely committed to realism, and understands scientific inquiry as an activity directed at better understanding reality (Chang, 2022, p. 5; p. 208).

Let us now consider whether pragmatism is better able to deal with the epistemological demands of the sciences of the artificial than scientific realism. Rather than providing a full-fledged account of how the sciences of the artificial can be theorised within the framework of pragmatic realism, what I set out to do here is more humble in its ambition. Namely, I aim to support my claim that scientific realism is insufficiently expressive as an epistemology of the sciences of the artificial by showcasing there exist alternative different frameworks—in this case pragmatic realism—that do not face the same limitations. In other words, I aim to showcase that, indeed, we can do better.

First, as we have seen, scientific realism fails to adopt the “view point of the scientist''. As a result, it collapses the question of paradigm choice to a question of paradigm change. This makes scientific realism incapable of addressing the (very real) challenge faced by the scientist; afterall, as I have argued, different paradigms might come with different properties we care about (such as when they concern questions of safety or governance). In contrast to scientific realism, pragmatism explicitly rejects the idea that scientific inquiry can ever adopt a “view from nowhere” (or, a “God’s eye view” as Putnam (1981, p. 49) puts it). Chang (2019) emphasises the “humanistic impulse” in pragmatism. “Humanism in relation to science is a commitment to understand and promote science as something that human agents do, not as a body of knowledge that comes from accessing information about nature that exists completely apart from ourselves and our investigations.” (p. 10). This aligns well with the need of the sciences of the artificial to be able to reason from the point of view of the scientist.

Second, pragmatism, in virtue of focusing on the means through which scientific knowledge is created, recognise the historicity of scientific activity (also see, e.g. Vihalemm, p.3; Chang, 2019). This capacity allows pragmatic realism to reflect the historicity that is also present in the science of the artificial. Recall that, as we discussed earlier, one central epistemological question of the sciences of the artificial concerns how our theorising affects what comes into existence. As such, our prior beliefs, scientific frameworks and tools affect, by means of ‘differential investment’ in designing artefacts under a given paradigm, what sort of reality comes to be. Moreover, the nature of technological progress itself affects what we become able to understand, discover and build in the future. Pragmatism suggests that, rather than there being already a predetermined answer as to which will be the most successful paradigm, the scientist must understand their own scientific activity as part of an iterative and path-dependent epistemic process.

Lastly, consider how the sciences of the artificial entail a ‘strange inversion’ of ‘functional’ and ‘mechanistic’ explanations. In the domain of the natural, the ‘function’ of a system is understood as a viable post-hoc description of the system, resulting from its continuous adaptation to the environment by external pressures. In contrast, in design, the ‘function’ of an artefact becomes that which is antecedent, while the internal environment of the artefact - its low-level implementation - becomes post-hoc. It appears difficult, through the eyes of a scientific realist, to fully accept this inversion. At the same time, accepting it appears to be useful, if not required, in order to epistemology of the sciences of the artificial on its own terms. Pragmatic realism, on the other hand, does not face the same trouble. To exemplify this, let us take Chang’s notion of operational coherence, a deeply pragmatist notion of the yardstick of scientific inquiry, which he describes as “a harmonious fitting-together of actions that is conducive to a successful achievement of one’s aims” (Chang, 2019, p. 14). As such, insofar as we are able to argue that a given practice in the sciences of the artificial possesses such operational coherence, it is compatible with pragmatic realism. What I have tried to show hereby is that the sciences of the artificial, including the ‘strange inversion’ of the role of ‘ functions’ which it entails, is fully theorisable inside of the framework of pragmatic realism. As such, unlike scientific realism, the latter does not fail to engage with the sciences of the artificial on its own terms.

To summarise this section, I have argued, by means of three examples, that pragmatic realism is a promising candidate for a philosophy of science within which it is possible to theorise the sciences of the artificial. In that, pragmatic realism differs from scientific realism. In particular, I have invoked the fact that the sciences of the artificial requires us to take the “point of view of the scientist”, to acknowledge the iterative, path-dependent and self-referential nature of scientific inquiry (i.e., its historicity), and, finally, to accept the central role of ‘function’ in understanding designed artefacts.

Conclusion

In section 1, I have laid out the case for why we need a philosophy of science that can encompass questions arising from the sciences of the artificial. One central such question is the problem of paradigm choice, which requires the scientific practitioner to understand the ways in which their own theorising affects what will come into existence.

In section 2, I have considered whether scientific realism provides a sufficient account, and concluded that it doesn’t. I have listed three examples of ways in which scientific realism seems to be insufficiently expressive as an epistemology of the sciences of the artificial. Finally, in section 3, I explored whether we can do better, and have provided three examples of epistemic puzzles, arising from the sciences of the artificial, that pragmatic realism, in contrast with scientific realism, is able to account for.

While scientific realism seems attractive on the basis of its explaining the success of science (of the natural), it does not in fact present a good explanation of the success of the science of the artificial. How, before things like, say, planes, computers, or democratic institutions existed, could we have learnt to build them if all that was involved in the scientific enterprise was uncovering that which (already) is? As such, I claim that the sciences of the artificial provide an important reason for why we should not be satisfied with the epistemological framework provided by scientific realism with respect to understanding and—importantly—guiding scientific inquiry.

References

Boyd, R. N. (1983). On the current status of the issue of scientific realism. Methodology, Epistemology, and Philosophy of Science: Essays in Honour of Wolfgang Stegmüller on the Occasion of His 60th Birthday, June 3rd, 1983, 45-90.

Chang, H. (2019). Pragmatism, perspectivism, and the historicity of science. In Understanding perspectivism. pp. 10-27. Routledge.

Chang, H. (2022). Realism for Realistic People. Cambridge University Press.

Christiano, P. (2015). On heterogeneous objectives. AI Alignment (medium.com). Retrieved from. https://ai-alignment.com/on-heterogeneous-objectives-b38d0e003399.

Drexler, E. (2018). Paretotopian goal alignment, Talk at EA Global: London 2018.

Drexler, E. (2019). Reframing Superintelligence. Future of Humanity Institute.

Fine, A. (1986). Unnatural Attitudes: Realist and Antirealist Attachments to Science. Mind, 95(378): 149–177.

Fu, W., Qian Q. (2023). Artificial Intelligence and Dual Contract. arXiv preprint arXiv:2303.12350.

Laudan, L. (1981). A confutation of convergent realism. Philosophy of science. 48(1), 19-49.

Normile, D. (2018). CRISPR bombshell: Chinese researcher claims to have created gene-edited twins. Science. doi: 10.1126/science.aaw1839.

Putnam, H. (1975). Mathematics, Matter and Method. Cambridge: Cambridge University Press.

Putnam, H. (1981). Reason, Truth and History. Cambridge: Cambridge University Press.

Simon, H. (1996). The Sciences of the Artificial - 3rd ed, The MIT Press.

Soares, N., Fallenstein, B. (2017). Agent foundations for aligning machine intelligence with human interests: a technical research agenda. The technological singularity: Managing the journey, 103-125.

Toretti, R. (2000). Scientific Realism’ and Scientific Practice. In Evandro Agazzi and Massimo Pauri (eds.), The Reality of the Unobservable. Dordrecht: Kluwer.

Van Fraassen, B. (1980). The scientific image. Oxford University Press.

Vihalemm, R. (2012). Practical Realism: Against Standard Scientific Realism and Anti-realism. Studia Philosophica Estonica. 5/2: 7–22.

Some thoughts on “rationality”

March 15, 2023 by Nora Ammann

Meta: I recently stumbled upon this post which I wrote in Summer 2021 and apparently never posted (for reasons I can’t recall). My ideas have definitely evolved since then, but this still felt good enough to post, especially the two bolded bits towards the end of the post I think are still today of great relevance.

***

A vast number of related bodies of literature are interested in rationality and decision-making. The term “rationality” is used in subtly different ways in different fields and contexts, which is relevant to keep in mind when exploring these bodies of literature.

The “classical” variant of rationality is what I will refer to as axiomatic rationality; a way of making decisions that conforms to a set of abstract axioms (also see here). Departure from any of these axioms make the decision-making agent “exploitable” (e.g. money pump) and thus is considered “irrational”.

In this post I do not want to focus on axiomatic rationality, however. Instead, I will introduce two related conceptions of rationality and explore some possible implications for how we might understand the world.

Bounded rationality

Bounded rationality is a term that has been thrown around a lot recently, so I expect most readers to have come across it.

Instead of categorizing humans as either rationality or irrational, this school of thought takes seriously the fact that human decision-makers face resource constraints which affect what “a good decision strategy” looks like. Resources relevant to decision making include time, attention, cognitive capacity, memory, etc. Because humans are bounded, they will fall back on so-called “fast and frugal heuristics”. As opposed to how the “heuristics and biases” literature interprets this behaviour, relying on heuristics doesn’t necessarily represent human irrationality; given the agent’s constraints, relying on such heuristics is often the more robust, eventually more successful strategy. (The better understand this landscape of thought, I recommend reading “Bounded rationality: the two cultures” and “Mind, rationality, and cognition: An interdisciplinary debate.“)

A subtlety within this framework which I believe is often being glossed over is that the agent’s boundedness is not inherent to the agent but relative to their decision problem and environment. To illustrate, the same agent can be described as rational when playing tic-tac-toe, while being described as boundedly rational when making decisions within the economy. The degree of complexity depicted by the economy vastly overshadows the cognitive capacities of humans.

If an agent cannot process all relevant information - and in complex systems all relevant information can be a lot of information - the agent tends to be better off relying on heuristics. Biases and heuristics thus lose their common negative connotation; they become a necessity. Instead of asking how to make agents rational and unbiased, we should be asking how we can differentiate better from worse heuristics.

Ecological rationality

This is where ecological rationality has things to say.

Ecological rationality asks: “Given a problem and decision environment, which strategies should an agent rely on when optimization is not feasible?”. It is thus the normative study of which heuristic processes succeed in a given environment, where the environment is such that optimization is not feasible. According to ecological rationality, the yardstick determining the success of a decision is to be found in the external world, as opposed to in its internal consistency with the principles of rationality choice theory, as purported by most classical theory of rationality.

This conception is rooted in the realization, first popularized by Simon A. Herbert, that most decision environments do not in fact fulfil the assumptions necessary to apply as-if models of optimal decision-making, meaning that there is no guarantee that using optimization processes will lead to optimal outcomes.

Instead, for many problems, optimization is not feasible due to depicting one or several of the following three characteristics: a decision situation is (1) ill-defined, meaning that not all alternatives, consequences, and probabilities are or can be known; (2) underpowered, meaning that parameter values must be estimated from limited samples which can lead to greater error than error caused by “bias” of a simple heuristic does; (3) intractable, meaning that while the problem is well-specified, it is computationally intractable and thus no optimal strategy can be found (within the imposed time and resource constraints). It is in fact the case that most decision problems, including the decision problems we are concerned with in our research (GCR preparedness), fall into the category of decision problems which cannot be optimized (see below).

The conditions for the infeasibility of optimization lead us to recognise the central role of the information structure of the environment in which the agent makes a decision. Decision-making mechanisms, according to ecological rationality, exploit this information structure of the environment. The goal of the study of ecological rationality is thus to map individual and collective decision-making heuristics (“adaptive toolbox”) onto a set of environmental structures.

In other words, ecologically rational decision-making is a measure of fit between the mental structures of the decision-maker and the informational structures of the environment.This allows for comparative statements such as, given a decision problem and environment, which of a set of decision-making strategies will perform best? Similarly, it follows that, when aiming to improve decision-making in real world scenarios, it is central to match the strategy and information to mental structures of the decision maker(s).

Even if an agent is able to identify the “smartest” decision strategy, the additional cost for finding that strategy might outweigh the added benefit from having identified the marginally better decision. What constitutes the “best strategy” depends therefore inherently on the environment, including on the strategies adopted by other agents. Furthermore, an agent might make a sequence of decision in an evolving environment. The strategy that is best in one decision situation might not be the strategy that is most robust over the course of many decisions.

Some implications for thinking about the behaviour of possible (artificial) intelligences

I am interested in the space of possible intelligences, and how we should expect they would act and make decisions in the world. The discussion here will thus explore some ways the above might shed light on how we should expect strong AI (“superintelligent” / ”general” / ”transformative” / ...) will behave.

Economists sometimes differentiate between “normal” and inherent/knightian uncertainty (the former is also sometimes referred to as “risk”) (see here. “Normal” uncertainty or risk refers to a decision under conditions of known probabilities. (For example a (fair) coin flip: you know that there is a 50/50 chance the coin will land heads-up.) Inherent or knightian uncertainty refers to a decision under conditions of unknown probabilities; you don’t even know what your probability distribution should look like, potentially because you are not even sure what the space of possible future states are.

If you, like me, believe in a deterministic universe, you might struggle with the concept of knightian uncertainty. What does it mean, you might ask, to be “inherently unpredictable”? This confusion remains for me until we define inherent uncertainty as being relative to an agent’s epistemic situation (i.e. as being subjective).

There is an obvious link between (subjective) inherent uncertainty and bounded rationality. The subjective boundedness of a decision maker imposes additional constraints on decision making, beyond the axioms of rational choice, turning decisions of risks into decisions of inherent subjective uncertainty.

One might ask: will a superintelligent AI experience inherent uncertainty, or will it be able to treat any decisions as a decision under risk? It seems to be quite common to expect that strong AI will be rational in the sense of being able to treat all decisions as decisions under risk. However, that doesn’t seem right to me.

Conceptualizing bounded rationality as relative to a decision environment informs how to think about the decision-making of intelligent agents. We want to understand how AI makes decisions. Instead of asking whether or not some AI system makes decisions conforming with the axioms of rational choice, a better starting point might be to ask: relative to what problem and environment is the AI making its decisions?

This can nuance how we think about the promise of strong AI solving all our problems for us, and doing so in a way that is, unlike with human decision makers, free of irrationalities and biases.

Many of the most decision environments with the most real-world consequences (e.g. the economy, policy-making and governance) are complex social systems. These systems depict non-linearity, meaning that the slightest differences in initial condition can result in large differences later on. What sort of epistemic position would an intelligent system need to be in to be able to treat decisions within such complex systems as decisions under risk? This reflection makes me sceptical of some of the more simplistic scenarios of how strong AI will navigate the real world.

One, albeit speculative, caveat seems worth mentioning: this inherent epistemic challenge could compel a powerful AI to try to reduce the complexity of the real-world, thereby making it more controllable. (This is, in part, what we can already see with things like the Youtube recommender algorithm.) Social complexity is a primary mark of modern society, thus the prospect of having a strong system trying to radically reduce complexity in the world seems worrisome.

References

Gigerenzer and Todd (1999), Fast and frugal heuristics: The adaptive toolbox.
Gigerenzer (2021), Axiomatic rationality and ecological rationality
Herber Simon (1979), Rational Decision Making in Business Organizations
Kahneman and Tversky (1979), Prospect Theory: An Analysis of Decision under Risk
Todd and Gigerenzer (2012), Ecological Rationality: Intelligence in the World
Todd and Brighton (2016), Building the Theory of Ecological Rationality

The 'Imperative of Feedback' - a cautionary tail

February 17, 2023 by Nora Ammann

When I was younger, I had a near-insatiable hunger for feedback from other people.

I suspect it provided me with a sense of safety. The hope that, even if I was unworthy and ‘not enough’ in how I was there and then, I was improving. So - eventually - I would be worthy and enough. As long as my direction of travel was pointing upwards, as long as I was on a journey of constant and relentless improvement, all would be okay. I would be okay. Feedback was fuel, the direction-giver on that journey. The only salient signal that could indicate to me that I was still broadly travelling in the right direction, thereby giving me an (albeit perverse) sense of okayness and legitimacy ‘on-credit’.

Today, I feel different about feedback. Most of all I notice that I no longer have this relentless graving for feedback from others. I no longer need others to tell me how they think I am doing and how I could become better.

It’s not that I no longer think improvement and error correction are valuable. I do. It is two other things that have changed in how I orient to feedback.

First, I no longer need external feedback to give me that feeling of safety. Caring about growth is no longer an existential matter. Caring about growth is one - only one - of many things I care about.

In retrospect, feedback was a bit like a drug for me. I was reward-hacking myself. Feedback gave me a sense of safety - emphasis on sense - but said sense was importantly mistaken about something. Receiving people’s feedback is not the same as actually improving. People might think I am doing really well, or they might think I am doing poorly. But they can be wrong – and in fact often are. People’s beliefs about reality are the same as reality itself. The typically map-territory error. I craved feedback because I wanted to be okay, but all it got me was a momentary, dissipative and not appropriately-grounded feeling okay. But the feeling of okayness other people’s feedback could buy me was always a thin substitute of the real thing.

The second change of orientation I observe is that, while I still care about feedback (even if in a less existential manner), I am today significantly less interested in feedback from people, and significantly trustful that reality will provide me with most of the feedback I need.

I still think people-feedback can be useful, but I consider it useful in a much more narrow and specific set of cases. If I am writing a piece I might ask some specific people for feedback because I respect their takes on the topic. If I run a project I might be interested in checking whether the people I work with, or other people I respect, have noticed things I haven’t such as to expand and deepen my models of how to do this sort of thing well. In other words, the way I orient to people-feedback today tends to be more focused on moving me ahead on specific things I am trying to do, or on drawing on specific areas of expertise that other people might hold - rather than on say ‘my overall performance’.

What I am basically entirely suspicious of now – as opposed to back then – is other people giving feedback on what my goals should be in the first place. It’s not like I used to go around asking people to give me feedback on what I should care about. But, as a matter of fact, I did pay insufficient attention to keeping separate what are my goals from what are other people’s goals. And feedback played an important role in this confusion.

I was overly eager to receive feedback and to act on that feedback to change my behaviour. Afterall, just receiving feedback isn’t enough to make you improve; you also have to integrate the feedback, let the feedback trickle through your entire being, let it change you! So, in all that over-eagerness, my ability to ask one important question started to fade away; whether the feedback was relevant to what I personally cared about.

The ‘imperative of feedback’ developed from an imperative to elicit and take seriously to an imperative to consume all fully, and any omission to incorporate and act on the feedback was considered a failure of character. Disagreeing with feedback, or – even worse – agreeing with feedback but still not acting on it became equivalent to a moral shortcoming; a vice, a sign of laziness, a sign of pride, a lack of dedication to the imperative of eternal betterment.

What all of this missed, however,—and importantly so!—is that what feedback is relevant to you depends on what you’re trying to achieve or what you care about. This is also why there are cases where even if I agree with the feedback, it might not be right for me to incorporate it. Let’s say I could have presented myself more professionally at that event, or given a more sleek presentation. That may very well be true. But I might just not care. Maybe, given my goals, I don’t need to hone my professionalism or presentation skills. Maybe, given my goals and other things I care about more, it’s just not worth it for me to focus on this. Maybe—Sacre Coeur!—the feedback is accurate and incorporating it would help me move closer to my goals and also, I choose not to. Turns out you can do that. Turns out your choices can have many, many reasons – and not all of them are readily captured by narratives of improvement and performance.

It is through subtle narratives and dynamics like these that you can come to lose touch with what you care about, what you value – with yourself. What is more, in that process, you risk becoming the extension of someone else’s goals – like a little string puppet. I don’t think it has to be a bad thing to help with the furthering of someone else’s goals. But I do in fact believe it tends to be a bad thing to confuse other people’s goals for one’s own goals. There are few more fundamental ways for one’s agency and freedom to erode away.

***

As such, I want to value feedback for what it is – an occasionally useful tool to move you closer towards what I value. But also, I want to be firm in my knowledge of what feedback is not. It is not a substitute for the process of exploration and self-dialectic through which I come into contact with what is important to me. Feedback does not have to but can become a means of alienation from one’s own value. Don’t let that happen. Nor do I want to confuse feedback from others for feedback from reality.

Lastly, in my desire to grow and evolve in who I am, I don’t want to forget that there are many other things beyond improvement that constitute deep and rich sources of value – including the savouring of the journey itself.

*On a journey with Oscar who was yet to make us famous*

Epistemic justification in (Hu)man and Machine

February 12, 2023 by Nora Ammann in Essay

What does it take for a belief to be epistemically justified? In the hope of providing a novel angle to this long-standing discussion, I will investigate the question of epistemic justification by means of considering not only (what one might call) ‘classical’ cases, but also ‘machine’ cases. Concretely, I will discuss whether—and, if so, on what basis—artificial systems instantiating intelligent behaviour can be said to form epistemically justified ‘beliefs’. This will serve as a sort of thought experiment or case study used to test plausible answers to the problem of epistemic justification and, potentially, derive inspirations for novel ones.

Why do I choose to adopt this methodological approach? Consider, by comparison, the classic question in biology: what is life? Fields such as astrobiology or artificial life allow us to think about this question in a more (and more appropriately) open-minded way—by helping us to uproot unjustified assumptions about what life can and cannot look like based on sampling from Earth-based forms of life alone. The field of artificial intelligence can serve a similar function vis-à-vis philosophical inquiry. Insofar as we aspire for our theories—including our theories of knowledge and epistemic justification—to be valid beyond the contingencies of human intelligence, insights from the study of AI stand in a fruitful intellectual symbiosis with philosophical thought.

I will start our investigation into epistemic justification with a thought experiment.

Rome: Consider Alice; when having dinner with her friends, the topic of her upcoming trip to Italy comes up. Alice explains that she will be taking a plane to Rome, Italy’s capital city, from where she will start her journey.

It seems uncontroversial to say that Alice is epistemically justified in her belief that Rome is in fact the capital of Italy. The question I want to raise here is: in virtue of what is this the case? Before I delve into examining plausible answers to this question, however, let us compare the former story to a slightly different one.

Rome’: In this case, Bob is playing around with the latest large language model trained and made available by one of the leading AI labs—let’s call it ChatAI. Bob plays with the model in order to get a handle on what ChatAI is and isn’t able to do. At one point, he submits the following query to the model: “What is the capital of Italy?”, and the model replies: “The capital city of Italy is Rome.”

By analogy to the first case, should we conclude that the model is epistemically justified in its claim that Rome is the capital of Italy? And if not, how are these two cases different? In what follows, I will investigate these questions in more detail, considering various approaches attempting to clarify what amounts to epistemic justification. To do so, I will toggle between considering the traditional (or human) case and the machine case of epistemic justification and study whether this dialogue can provide insight into the question of epistemic justification.

Correctness (alone) is not enough—process reliabilism for minds and machines

Thus, let us return to a question raised earlier: in virtue of what can we say Alice is justified in claiming that Rome is the capital of Italy? A first observation that appears pertinent is that Alice is correct with her statement. Rome is in fact the capital of Italy. While this appears relevant, it doesn’t represent a sufficient condition for epistemic justification. To see why, we need only think of cases where someone is correct due to mere chance or accident, or even against their better judgement. You may ask me a question about a topic I have never heard of, and yet I might get the answer right by mere luck. Or, in an even more extreme case, we may play a game where the goal is to not give a correct answer. It is quite easily conceivable, in virtue of my utter ignorance of the topic, that I end up giving an answer that turns out to be factually correct, despite trying to pick an answer that I believe to be wrong. In the first case, I got lucky, and in the second case, I uttered the correct answer against my better judgement. In none of these cases would my factually correct answer represent an epistemically justified correct answer.

As such, I have shown that the truth condition (alone) is an insufficient account of epistemic justification. Furthermore, I have identified a particular concern: that epistemic justification is not given in cases where claim is correct for arbitrary or ‘lucky’ reasons. This conclusion seems to be supported when considering the machine case. If, say, we designed a program that, when queried, iterated through a predefined set of answers and picked one of them at random, then, even if this program happened to pick the correct answers, we wouldn’t feel compelled to consider this a case of epistemic justification. Insofar as we are here taking offense with the arbitrariness of the answer-producing process when considering its status of epistemic justification, we may come to wonder what it would look like for a claim to be correct on a non-arbirary or non-lucky basis.

To that effect, let us consider the proposal of process relabilism (Goldman, 1979, 1986). At its core, this theory claims that a belief is epistemically justified if it is the product of a belief-formation process that is systematically truth-conducive. In other words, while it is insufficient to observe that a process produces the correct answer on a single and isolated instance, if a process tends to produce the correct answer with a certain reliability, said process acts as a basis for epistemic justification according to the reliabilist thesis. Applied to our Rome case from earlier, the question is thus which processes (e.g., of information gathering and processing) led Alice to claiming that Rome is the Italian capital, and whether these same processes have shown sufficient epistemic reliability in other cases. Let’s say that, in Alice’s case, she inferred her belief that Rome is the capital of Italy as follows. First, her uncle told her that he was about to emmigrate to live in the capital city of Italy. A few weeks later, Alice receives a letter from said uncle which was sent from, as she can tell by the post stamp on the card, Rome. From this, Alice infers that Rome must be the capital of Italy. As such, Alice’s belief is justified insofar as it involved the application of perception, rational reflection, or logical reason, rather than, say, guessing, wishful thinking, or superstitious reasoning.

Furthermore, we don’t have to understand reliability here merely in terms of the frequency at which a process produces true answers. Instead, we can interpret it in terms of the propensity at which it does so. In the latter case, we capture a notion of truth-conduciveness that pertains not only to the actual-world observed, but is also cognizant of other possible worlds. As such, it aims to be sensitive to the notion that a suitable causal link is required between the given process and its epistemic domain, i.e., what the process is forming beliefs over. This renders the thesis more robust against unlikely but statistically possible cases where an arbitrary process gets an answer repeatedly correct, which would undermine the extent to which process reliabilism can serve as a suitable basis for epistemic justification. To illustrate this, consider the case of the scientific method, where we rely on empiricism to test hypotheses. This process is epistemically reliable not in virtue of getting true answers at a certain frequency, but in virtue of its procedural properties which guarantee that the process will, sooner or later, falsify wrong hypotheses.

To summarise, according to process reliabilism, a belief-formation process is reliable as a function of its propensity to produce true beliefs. Furthermore, the reliability (as defined just now) of a belief-formation process serves as the basis of epistemic justification for the resulting belief. How does this apply or not to the machine case from earlier (Rome’)?

To answer this question, let us imagine that Bob continues to play with the model by asking it more questions about the capital cities of other countries. Assuming capabilities representative of the current state of the art in machine learning and large language models in particular, let us say that ChatAI’s responses to Bob’s questions are very often correct. We understand enough about how machine learning works that, beyond knowing that it is merely frequently correct, we can deny that ChatAI (and comparable AI systems) produces correct answers by mere coincidence. In particular, machine learning exploits insights from statistics and optimization theory to implement a form of inference on its training data. To prove this is the case and test the performance of different models, the machine learning communities regularly develop so-called ‘benchmarks’ based on various performance-relevant features of the model being evaluated, such as accuracy as well as speed or (learning) efficiency. As such, AI systems can, given appropriate design and training, produce correct outputs with high reliability and for non-arbitrary reasons. This suggests that, according to process reliabilism, outputs from ChatAI (and comparable AI systems) can qualify as being epistemically justified.

Challenge 1: “You get out only what you put in”

However, the reliabilist picture as painted so far does not in fact hold up to scrutiny. The first problem I want to discuss concerns the fact that, even if procedurally truth-conducive, a process can produce systematically incorrect outputs if said process operates on wrong initial beliefs or assumptions. If, for example, Alice’s uncle was himself mistaken about what the capital of Italy is, thus moving to a city that he mistakenly thought was the capital, and if he had thus through his words and action passed on this mistaken belief to Alice, the same reasoning process she used earlier to arrive at a (seemingly) epistemically justified belief would now have produced an incorrect belief. Differently put, someone’s reasoning might be flawless, but if based on wrong premises, its conclusions must be regarded as null in terms of their epistemic justification.

A similar story can be told in the machine case. A machine learning algorithm seeking to identify underlying statistical patterns of a given data set can only ever be as epistemically valid as is the data set it’s being trained on. As a matter of fact, this is a vividly discussed concern in the AI ethics literature, where ML models have been shown to reproduce bias present in their training sets. For example, language models have been shown (before corrective interventions were implemented) to associate certain professions (e.g., ‘CEO’ or ‘nurse’) predominantly with certain genders. Similarly, in the legal context, ML systems used to predict recidivism risk have been criticised for reproducing racial bias.

What this discussion highlights is that the reliabilist thesis as I stated it earlier is insufficient. Thus, let us attempt to vindicate the thesis before I discuss a second source of criticism that can be raised against it. As such, we can reformulate a refined reliabilist thesis as follows: for a belief to be epistemically justified, it needs to a) be the product of a truth-conducive processes, and b) the premises on which said process operates to produce the (resulting) belief in question must themselves be justified.

As some might notice, this approach, however, may be at risk of running into a problem of regress. If justified belief requires that the premises on which the epistemic process operates must be justified, how do those premises gain their justification other than by reference to a reliable process operating on justified premises? Without providing, in the context of this essay, a comprehensive account of how one may deal with this regress problem, I will provide a handful of pointers to such attempts that have been made.

A pragmatist, for example, may emphasise their interests in a process that can reliably produce useful beliefs. Since the usefulness of beliefs is determined by its usage, this does not fall prey to the regress challenge as stated above. A belief can be tested for its usefulness without making reference to another belief. Klein (1999), on the other hand, denies that the type of regress at hand is vicious in the first place, making references to a view called infinitism. According to infinitism, justification requires an appropriate chain of reasons, and in the case of infinitism specifically, such chains take the form of non-repeating infinite ones. Finally, Goldman himself (2008) tackles the regress problem by differentiating between basic and non-basic beliefs, where the former is justified without reference to another belief but in virtue of being the product of an unconditionally reliable process. Such basic beliefs, then, represent a plausible stopping point for such a regress dynamic. Perception has been proposed as a candidate of such an unconditional process, although one may object to this account by denying that it is possible, or common, for perceptual or empirical data to be entirely atheoretical. In any case, the essence of Goldman’s proposal, and of the proposals of externalist reliabilists in general, is that a belief must be justified not with reference to reflectively accessible reasons (which is what internalists propose), but in virtue of the causal process that produced the belief whether or not these processes make reference to other beliefs. As such, externalists are commonly understood to be able to dodge the regress bullet.

For now, this shall suffice as a treatment of the problem of regress. I will now discuss another challenge to process reliabilism (including its refined version as stated above). It concerns questions regarding the domain in which the reliability of a process is being evaluated.

Challenge 2: Generalization and its limits

To understand the issue at hand better, let’s consider the “new evil demon problem”, first raised by Cohen (1984) as a critique against reliabilism. The problem arises from the following thought experiment: Imagine a world WD in which there exists an epistemic counterpart of yours, let’s call her Anna, who is identical to you in every regard except one. She experiences precisely what you experience and believe precisely what you believe. According to process reliabilism, you are epistemically justified in beliefs about this world—let’s call it WO—on the basis of those beliefs being the product of truth-conducive processes such as perception or rational reasoning. In virtue of the same reasoning, Anna ought to be epistemically justified in her beliefs about her world. However, and this is where the problem arises, the one way in which Anna differs from you is that her experiences and beliefs of WD have been carefully curated by an evil demon with the aim of deceiving her. Anna’s world does not in fact exist in the way she experiences it. On a reliabilist account, or so some would argue, we would have to say that Anna’s beliefs are not justified, since her belief-formation processes do not reliably lead to correct beliefs. However, how can your counterpart, who in every regard relevant to the reliabilist thesis is identical to you, not be justified in their beliefs while you are? The dilemma arises in that many would intuitively say that Anna is just as justified in believing what she believes as we are, despite the fact that the process that produced Anna’s belief is unreliable.

One way to cast the above problem–which also reveals a way to diffuse it–is by indexing and then separately evaluating the reliability of the belief-formation processes for the different worlds, WO and WD. From here, as developed by Comesaña (2002), we can make the case that while the belief-formation processes are reliable in the case of WO, they are not in the case of WD. As such, the reliability of a process, and thus epistemic justification, must always be assessed relative to a specific domain of application.

Another similar approach to the same problem has been discussed for example by Jarrett Leplin (2007, 2009) by invoicing the notion of ‘normal conditions’, a term originally introduced by Ruth Millikan in 1984. The idea is that the reliability of a process is evaluated with respect to the normal conditions of its functioning. Lepin defines normal conditions as “conditions typical or characteristic of situations in which the method is applicable” and explains that “[a] reliable method could yield a preponderance of false beliefs, if used predominantly under abnormal conditions” (Lepin, 2007, p. 33). As such, the new evil demon case can be understood as a case where the epistemic processes which are reliable in a demon-less world cease to be reliable in the demon world, since that world no longer complies with the ‘normal conditions’ that guarantee the functionality of said process. While promising as an approach to address a range of challenges raised against reliabilism, there is, one must note, still work to do in terms of clearly formalising the notion of normality.

What both of these approaches share in common is that they seek to defend reliabilism against the new evil demon problem by means of specifying the domain or conditions in which the reliability of a process is evaluated. Instead of suggesting that, for a process to be reliable—and thus to serve as a basis for epistemic justification—it has to be universally reliable, these refinements to reliabilism seek to formalise a way of putting boundaries on the application space of a given process. As such, we can understand the new evil demon problem as an instance of a more general phenomena: of generalization and its limits. This way of describing the problem serves to clarify how the new evil demon problem relates to issues frequently discussed in the context of machine learning.

The problem of generalization in machine learning concerns the fact that the latter, generally speaking, works by trying to exploit underlying patterns to approximate functions that efficiently describe the data encountered. While this approach (and others) has enabled impressive AI applications to date, it faces important limitations. In particular, this learning method is based on an assumption, commonly called IID (i.e., independent and identically distributed sampling), which says that the data set used in training must be representative of the data encountered upon deployment for there to be a guarantee of the effectiveness or accuracy of the learned model. In other words, while we have guarantees about a model’s performance (i.e., accuracy/loss) under the IID assumption, these guarantees no longer hold when the nature of the distribution changes, i.e., when we encounter what is called a distributional shift. Under distributional shift, whatever approximation function a model has learnt will no longer be effective in the new (deployment) environment. This would be called a case of failure to generalise.

Let us reiterate the suggested analogy between the new evil demon problem and the problem of out-of-distribution generalization failures in machine learning. I claim that the demon world WD represents an ‘outside-of-distribution case’ for the epistemic processes that in our world WO are reliable. Though Anna nominally uses the same processes, because she uses them in an importantly different environment, it makes it seem unsurprising that they turn out to be unreliable in WD. Afterall, the reality of WD differs in fundamental ways from WO (namely, the existence of the evil demon). Insofar as the thought experiment is intended to suggest that the demon itself may be subject to completely different fundamental laws than the ones that govern WO, the same processes that can approximate the fundamental laws of WO are not guaranteed to approximate the fundamental laws that govern WD. As such, I have vindicated process reliabilism from the evil demon problem by squaring what earlier appeared counterintuitive: the same processes that are reliable—and thus the basis for epistemic justification in our world (WO)—can turn out to be unreliable in an environment sufficiently foreign to ours, such as the demon world WD.

Conclusion

In this essay, I have set out to evaluate the question of epistemic justification. Most centrally, I discussed whether the proposal of process reliabilism may serve as a basis for justification. To this effect, I raised several challenges to process reliabilism. For example, I observed that a reliable process operating on false premises (or, corrupted data) may cease to systematically produce correct beliefs. We then discussed ways to refine reliabilism to accommodate said concern, and how such refinements may or may not fall prey to a problem of regress. More practically speaking, I linked this discussion to the machine case by explaining how AI systems, even if they may operate on reliable processes, may become corrupted in their ability to produce epistemically justified outputs due to algorithmic bias due to having been trained on non-representative data samples.

The second challenge to reliabilism I discussed concerns details of how the reliability of a process should be evaluated. In particular, I identified a need to specify and bound a ‘domain of application’ in reference to which a process’s reliability is established. The goal of such a demarcation—which may come in the form of indexing as suggested by Comesaña, in the form of defining normal conditions such as proposed by Leplin, or in some other way—is to be sensitive to (the limits of) a process’s ability to generalise. As such, over the course of this discussion, I developed a novel perspective on the new evil demon problem by casting it as an instance of a cluster of issues concerning generalisation and its limits. While the new evil demon problem is commonly raised as an objection to process reliabilism—claiming that the reliabilist solution to the case is counterintuitive—I was able to vindicate reliabilism from these allegations. Anna’s epistemic processes—despite being nominally the same as ours—do fail to be reliable; however, said failure must not be surprising to us because the demon world represents an application domain that is sufficiently and relevantly different from our world.

Throughout the essay, I have attempted to straddle both the classical domain of epistemological inquiry, as well as a more novel domain, which one may call ‘machine’ epistemology. I believe this dialogue can be methodologically fruitful, and hope to have been able to provide evidence towards that conviction by means of the preceding discussion. It may serve as source of inspiration; it may, as discussed at the start of this essay, help us appropriately de-condition ourselves from unjustified assumptions such as forms of anthropocentrism; and it may serve as a practical testing ground and source of empirical evidence towards assessing the plausibility of different epistemological theories. Unlike with humans or mental processes, machines provide us with a larger possibility space and more nimbleness in implementing and testing our theoretical proposals. This is not to say that there aren’t dis-analogies between artificially intelligent machines and humans, and as such, any work that seeks to reap said benefits is also required to adopt the relevant levels of care and philosophical rigor.

As a last, brief and evocative thought before the conclusion of this essay, let us return to a question raised at the very beginning of this essay. When comparing the two cases Rome and Rome’, we asked ourselves whether we should conclude that, by analogy between these two cases, insofar as Alice is deemed justified in believing the capital of Italy is Rome, so must be ChatAI. First, we must recognise that the only way to take this analogy seriously is to adopt an externalist perspective on the issues—that is, at least unless we are happy to get sucked into discussions of the possibility of machine mentality and reflective awareness of their own reasons. While some may take offense with this on the basis of favouring internalism over externalism, others—including me—may endorse this direction of travel for metaphysical reasons (see, e.g., Ladyman & Ross, 2007). Afterall—and most scientific realists would agree on this—whatever processes give rise to human life and cognition, they must in some fundamental sense be mechanistic and materialistic (i.e., non-magical) in just the way machine processes are. As the field of AI continues to uncover ever more complex processes, it would not be reasonable to exclude the possibility that they will, at some point—and in isolated cases already today—resemble human epistemic processes sufficiently that any basis of epistemic justification must either stand or fall for both types of processes simultaneously. This perspective can be seen as unraveling further depth in the analogy between classical and machine epistemology, and as such, provide support towards the validity of said comparison for philosophical and scientific thought.

Resources

Cohen, Stewart (1984). “Justification and Truth”, Philosophical Studies, 46(3): 279–295. doi:10.1007/BF00372907
Comesaña, Juan (2002). “The Diagonal and the Demon”, Philosophical Studies, 110(3): 249–266. doi:10.1023/A:1020656411534
Conee, Earl and Richard Feldman (1998). “The Generality Problem for Reliabilism”, Philosophical Studies, 89(1): 1–29. doi:10.1023/A:1004243308503
Feldman, Richard (1985). “Reliability and Justification”:, The Monist, 68(2): 159–174. doi:10.5840/monist198568226
Goldman, Alvin (1979). “What is Justified Belief?” In George Pappas (ed.), Justification and Knowledge. Boston: D. Reidel. pp. 1-25.
Goldman, Alvin (1986). Epistemology and Cognition, Cambridge, MA: Harvard University Press.
Goldman, Alvin (2008). “Immediate Justification and Process Reliabilism”, in Quentin Smith (ed.), Epistemology: New Essays, New York: Oxford University Press, pp. 63–82.
Goldman, Alvin (2009). “Internalism, Externalism, and the Architecture of Justification”, Journal of Philosophy, 106(6): 309–338. doi:10.5840/jphil2009106611
Goldman, Alvin (2011). “Toward a Synthesis of Reliabilism and Evidentialism”, in Trent Dougherty (ed.), Evidentialism and Its Discontents, New York: Oxford University Press, pp. 254–290.
Janiesch, C., Zschech, P., & Heinrich, K. (2021). “Machine learning and deep learning”, Electronic Markets, 31(3), 685-695.
Klein, P. (1999). “Human Knowledge and the Infinite Regress of Reasons,” in J. Tomberlin, ed. Philosophical Perspectives 13, 297-325.
Ladyman, James & Ross, Don (2007). Every Thing Must Go: Metaphysics Naturalized. Oxford University Press.
Leplin, Jarrett (2007). “In Defense of Reliabilism”, Philosophical Studies, 134(1): 31–42. doi:10.1007/s11098-006-9018-3
Leplin, Jarrett (2009). A Theory of Epistemic Justification, (Philosophical Studies Series 112), Dordrecht: Springer Netherlands. doi:10.1007/978-1-4020-9567-2

Why powerful instrumental reasoning is by default misaligned, and what we can do about it?

February 01, 2023 by Nora Ammann in Essay

Meta: There are some ways I intend to update and expand this line of thinking. Due to uncertainty about when I will be able to do so, I decided to go ahead with posting this version of the essay already.

The AI alignment problem poses an unusual, and maybe unusually difficult, engineering challenge. Instrumental reasoning, while being plausibly a core feature of effective and general intelligent behaviour, is tightly linked to the emergence of behavioural dynamics that lead to potentially existential risks. In this essay, I will argue that one promising line of thinking towards solving AI alignment seeks to find expressions of instrumental reasoning that avoid such risk scenarios and are compatible with or constitutive of aligned AI. After taking a closer look at the nature of instrumental rationality, I will formulate a vision for what such an aligned expression of instrumental reasoning may look like.

I will proceed as follows. In the first section, I introduce the problem of AI alignment and argue that instrumental reasoning plays a central role in many AI risk scenarios. I will do so by describing a number of failure scenarios - grouped into failures related to instrumental convergence and failures related to Goodhart’s law - and illustrate the role of instrumental reasoning in each of these cases. In the second section, I will argue that we need to envision what alignable expressions of instrumental rationality may look like. To do so, I will first analyse the nature of instrumental rationality, thereby identifying two design features—cognitive horizons and reasoning about means—that allow us to mediate which expressions of instrumental reasoning will be adopted. Building from this analysis, I will sketch a positive proposal for alignable instrumental reasons, centrally building on the concept of embedded agency.

AI risk and instrumental rationality; or: why AI alignment is hard

Recent years have seen rapid progress in the design and training of ever more impressive-looking applications of AI. In their most general form, state-of-the-art AI applications can be understood as specific instantiations of a class of systems characterised by having goals which they pursue by making, evaluating, and implementing plans. Their goal-directedness can be an explicit feature of their design or an emergent feature of training. Putting things this way naturally brings up the question of what goals we want these systems to pursue. Gabriel (2020) refers to this as the “normative” question of AI alignment. However, at this stage of intellectual progress in AI, we don’t know how to give AI systems any goal in such a way that the AI systems end up (i) pursuing the goal we intended to give them and (ii) continuing to pursue the intended goal reliably, across, e.g. distributional shifts. We can refer to these two technical aspects of the problem as the problem of goal specification (Krakovna, 2020; Hadfield-Menell, 2016) and the problem of goal (mis)generalisation (Shah et al. 2022; Di Langosco et al., 2022; Hubinger et al., 2019), respectively.

At first sight, one might think that finding solutions to these problems is ‘just another engineering problem’. Given human ingenuity, it should be only a matter of time before satisfactory solutions will be presented. However, there are reasons to believe the problem is harder than it may initially appear. In particular, I will argue that a lot of what makes solving alignment hard is behavioural dynamics that emerge from powerful instrumental reasoning.

By instrumental rationality, I refer to means–end reasoning which selects those means which are optimal in view of the end that is being pursued. It is itself a core feature of general intelligence as we understand it. ‘Intelligence’ is a broad and elusive term, and many definitions have been proposed (Chollet, 2019). In the context of this essay, we will take ‘intelligence’ to refer to a (more or less) general problem-solving capacity, where its generality is a function of how well it travels (i.e., generalised) across different problems or environments. As such, AI is intelligent in virtue of searching over a space of action–outcome pairs (or behavioural policies) and selecting whichever actions appear optimal in view of the goal it is pursuing.

The intelligence in AI—i.e., its abilities for learning, adaptive planning, means–end reasoning, etc.—renders safety concerns in AI importantly disanalogous to safety concerns in more ‘classical’ domains of engineering such as aerospace or civil engineering. (We will substantiate this claim in the subsequent discussion.) Furthermore, it implies that we can reasonably expect the challenges of aligning AI to increase as those systems become more capable. That said, observing the difficulty of solving AI alignment is interesting not only because it can motivate more work on the problem; asking why AI alignment is hard may itself be a productive exercise towards progress. It can shine some light on the ‘shape’ of the problem, identify which obstacles need to be overcome, clarify necessary and/or sufficient features of a solution, and illuminate potential avenues for progress.

Both the philosophical and technical literature has discussed several distinct ways in which such means–end optimisation can lead to unintended, undesirable, and potentially even catastrophic outcomes. In what follows, we will discuss two main classes of such failure modes – instrumental convergence and Goodhart’s law – and clarify the role of instrumental reasoning in their emergence.

Instrumental convergence describes a phenomenon where rational agents pursue instrumental goals, even if those are distinct from their terminal goals (Omohundru, 2008; Bostrom, 2012). Instrumental goals are things that appear useful towards achieving a wide range of other objectives; common examples include the accumulation of resources or power, as well as self-preservation. The more resources or power are available to an agent, the better positioned it will be at pursuing its terminal goals. Importantly, this also applies if the agent in question is highly uncertain about what the future will look like. Similarly, an agent cannot hope to achieve its objectives unless it is around to pursue them.This gives rise to the instrumental drive of self-preservation. Due to their instrumental and near-universal usefulness, instrumental goals act as sort of strategic attractors which the behaviour of intelligent agents will tend to converge towards.

How can this lead to risks? Carlsmith (2022) discusses at length whether power-seeking AI may pose an existential risk. He argues that misaligned agents “would plausibly have instrumental incentives to seek power over humans”, which could lead to “the full disempowerment of humanity”, which in turn would “constitute an existential catastrophe”. Similarly, the instrumental drive of self-preservation gives rise to what in the literature is discussed under the term of safe interruptibility (e.g., Oreseau and Armstrong, 2016; Hadfield-Menell et al., 2017) AI systems may resist attempts to be interrupted or modified by humans, as this would jeopardize the successful pursuit of the AI’s goals.

The second class of failure modes can be summarised under the banner of Goodhart’s law. Goodhart’s law describes how, when metrics are used to improve the performance of a system, and when sufficient optimisation pressure is applied, optimising for these metrics ends up undermining progress towards the actual goal. In other words, as discussed by Garrabrant and Manheim (2019), the “Goodhart effect [occurs] when optimisation causes a collapse of the statistical relationship between a goal which the optimiser intends and the proxy used for that goal.” This problem is well-known far beyond the field of AI and finds application in such domains as management or policy-making, where proxies are commonly used to gauge progress towards a different, and often more complex, terminal goal.

How does this manifest in the case of AI? It is relevant to this discussion to understand that, in modern-day AI training, the objective is not directly specified. Instead, what is specified is a reward (or loss) function, based on which the AI learns behavioural policies that are effective at achieving the goal (in the training distribution). Goodhart’s effects can manifest in Ai applications in different forms. For example, the phenomenon of specification gaming occurs when AI systems come up with creative ways to meet the literal specification of the goal without bringing about the intended outcomes (e.g., Krakovna et al., 2020a). Tthe optimisation of behaviour via the reward signal can be understood as exerting pressure on the agents to come up with new ways to achieve the reward effectively. However, achieving the reward may not reliably equate to achieving the intended objective.

Sketching a vision of alignable instrumental reason

In the previous section, I argued that instrumental rationality is linked to behavioural dynamics that can lead to a range of risk scenarios. From here, two high-level avenues towards aligned AI present themselves.

First, one may attempt to avoid instrumental reasoning in AI applications altogether, thereby eliminating (at least one) important driver of risk. However, I argue that this avenue appears implausible upon closer inspection. At least to a large extent, what makes AI dangerous is also what makes it effective in the first place. Therefore, we might not be able to get around some form of instrumental rationality in AI, insofar as these systems are meant to be effective at solving problems for us. Solutions to AI alignment need not only be technically valid; they also need to be economically competitive enough such that they will be adopted by AI developers in view of commercial or military–strategic incentives. Given this tight link between instrumental rationality and the sort of effectiveness and generality we are looking to instantiate in AI, I deem it implausible to try to build AI systems that avoid means–end reasoning entirely.

This leads us to the second avenue. How can we have AI that will effective and useful, while also being reliably safe and beneficial? The idea is to find expressions of instrumental reasoning that do not act as risk drivers and, instead, are compatible with, or constitutive of aligned AI. Given what we discussed in the first section, this approach may seem counterintuitive. However, I will argue that instrumental reasoning can be implemented in ways that lead to different behavioural dynamics than the ones we discussed earlier. In other words, I suggest recasting the problem of AI alignment (at least in part) as the search for alignable expressions of instrumental rationality.

Before we can properly envision what alignable instrumental rationality may look like, we first need to analyse instrumental reasons more closely. In particular, we will consider two aspects: cognitive horizons and means. The first aspect is about restricting the domain of instrumental reason along temporal, spatial, material, or functional dimensions; the second point investigates how, if at all, instrumental rationality can substantially reason about means.

Thinking about modifications to instrumental rationality that lead to improvements in safety is not an entirely novel idea. One way to modify the expression of instrumental reasoning is by defining, and specifically limiting, the ‘cognitive horizon’ of the reasoners, which can be done in different ways. One approach, usually discussed under the term myopia (Hubinger 2020), is to restrict the time horizon that the AI takes into account when evaluating its plans. Other approaches of a similar type include limiting the AI’s resource budget (e.g,. Drexler (2019), chapter 8), or restricting its domain of action. For example, we may design a coffee-making AI to act only within and reason only about the kitchen, thereby preventing it from generating plans that involve other parts of the world (e.g., the global supply chain). The hope is that, by restricting the cognitive horizon of AI, we can undercut concerning behavioural dynamics like excessive power-seeking. While each of these approaches comes with its own limitations and no fully satisfying solution to AI alignment has been proposed, they nevertheless serve as a case-in-point for how modifying the expression of instrumental reasoning is possible and may contribute towards AI alingment solutions.

A different avenue for modifying the expression of instrumental reasoning is to look more closely at the role of means in means–end reasoning. It appears instrumental reasoners do not have substantive preferences over the sorts of means they adopt, beyond wanting them to be effective at promoting the desired ends. However, there are some considerations that let us re-examine this picture. Consider, for example, that, given the immense complexity of the consequentialist calculus combined with the bounded computational power of real agents, sufficiently ‘sophisticated’ consequentialist reasoners will fare better by sometimes choosing actions on the basis of rule- or virtue-based reasoning as opposed to explicit calculation of consequences. Another argument takes into account the nature of multi-agent dynamics, arguing that positive-sum dynamics emerge when agents are themselves transparent and prosocial. Oftentimes, the best way to credibly appear transparent or prosocial is to be transparent and prosocial. Finally, functional decision theory suggests that, instead of asking what action should be taken in a given situation, we should ask what decision algorithm we should adopt for this, and any future or past decision problems with structural equivalence (Yudkowsky and Soares, 2017). Generally speaking, all of these examples look a lot like rule- or virtue-based reasoning, instead of what we typically imagined means–end reasoning to look like. Furthermore, they refute at least a strong version of the claim that instrumental reasoners cannot substantially reason about means is disputable.

With all of these pieces in place, we can now proceed to sketch out a positive proposal of aligned instrumental rationality. In doing so, I will centrally draw on work on the embedded agency as discussed by Demski and Garrabrant (2019).

Embeddedness refers to the fact that an agent is itself part of the environment in which it acts. In contrast to the way agents are typically modelled—as being cleanly separated from their environment and able to reason about it completely—embedded agents are understood as living ‘inside’ of, or embedded in, their environment. Adopting the assumption of embedded agency into our thinking on instrumental rationality has the following implications. For a Cartesian agent, the optimality of a means M is determined by the M’s effects on the world W. Effective means move the agent closer to its desired world state, where the desirability of a world state is determined by the agent’s ends or preferences. In the case of an embedded agent, however, the evaluation of the optimality of means involves not only M’s effects on the world (as it does in the case of the Cartesian agent), but also M’s effects on the future instantiation of the agent itself.

For example, imagine Anton meets a generous agent, Beta, who makes the following offer: Beta will give Anton a new laptop, conditional on Anton having sufficient need for it. Beta will consider Anton legitimately needy of a new laptop if his current one is at least 5 years old. However, Anton’s current laptop is 4 years old. Should Anton lie to Beta in order to receive a new laptop nevertheless? If Anton is a ‘Cartesian’ instrumental reasoner, he will approach this question by weighing up the positive and negative effects that lying would have compared to telling the truth. If Anton is an embedded instrumental reasoner, he will additionally consider how taking one action over the other will affect his future self, e.g., whether it will make him more likely to lie or be honest in the future. What is more, beyond taking a predictive stance towards future instantiations of himself, Anton can use the fact that his actions today will affect his future self as an avenue to self-modification (within some constraints). For example, Anton might actively wish he was a more honest person. Given embeddedness, acting honestly today contributes towards Anton being a more honest person in the future (i.e., would increase the likelihood that he will act honestly in the future). As such, embedded agency entails that instrumental reasoning becomes entangled with self-predictability and self-modification.

Bringing things back to AI risk and alignment. Insofar as instrumental reasoning in Cartesian agents tends to converge towards power-seeking behaviours, what will instrumental reasoning in Cartesian agents converge towards? This is, largely, an open question. However, I want to provide one last example to gesture at what the shape of an answer might look like. Let’s assume agent P’s goal is to bring about global peace. Under ‘classical’ instrumental convergence, we may worry that an ‘attractive’ strategy for an instrumental reasoner is to first seek enough power to dominate the world and then, as P’s final ruling so to speak, instantiate global peace. Was a powerful future AI system to pursue such a strategy, that would surely seem problematic.

Let us now consider the same setup but this time with P’ as an embedded instrumental reasoner. P’ would likely not adopt a strategy of ruthless power-seeking because doing so would make future versions of itself more greedy or violent and less peace-loving. Instead, P’ may be attracted to a strategy that could be described as ‘pursuing peace peacefully’. Importantly, a world of global peace is a world where only peaceful means are being adopted by the actors. Taking actions today that make it more likely that future versions of P’ will act peacefully thus starts to look like a compelling, potentially even converging, course of action.

As such, instrumental reasoning in embedded agents, via such mechanisms as self-prediction and self-modification, is able to reason substantially about its means beyond their first-order effects on the world. I believe this constitutes a fruitful ground for further alignment research. Key open questions include what sorts of means–ends combinations are both stable (i.e. convergent) and have desirable alignment properties (e.g. truthful, corrgibility, etc.).

In summary, I have argued that, first, the AI alignment problem is a big deal. The fact that alignment solutions need to be robust in face of intelligent behaviour (e.g. adaptive planning, instrumental reasoning) puts it apart from more ‘typical’ safety problems in engineering. Second, with the help of a number of examples and conceptual arguments mainly centred around the notion of embedded agency, I contended that it is possible to find expressions of instrumental reasoning that avoid the failure modes discussed earlier. As such, while I cannot conclude whether AI alignment is ultimately solvable, or will de facto be solved in time, I wish to express reason for hope and a concrete recommendation for further investigation.

Resources

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., ... & Amodei, D. (2018). The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv preprint arXiv:1802.07228.
Bostrom, N. (2012). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines, 22(2), 71-85.
Carlsmith, J. (2022). Is Power-Seeking AI an Existential Risk?. arXiv preprint arXiv:2206.13353.
Christiano, P. (2019). Current work in AI alignment. [talk audio recording]. https://youtu.be/-vsYtevJ2bc
Chollet, François. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.
Demski, A., & Garrabrant, S. (2019). Embedded agency. arXiv preprint arXiv:1902.09469.
Di Langosco, L. L., Koch, J., Sharkey, L. D., Pfau, J., & Krueger, D. (2022). Goal misgeneralisation in deep reinforcement learning. In International Conference on Machine Learning (pp. 12004-12019). PMLR.
Drexler, K. E. (2019). Reframing superintelligence. The Future of Humanity Institute, The University of Oxford, Oxford, UK.
Evans, O., Cotton-Barratt, O., Finnveden, L., Bales, A., Balwit, A., Wills, P., ... & Saunders, W. (2021). Truthful AI: Developing and governing AI that does not lie. arXiv preprint arXiv:2110.06674.
Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and machines. 30(3), 411-437.
Hadfield-Menell, D., Russell, S. J., Abbeel, P., and Dragan, A. (2016).Cooperative inverse reinforcement learning. Advances in Neural Information Processing Systems, 29:3909–3917
Hadfield-Menell, D., Dragan, A., Abbeel, P., & Russell, S. (2017). The off-switch game. In Workshops at the Thirty-First AAAI Conference on Artificial Intelligence.
Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., and Garrabrant, S. (2019). Risks from learned optimisation in advanced machine learning systems. arXiv preprint arXiv:1906.01820.
Hubinger, E. (2020). An overview of 11 proposals for building safe advanced AI. arXiv preprint arXiv:2012.07532.
Krakovna, V., Uesato, J., Mikulik, V., Rahtz, M., Everitt, T., Kumar, R., Kenton, Z., Leike, J., and Legg, S. (2020). Specification gaming: the flip side of AI ingenuity. DeepMind Blog.
Krakovna, V., Orseau, L., Ngo, R., Martic, M., & Legg, S. (2020). Avoiding side effects by considering future tasks. Advances in Neural Information Processing Systems, 33, 19064-19074.
Manheim, D., & Garrabrant, S. (2018). Categorizing variants of Goodhart's Law. arXiv preprint arXiv:1803.04585.
Omohundro, S. M. (2008). The basic AI drives. In AGI (Vol. 171, pp. 483-492).
Orseau, L., & Armstrong, M. S. (2016). Safely interruptible agents.
Prunkl, C., & Whittlestone, J. (2020). Beyond near-and long-term: Towards a clearer account of research priorities in AI ethics and society. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 138-143).
Shah, R., Varma, V., Kumar, R., Phuong, M., Krakovna, V., Uesato, J., & Kenton, Z. (2022). Goal Misgeneralisation: Why Correct Specifications Aren't Enough For Correct Goals. arXiv preprint arXiv:2210.01790.
Soares, N., Fallenstein, B., Armstrong, S., & Yudkowsky, E. (2015). Corrigibility. In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence.
Wedgwood, R. (2011). Instrumental rationality. Oxford studies in metaethics, 6, 280-309.

Principles

January 01, 2023 by Nora Ammann

Principles that I seek to live by, given my epistemic position today. As a meta principle. I hope I will be able to update these principles to manifest more and more wisdom as time goes on.

1. Work hard to promote freedom and autonomy in the world. Continuously seek to identify and eliminate sources of harm and suffering.

2. When taking action, be cognizant of the big in the big (scope sensitivity) and the big in the small (UDT).

3. You are like fish in the water. Continuously seek to transcend your current ideological context. Be cognizant of power structures. Consider important issues from different plausible worldviews and listen carefully to all plausible sides. Be cognizant of the extent of your own ignorance and blind spots.

4. Actually care about what is true. Known to distinguish the truth from being right, being defensible or agreeing with people you respect.

5. Think for yourself. Ask your own questions. Be wary of failing to distinguish other's questions from your own. The first and last person before which you have to defend your choices is yourself.

6. Develop your character. Try to never tell lies, even small ones. Acknowledge, understand and apologize for your mistakes. Take responsibility for your own beliefs and actions. Look at that which you're afraid of. Aim to never avoid something valuable just because it makes you feel awkward or afraid. Try to avoid all self-deception. Live truthfully.

7. Engage with the world you live in. Have opinions on many topics, but hold your beliefs probabilistically. Be quick to update on evidence. Notice when you're confused. Acknowledge when you were wrong. Be cognizant of complex truths. Don't force coherence prematurely.

8. Transcend the self-other divide. Remember that you exist outside of people's judgement.

9. Have hope. Rejoice in existence. Don't be ashamed of feeling deeply and feeling much. Don't be ashamed of struggling. Don't be afraid of where the currents of your being may carry you.

10. Try to understand your intrinsic values and avoid confusing them with instrumental values. Don't confuse other people's goals with your own. Develop, cultivate and deepen your own capacity to value.

11. Constantly seek to transcend your own sense of self. Learn to love your own boundaries. Cultivate clear seeing. Take care of your physical and mental instantiation. Always have a self-experiment running to find new ways to better yourself and your life.

12. Try to always be kind. Treat everyone with respect. Seek to understand before asking to be understood.

13. Look at yourself and the world from a place of truth and love.

What do we care about in caring about freedom?

May 21, 2022 by Nora Ammann

Typically, when freedom is discussed in political philosophy, the discussion revolves around the question concerning the nature of freedom. In this essay, I want to ask a related but different question: why do we care about freedom? Or, what is it that we care about when we care about freedom?

Why do I propose this is a worthwhile question to ask? In discussions of the nature of freedom, it is usually assumed that freedom - or whichever specific notion of it is being defended - is something worthwhile of protection. However, for this conversation to remain calibrated around what matters most, it is critical to have a solid understanding of why we deem freedom to be worthy of protection. Clarifying the why-questions can help add clarity and precision to various other questions concerning freedom, some of practical importance. For example, it can illuminate discussions about what strategies are more or less apt for protecting and fostering freedom in society.

Alas, I will proceed as follows. I will start with a broad discussion of why it is that we care about freedom. Of course, the reasons for caring about freedom are importantly intertwined with the question of the nature of freedom. The answer to the former will depend on which notion of freedom is being adopted. In the first section, I thus aim to explore the spectrum of reasons underlying and evoked by the different notions of freedom, thereby situating this essay in the centuries-long discussion on the topic. In doing so, I identify three core reasons for why we care about freedom which I will briefly expand on them one at a time. Next, I will dive into one of those reasons in more depth, namely freedom as a protector of the unfolding of humanity’s collective potential. I chose to focus on this reason because it remains the most scarcely discussed among them, all the while being of great moral importance. In the second section, I will discuss what it is I mean by humanity's potential and think it morally matters. In the third section, I will discuss how freedom enables the processes that drive cultural and moral progress, thereby protecting us from scenarios of cultural and moral lock-in.

Section 1: Why do we care about freedom?

It can hardly be said that the discussion of freedom among political philosophers is a shallow one. Given the amount of ink spilt on the topic, the avid reader can safely assume that thinkers at least agree on one thing: the fact that freedom, whatever exact definition one chooses to assume, is important. From here, a further question ensures: Why is freedom important? Why do we care about it? What is it that we care about in caring about freedom?

I propose to central the space of possible reasons into three: freedom as protection from oppression and authoritarianism, freedom as an enabler of the unfolding of individual human potential, and, lastly, freedom as an enabler of the unfolding of humanity’s collective potential. A given theory of the nature of freedom may agree with all three of these reasons while another one may only buy into a subset of them, and different theories may put different emphasis on the importance of these reasons. We will discuss each of those reasons in turn.

Reason 1: By protecting individual freedom, we protect ourselves and our society from oppression and authoritarianism.

This raison d’être of freedom figures most centrally in the so-called notion of negative freedom. Negative freedom - a term first introduced by Isaiah Berlin in his seminal essay titled “Two concepts of liberty” - can be summarized as freedom from something.

Freedom from what, one may ask? For Hobbes, freedom was centrally about the absence of physical coercion of the body. Locke would later expand on this arguably rather thin notion of freedom by adding that a person can also be rendered unfree by means of coercion of the will. Freedom, as it is conceived here, carries the central purpose of protecting the individual from oppression and coercion by an external, more powerful source. Whether it be coercion of the body or the will, freedom is being limited in virtue of the fact that alternative choices, or alternative paths of action, were rendered ineligible by external agents. This view can be summarized as understanding freedom as non-interference. Out of this tradition of thoughts grew political liberalism and the view that of central concern is to protect individual freedom from the freewheeling power of the state.

Some, however, thought the notion of freedom as non-interference too shallow. Consider the example of a master and their slave. A master may not interfere with his slave’s will to eat their meagre lunch. Nevertheless, the slave is unfree in the sense that the master could, at any point, for no good reason but their own whims, decide to interfere with the slave’s plan and take away the slave’s meal. Thus, the slave is unfree in the sense of being unfree from arbitrary applicatoin of power - or: domination. This is the idea of freedom as non-domination sometimes also called ‘republican freedom’. (Pettit, 2001) This notion of freedom embraces the importance of the rule of law - the absence of arbitrariness in the functioning of a just state. Under this view, it is thus not power as such that necessarily limits freedom, but whether or not the wielding of power is subject to a legitimate - as opposed to an absent or arbitrary set of rules. (Skinner, 2013)

While the protection from oppression and authoritarianism is a central concern of thinkers in the negative tradition, it is also one of the raison d’être - although not the central one - of positive freedom. Here, freedom tends to be understood as residing in the collective governance over one’s lives as a means to individual and/or collective self-determination. Proponents of the positive tradition include thinkers like Rousseau and Marx [1], among many others. Under this view, it is not required to limit state activity to its bare minimum. State activity can - in certain areas of life and under the right choices of institutional design - be desirable or even necessary for achieving certain goods critical to the goal of self-realization. That said, this view still places certain constraints on what legitimate structures of governance look like. For example, legitimacy can come from citizens' ability to participate freely in the democratic process, thereby co-determining its outcome. As such, oppression and authoritarianism remain an outcome that defenders of positive freedom aim to protect against by entrenching or making more secure our negative freedoms.

Moving on to the second reason we may care about freedom.

Reason 2: By protecting individual freedom, we enable the unfolding of individual human potential.

We find this raison d’être for freedom most prominently reflected in positive notions of freedom.

Insofar as the notion of positive freedom points at the idea of freedom as the presence of something, this ‘something’ (e.g. self-determination or self-governance) can be often understood as being at the service of a safe or virtuous unfolding of human nature.

When looking at the concept of freedom from a genealogical standpoint, we can find early traces of freedom as a means to the unfolding of human potentiality in thinkers who are typically classified as a representative of the negative tradition. Early defenders of the negative tradition of liberty, as we have seen above with Hobbes and Locke, viewed obstacles to freedom as always originating from external sources. John Stuart Mill, author of the seminal book On Liberty, however, recognised that one’s freedom can also be curtailed by internal forces. [2] According to MiIll, if I act from my passions - anger, fear, greed, lust, etc. - without appropriate consultation of reason, I am in fact acting unfreely. Similarly, if I am acting unauthentically, in other words, if my choices are determined in the absence of proper introspection and reasoning - but instead by habit, unreflectively adopted cultural norms and the like -, I am acting unfreely. Or, if I am acting based on a ‘false conscious’ - a misguided understanding of my real interests -, I too am acting unfreely.

A similar idea was picked up and expanded on by more modern thinkers. Charles Taylor (1985), for example, - who however stands more firmly within the positive tradition - defends a similar position by arguing that certain emotions - such as spite, irrational fear, vanity - are to be understood as alien to us. We would be, according to Talyor, better off without them and getting rid of them wouldn’t cause us to lose anything important. Thus, he argues, there is a clear sense in which having such emotions can make us unfree with respect to our authentic or genuine goals, desires or purpose.

In order to help us think about freedom as enabler of self-realization, let us, on top of the negative-positive axis, add another axis that can help chart out the space of notions of freedom: exercise- vs opportunity-based notions. In the former, freedom is concerned with “the extent that one has effectively determined oneself and the shape of one’s life.” (Taylor, 1985, p. 213) In other words, freedom is determined not merely by my possibilities, but by what I actually do. In the case of opportunity-based notions, “freedom is a matter of what one can do, of what it is open to us to do, whether or not we do anything to exercise these options”. (p. 213)

Self-realization - the idea of unfolding one’s potential - maps onto the exercise concept. (In particular, it does so more cleanly than onto either positive or negative notions of freedom.) It is something that necessarily has to be acted out and realized for it to be of value. If someone is free to realize their potential in the sense of not being impeded from doing so, but this person has not realized that potential because, say, they are unaware of their potential, paralysed by doubt or because the societally accepted notion of the ‘good life’ conflicts with their own sense of where their potential lies, they may still lack freedom insofar as we care about freedom as a means to self-realization.

Somewhere in between a pure exercise and a pure opportunity based conception of freedom, we can situate the so-called capabilities approach. It, too, recognises that opportunity alone may not amount to substantive freedom. Instead, it is only capabilities that amount to substantive freedoms as it is capabilities that allow a person to acquire or achieve the things they desire. Resources, means, opportunities and even formal rights alone are empty if the person lacks the capabilities to realize them. Amartya Sen and Martha Nussbaum are maybe the most prominent names associated with this view. [3]

So far, we have discussed the idea of freedom as means of unfolding individual human potential, of realizing some higher or true ‘self’, or of actualizing some authentic ‘way of life’. This raises the question: what is this ‘self’ that is being realized? In other words, the notion of a higher (as opposed to lower) or true (as opposed to false) self appears to assume some view of the true essence of human nature. Different thinkers provide different takes on this question. As we have just seen, thinkers like Mill or Taylor find their notion of the ’self’ that is meant to be realized by differentiating between lower and higher-order motivations, wherein some desires or drives may be considered foreign to us, while others represent what we really want, our ‘true will’. Hannah Arendt holds a different notion of human nature According to Arendt, freedom needs to be understood as freedom of action, in particular, action within the political arena. Accordingly, we are only free if so far as we are free to exercise our power within the polis, or the political sphere. [4] As such, human nature is understood to be essentially political, and the unfolding of human potential is inevitably intertwined with the full and equal participation in the collective political life. (Arendt, 1961)

We have seen how these different notions of freedom all stand in the service of the unfolding of individual human potential. Such self-knowledge, authenticity and self-realization may be viewed as a moral good in and of itself, or it may be viewed as the right thing to aim for in virtue of tending to produce morally virtuous or happy people. The role of freedom in the unfolding of human potential can come in the form of protecting said unfolding from external or internal factors that may interfere with it; or in the form of supporting the realization of said potential by such means as material resources, capabilities and institutional or cultural structures.

Let us now turn to the third reason for caring about freedom.

Reason 3: By protecting individual freedom, we protect and enable the unfolding of humanity’s collective potential

Less has been written about the relationship between freedom and humanity’s long-run trajectory, so we will explore it in more detail here. To do so, we will first discuss what it is I mean by ‘humanity’s collective potential’ and why one might care about it. Then, I will explore the arguments for whether and how freedom may help with its unfolding.

Section 2: Humanity’s trajectory, past and future

In the same way we just explored what we might understand by ‘individual human potential’ and its unfolding, we should also clarify the notion of ‘humanity’s collective potential’. To do so, we will first turn our metaphorical gaze toward the past, at human history. In doing so, we may conclude that humanity has come a long way. From hunter-gatherer tribes to the agricultural revolution, the discovery of the new world, the industrial revolution, the abolition of slavery, women’s suffrage, the introduction of the welfare state, the establishment of an internal order aimed at limiting the inhumanity of wars [5], the discovery of penicillin, the eradication of smallpox, the dawn of the internet - to only name a few milestones in human history. Zooming out to timescales of hundreds and thousands and millions of years, we can generally witness a drastic increase in human welfare. This increase stems, in large parts, from economic growth, technological advances, better and more comprehensive protection of fundamental rights, respect for a growing set of varied ways of life and means of human self-expression. [6]

Far from arguing that the human history has not also been the witness of an unspeakable amount and severity of moral tragedy; and far from claiming that, today, the face of the earth is free of injustice, violence or other forms of needless suffering - the average human being is drastically better off living today rather than 100, 1000 or 10’000 years ago. Accordingly, we may harbour the hope that this trend of the general betterment of the fate of sentient life may continue.

In his 1981 book The Expanding Circle, Peter Singer introduces the idea of the moral circle as a way to conceptualize the nature of moral progress. Humanity, over the course of its own history, has expanded its circle of moral concern from the individual, the family and the tribe to include forms of life increasingly more dissimilar to oneself. At the time of writing, Singer was at the forefront of the up-and-coming animal liberation movement. Said movement, based its moral mission on the idea that non-human animals, in virtue of their capacity to experience suffering and pain, ought to be considered as moral patients and treated with the respective respect and care. As time passes on and progress continues, the moral circle may expand yet further.

The point of this essay is not to make a substantive claim about where the moral circle ought to extend to next. Instead, its point is to argue that there are immense moral stakes in ensuring that the processes driving cultural and moral progress continue. As argued by Williams (2015), we may not know where, but chances are that there are moral atrocities unfolding at this very time that we cannot even yet identify as such. Looking back at the relevant periods in human history, it is not that the ill-treatment of slaves, women and non-human animals was recognized as morally wrong yet it was tolerated - the first step to the emancipation of these groups resided in the mere acknowledgement of their moral worth. One thing we can learn from history in this regard is just how blind we may be to injustices that may unfold in front of our eyes. The specific lesson I am pointing at here is not to advocate for one or the other minority group to gain protection and a voice - albeit these may well be worthwhile missions to support. The specific point I am trying to make concerns a different puzzle: what ought one do when one is ignorant of where one’s mistake lies, but has reason to believe that one may well be committing some mistake(s)?

The answer, or so I claim, lies at least in parts in investing in the processes that tend to drive moral and cultural progress. From the point of view of the ignorant or morally uncertain (MacAskill et al. 2020), investing in those processes is the best guess for discovering that which you don’t yet know you need to discover. We will explore what freedom plays in the functioning of these processes in the next section.

Before that, however, let us consider one more thought. The argument for the importance of protecting the processes driving moral and cultural progress does not merely rely on an argument from ignorance, as discussed above. What is more, we live in a physical world that is ever-changing. This can be easily forgotten when reasoning from within history and inside of a set of cultural narratives that conceive of humanity as the pinnacle of evolution. However, once spelt out, there is little question to the matter that the world, as well as the human species itself, keeps on changing. The answer to the question of what it means to be human - be that from a cultural, sociological or biological perspective - is evolving every second of every day. Biological evolution is shaping the human gene pool; niche construction is moulding the environment we live in; cultural evolution is forming and re-forming the institutions that govern our lives; scientific and technological progress is shaping our socio-economic realities. What is more, technological progress may in the not-to-far future start to affect life itself. Be it in the form of transhumanism, artificial intelligence, artificial consciousness, or new hybrid or fully-artificial forms life: technological progress will force upon us new cultural and moral questions. [7] These aren’t questions we can choose to ignore. As such, moral progress is not just about arriving at better and more nuanced views on questions that pose themselves in front of some stating background. It is also about finding answers to entirely new questions that arise from changes to the background condition of life itself. As such, moral progress is about adapting to new and constantly changing realities. The world will look different in 100, 1’000 and 10’000 years. In the same way that I do not want to live my life today based on the cultural preconceptions of the 16th century, so do I not want future generations to e forced to live based on the cultural and moral conceptions of today.

So far, I have claimed that one of the reasons that we care about freedom is as a means for the unfolding of humanity’s collective potential. We discussed what we might mean by collective human potential, and why it appears of great importance for the processes that drive cultural and moral progress to continue. However, we have not yet discussed in much detail how it is that freedom is meant to protect and advance such progress. This is what we will tackle next.

Section 3: The role of freedom in the unfolding of humanity’s collective potential

So, what does freedom have to do with it? To explore this question, I will first introduce the idea that freedom acts as the engine of exploration and then discuss how an engine that is based on indviual freedom differentially selects its direction of progress on the basis of reason more so than power. We will see that this unfolding happens in an open-ended fashion, with no need to know its own destination before arrival, thereby providing some robustness against moral imperialism and related concerns. In making these arguments, I will help myself to insights from evolutionary theory, Austrian economic thought as well as insights from complexity science.

In Generalizing Darwinism to Social Evolution: Some Early Attempts, Geoffrey M. Hodgson (2005) writes: “Richard Dawkins coined the term universal Darwinism (1983). It suggests that the core Darwinian principles of variation, replication and selection may apply not only to biological phenomena, but also to other open and evolving systems, including human cultural or social evolution.” Borrowing from this idea of Universal Darwinism, freedom can be understood as a is a socio-political institution that protects the production of ‘variation’ (in the form of ‘ways of life’, perspectives and ideas). In the analogy with Darwinian evolution, negative freedom can be understood as serving the purpose of preventing any small number of powerful actors (akin to, say, some invasive species) to take over the entirety of the ecosystem, thereby locking in their singular ‘way of life’ (i.e. cultural norms, ideologies, moral perspective, etc.). Conversely, positive freedom and the fostering of capabilities allow individual actors or groups to property actualize, explore and develop their own ideas, perspectives and ways of life. In doing so, these ideas get a chance mature, be tested, be improved upon and - if they prove unsatisfactory or non-adaptive - get dropped.

The insight that exploration is crucial to progress has long been understood in economics and the philosphy of science. In his 1942 book Capitalism, Socialism, and Democracy, Joseph Schumpeter introduced the concept of entrepreneurship into modern economic thought and argued for its critical role in innovation. It is individual people who notice problems, apply their creativity to come up with solutions and take risks who become the ‘engine’ of capitalism. Friedrich Hayek (1945), also a scholar of the Austrian School of Economics, added a further puzzle to this view by illuminating the epistemic function of the market. As opposed to a system of centralized economic activity, a decentralized market is able to exploit the local information of individual actors. These insights stand at the origin of the libertarian view that aims to protect individual (economic) liberty against state interventionism, with the goal of protecting innovation and the proper functioning of the ‘free market’. [8] In turning our gaze philosophy of science, we find a similar logic at play there. In science, where the goal is for the truth to prevail, freedom of thought allows for arguments to meet each other, and for the better ones to prevail (Popper, 1935). In society, indivudal freedom allows individuals to pursue their own way of life, thereby allowing humanty at a collective level to explore and compare a wider range of possibilities. Insofar as we consider the current set and configuration of ideologies, cultural norms, and political economics as unsatisfactory, it matters that humanity continues to explore new configurations.

The study of complex systems provides us with useful vocabulary for talking about this general pattern. What we see is how the emergence of macro-level dynamics (such as economic growth, scientific progress, cultural trends or moral progress) is driven by micro-level processes (such as the actions and ideas of individual human beings). We can ask, according to this view, what happens to the macro-level dynamics if we protect (or fail to do so) individual freedom at the micro-level? In this framework, freedom can be understood as a constitutive factor in the mechanism of exploration. In its absence, we risks not exploring widely enough, resulting in a premature convergence to some local optimum that represents a suboptimal set of views.

At this point, we start to glean the connection between the reason d’être of freedom as a means to the unfolding of individual potential and the raison d’être as the means of the unfolding of collective potential. If not for the first - if freedom isn’t able to provide to the individual a space in which they can authentically and thoughtfully explore their ideas and ways of life - then freedom could never fulfil its purpose of driving collective progress.

However, exploration alone is not enough. The question arises: according to what criteria do the sociological processes driving cultural and moral progress select among new options? This question assumes - correctly so - that these processes we have been talking about are not merely random walks. The process is guided by something - but that something emerges from a bottom-up, decentralized logic rather than being imposed on the system in some top-down fashion. There are some mechanisms that can be studied and understood, and maybe even influenced, that drive the direction of progress. The hope is - after all, this is a philosophical essay - that reason may be the overwhelming factor in selecting between new forms and configurations of cultural norms and moral beliefs. But is that hope justified?

I claim it is, and freedom itself is the key to why. In short, it is precisely that which freedom - letting reason come to the surface and prevail. We can find support for this view in the writings of two authors we have encountered earlier, namely Mill and Dewey. In so far as Mill recognises that freedom can be limited by internal coercive forces, he also believed that what it meant to be free from those forces is for reason to prevail over passion, inauthenticity or false-conscious. John Dewey provided a more detailed account of a similar idea, discussing how exactly it is that reason - on top of mere habitual reaction - interacts with human action and decision making. In What is Freedom?, Dewey explains how, according to him “the only freedom that is of enduring importance is freedom of intelligence, that is to say, freedom of observation and of judgment exercised on behalf of purposes that are intrinsically worthwhile” (p. 39) and continues in saying that “impulses and desires that are not ordered by intelligence are under the control of accidental circumstances. [...] A person whose conduct is controlled in this way has at most only the illusion of freedom. Actually, he is directed by forces over which he has no command.” (p. 42)

If it is the case that freedom is about the triumph of reason, what we do when we protect feedom is to make sure that reason will trump power in the evaluation of new cltural or moral ideas. Furthermore, at the collective level, the protection of individual freedom and in particular values such as tolerance, respect and free speech creates a public arena in which different views and ‘ways of life’ can interact in a way that allows for the potential of mutual enrichment. This adds another elements to the engine of moral and cultural progress.

One may object to this whole notion of moral progress that it risks being merely a cover for some form of moral imperialism, or that it implies (without making the case for) some notion of value monism. However, there is in fact nothing inherent to the logic of these mechanisms that they have to converge to some singular view on the ‘right’ cultural norms, way of life or moral view has to triumph. This is not what we see in biology, economics nor science either. In fact, it is precisely freedom that constitutes a critical ingredient in the machinery of moral progress - in acting as a sort of buffer - which makes it compatible with value pluralism as discussed, among others, by Berlin (1958) and Sen (1981).

As such, the unfolding of humanity’s collective potential should be understood as an open-ended process. We need not know where the journey takes us as long as the journey is guided by a process with the right properties. In fact, given the epistemic position we find ourselves in by the virtue of reasoning from the point of view of an evolved human brain, we ought not to attempt to form firm views as to where the process should destinate or to aim to direct it in a fully top-down manner.

Max Stirner (1972) critiques of the idea of freedom as self-realisation, as the latter suggests some implicit and normative notion of ‘true’ human nature that the process of self-realization is aimed at. Striner critiqued this notion as one that is always and necessary culturally contingent and can act as a coercive force that is itself inherently at odds with genuine freedom. Freedom is being reduced “to [some] kind of spectral ideal that always concealed deeper forms of domination”. (Newmann, 2017, p. 156) A related argument applies to the notion of potential at the collective level. Whenever in human history a person or group were under the conviction that they knew where the journey is meant to take us, bad things ensued. On one hand, such individuals or groups tended to believe themselves in the position of some sort of moral licence that would permit them to disrespect individual rights and freedom. What is more, the attempt to steer this process in a top-down and centralized fashion, based on the assumption of having access to some sort of absolute moral truth, interferes with the functioning of the epistemic and social process as we have described it at length before. If power dominates the equation, reason cannot prevail.

Importantly, the proper functioning does not require that nobody ever forms and defends their own views as to what makes for a good life. On the contrary: the engine of collective progress lies exactly in individuals exploring different views and attempting to find the strongest version and argument for why their perspective is a valuable one to be included in the overall picture. The argument comes down to the following. Given our lack of moral omniscience, we should not try to steer the processes of moral and cultural development in a top-down manner. Furthermore, the efficient and robust functioning of the processes of decentralized information gathering and processing - be that in economics, biology, science or society - is contingent on the integrity of the conditions of operation. In the case of cultural and moral progress, freedom captures a large part of what is required for the process to run effectively. It is as such that freedom is a critical ingredient to the unfolding of humanity’s collective potential.

Conclusion

In this essay, I set out to investigate the question: why do we care about freedom? I identified three possible answers to this question: freedom as a protective factor against oppression and authoritarianism, freedom as an enabler of individual self-realization, and freedom as the enabler of moral and cultural progress and the unfolding of humanity’s collective potential. I have situated the discussion on these reasons in the larger discourse on the nature of freedom and within the history of ideas. I then zoomed in on one reason for caring about freedom specifically - the unfolding of humanity’s collective potential. To do so, I clarified what I mean by humanity’s collective potential, before exploring the mechanisms via which freedom is key to enabling its unfolding. I argued that freedom allows for the production of variation among moral and cultural views, and that, in strengthening freedom - in particular freedom as conceived of by Mill and Dewey -, we strengthen the extent to which reason is the primary factor guiding the differential selection between different cultural and moral views. To make these points, I helped myself to insights from evolutionary theory, economic theory and complexity science.

Last but not least, let me reiterate the reasons why we should care about understanding the processes that guide the human trajectory. We can look at the past and notice the pattern of an expanding moral circle and infer that we are likely ignorant about yet many more issues of moral concern. Or we can look into the future, noticing its potential vastness and anticipating an ever-changing reality that will keep throwing new moral and social questions at us. As humanity expands its range of capabilities through technological progress, we are increasingly able to affect the fate of things at ever faster spatial and temporal horizons. It may thus be high time for us to start a collective conversation about how it is we choose to yield these capabilities. The moral stakes are high. In that, I see reflected our third - if not the strongest - the reason for why we want to live in a society that protects and fosters individual freedom, for the sake of us all, and for the sake of future forms of life.

Nora Ammann, Spring 2022, New College of the Humanities

Footnotes

[1] See Rousseau’s The Social Contract (1762) and Marx’s “On the Jewish question” (1843), among others

[2] This is a reading suggested by Quentin Skinner in his 2013 lecture “A Genealogy of Freedom.” at Queen Mary College, University of London.

[3] See, among others, Sen’s “Equality of What?” (1979) and Nussbaum’s Creating Capabilities (2011).

[4] In fact, many ancient Greek thinkers, including Aristotle, shared a similar view of human nature, even if they may not have theorized human nature in relationship to freedom specifically.

[5] For example the four Geneva Conventions (1864, 1907, 1929, 1949), the establishment of the Red Cross, conventions for the limitation of biological, chemical and other weapons of mass destruction, international laws and courts for the persecution of crimes of war, etc.

[6] For examples of such a bird’s eye view on human history and development, see for example Harari, 2014 or Pinker, 2011.

[7] For examples of cutting-edge research in the life sciences supporting such a view of future posisbilities, see:
Levin, M., “Technological Approach to Mind Everywhere (TAME): an experimentally-grounded framework for understanding diverse bodies and minds”, Frontiers in Systems Neuroscienc (in press), 2022.
Levin, M., "Synthetic Living Organisms: Heralds of a Revolution in Technology & Ethics", Presentation at UCF Center for Ethics, 2021

[8] Whether or not the libertarian argument provides sufficient evidence to justify the convictions and political demands purported by right-wing libertarianism.

References

Arendt, Hannah. “What is freedom?.” in Between Past and Future. New York: Viking Press. 1961.
Beckstead, N., Greaves, H., Pummer, T.. "A brief argument for the overwhelming importance of shaping the far future." in Effective Altruism: Philosophical Issues. 2019. 80–98.
Berlin, Isaiah. “Two conceptions of liberty.” in Four Essays On Liberty. Oxford: Oxford University Press. 1969. 118–172.
Dawkins, Richard. "Universal Darwinism." in Evolution from molecules to men. edited by D. S. Bendall. Cambridge: Cambridge University Press. 1983. 403–25.
Harari, Yuval Noah. Sapiens: A brief history of humankind. Random House, 2014.
Hay, William H.. “John Dewey on Freedom and Choice.” The Monist. 48:3. 1964. 346–355.
Hayek, Friedrich. “The Use of Knowledge in Society.” American Economic Review. 35:4. 1945. 519–30.
Hobbes, Thomas. Leviathan, edited by Ian Shapiro. Princeton: Yale University Press. 2010.
Hodgson, Geoffrey M.. “Generalizing Darwinism to Social Evolution: Some Early Attempts.” Journal of Economic Issues. 39:4.2005. 899–914.
Levin, Micheal. "Synthetic Living Organisms: Heralds of a Revolution in Technology & Ethics." UCF College of Arts & Humanities. November 2, 2021. Talk at a Speaker Series (Ethically Speaking”) run by the UCF Center for Ethics.
Levin, Micheal. ”Technological Approach to Mind Everywhere (TAME): an experimentally-grounded framework for understanding diverse bodies and minds.” Frontiers in Systems Neuroscience. 2022. (in press)
Locke, John. An Essay Concerning Human Understanding, edited by Peter H. Nidditch. Oxford: Oxford University Press. 1975.
MacAskill, W., Bykvist, K., Ord, T.. Moral uncertainty. Oxford: Oxford University Press. 2020.
Marx, Karl. “On the Jewish Question.” in O'Malley, J. (Ed.). Marx: Early Political Writings. Cambridge: Cambridge University Press. 1994. 28–56.
Newman, Saul. “‘Ownness created a new freedom’ - Max Stirner’s alternative conception of liberty.” Critical Review of International Social and Political Philosophy. 22:2. 2019. 155–175.
Nussbaum, Martha. Creating Capabilities. Cambridge, MA: Harvard University Press. 2011.
Pettit, Philip. A Theory of Freedom, Cambridge: Polity Press. 2001.
Pinker, Steven. The better angels of our nature: The decline of violence in history and its causes. New York: Viking Books. 2011.
Popper, Karl R.. Logik der Forschung - Zur Erkenntnistheorie der modernen Naturwissenschaft. Vienna: Verlag Julius Springer. 1935.
Rousseau, Jean-Jacques. “The Social Contract.” in Gourevitch, Victor (Ed.). The Social Contract and Other Later Political Writings. Cambridge: Cambridge University Press. 1997.
Sen, Amartya. “Equality of What?” in McMurrin (ed.). Tanner Lectures on Human Values. Cambridge: Cambridge University Press. 1979. 197–220.
Singer, Peter. The Expanding Circle. Oxford: Clarendon Press. 1981.
Skinner, Quentin. "A third concept of liberty." Proceedings of the British Academy. 117. 2002. 237–268.
Skinner, Quentin. “A Geneology of Freedom.” Queen Mary College, University of London. 2013. Lecture. Last accessed: 04/2022, https://www.youtube.com/watch?v=yfNkA2Clfr8&ab_channel=NorthwesternU
Stirner, Max. Der Einzige und sein Eigentum. Stuttgart: Philipp Reclam. 1972.
Taylor, Charles. “What’s wrong with negative liberty.” in Philosophy and the Human Sciences: Philosophical Papers. Vol. 2, Cambridge: Cambridge University Press. 1985. 211–29.
Williams, Evan G.. "The possibility of an ongoing moral catastrophe." Ethical theory and moral practice. 18:5. 2015. 971–982.

(partially retracted) The role of tribes in achieving lasting impact and how to create them

September 29, 2021 by Nora Ammann

Co-authored by Konrad Seifert and Nora Ammann

Cross-posted on the EA Forum

Edit (July 2022): I (Nora) do no longer fully and unambiguously endorse all of the framings and contents of this post. While I do not regret writing it and still consider many of the ideas presented useful, in some respects dimensions my thinking has moved on.

TL;DR

To bring about grand futures, we humans have to figure out how to reconcile our current needs with our lofty ambitions. Tight-knit support communities - what we call tribes in this post - seem to be a good way to preserve our well-being and values while achieving more impact. Yet, building effective tribes seems like a relatively neglected puzzle in the life plans of many people who wish to improve the world, or at least would benefit from more collective model-building and coordinated experimentation.

In this post, we outline our current models for modern-day tribe building. We hope to initiate an exchange on the topic, motivate others to look into this, too, and achieve more together.

Introduction

About this post

Coordinating with other humans is key to achieving lasting impact. Coordination helps us grow our well of common knowledge, build things, become better humans and create more value for the world than we could on our own.

Humans have developed myriad forms of coordination. This post focuses on one specific form: tribes.

As we are developing our own modern-day tribe, we have received many questions with respect to how we got to this point and how we’re moving forward. To get feedback and inspire others, this post outlines our current models of how to find like-minded individuals, build trust, establish norms, get stuff done, commit long-term and adapt to changing circumstances. We will also discuss some common challenges. Some of the discussed ideas are generally useful for all types of relationships, e.g. getting more out of your friendships, organizing a community, or building an organization.

Our models are largely based on our experience with community building within Effective Altruism. We have also invested a lot of thought and resources into achieving new levels of positive-sum dynamics among our close friend group. As you will see, many of the ideas are inspired - or blatantly copied - from others. We link to resources throughout the post and end with a section listing those that we have found particularly useful.

What is a tribe?

By tribes, we refer to what is essentially a tight-knit support community. Members of a tribe have shared goals, values and interests. But that doesn’t yet capture all of why we are interested in tribes over other types of communities. Beyond the shared interests (which is something we also find in firms, unions or clubs, for example), a tribe is characterized by the gifting of one's resources to the community. Resource sharing, paired with close personal bonds and autonomy, seem like a key combination of features for strong and sustainable coordination.

To clarify further what we do and don’t mean by tribes, let’s consider two axes along which we can categorize different types of social groups: size and logic of reciprocity.

Tribes are small enough that each individual is able to maintain a meaningful personal relationship with each other. This is opposed to larger groups where personal relationships between all members can no longer be maintained. Literature in sociology and political science refers to the latter as “imagined community”. Nations, firms, or communities such as EA or the global scout movement are examples of it.

Members of tribes share their resources quasi-unconditionally with the rest of the tribe. This is in contrast to the direct and formalized tit-for-tat reciprocity inherent to, say, club membership or simple economic contracts. Marriage is likely the most widely understood example of such unconditional support made possible through hard-to-fake signals of complete buy-in. We think there are benefits of similarly intense mutual support beyond the nuclear family.

At the far end of this spectrum we would find pure, self-less altruism. Just short of that are examples of coordination where the idea is to achieve mutual benefit, but the "exchange of value" happens in a diffuse and timeless manner. Members of a tribe expect to benefit from investing into the tribe - and they might decide to leave if those benefits never manifest - but it matters little to them when and how exactly they receive that benefit.

Why tribes matter

Humans have been coordinating for millions of years. From hunter-gatherer societies to nation-states, from soccer teams to modern supply chains, from marriages to organizations like the Scout Movement with an estimated 50 million members worldwide.

It is not a coincidence that humans have developed ever more complex forms of coordination: it unlocks positive-sum dynamics that realize more and more of potential. From science to engineering to entrepreneurship: cooperating, predictably and repeatedly, allows for specialization and increases synergies.

That said, historically speaking, tribes do not seem to be key drivers of progress. The sustainability of hunter-gatherer tribes who have survived until today, like the Tanzanian Hadzabe, seems in part due to never having experienced explosive population growth. Absent grand narratives of an afterlife or their future potential, they seem to live simple, content lives despite what could easily look like hardship from the outside.

Furthermore, despite economic growth making just about everything better, it also comes with its own cost. Leaving aside, for the purpose of this post, the climate crisis, humans have not fully adapted to modern society. The value of living as a part of modern society is indisputable, yet some aspects of it make it harder to live healthily because of the lack of supportive, tribe-like structures.

Humanity has documented a lot of the knowledge relevant for adapting to a changing world. But knowledge is only useful if it can be acted upon. It seems plausible that we struggle more with mental health issues nowadays because our lifestyles and environment have changed drastically. We have forgotten - or are unable to enact - important lessons about parenting because that knowledge was implicitly embedded in now-disrupted customs. Our rapid progress has its costs but, luckily, those seem reparable and the benefits lasting.

Thus, it seems valuable to invest more effort into consciously engineering our social structures to help one another thrive and improve the world. Building modern-day tribes by combining the best of all worlds - from ancient social structures to modern tech - seems like a strong bet.

Highly functional coordination does not come for free. It usually requires a lot of upfront investment, with uncertain payoff and continuous maintenance costs. What are the benefits of coordination for individuals like you or me? (When) is it worth it?

Benefits from building a tribe, as we have experienced them so far, are:

A sense of community, belonging, and emotional support
Access to more knowledge and skills (you can develop expertise in only so many areas, only so quickly)
More data and different perspectives (think: collaborative truth seeking, e.g. more productive “double cruxing”, or gaining access to “different worlds”)
Access to feedback and sanity-checks (those tend to increase in value over time, as the others gain more context on you)
Access to more resources (a lot of things become cheaper if shared, e.g. co-living, and you more easily surpass relevant thresholds of resources, e.g. to start a company)
Ability to take more risks individually because a collective can provide a better safety net

All these translate into an increased ability to achieve our goals.

The balance between what you invest and what you get out of being in a tribe depends on the specific case. One thing worth keeping in mind, however, is that the pay-off compounds over time. As you build knowledge about each other, your ability to trust, support, and provide growth-inducing feedback increases. A lot of the investments in bridging gaps, building a shared map of the world, and developing a shared vocabulary really only start to pay off after some time. Chances are we underestimate just how good the best case scenarios are in the long run.

Why do we post this on the EA Forum?

The role of tight-knit personal networks beyond the nuclear family is potentially very relevant for people who dedicate their lives to doing good effectively. The contribution of personal support networks to happiness, risk perception, personal and intellectual development are important impact factors in EA life planning. But, they seem relatively underemphasized compared to, for example, considerations regarding career capital.

The EA network cannot and should not be anybody’s main support network. Given its ambitions, it luckily has well surpassed the size where meaningful personal relationships with each member can be maintained. It has become an “imagined community” (see above) - tied together by shared ideas rather than shared relationships - even at the local level. This is not a critique of EA, nor do we suggest that EA plays no role in creating an environment conducive to people's wellbeing, mental or otherwise. This is just to say that even if you’re well embedded in the EA network, your family and friends still matter a lot. Of course, (some) your friends may in fact be part of the EA network, but it is them who form your support network, not the network.

EAs actively build their professional networks and discuss their mental health. But effective networks and life satisfaction also benefit tremendously from a stabilizing, local anchorage. This is in part the reason why the SF Bay Area and London-Oxford-Cambridge are strong community hubs: the number of aspiring EAs is high enough for people’s personal support networks to largely overlap with the EA network. Elsewhere, you’re less likely to find people who focus on ambitious world improvement and whom you can imagine starting a tribe with. Thus, building EA-minded support structures outside of such hubs requires more work and has a higher chance of failure.

Given this context, we argue that it seems valuable to invest more effort into understanding the intentional development of modern tribes. They can provide stability through anchorage, long-term planning horizons, and personalized support. When it comes to big projects - from kids to companies - high coordination capacity (e.g. trust, reliable information exchange, long-term planning horizon) seems critical. And that capacity doesn’t come out of nowhere - it’s the product of years of interaction and conscious cultivation.

Having shared a quick overview of what is motivating this post, let’s move on to the “how-to” of tribe building.

This post is structured along “stages of development” - five steps on how to build a tribe.

Identification: find and be found
Communication: increase bandwidth
Cooperation: create value
Reification: make your group “a thing”
Adaptation: find the dark forces to preserve the essence of your thing

Even though the process is not necessarily linear, this 5-step abstraction can be useful for understanding where your group is at and where it’s headed.

We’ll discuss each of these stages in turn and close with key challenges groups might face throughout the process.

*“Stages of development” (not actually linear)*

I. Identification

Find and be found

Before you can coordinate, you need to find people to coordinate with. This is hard but there are ways to significantly raise the odds of encountering great humans. It’s obvious that proactively shaping your social circle brings many benefits.

One strategy to make it easier is to make yourself discoverable by the kind of people you want to be with.

Put yourself out there. Stand for what you care about and are interested in. For example, when meeting new people, speak about your interests and what you are passionate about. Or, be the one to organize the types of gatherings you want to attend. The more you do so, the more likely it is that people with similar values and interests will gravitate towards you. Once others know your interests, they can refer like-minded people to you, too.
Be enjoyable to be around. More specifically, be enjoyable to be around for the type of people you want to have in your life. Some people enjoy intense conversations about vague theories, others prefer getting to know people while building things, exploring nature, or making music. Figure out what you like (or want to like), then act accordingly.
Be in, or create, the places where you’re likely to meet interesting people. If you’re interested in bouldering, find out where the local bouldering scene hangs out and go there. If it turns out such a place (whether physical or virtual) doesn’t exist yet, create it.

By taking these steps repeatedly, you will eventually stumble into like-minded people.

You might be worried about coming across as self-promoting. It can be hard to keep a balance between talking about your interests and actively listening to others. But overtime, with trial and error, you'll develop a sense of how and when to talk about your interests.

There is a fundamental tension at this stage. On the one hand, you want to set a high bar for people you spend a lot of time with. On the other hand, getting to know people can take time. It is hard to tell early on whether a person is someone you’d want to hang out with more. Here are few heuristics you may find helpful:

Don’t expect to meet the perfect match, if you see potential, be willing to embark on a joint journey of helping each other to become the best versions of yourselves.
A priori, your instinct/gut-level impression is probably pretty well-calibrated on who you will get along with.
If you’re not sure, the value of information in getting to know somebody better is likely pretty high.
Someone’s willingness and ability to critically self-reflect, to learn from feedback, and their propensity to communicate transparently tend to be particularly good predictors.

II. Communication

Increasing bandwidth between brains

Communication plays a fundamental role in any relationship, and it will play a fundamental role in any tribe. Good communication can allow you to avoid collective action failures. You want to set good communication norms from the get-go.

Communication norms

Communication norms are a pervasive factor influencing the functioning of your group. They are an important part of a culture and a means of shaping it. The culture influences people’s expectations and priorities and determines which behaviors are acceptable and which ones aren’t. For example, communication strengthens or weakens shared norms around the good faith principle (/Hanlon’s razor), around proactively or reactively saying the truth, around what it means to be a robust agent, what it means to commit, how to break a promise, the importance of self-care or the value of exploration.

Communication norms are also critical to a tribe's ability to reason in a way that systematically improves its understanding of the world. Collective sensemaking relies in large part on language as the interface between people. This is why communication norms and epistemic norms are closely linked. Norms such as indicating one’s confidence in a belief, being specific, stating one’s cruxes, sharing raw impressions as well as “all-things-considered” views are important pillars to a tribe’s epistemic hygiene.

Based on our experience, we recommend following these communication norms. They might not be the right norms for any type of tribe, but they have proven valuable for us:

Paraphrase each other, ask for clarifications and steel-man other people’s takes
Make your predictions explicit (e.g. by remembering to make your beliefs pay rent and by making bets) and work on belief calibration
Make your assumptions explicit, as much as possible
1. For example, you might want more spontaneity from your friend (because to you, being up for do things together spontaneously is sign that they value you highly) while your friend wants to plan events upfront and put them in a shared calendar (because to them, putting in the effort to plan doing nice things together is a sign of mutual appreciation).
Don’t agree to disagree, learn to have constructive disagreements
1. To some people, disagreements feel like conflict; they don’t have to be, and avoiding disagreements comes with a high price. It doesn’t have to feel bad to figure out where you might be wrong if you learn to disentangle your self-worth from your belief system.
Encourage pre-mortems and red teaming - it doesn’t have to be nay-saying, it can just be about increasing your chances of success
Rewarding transparency and integrity, usually through leading by example

Two books that provide further tips for a constructive communication culture are Messages: The Communication Skills Book and Nonviolent Communication - A language of Life.

Communication bandwidth

The notion of “bandwidth” captures a number of useful intuitions for what is important with regards to communication. The higher your communication bandwidth, the more information you can exchange per unit of effort/time. If you care about your tribe’s ability to make sense of a complex world, a high or constantly growing communication bandwidth is crucial.

Shared vocabularies and common knowledge play an important role in increasing communication bandwidth. So does trust (more on this later on). Here’s an example of how a lack of communication bandwidth can often lead to coordination failures:

Let’s look at the stag hunt scenario, where you are part of a group of hunters. You have the choice to either hunt a rabbit or a stag.

The rabbit promises a small but certain reward, because you are not dependent on the other hunters in hunting the rabbit.
The stag promises a much larger reward, but you will only be successful in your mission if everyone else in the group also chooses to hunt the stag.

What can you do to foster cooperation? And when it fails, how can you mitigate the negative effects of the failure for the group’s ability to coordinate in the future?

The stag hunt scenario helps us see that, sometimes, uncooperativeness looks like defection when, in fact, the person “defecting” acted rationally according to their understanding and incentives. I.e. choosing rabbit over stag does not have to imply the conscious abandonment of an allegiance or duty for the sake of personal gain.

There is an important difference between “being compelled to act in a certain way because, according to one’s honest assessment, it seemed to be the correct choice” and “consciously choosing to ignore potential shared gains or willingly inflicting harm on others by choosing the option that seemed narrowly more profitable to oneself”.

The stag hunt example can teach us about the importance of being able to understand, account for, and communicate each others’ perspectives on a collective action problem. Figure out why someone did what they did before you assume they defected. If your friend “defects” on your suggestion it can be you have not yet earned their trust. If you haven’t bothered understanding whether your idea was a viable option for them, you also have work to do.

The stag hunt dynamic is also described in this post (see “2. Defection and discontent”). Also check out “Some Ways Coordination is Hard” for more thoughts on how to navigate stag hunt-type situations.

III. Collaboration

Working together to become better at coordinating beyond work; weave the social fabric

As much as thinking about how to improve your group is valuable, never forget to also get to things done that you all are excited about.

Consider: in order to become a world-class rowing team, the single most important thing you need to do is practice rowing together.

In the same way, a group can become excellent at coordinating through… coordination. Collectively working on a concrete project trains the “coordination muscle” of your group.

Working together can also be an extremely enriching and sobering experience. It is enriching because people learn, gain traction, motivation and a sense of meaning. It’s sobering because it provides a real-world test to the quality and strength of the coordination fabric your group has been building, stripping away the relative comfort of theoretical discussion. Shared experiences also contribute to common knowledge, which can implicitly improve your ability to communicate with each other (see section 2.).

Trust

The purpose of collaborating is twofold. For one, it is to build trust in each other.

Trust is essential for coordination. Importantly, trust is built over time, incrementally, rather than gained at a single moment. It’s a sequence of updates, rather than an immediate understanding.

The second purpose of collaboration is to build trust in your coordination mechanisms.

Coordination mechanisms include:

formal processes (e.g. weekly meetings, shared task management systems and calendars, group decision-making procedures)
informal processes (e.g. norms about checking in at the start of a meeting, about expected response times to messages, or about whether and when to inform others about not making a project deadline).

These mechanisms affect communication, decision-making, work and resource allocation, collective memory and more - virtually all processes a functioning tribe is interested in.

Why establish trust in these mechanisms? What is important about them?

Let’s take communication norms as an example. Say, our tribe came to think that a norm of transparency is desirable. All else equal, you will have an easier time being transparent with the tribe members if you trust that they, too, are transparent with you. In other words, the more you trust that the norm of transparency is well-established, and the more this trust is common knowledge among the tribe’s members (meaning: others trust that others trust that…), the stronger your incentive to live up to the norm yourself.

This echoes insights from the stag hunt mentioned earlier. Within that set up, the more you trust that the other hunters will stick to their commitment of hunting the stag, the less subjectively risky it is for you to stick with your commitment to hunting the stag. From the perspective of the tribe, your ability/willingness to take subjective risk translates to a tribe’s ability to “take leaps” - to aim for ambitious goals despite the possibility of failing. The stronger the fabric of trust within a tribe, the better it becomes at coordinating on “stag” over “rabbit”, even if the risk differential increases.

If I decide to “hunt stag”, even if doing so is a risky move from my individual perspective, it is a leap of faith. Whether this is a smart move (or a really dumb one) depends on whether the other people also decide to “hunt stag”. But we shouldn’t just expect people to put their faith in others, or ourselves. Trust is built and earned. At the same time, as a tribe, you want to clearly expect from one another that people choose “cooperate” over “defect”. This expectation in itself, if sensibly applied, is a coordination mechanism.

A distinct, but related, way of thinking about the developing strong coordination is as the result of an ever-escalating dance of increasing asks and rewards from and for everyone involved. It’s a process where people put in increasing amounts of effort and get out increasing amounts of value. We encourage readers interested in this alternative framing to check out section 1 (“Buy-in and retention”) of this post.

To sum up, a group becomes more like a tribe by actively practicing collaboration. Seek collaboration to forge the trust that permits coordination to arise.

IV. Reification

Make your group “a thing”

Until this point, you have been building a group of people who are coordinating, building trust in each other, strengthening coordination mechanisms, norms for communication, interaction and collective epistemics.

If everything is going well, it can be valuable to reify your achievements — start explicitly thinking of your group as a unit.

This can help you continue the streak.

What does reification involve? The answer will vary depending on your circumstances. The examples below can help you get the gist. As a part of reification you can:

Define, track and periodically evaluate shared goals, whether they relate to building a product, training skills, or simply discussing the amount of time you want to invest in interacting with each other.
Plan weekly meetings, weekly (rotating) 1-1s, regular personal retrospectives/personal development/feedback sessions, or quarterly retreats.
Give yourself a name (like group houses do)
Think through mechanisms for communication, systematize information sharing and collective memory retention in a way that suits your purposes.
Increase the investment in whatever institutions you have already established; e.g. you could decide to put more time into challenging each other’s personal development or working on shared projects.
Consider living together. Being closer to each other is generally valuable, whether that means living in the same building, quarter or city. It increases serendipity and decreases the cost of interaction.
Define long(er) term commitments, be that writing a blog together, starting an organization, supporting each other in raising family(s), etc.
Figure out how to handle somebody moving away (temporarily) or new potential members showing up.

The specifics depend on your tribe’s goals and circumstances. We recommend regularly reviewing the purpose of your tribe. What value are people getting out of it? What value would people ideally like to get out of it? Keep your tribe alive and purposeful by making sure you regularly and collectively ask and answer the questions that matter.

If successful, explicit and credible long-term commitment by everyone will create a shared understanding. Your tribe will be perceived as something worth investing in and something that will yield further benefits.

You can complement the core idea of reification with the notions of (1) co-ownership, and (2) a minimum viable set of coordination mechanisms.

Co-ownership

People don’t respond as well to principles that have been externally imposed on them. They are more likely to enact principles that are “theirs”. If, in their mind, an action is clearly linked to something they already care about, it becomes much easier to take the action.

This is important to keep in mind during the reification phase. Everyone should be involved in establishing your tribe's goals and values, and deciding how you coordinate with each other. True belonging is always the product of co-construction. This requires mechanisms that: create buy-in, ensure people are and feel heard, facilitate collective decision making and error-correction.

If your tribe is relatively small and highly value-aligned, the creation of co-ownership likely won’t require the introduction of new formal structures. The larger the group, the more beneficial formalizing coordination mechanisms tends to be.

This retrospective of a high-commitment co-living experiment contains several valuable lessons on this matter.

Minimum viable set of coordination mechanisms

We’ve mostly talked about increasing commitment and input. At this point, you might think to yourself “hell, this sounds like a lot of work! Even if the idea is cool in theory, I’m not sure I could invest that much effort.”

We hear you! As much as having a tribe can be rewarding, it also needs to be practically viable to upkeep, else it won’t last long enough to produce the juiciest rewards.

Aiming for a set of minimal viable coordination mechanisms is particularly helpful in making the project last. It’s surprising how much you can achieve if you’re well organised—even without putting in a lot of effort, and with people living in different countries. Weekly meetings, weekly 1-1s, goal tracking and evaluation, quarterly retreats can already sound like a lot. However, there are three reasons why it requires a lot less effort than commonly thought:

you can share the workload with other tribe members
many processed can be automated and systematized
you learn doing these things more efficiently and better over time

Here are a few examples:

If you run a weekly meeting with a handful of people, you can rotate who is doing the bulk of preparation. This means any one person will only be responsible for preparing the meeting approximately once a month.
You can create templates, checklists and automated reminders for recurring structures or tasks (e.g. reminder for who is responsible for preparing the next weekly meeting, a (baseline) retreat timetable, templates for personal retrospectives or feedback templates, a spreadsheet where you collect and score discussion topics for 1-1s or discussions).
After having organized a first event of a specific format (e.g. the first retreat), subsequent iterations will likely require 50% or less of the initial preparation effort. By solidifying how to run certain aspects of your activities, you also free up resources to experiment with other aspects to keep improving how much your group is getting out of it.

Overtime, you learn more about what works for your group and what doesn’t. It becomes easier to run minimal-viable versions of the activity that still provide a lot of value. This gives the group slack in case you are going through a particularly busy time. Conversely, when you have more time at hand or are particularly motivated, you can make additional investments and run more innovative or polished versions of your activities.

For a community to remain healthy, it is crucial to periodically “prune” its structures—removing systems that no longer support the purpose of the tribe. You want to preserve the essence, while getting rid of the frills. Good pruning mechanisms also create more space for innovation—you don’t risk getting stuck with useless systems that stick around due to inertia and weigh you down.

It takes time to clarify the core values of your tribe. This effort will make it easier to decide which systems to keep and which ones to remove in your pruning process.

V. Adaptation

Success is not a function of zero failures, but one of resilience and learning in the face of failure.

We've shared a lot of advice on how to make your tribe strong and robust. But, even if you follow all of it, from time to time, your coordination efforts will still fail.

That’s okay.

A successful tribe is not one that never fails at coordinating. It’s one that is antifragile—one that has a set of proven-to-be-robust mechanisms to handle failures.

Antifragile groups are able to learn from coordination failures, as opposed to letting them shatter the attempt at building a tribe. A group that has no tolerance for miscommunication and coordination failures is unlikely to last for long, let alone grow and improve over time. The real world is too unpredictable and noisy for this strategy to be successful.

An antifragile tribe will even see occasional friction as something desirable. It's a mechanism that helps build the skills involved in conflict resolution and confidence in the group’s ability to “figure things out”. Having the locus of control located within a group will make it grow stronger over time.

To strengthen your tribe's ability to grow from coordination failures it's worth paying attention to the following areas:

Conflict management: coordination failure can be caused by or lead to interpersonal tensions. It is thus necessary to have some capacity to constructively deal with conflicts and, where possible, resolve them. Non-violent communication (NVC) seems like a good tool to start to raise the tribe’s ability to manage conflicts.
Learning: transform failures into lessons that trigger growth and make you stronger.
- For example, you can use meetings or retreats to discuss whether the tribe is providing the value you are looking for. Regularly ask yourself what is working for the group and what isn’t. Keep track of your collective learnings and set up systems that remind you of them from time to time.
- An important part of learning is the ability to confront and be confronted on issues that might require improvement. Neither making nor receiving confrontations is easy, but it’s a skill that can be practiced. And if not with your tribe, with whom then?
Trust in the meta-level: Are you 100% confident in your tribe’s ability to have constructive conversations and avoid talking past one another? No. And you don’t need to. Instead, it suffices to be confident that someone will notice when you are talking past each other and that you will be able to figure out why this happens and fix it. Similarly, you cannot be absolutely certain that the way your tribe pursues its goals is a good one. But, you can be pretty confident that flaws will eventually be recognized and the approach updated. This is a collective version of confidence all the way up.
Holding values/standards high: adaptation and learning are important, but they're not the same as giving up on your principles or values in response to the slightest signs of resistance. You also have to maintain your values and preserve the essence. Just because a coordination attempt around a given standard failed doesn’t immediately mean you will want to drop that standard altogether.

Last but not least: as the world around you changes, your tribe, too, needs to change.

Tribes are living organisms. Even if nobody joins or leaves, members of the group change and the world around the group changes. Your rules, norms and goals will evolve. This can be great, because you are learning more and the updated rules, norms and goals move you closer to what you care about. The tricky bit is to find the right balance between innovative and preserving forces. The key, again, is to have processes that guide the development of the tribe while preserving its essence - whatever that is.

To sum up this section groups need to be adaptive because success is a function of how well you’re able to adapt to, and become stronger within, a changing world.

Challenges

Let’s take a final look at some challenges a group might face when going through the above process.

We will keep this section short, also because we don’t add much new content but summarize some of our key points through the lense of common challenges. These are dynamics that to look out for.

(A number of these challenges have also been described by Duncan in Open Problems in Group Rationality. )

Defection vs miscommunication

Defection happens, but if you selected your friends on the basis of shared values, genuine defection is pretty rare.

The most common cause of defection is miscommunication. As illustrated by the stag hunt example in the section on communication, it is easy to misunderstand others. Internal lives are complex. We all, to some extent, live in “different worlds”, have different priors, and are subject to our personal narratives. Individual context shapes our perception of the option space and what the “payoff matrix” looks like.

It is useful to learn to decouple the feeling that someone defected on you from your immediate course of action. As long as you don't have strong evidence that you should abandon the good faith principle, the more promising option is likely to cooperate again, not to retaliate. (Nicky Case’s “The Evolution of Trust” is an excellent illustration of this point, in particular section 6 “Making Mistakes”). In a world where genuine mistakes and misunderstandings are possible, forgiveness is not some lofty virtue, but a solid strategy for success.

The real-world version of “cooperate again” often means talking about what happened: explaining how you interpreted someone’s behaviour or clarifying how you feel. Nonviolent communication offers a framework to have these types of conversations without further escalating the dynamics.

Making discourse about defection vs miscommunication explicit, and keeping in mind the difference between “feeling defected on” and “being defected on” is helpful. It requires to keep a tough balance. You’ll need to acknowledge each other's differing subjective experiences without giving up on the idea of there being an objective reality. Striking this balance is what allows you to help each other become stronger. Then, collectively, you can strategize about how to handle similar situations better in the future.

Minor grievances

Another common way in which a lack of communication can lead to defection is through the accumulation of minor and repeated grievances that remain unvoiced and unresolved. This is how you end up with each person maintaining a list of unreturned favours or sacrifices that weren’t properly acknowledged. The disputes that eventually erupt from such dynamics are particularly hard to resolve—everyone feels in the right and defensive, making it hard for anyone to break the cycle. Proactive and transparent communication can preempt these dynamics from escalating and is therefore an important norm.

Adopting a culture of transparent communication where grievances are voiced proactively has its own pitfalls. A common one is that communication can become entrenched in being subtly negative or adversarial. To prevent this form arising, it can be helpful to make sure your tribe has an equally strong norm of expressing loving, caring or otherwise positive emotions towards each other.

Safety and standards

Creating a space that is safe for people to be in, physically and emotionally, is an important precondition to fostering deep relationships and personal growth. However, groups that value safety above all else tend to lose “their essence”.

There are situations where considerations about safety (or comfort) trade off against holding each other to certain standards and striving to become stronger. Genuine growth will periodically be uncomfortable; giving and receiving candid feedback, confronting each other about ways we are not currently living up to our own standards, challenging each other to become better and aim higher.

Insisting on certain standards can make people feel periodically unsafe. For a lot of people it is hard to emotionally differentiate between being challenged on a specific problem and being attacked as a person. That’s a fact about human psychology and doesn’t say much about a person. It is something we can get better at disentangling.

This is genuinely difficult to navigate but the answer is not to give up on either standards nor safety.

This is where communication comes back into the picture. We can strive for communication norms that allow us to more easily differentiate matters of safety from matters of standards. To do so, we need to first collectively acknowledge the difference between local discomfort from challenge and genuine lack of safety (even if this difference is not always clear). Developing shared language around the topic can further help to identify, address and unpack difficult social dynamics where someone tries to enforce standards and another person feels threatened.

t's up to your tribe to figure out where the trade-off between threats-to-safety vs threats-to-standards should fall. Making these negotiations explicit is what allows members of your tribe to consciously decide whether they take this deal. The point is not to be eternally bound to a set of rules - people are allowed to change their minds as they and the world around them change. The point is to help each individual to more easily understand their ideal and strive for it. For example, even if your volition is a continuous search for growth, you might still occasionally fall for a more myopic, comfort-seeking outlook. Your friends can act as a scaffold to help you align these seemingly conflicting preferences.

Independence, autonomy and cooperation

When you coordinate, you give up some amount of your independence for collective gain, for example in the form of safety, emotional support or success. At least that’s the idea.

As we’ve seen before, the correct trade-off depends on the specific people in your tribe. And again, better communication norms enable you to negotiate these trade-offs more successfully.

Cooperating implies giving up some amount of personal independence by relying on others and having others rely on you. But,it does not mean that you’re gambling away your autonomy—your right to make decisions about what you do and what happens to you.

As a tribe, you need to coordinate without a clear authority. According to the model of co-ownership, decisions are made collectively. That doesn’t necessarily mean that everyone has to be part of every decision. You can collectively decide on processes for someone to make decisions for the tribe - but that is never irreversible. In any case, collective decision making is not a trivial feat. You will have to figure out how to productively combine each other’s views to make progress.

Different people have different needs when it comes to things like autonomy and support. Different people have different default strategies for solving their problems or processing their emotions. To find good trade-offs, you want to discuss these differences.

Scaling and dilution

Growing your tribe creates more resources that can allow you to reach higher goals. New people bring in new knowledge, new ideas, more attention, etc. At the same time, as the tribe grows it becomes increasingly harder to maintain highly trust-based group dynamics. The potential for conflict increases and so does the time required for maintaining or establishing new norms or common knowledge. Simultaneously, the amount of time you can spend having in-depth personal conversations with individual group members decreases.

We don’t have a good sense for what the maximal “carrying capacity” of a tribe is. If you are interested in growing your tribe, you probably want to adopt a slow and careful approach, one that allows you to slow down further or take a step back if it looks like you have been moving too fast. Generally, it seems valuable to also experiment with different settings. Working with someone on a project, or renting out a place for a month to test coliving will provide valuable data.

Of course, the decision mustn’t be between having someone join your tribe as a full member, or not interacting with them at all. We sometimes refer to this idea as the “onion model”. There are a lot of great people that you can mutually benefit from cooperating with. Not only do you benefit from new input and faces from time to time, your tribe probably also has a lot to offer to the world.

Summary

At its heart, this post is motivated by the belief that “doing community/friendship well” is a) something that we can gain insight on and get better at and b) something that is worth pursuing.

We have shared some of our learnings and thoughts on important stages to foster increasingly strong coordination. There are other plausible ways to delineate and name the different stages, you will have noticed that several ideas recur throughout the post. Some of the most important ones are:

Trust and communication
An intricate balance between consistency and change
A constant dialogue about the purpose/essence of your group
An appreciation of the fact that we are all running on monkey-brains

The post is dense and yet merely provides a glimpse into what it means to build a tribe, but it’s a topic we’re interested in exploring. Don’t hesitate to reach out to us, discuss further or write your version of this post to highlight differences and other models. We’re just as keen to engage in exchange and learn from you as we’re happy to share whatever we can.

Resources

Duncan Sabien: Open Problems in Group Rationality, Common Knowledge and Miasma, Reneging Prosocially, The Social Motte-and-Bailey
Nicky Case: Evolution of Trust, Wisdom/Madness of Crowds, Attractor Landscapes, Parable of the Polygons
LessWrong tags: Coordination/Cooperation, The Craft and The Community, Group Rationality, Robust Agents
EA forum tags: Cooperation & Coordination, Discussion Norms
Communication: Non-violent communication, Messages: The Communications Skills Book, Authentic relating (e.g. circling)
Epistemics: Inadequate equilibria, Worldview diversification, Scholarship & Learning, Superforecasting, Fermi estimates
Academic literature, such as: on trust (psychology & sociology), on decentralized governance, institutional development, game theory (political science, economics and history)

Many thanks to Marta and Chana for invaluable feedback and help with editing, and to Maxime and Jan Pieter for being part of this journey.

Compilation of thoughts on impact-oriented interdisciplinary research

July 12, 2021 by Nora Ammann

The below doesn’t come in the format of a traditional post; instead of a coherent start-to-end narration, it’s a compilation of topically related thoughts. I am posting them here as one single post because I want to have these thoughts accessible in one place.

[Cross-posted to the EA forum in shortform: 1 2 3.]

(1) Motivation

Below, I briefly discuss some motivating reasons, as I see them, to foster more interdisciplinary thought in EA. This includes ways EA's current set of research topics might have emerged for suboptimal reasons.

The ocean of knowledge is vast. But the knowledge commonly referenced within EA and longtermism represents only a tiny fraction of this ocean.

I argue that EA's knowledge tradition is skewed for reasons including but not-limited-to the epistemic merit of those bodies of knowledge. There are good reasons for EA to focus in certain areas:

Direct relevance (e.g. if you're trying to do good, it seems clearly relevant to look into philosophy a bunch; if you're trying to do good effectively, it seems clearly relevant to look into economics (among others) a bunch; if you came to think that existential risks are a big deal, it is clearly relevant to look into bioengineering, international relations, etc. a bunch; etc.)
Evidence of epistemic merit (e.g. physics has more evidence for epistemic merit than psychology, which in return has more evidence for epistemic merit than astrology; in other words, beliefs gathered from different fields are are likely to pay more/less rent, or are likely to be more/less explanatory virtuous)

However, some of the reasons we’ve ended up with our current foci may not be as good:

Founder effects
The, in parts arbitrary, way academic disciplines have been carved up
Inferential distances between knowledge traditions that hamper the free diffusion of knowledge between disciplines and schools of thought

Having a skewed knowledge basis is problematic. There is a significant likelihood that we are missing out on insights or perspectives that might critically advance our undertaking. We don’t know what we don’t know. We have all the reasons to expect that we have blindspots.

***

I am interested in the potential value and challenges of interdisciplinary research.

Neglectedness

(Academic) incentives make it harder for transdisciplinary thought to flourish, resulting in what I expect to be an undersupply thereof. One way of thinking about why we would see an undersupply of interdisciplinry thought is in terms of "market inefficiencies". For one, individual actors are incentivised (because it’s less risky) to work on topics that are already recognised as interesting by the community (“exploitation”), as opposed to venturing into new bodies of knowledge that might or might not prove insightful (“exploration”). What is “already recognized as valuable by the community”, however, will only in part be determined by epistemic considerations, and in another part be shaped by path-dependencies.

For two, “markets” are insufficiently liquid and thus tend to fail where we cannot easily specify what we want. I’d argue that this is the case for transdisciplinary work. This is generally true for intellectual work, but is likely even more true for transdisciplinary work due to the relatively siloed structure of academia that adds additional “transaction costs” to attempts of communicating across disciplinary boundaries.

One way to reduce these inefficiencies is by improving the interfaces between the disciplines. "Domain scanning" and "episetmic translation" are precisely about creating such interfaces. Their purpose is to identify knowledge that is concretely relevant to a given target domain and make that knolwege accessible to thinkers entrenched in the "vocabulary" of that target domain. A useful interface between political philosophy and computer science, for example, might require a mathematical formalization of central ideas such as justice.

Challenges

At the same time, doing interdisciplinary well is callenging. For example, interdisciplinary research can only be as valuable as a researcher's ability to identify knowledge relevant to their target domain; or as a research community's quality assurance/error correction mechanisms. Phenomena like citogenesis or motivatiogensis are examples of manifestations of these difficulties.

There have been various attempts at overcoming these incentive barriers, for example the Santa Fe Institute whose organizational structure completely disregards scientific disciplines; -ARPAs have a similar flavour; the field of cybernetics which proposed an inherently transdisciplinary view on regulatory systems; or the recent surge in the literature on “mental models” (e.g. here or here).

A closer inspection of such examples - in how far they were successful and how they went about it - might bear some interesting insights. I don't have the capacity to properly puruse such case studies in the near future, but it's definteily something on my list of potentially promising (side) projects.

If readers are aware of other examples of innovative approaches trying to solve this problem that might make for insightful case studies, I’d love to hear them.

(2) A model : “domain scanning” and “epistemic translation”

The below provides definitions and explanations of "domain scanning" and "epistemic translation", in an attempt of adding further gears to how interdisciplinary research works.

I suggest understanding domain scanning and epistemic translation as a specific type of research that both plays (or ought to play) an important role as part of a larger research progress, or can be usefully pursued as “its own thing”.

Domain Scanning

By domain scanning, I mean the activity of searching through diverse bodies and traditions of knowledge with the goal of identifying insights, ontologies or methods relevant to another body of knowledge or to a research question (e.g. AI alignment, Longtermism, EA).

I call source domains those bodies of knowledge where insights are being drawn from. The body of knowledge that we are trying to inform through this approach is called the target domain. A target domain can be as broad as an entire field or subfield or a specific research problem (in which case I often use the term target problem instead of target domain).

Domain scanning isn’t about comprehensively surveying the entire ocean of knowledge, but instead about selectively scouting for “bright spots” - domains that might importantly inform the target domain or problem.

An important rationale for domain scanning is the belief that model selection is a critical part of the research process. By model selection, I mean the way we choose to conceptualize a problem at a high-level of abstraction (as opposed to, say, working out the details given a certain model choice). In practice, however, this step often doesn’t happen at all because most research happens within a paradigm that is already “in the water”.

As an example, say an economist wants to think about a research question related to economic growth. They will think about how to model economic growth and will make choices according to the shape of their research problem. They might for example decide between using an endogenous growth or an exogenous growth model, and other modeling choices at a similar level of abstraction. However, those choices happen within an already comparably limited space of assumptions - in this case namely neoclassical economics. It's at this higher level of abstraction that I think we're often not sufficiently looking beyond a given paradigm. Like fish in the water.

Neoclassical economics, as an example, is based on assumptions such as agents being rational and homogenous, and the economy being an equilibrium system. Those are, in fact, not straightforward assumptions to make, as heterodox economics have in recent years slowly been bringing to the attention of the field. Complexity economics, for example, drops the above-mentioned assumptions which helps broaden our understanding of economics in ways I think are really important. Notably, complexity economics is inspired by the study of non-equilibrium systems from physics and its conception of heterogeneous and boundedly rational agents come from fields such as psychology and organizational studies.

Research within established paradigms is extremely useful a lot of the time and I am not suggesting that an economist who tackles their research question from a neoclassical angle is necessarily doing something wrong. However, this type of research can only ever make incremental progress. As a research community, I do think we have a strong interest in fostering, at a structural level, the quality of interdisciplinary transfer.

The role of model selection is particularly important in the case of pre-paradigmatic fields (examples include AI Safety or Complexity Science). In this case, your willingness to test different frameworks for conceiving of a given problem seems particularly valuable in expectation. Converging too early on one specific way of framing the problem risks locking in the burgeoning field too early. Pre-paradigmatic fields can often appear fairly chaotic, unorganized and unprincipled (“high entropy”). While this is sometimes evidence against the epistemic merit of a research community, I tend to want to abstain from holding this against emerging fields, because, since the variance of outcomes is higher, the potential upsides are higher too. (Of course, one’s overall judgement of the promise of an emerging paradigm will also depend more than just this factor.)

Epistemic Translation

By epistemic translation, I mean the activity of rendering knowledge commensurable between different domains. In other words, epistemic translation refers to the intellectual work necessary to i) understand a body of knowledge, ii) identify its relevance for your target domain/problem, and iii) render relevant conceptual insights accessible to (the research community of) the target domain, often by integrating it.

Epistemic translation isn’t just about translating one vocabulary into another or merely sharing factual information. It’s about expanding the concept space of the target domain by integrating new conceptual insights and perspectives.

The world is complicated and we are at any one time working with fairly simple models of reality. By analogy, when I look at a three-dimensional cube, I can only see a part of the entire cube at any one time. By taking different perspectives on the same cube and putting these perspectives together - an exercise one might call “triangulating reality” -, I can start to develop an increasingly accurate understanding of the cube. The box inversion hypothesis by Jan Kulveit is another, AI alignment specific example of what I’m thinking about.

I think something like this is true for understanding reality at large, - be it magnitudes more difficult than the cube example suggests. Domain scanning is about seeking new perspectives on your object of inquiry, and epistemic translation is required for integrating these numerous perspectives with one another in an epistemically faithful manner.

In the case of translation between technical and non-technical fields - say translating central notions of political philosophy into game theoretic or CS language - the major obstacle to epistemic translation is formalization. A computer scientist might well be aware of, say, the depth of discourse on topics like justice or democracy. But that doesn’t yet mean that they can integrate this knowledge into their own research or engineering. Formalization is central to creating useful disciplinary interfaces and close to no resources are spent to systematically spreading up this process.

Somewhere in between domain scanning and epistemic translation, we could talk about “prospecting” as the activity of providing epistemic updates on how valuable a certain source domain is likely to be. This involves some scanning and some translation work (therefore categorized as “in between the two”), and would serve the central function of a community mechanism for coordinating around what a community might want to pay attention to.

(3) A list of fields/questions for interdisciplinary AI alignment research

The following list of fields and leading questions could be interesting for interdisciplinry AI alignment reserach. I started to compile this list to provide some anchorage for evaluating the value of interdiscplinary research for EA causes, specifically AI alignment.

Some comments on the list:

Some of these domains are likely already very much on the radar of some people, other’s are more speculative.
In some cases I have a decent idea of concrete lines of question that might be interesting, in other cases all I do is very broadly gesturing that “something here might be of interest”.
I don’t mean this list to be comprehensive or authoritative. On the contrary, this list is definitely skewed by domains I happened to have come across and found myself interested in.
While this list is specific to AI alignment (/safety/governance), I think the same rationale applies to other EA-relevant domains and I'd be excited for other people to compile similar lists relevant to their area of interest/expertise.

Very interested in hearing thoughts on the below!

Target domain: AI alignment/safety/governance

Evolutionary biology
1. Evolutionary biology seems to have a lot of potentially interesting things to say about AI alignment. Just a few examples include:
  1. The relationship between environment, agent, evolutionary paths (which e.g. relates to to the role of training environments)
  2. Niche construction as an angle on embedded agency
  3. The nature of intelligence
Linguistics and Philosophy of language
1. Lots of things that are relevant to understanding the nature and origin of (general) intelligence better.
2. Sub-domains, such as semiotics could, for example, have relevant insights on topics like delegation and interpretability.
Cognitive science and neuroscience
1. Examples include Minsky’s Society of Minds (“The power of intelligence stems from our vast diversity, not from any single, perfect principle”), Hawkin’s A thousand brains (the role of reference frames for general intelligence), Frinston et al’s Predictive Coding/Predictive Processing (in its most ambitious versions a near universal theory of all things cognition, perception, comprehension and agency), and many more
Information theory
1. Information theory is hardly news to the AI alignment idea space. However, there might still be value on the table from deeper dives or more out-of-the-orderly applications of its insights. One example of this might be this paper on The Information Theory of Individuality.
Cybernetics/Control Systems
1. Cybernetics seems straightforwardly relevant to AI alignment. Personally, I’d love to have a piece of writing synthesising the most exciting intellectual developments under cybernetics done by someone with awareness of where the AI alignment field is at currently.
Complex systems studies
1. What does the study of complex systems have to say about robustness, interoperability, emergent alignment? It also offers insights into and methodology for approaching self-organization and collective intelligence which is interesting in particular in multi-multi scenarios.
Heterodox schools of economic thinking
1. Schools of thought are trying to reimagine the economy/capitalism and (political) organization, e.g. through decentralization and self-organization, by working on antitrust, by trying to understand potentially radical implications of digitalization on the fabric of the economy, etc. Complexity economics, for example, can help understanding the out-of-equilibrium dynamics that shape much of our economy and lives.
Political economy
1. An interesting framework for thinking about AI alignment as a socio-technical challenge. Particularly relevant from a multi-multi perspective, or for thinking along the lines of cooperative AI. Pointer: Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles
Political theory
1. The richness of the history of political thought is astonishing; the most obvious might be ideas related to social choice or principles of governance. (A denses while also high-quality overview is offered by this podcast series History Of Ideas.) The crux in making the depth of political thought available and relevant to AI alignment is formalization, which seems extremely undersupplied in current academia for very similar reasons as I’ve argued above.
Management and organizational theory, Institutional economics and Institutional design
1. Has things to say about e.g. interfaces (read this to get a gist for why I think interfaces are interesting for AI alignment); delegation ( e.g. Organizations and Markets by Herbert SImon; (potentially) the ontology form forms and (the relevant) agent boundaries (e.g. The secret to social forms has been in institutional economics all along?)
2. Talks for example about desiderata for institutions like robustness (e.g. here), or about how to understand and deal with institutional path-dependencies (e.g. here).

Maps of Maps, and Empty Expectations

May 03, 2021 by Nora Ammann

[cross-posted here]

This post is a recap of some naturalism studies I did a few months back. At the outset, I didn’t know where the studies would take me. Now, I believe they taught me about how to do the “real” version of the thing you’re trying to do, where “real” tries to point at things like “the territory” and “what actually matters”.

To communicate my idea, I will rely on two central examples and offer two frames of interpretation. I want to offer people several, slightly different footholds, such that, if one example or one frame clicks for you while the others don’t, you can run with the one that did. I end the post with some thoughts on how to respond to the problem I’m outlining.

Outline

Let’s go!

Examples

“Ideal solutions”

Some time ago, I was working on an important work project. I cared about doing it well. Initially, things went well but soon I started to feel stuck and grew increasingly averse to working on the project. Whenever I would turn to thinking about the project, I had an experience of my mind “cramping” or “tensing up”, and a veil of fog settling over my mind, preventing me from thinking clearly and making progress. I’d start to feel increasingly frustrated, self-judgemental, and averse to working on the project.

A central part of the phenomenology of this experience is a sense of “trying really hard”, but failing to really get a grip on the problem. It’s a chain of trigger-action patterns where, when I notice I get stuck in this way (trigger), my mind reacts by “trying harder” (action), which in turn makes my mind tense up more. This creates more of the mind-fog preventing me from being able to think clearly, meaning I’m even less able to make progress (trigger), causing my mind to want to try even harder (action), etc. The resulting experience is one of “drifting away from” rather than “towards” a clear understanding of the task/problem.

When I first ran into this problem, I didn’t understand what was going on. So, I started studying this tensing-up experience, which eventually led me to an insight that was central to my making progress on this problem.

I realized that I was holding the subconscious belief that there existed, somewhere, an ideal, a perfect solution to my project. I didn’t have access to this “ideal solution”, nor did my belief about its existence contain any details on what it looked like. And yet, part of me was convinced that this perfect solution existed.

As a result of this belief - and until that point unnoticed to myself - my orientation to the project had shifted. What I was trying to do when working on it had moved away from “thinking about and solving the problem” towards “finding and replicating the ideal solution”. Anything I did in fact produce - necessarily - fell short of its perfection, thus harbouring frustration, a sense of insufficiency, and further pulling my attention away from the object-level of actually solving the problem.

I’m inclined to call the “ideal solution” a construct of my social cognition. It was created out of beliefs about how I was supposed to carry out my project, and out of comparisons with how others would do it. It wasn’t that the project itself was unbelievably hard, so much so that I couldn’t have made meaningful progress on it. Rather, linked to my desire to do it well - to live up to some high, yet ill-defined standards, as well as my desire to be seen as doing so - my thinking had become tangled up with a lot of thoughts, only some of which still had to do with the plain task of making progress my project.

Once I noticed that, in some sense, I had ceased to plainly try to solve the problem, combined with some additional hacks that helped me get back on track with doing just that (described in more detail below), my frustration and sense of stuckness with regards to the project started to melt away successively. With my mind clear (instead of fogged up) and able to sustain a gentle focus (instead of cramping), I would return to thinking clearly, which allowed me to regain traction and soon be reconnected to the joyful creativity of genuine problem- solving.

“Being helpful”

Some time ago, I helped a friend debug a problem of theirs. They wanted to support their partner, who was going through an intense couple of months. The problem was that my friend's attempts of helping their partner weren’t always as successful as one would hope.

We had talked about this problem before, but this time my friend came to me with a specific new insight - one that proved to be particularly juicy with respect to making progress on their problem.

“I realize that, sometimes, when my partner asks me to do something for them, I switch into the mode of ‘trying to be helpful’ or ‘trying to be seen as helpful’, instead of ‘trying to solve the problem they asked me to solve for them’.”

We went on deconstructing this dynamic further. It looked like “being seen as helpful” served two distinct purposes. For one, my friend wanted to be the sort of person who is a supportive partner. This was, in essence, a need that they themselves had. Second, they wanted to signal to their partner that they “had their back”. Importantly, this desire to offer emotional support was genuine and did serve an important purpose (which their partner recognized and valued). They could signal their commitment by, for example, adjusting their body posture, engaging in certain verbal patterns, being generous in what resources they were willing to spend on solving the problem.

However, their kind intention alone wasn’t solving the problems their partner had asked them for their help with. In optimizing for “being a supportive/helpful partner”, they would give up on some things that normally made them more efficient at solving problems. For example, in discussing with their partner what solutions were most appropriate, they felt disinclined to disagree with their partner and clarify where the disagreement (or confusion) came from. Voicing disagreement felt like it was going against their goal of making their partner feel supported. However, it also decreased their ability to fully understand what it was their partner cared about.

Just like myself in the earlier example, my friend got trapped in - tangled up with - a behaviour that was optimizing for something other than “solving the actual problem”. Given that my friend genuinely cared about “solving the actual problem”, their strategies for achieving this goal was non-ideal. This is not to say that the other things their behaviour was optimizing for might not also be valuable in its own regards. By gaining clarity on what was going on, and acknowledging that they did in fact care about several things in this situation, my friend found a way to satisfice each of them separately. As a result, the tension they used to feel when trying to help their partner reduced, and they became more effective at actually helping them.

Interpretive Frames

I now want to offer two complementary ways in which one can interpret and draw lessons from the above two case studies.

Maps of maps

The two examples are trying to point at a way of orienting to reality - namely, engaging in a sort of “guessing the teacher’s password” move, rather than trying to solve the real problem - that comes with detrimental effects on one’s cognitive/epistemic processes. Let me unpack:

In “rationalist lingo”, I would describe the shift from “trying to solve the problem” to “trying to replicate the ideal solution” as follows: Instead of trying to create my own map of the territory, I was trying to create a map of someone else’s map of the territory.

However, whenever the problem that you’re trying to solve resides in reality, the epistemic process of “trying to create a map of someone else’s map of reality” is inherently misguided and likely to lead you astray. One of the most important aspects of this, in my experience, is that you cannot interact with someone else’s map of the territory in the same way you can interact with - say, run experiments on - territory.

Two caveats on what I just said seem appropriate:

First, this is not to say that you cannot (or that you should not) interact with and learn from other people’s maps of the territory. In fact, other people’s maps are a great source of information, and I’m all in favour of downloading and adequately integrating parts of their maps into your own. What I am trying to point at, however, is that the type of relationship between you and someone else’s map is importantly distinct from the relationship between you and the territory. There is a difference between a) treating someone else’s beliefs as evidence about the territory, and integrating that evidence into your own overall view, and, and b) confusing someone else’s map of the territory for the territory itself, forgetting that there is an actual territory and that the way your actions cash out depend on the territory, not the other person’s map.

Second, things become a bit more complicated in cases where the problem you care about does in fact reside in someone else’s map of reality, and the evidence you’re looking for is evidence about their map, not the territory per se. These types of problems exist, and in these cases, you are correct in trying to build a model of the other person’s model. What does remain valid, however, is the importance of tracking what level it is you actually care about, and what level it is you’re currently on.

In my experience, one of the simplest-while-still-robustly-useful ways for (re-)orienting is to pause, take a (mental or actual) step back and ask “Soo.. what is it that that I’m actually interested in/trying to do here?”

This is another way of saying, the “maps of maps” problem is a type of Goodhart’s problem: you tried to solve a problem, you picked (consciously or nor) a metric that was at some point correlated with the thing you actually cared about, but eventually, once you optimized enough for that metric, it ceases to capture the thing you actually care about. (For example: goal - helping; metric - being seen as helpful.) Asking “what’s the thing I actually care about is here” helps you re-calibrate where you’ve come since you last asked this question, and what looks like the right direction to be moving in now.

“Empty expectations”

Here’s another way of describing what is going on in the above examples. It revolves around the cognitive move/phenomenon of “having expectations”. In the first example, say, I had expectations about what the ideal solution looks like. But, as I will argue in a bit, something was off about my expectations - it was empty.

First, let us do some ground work. Expectations come in different types. There is a type of expectations that works like predictions. Prediction-type expectations are great because they contain a lot of useful information. Assume I write a post. As I re-read it, I have some sense of the current draft “falling short of my expectations”. I can now use my inner-sim to poke at this sense of “not quite right”, and it will tell me things about how exactly I am falling short, and what a better version of the post would look like. Maybe I need to add an example, or maybe this sentence is too wordy, etc.

There is another kind of expectation. To understand how it works, let’s take the “ideal solutions” example from earlier. There, the expectation I (subconsciously) held (about the existence of a perfect solution) was empty, non-specified. All it had to say was that whatever I had produced so far “surely wasn’t perfect”. But my expectation had nothing at all to say about how my current draft solution was falling short, or in what direction I should be travelling in. I call this type of expectation an “empty expectation”.

Note that, sometimes, a prediction-type expectation might reside somewhere in the blackbox-y parts of your mind that, at first sight, it looks like an empty expectation. However, just because it is difficult to succinctly verbalize how what you are seeing falls short of the expectation - let’s say the expectation is preverbal - that doesn’t yet mean it is empty. The preverbal expectation does still clearly carry information about where I ought to look or what I ought to do next, even if it would be hard for me to explain that to someone else. An empty expectation doesn’t have that. It might require some extra attention to correctly distinguish preverbal from empty expectations in practice. In my own experience, however, once I look more closely, the distinction quickly becomes evident.

Since giving this phenomenon of empty expectations a name, it has become even more salient to me just how widespread it is. Any of the following sounds familiar to you?

You think about [starting a project], but then you think: “Nah, I'm not good enough. I couldn't possibly do that."
You have been working hard on [a project], but you keep thinking to yourself: “I’m not doing enough. I’m not doing it right. I have to do more. I have to do better.”

These aren’t always cases of empty expectations. To check whether they are, you can ask yourself some of the following questions:

Does the thought have anything specific to say about in what way you’re not currently up for the tasks, what you’re lacking, or what, concretely, it would look like for you to be? Is there any bit of evidence that could cause you to think that you were up for it? If no, that’s an empty expectation.
Does the thought have anything realistic to say about what it would look like for you to do enough or will it continue to ask for ‘more’ indefinitely? If there is no realistic and concrete answer to that, that’s an empty expectation.
And here’s a bonus question: What if you swapped out “you” for some other person in the same scenario. Do you get the same or a different answer? If different, why does one answer not apply to the other case (and vice versa)?

In the “ideal solutions”, it was important for me to internalize that the “ideal solution” I had been trying to replicate doesn’t exist the way I was conceiving of it, and that I could stop trying to look for it. This also meant I could give myself the permission to think for myself a bit more recklessly, and come up with my own solutions, a bit more desperately.

***

So far, I’ve been talking about what happens if we shift away from ”trying to solve the actual problem”, towards some other, more convoluted way of orienting to reality. We might describe this convoluted orientation as not noticing that you’re building maps of maps, instead of maps of reality; or as being fooled by empty expectations.

Conversely, we might wonder what it is that happens if I orient back to “trying to solve the actual problem”. What is the mode that I am advocating for here?

I believe the best way to answer this question is to try it out yourself. What does happen when you ask yourself what you’re actually trying to do, and then do that; when you - nothing but - genuinely try to do the thing you’re trying to do?

Beyond that, all I have are pointers. What I find when trying to orient to reality in this way is related to original seeing, to the mode of orienting to the world that is a constant undercurrent of the Replacing Guilt series, to “the thing” that (according to me at least) most deserves to be called “research” or “truth-seeking” or “sensemaking”.

What to do in response

Having worked through some examples and interpretative frames, let me now share some observations about what might help - what helped me - when you notice yourself getting tangled up in things other than what lets you do the “real” version of the thing you’re trying to do.

Noticing
- Learning to notice when the situation occurs (e.g. noticing the tensing-up experience from example one, or noticing the experience of “trying to be helpful” in example two) is extremely helpful in terms of gaining surface area with this specific way of getting stuck.
- Initially, I would only notice the experience after having been struggling along for maybe an hour or so. Over time, I learnt to notice the experience sooner; maybe after 20 minutes, then 10, then after 2. Eventually, I wasn’t so much noticing the experience itself, but rather a "precursor experience" (which is to say, the thoughts and mental moves responsible for the experience themselves). This process of becoming better at noticing can dramatically increase your ability to do something about the problem. (More about what to do about it below).
Replacing TAPs
- In example one, my default reaction to the tensing-up experience was to “try harder”, causing a detrimental and self-reinforcing cycle to kick in. Understanding this allowed me to replace this natural trigger-action-pattern with a more conducive behaviour/mental move (such as asking myself what I was actually trying to do, or “letting go”/”backing up”; see below).
Asking yourself “What am I actually trying to do?”
- Somehow, those “empty expectations” don’t exist in the space where I’m genuinely connected to what truly matters in what I’m trying to do. Again, this mental move can be turned into a TAP which is a great way for learning to track the “real thing”/the territory more reliably.
- “What am I actually trying to do?” is the version of this question that has been most robustly useful to me. However, depending on the situation, you might want to play around with, tweak and customize what exact question(s) you’re asking. For example:
  - What am I doing right now? Why am I doing what I'm doing? What problem/thing am I trying to solve/achieve?
  - According to my behaviour, what am I trying to achieve (on top of my explicit goal)? Or in other words, what is my current behaviour buying me? Is the way I currently try to achieve all of these goals the best way to do it? If not, can I separate out these processes, so that they cease to interfere with each other?
  - What can I gather (from the world, from my interlocutor, ..) that tells me about what’s actually valuable here?
  - Am I trying to be [helpful]? Is this still worth doing X if it's not seen as [helpful]? If no one else seemed to care, what would I still care about there?
  - etc.
Ways of “letting go”/”backing up”
- A lot of mental moves that proved helpful to me are in essence about shifting from “trying harder” to “not applying force”. These things overlap a lot with what I would do to get more grounded, such as engaging/focusing on my sensory experiences (e.g. looking at some natural structure, reading poetry, observing my breath, checking what my body feels like, drawing something, ...). All of these moves involve observing reality directly. When I look at a tree or feel my own body, there's little chance I’ll accidentally try to look at someone else's model of a tree. Speculative, this might help by reminding myself of how to make maps, rather than maps of maps.
- Something that initially was very helpful for me was to “de-prime” my mind from the “the stuckness” and frustration that I felt around my project. Successfully de-priming my mind would mean that I could go back to working on the project without immediately falling back into the same cognitive pattern, but instead being able to maintain my newly gained orientation. Doing the groundedness exercises and breaking the above-mentioned trigger-action-chain were helpful with that.
- Notes on Actually Trying also contains some pointers at how to back up and properly reorient to a task/project.
Noticing picas or bucket errors
- In all generality, we get derailed from tracking reality directly for reasons. Usually, these reasons are valid in their own rights (such as my friend's desire to provide emotional support to their partner), even if the way we currently pursue them interferes with other things we also care about (e.g. solving the actual problem). The fact that they interfere with each other is often not an inherent problem but emerges because we might be committing a bucket error (e.g. disagreeing with my partner means that I’m not being supportive), or we might be pursuing what we care about in a convoluted way (e.g. the act of obsessively worrying about something in our minds is like the “ice” we eat to address our “iron deficiency”, say, our wish for the project to go well). When we can understand these dynamics, we can often find ways to serve all of the different goals through alternative, separate actions.
Beware social cognition, and drop it wherever it doesn’t serve you
- Social cognition refers to the set of mental processes that are concerned with the social world; they aim to perceive, make sense of, remember or attend to other people and our relationship with them. Social cognition is powerful and critically useful to a lot of aspects of human life. However, spending social cognition on, say, solving a math problem is rarely helpful. It might make you ask questions like: “How quickly do I think Anne solved this problem? What will Bob think of me if they learn that I got the answer wrong for all of my first three attempts? If I don’t manage to solve this problem, what will this mean about me?” Again, I am not saying these types of thoughts are never useful for anything. I am saying that they are barely useful to actually solving things like math problems.
- So, what can you do? First, observe your thoughts for a few moments. How many of your thoughts are exclusively about what you’re working on, and how much of them are social cognition? If you have a lot of social cognition going on, ask what work this is doing for you, what needs it might be addressing? In all generality, we do things for reasons, even if the way we pursue them might not be the most effective. Now, either take care of these needs right away, or credibly commit (to yourself) to taking care of them at some later point in time. Often, the crux is in creating enough internal emotional safety such that, at this point, you can simply drop whatever social cognition that isn’t serving you and return back to the object level of what you’re working on. (Be sure though to come back to these needs if you committed to doing so. Trust (including self-trust) is a precious thing.)
Maintaining a healthy distrust in labels
- If you want to maintain a pure orientation towards what actually matters about the thing you’re working on, it is often important to be careful with labels, i.e. your choices about how to carve up the problem space into relevant entities/concepts (ontology) and the act of naming and refining these concepts (labelling)).
- Practically speaking, it is often useful to come up with short and condensed descriptions of what you’re doing (e.g. “I am a teacher”, “I am working on our annual impact evaluation”, “I’m writing a strategy document”, etc.). However, in my experience, these labels (e.g. “teacher”, “impact evaluation”, “strategy”) can become increasingly obstructive to my sense-making/problem-solving process - as if the label itself prevented me from seeing clearly what is actually there and what matters. For example, it becomes easier to move towards optimizing for a neatly formatted strategy doc, as opposed to working out relevant considerations about what your medium- and long-term plans should look like.
- Especially in the earlier phases of a process, your S1 is often better at understanding the most relevant aspects of your problem compared to what an initial, S2-generated ontology will be able to capture. If you unambiguously run with this initial ontology, you risk running off in the wrong direction. My guess is that this is related to (something like) verbal overshadowing, i.e. the fact that preverbal, S1-based intuitions can be fragile and easily overwritten by the more “forceful”, verbal, top-down S2 processes. In order to avoid overwriting what your S1 might know, you can start by maintaining a decent amount of distrust in any of the concepts and labels you initially come up with (or avoiding them altogether). Later, once your S1-understanding of the problem space has become sufficiently nuanced and robust, you can start eliciting better labels that will (hopefully) carve reality (more) at its joints.

Thanks to Logan Strohl for developing and coaching me in the methodology of naturalism. If you want to learn more about it, this is a good place to start.
Thanks to various other people for helpful discussions and comments on this post.

Types of generalism

April 20, 2021 by Nora Ammann

I am interested in the nature of inter- and transdisciplinary research, which often involves some notion of “generalism”. There are different ways to further conceptualize generalism in this context.

First, a bit of terminology that I will rely on throughout this post: I call bodies of knowledge where insights are being drawn from “source domains”. The body of knowledge that is being informed by this approach is called the “target domain”.

Directionality of generalism

We can distinguish between SFI-style generalism from FHI-style generalism? (h/t particlemania for first formulating this idea)

In the case of SFI-style generalism, the source domain is fixed and they have a portfolio of target domains that may gain value from “export”.
In the case of FHI-style generalism, the target domain is fixed and the approach is to build a portfolio of diverse source expertise.

In the case of SFI, their source domain is the study of complex systems, which they apply to topics as varied as life and intelligence, cities, economics and institutions, opinion formation, etc.

In the case of FHI, the target domain is fixed, although more vaguely than it might be, via the problem of civilization-scale consequentialism and source domains include philosophy, international relations, machine learning and more.

Full vs partial generalism

Partial generalism: Any one actor should focus on one (or a similarly small number of) source domains to draw from.

Arguments:

Ability: Any one actor can only be well-positioned to work with a small number of source domains because doing this work well requires expertise with the source domain. Expertise takes time to develop, so naturally, the number of source domains a single person will be able to draw upon (with adequate epistemic rigor) is limited.
Increasing returns to depth: The deeper an actor’s expertise in two fields they are translating between, the higher the expected value of their work. This can apply to individual researchers as well as to a team/organization doing generalist researchers.

Full generalism: As long as you fix your target domain, an actor can and should venture into many source domains.

Arguments:

Ability: An actor can do high-quality research while drawing from a (relatively) large number of source domains, some of which they only learn about as they discover them. This “ability” could come from several sources:
- The researchers’ inherent cognitive abilities
- The structure (i.e. lack of depth) of the field (sometimes a field might be sufficiently shallow in its structure that the assumption that someone can get adequately oriented within this field is justified)
- Error correction mechanisms within the intellectual community being sufficiently fit (which means that, even if an individual starts out by getting some important things wrong, error correction mechanisms guarantee that these mistakes will be readily discovered and corrected for).
Increasing returns to scope: The richer (in intellectual diversity) an actor’s expertise, the juicier the insights. Again, this argument could apply to an individual or groups of individuals working closely together.

Note that you can achieve full generalism at an organizational level while having a team of individuals that all engage in partial generalism.

In praise of being a stranger

April 19, 2021 by Nora Ammann

I like to be a stranger to the places I live in.

By stranger, I don’t mean to refer to external facts like how long you lived in a palace or your residence permit. I mean an internal posture, a way of relating to the world around you. I mean choosing to be a stranger rather than a familiar; a visitor rather than a resident.

This is an exploration of what I think is so praiseworthy about it.

I haven’t much seen people voice praise for being a stranger. In fact, I believe most people consider strangeness as something undesirable, and overcoming it an achievement. Since I have a lot of praise for being a stranger, I want to share some of my musings.

This is not meant as unambiguous endorsement, nor an attempt to convince anyone. It’s rather an invitation, of sorts, to lean into a perspective - maybe different from the perspective you take usually - on what it could be like to relate to the world around you as a stranger.

Seeing

I can see places that I have no personal business with - places I’m a friendly stranger to - in ways it’s harder for me to see places that my identity is entangled with. This is a way of looking that I am hungry for.

As a stranger, I am a bit like a scientist, a dispassionate observer: the place’s flaws and mistakes don’t offend me as much. Its virtues, too, shine more purely. This isn’t about suspending my ability to make judgements about what I think is good or bad, or what I endorse or don’t. It’s about protecting my ability to look.

The virtues and flaws, in the clarity in which they present themselves to me, become windows through which I catch glimpses of reality.

Is this just about novelty? I think not. From the first to the last day I kept finding awe and perspective in Davos’ mountains; I kept savouring the humid, heavy, sultry scent of the Beiruti air; I kept connecting to calm and clarity in the ocean waves of El Medano; the beauty of gazing upon the Swiss midlands from high up the mountain top never fades.

Yes, novelty is powerful in helping us get glimpses of beauty. But powerful, too, is the skill of re-seeing and re-discovering which we can cultivate.

What Hayao Miyazaki - Studio Ghibli - does in his movies is equivalent to what I call being a stranger. Miyazaki helps us see, through the eyes of his protagonists, the enchanted nature of his fictional worlds. He sometimes directs our attention at vastness and mightiness, and sometimes at smallness and inconspicuousness. Wonder and a sense for the rawness of reality can be found in any of these places.

Relating

There is something about relating to - and as - a stranger that is pure, and unaffected by mental constructs of identity.

A lot of people “stand for” things: ideas, roles, communities, places. And a lot of interactions happen between people who all “stand for” something. Or more accurately, those interactions happen between the things they stand for, not between the people themselves.

To me, this way of relating often feels fake, impoverished and confused. If someone addresses me as whatever the thing is they stand for, I might exchange words with them; pre-cashed, usually polite yet ultimately rather empty words. And I’d feel a sense of disgust - not at them - but at the fakeness of our interaction and the nothingness between us.

The thing is, I don’t think I know how to be in a relationship with anything that is not a person.

Relatedly, living in a place where people know me - or believe to know me; where they have conceptions of who I am - makes it such that I can end up feeling marshalled into “standing for” something myself. This makes me feel caged; imposed upon me an identity or role I never wanted, or don’t want anymore.

The beauty of meeting a stranger, in a place I’ve nothing to do with - the beauty of meeting someone distant from me in a plethora of ways - is that it bears the potential of completely stripping away all mental constructs. Into this nothingness, a mere smile can birth a connection between two personhoods.

I used to be surprised - though I no longer am - by how the genuine act of looking at someone, of actually and merely looking, can have a profound impact on them and me.

There is a human bond that can exist between two strangers - despite, across and beyond the distance of ignorance about each other’s lives, of language, culture, lifestyles. A bond created by not more than a simple smile unleashed into silence; a bond nearly ephemeral, yet deeply real.

This bond is among the strongest types of connection I have felt. Its depth and richness, I believe, come as much from the connection between two people, as from connecting to our shared humanity. A richness that is accessed by looking - unafraid, unashamed and truthfully.

The woman that smiled back at me in the streets of Ashrafieh, on my way to university, as if she saw something in me that day, that I couldn’t. The waiter at the café I used to go to in order to lose myself in thought, asking me how I was doing, causing me to snap out of my inner world, and, in the reflection of my surprise, creating a moment of genuine encounter. The old grape farmer in the hot valley in Turpan, with whom I couldn’t exchange a single word, but whose eyes held all the stories.

I used to be astonished - though I no longer am - by just how easy it is to love someone. How easy it is to love a stranger.

Practice

Being a stranger, to me, is an invitation to freedom and truthfulness. When I'm relating to the world around me as a stranger, I can see it more clearly. Or more precisely, it is easier to access this clear seeing. This way of looking is extremely precious to me and I would (indeed do, arguably) pay a lot for protecting and cultivating it. And so I tend to make myself a visitor, rather than a resident, of the places I am in.

I don’t mean to say that relating as a stranger is the only way to see clearly and truthfully. Far from. It’s a mental posture that resonates with me, and won’t for others. It’s a mental posture that is useful to me today, and might cease to be in the future.

Being a stranger is the training regime, rather than the state I aspire to. What I practice is a certain way of seeing and relating to the world. Being a stranger is like a crutch I use to walk. What I aspire to is to dance.

I haven’t addressed what might come as a cost from being a stranger. I understand and acknowledge that such costs exist, even if, at this point in my life, I am welcoming them readily and smilingly.

There is another piece of writing that would naturally go here, alongside these thoughts on being a stranger. That piece is about being alone, of being with oneself. I have praise for that, too. I might write that piece, some other day.

***

There is a sense in which we are and will always be strangers. The act of pretending we’re not is also the act of inviting a veil of fog being lowered over whatever it is that allows you to see clearly. We want to pretend we are not strangers because we fear an eternity of solitude; we fear being cut off, never quite let in, never quite being part of the circle of true familiars. This fear, I believe, is an illusion - like many of its kind. Stopping the pretence is the act of sinking into strangeness, fully and unafraid, in a way that opens, rather than closes, the world to us.

Some thoughts on Patient Longtermism

March 15, 2021 by Nora Ammann

Cross-posted to the EA forum

Epistemic status: I have written this fairly quickly with the goal of getting the gist of the ideas out, rather than trying to make a very rigorous argument. Also, I spent a limited time reading up on patient longtermism, and it might be that I am unaware of or misunderstood relevant arguments. If this turns out to be the case, I apologize and will very happily update.

Tl;dr: The question “What sorts of things store and accumulate impact potential?" ought to be one of the central questions of patient longtermism. We can further break it down into a) exploring what such ‘things’ (other than money), with the desired properties (i.e. storing, accumulation), exist, and b) how to render their impact potential commensurable, so as to be able to assess in which one(s) a patient longtermist would want to invest. I believe this question/frame currently receives not enough attention.

This post provides a good, short introduction to the key tenets of patient longtermism. For a more extensive treatment of the subject, see this paper by Phillip Tremmel and this 80,000 Hours interview with Phillip.

I think considerations raised by patient longtermism are important, and I am in favour of acting upon its implication with some of our resources, in view of diversifying our altruistic portfolio. That said, based on the arguments I’ve heard so far, I remain very sceptical of what one might take as a first-order implication: investing most of one’s money to be used in a couple of hundred or thousand years.

Patient Longtermism as a reference frame

Arguments about patient longtermism seem to naturally gravitate to discussing money as the key resource under discussion. I don’t mean to say that patient longtermism necessarily implies that investing money is the best way to act upon its implications, nor that people who have thought about it most suggest it does. However, I believe there currently exists, as a matter of fact, such a tendency. Reasons for why this is the case include the fact that money provides a straightforward way to quantify the discussion, or that we understand things about how it accumulates over time (e.g. historical interest rates). However, limiting our analysis to money would be a mistake.

In particular, I believe that the following ought to be one of the central questions of patient longtermism:

What types of things can store and accumulate ‘impact potential’ over time?

I think we should spend more time thinking about a) what such ‘things’ other than money exist, and b) how to render their impact potential commensurable so as to be able to assess in which one a patient longtermist would want to invest.

I will share a few thoughts on potential contestants for a), and some frugal arguments for how they might compare to money as a means to store and accumulate impact potential.

Types of things that store and accumulate “impact potential“ over time

My number one contestant is knowledge. I believe that knowledge is critical for any attempts of doing good and steering humanity into a flourishing future. Importantly, I also believe that knowledge has similar "accumulative" properties to invested money.

Knowledge created now will have us ask better questions tomorrow; knowledge can create better tools for generating more knowledge; and knowledge creation has the potential to uncover critical considerations, illuminate known unknowns and uncover unknown unknowns. Personally, I think the rationale for investing in knowledge creation is extremely strong. If I was given some large sum of money to spend on altruistic causes, I would invest it in knowledge creation rather than longterm financial investments.

Another contestant, although pretty hard to measure, is something like "civic capacity" or "the quality of our institutions". According to me, there are two main reasons in favour of investing in our better institutions. One, institutional development has high path-dependency. This suggests that improving institutions earlier rather than later might have large payoffs compared to the counterfactual scenario. (Some relevant academic work I am aware of on this topic are from Jenna Bednar, Scott Page, Elinor Ostrom and Douglass North.) Two, institutions interface with nearly every other area of concern, such as economics, science/knowledge creation, innovation, national and international politics. For example, the field of development economics has thought about a version of this issue for some time now and offers a bunch of reasons why investing in good institutions might be extremely "lucrative”, among others because the health of an economy and the nature of its political institutions are tightly linked.

However, this category does to me definitely seem harder to think about than, say knowledge or money. It is, for example, not straightforward how you get better institutions (“tractability concern”). There are many examples of institutional changes that did in fact not lead to an accumulation of positive consequences downstream. In the case of improving institutions in the developing world specifically, there is already an entire sector trying to work away on this, so that's less neglected (“neglectedness concern”).

However, I think it would be a mistake to limit the mission of creating better institutions to development economics. For example, I consider AI governance, or more generally the governance of emergent technologies, as part of this category. I, like many others, believe that this is likely a very impactful issue to work on. As a final example of what we might want to consider part of the “better institutions” bucket is something like “getting our social media and information ecosystem dilemma right”. Lots of money in the future but a very confused and fragmented world is likely not the ideal scenario from an altruistic perspective.

The Inner Workings of Resourcefulness

February 25, 2021 by Nora Ammann

Cross-posted to Less Wrong.

Meta: If not for its clumsiness, I would have titled this piece “some of the inner workings of resourcefulness”. In other words, I do not want to pretend this piece offers a comprehensive account, but merely partially insights.

About this time a year ago - when for a lot of people, the world came crashing in on them - I started obsessing over an idea. Maybe, I thought to myself, one of the most valuable things I could be focusing on at the moment is becoming more resourceful.

In this time of crises, starting early 2020, what was, in a sense, most needed were people capable of dealing with flexibly new, difficult situations and of finding and enacting creative solutions despite constraint and uncertainty. People capable of staying oriented, in the face of unfolding chaos, while also being relentlessly effective in their action. People who can give, plentifully, when others are in need of help. Resourceful people

For much of March, I wasn't resourceful. I had shit on my own to figure out, and a lot of my energy flew (sunk!) into that, and into handling the wave of uncertainty that, for a couple of days, took away my breath.

With opportunities for meaningful mitigative action at my fingertips yet much of my resources gobbled up, I couldn't stop ruminating the question of what I could have done to be a more valuable actor now.

For some time now, I have been deliberately working on making myself more valuable to this world. Never before in my lifetime was the world so much in need of me. And yet I felt so utterly unprepared.

This is when it came to me that resourcefulness was an important concept in this quest of mine.

Turns out, I didn’t know - not really - what being resourceful means. Or rather, while I could give a description of what resourcefulness might look like, I wasn’t able to “pull back the curtains” and look at the “inner workings” of resourcefulness. I needed to remedy this situation, and I have been pondering the nature of resourcefulness ever since then.

This morning, a day in early 2021, like a bird flying up to my window bench in joyful chatter, the thought came to me: What have I learnt about resourcefulness over the last year? Let’s take stock.

I am confident I haven’t finished my quest to understand the inner workings of resourcefulness. In fact, the rest of this post will adopt only two of many possible angles on the subject. But I have learnt some things about resourcefulness over the past year - since that wave of raw uncertainty had stolen my breath for some time. It is, arguably, the most important axis of progress I've gone through in 2020.

Importantly, my reflections aren’t meant to only apply to “proper times of crisis”, such as an unravelling pandemic. The world needs resourceful people at any time, and you won’t build resourcefulness any other time than now. Our society is structured such that we see people “on their big days”, such as an athlete on the day of the game. However, every good athlete will tell you that their victory (or failure) hadn’t been determined on the court (or whatever the relevant equivalent), but on all the days leading up to it.

Every day is when you build form. Every day is when it matters.

Roles, Slack and Intrapersonal Freedom

In the rationality community, Slack is a semi-artistic concept referring to the absence of binding constraints on behaviour.

From the post originally introducing the term:

Slack means margin for error. You can relax.
Slack allows pursuing opportunities. You can explore. You can trade.
Slack prevents desperation. You can avoid bad trades and wait for better spots. You can be efficient.
Slack permits planning for the long term. You can invest.
Slack enables doing things for your own amusement. You can play games. You can have fun.
Slack enables doing the right thing. Stand by your friends. Reward the worthy. Punish the wicked. You can have a code.
Slack presents things as they are without concern for how things look or what others think. You can be honest.
You can do some of these things, and choose not to do others. Because you don’t have to.
Only with slack can one be a righteous dude.

Of particular relevance to my then-situation: Slack allows adjusting your priorities according to what is most needed. This isn't to say that you should never commit part of your resources to longer-term projects. Investing for larger longer-term gains is great. And so is having Slack.

We often start by looking for the source of Slack (or the causes for the lack thereof) in the external world. Someone who has a lot of savings, say, has more Slack than someone who cannot afford to lose their salary, even just for some time.

Interestingly, the thing that was consuming Slack in the case of Nora-from-early-2020 hardly wasn't object-level constraints at all. Instead, a lot came down to me not being psychologically ready to adjust her priorities as fast and as effectively as would have been adequate.

In April 2020, I started to contribute to a covid-relevant project. But although my then-employer explicitly allowed me to direct my working hours towards that project, I didn't do so to the extent I should have.

Instead, I started working what was much more like two jobs, logging more work hours that I' had ever done before. I was conscious of and tried hard to make the situation as sustainable as possible. But the urgency at that time was overwhelming, so I succeeded at this goal only so much.

Retrospectively, I now think that a significant portion of my "trying hard" was directed at the wrong thing. I focused most of my attention on freeing up times "at the margins" (e.g. streamlining daily routines such as food preparation and consumption, household chores, etc; making sure my sleep was as good as it could be, including using melatonin to minimize the amount of time I laid in bed not yet asleep but also unproductive; doing shorter but more intense workouts; ...). And sure, these were mostly good and worthwhile interventions. But they weren't where I could have gotten "most bang for the buck".

What I should instead have done was to "sweep clear" my metaphorical work desk; I should have gone through my list of commitments at the time and marked each one that wasn't critical and urgent ("something breaks if I don't do this now"); I should have explicitly put an "on pause" sign on each of the non-critical ones, and I should have treated this as an administrative as well as a psychological process.

In particular, I should have talked to my then-employer to clarify that, instead of reducing my workload by a third, I needed to reduce it to 0-10% for the next three weeks, and then upon reevaluation, potentially for longer.

Of course, this isn't always possible. But in my case, it was. And I knew it was. My then-employer would have approved and supported this decision. The problem was with me. I felt a misguided sense of duty to my "role". My psychology was what was consuming my Slack.

The resource that I was most in need of wasn't just number of hours. Of course, these mattered. But the more scarce resource was actually my attention.

Freeing up something like a third of my time allowed me to get a bunch of valuable tasks done for the covid-related project. But it didn't allow me to become a reliable and agentic contributor, able to take on responsibility and show initiative. Something that the project was in dire need of at the time.

The best example of this is when I at some point half-heartedly tried to take on the task of getting a policy report ready to be shipped. There were indeed a lot of things that made this undertaking hard, and maybe I wouldn't have succeeded at it either way. But fact is, I didn't give it my all. I hid behind the excuse that I was only doing this to "help out", and didn't have a sufficient overview of what was going on in all corners of the project and that I lacked some skills that would have made me a more abt person to lead this project.

I would like to believe that, had I had more attention by letting go of other temporarily-dispensable responsibilities, I would have not taken more ownership. I don't know that this is what would have happened - I think I still would have struggled -, but I do know it would have been much more likely.

(To be clear, this is not meant as uncontextualized advice to, at any instance of apparent or real urgency, let go of all commitments you have made up to that point.)

What I mean to say here, really, is that what took away my Slack was my own mind in, what turns out to be, an illusionary and misguided search for safety.

The human mind is susceptible to the illusion of safety. Ultimate safety is an illusion, and so is the emotional response to the absence of it. In transcending the excessive striving for safety one finds another dimension of freedom: intrapersonal freedom.

The human mind is particularly susceptible to place its sense of safety into other people, or objects, or - like in my case - things like roles and duty.

By working on transcending this illusion, I have since gained Slack and become more resourceful.

Personal resilience

A low-resolution summary of this next section might suggest I’m talking about “self-care” - the idea that before you can help others, you need to help yourself. While such a summary wouldn’t exactly be incorrect, it would have me conclude that I didn’t manage to get my point across faithfully. In fact, I feel reluctant towards generic advice of that sort. Not because I think it's wrong, but because it's a too low-bandwidth way of communicating what I consider is a real thing. I hope that I’ll manage to do better than that.

The second angle on the inner workings of resourcefulness I want to adopt is captured in the idea of resilience. I take resilience to be the capacity to recover quickly from disruptive events and subsequently adapt (“bounce back and become stronger”).

I have found it useful to model my personal resilience as follows:

Increasing resilience equates to widening and steepening the slopes of my attraction basin of wellbeing - the area where I am well and highly functional.

The wider my attraction basin, the larger are the turbulences I can withstand without causing the entire system to tip over. The steeper the slopes of my attraction basin, the faster I can recover from turbulences, the sooner I'm back in the driver's seat.

Here is a bad hand-drawing. If it confuses you more, ignore it. The idea is that attraction basin A is wider than B, and has steeper slopes than C. A is thus preferable over both B and C. — Here is a bad hand-drawing. If it confuses you *more*, ignore it. The idea is that attraction basin A is wider than B, and has steeper slopes than C. A is thus preferable over both B and C.

The dimensions that define the basin of attraction are physical, physiological, mental and emotional. For example:

Certain features of my physical space make it easier for me to be functional (e.g. I need my laptop, I work better with additional large screens, places with good weather and accessible nature spaces to support my wellbeing, ...).
My physiological wellbeing is the largest determinant for my overall energy levels and is influenced by my sleeping, eating and workout habits.
My mental energy permits me to think clearly, sustain high levels of concentration for longer and direct my attention to those things/details that matter most. The main factors reducing my mental energy are unsustainable work habits and a lack of mental hygiene (clarity on priorities, personal planning and admin, etc) which primarily manifests in my case as "overwhelm" and "mental restlessness".
Finally, high-flaring emotions can absorb large amounts of resources, which is particularly costly if highly unpredictable; but also be a force for good as a critical source of information, inspiration and, if channelled correctly, Clear Seeing.

“Tipping over” in this framework means that, for example, I’ve worked so much that I need a recovery day (or more than one), because a rest day doesn’t do the job. Or I get sick because I haven’t taken care of my body. Or I temporarily lose my motivation (in contrast to maintaining my motivation while deciding to take a break). That sort of thing.

So, according to this framework, how can you strengthen your resilience?

For one, you can also work on making the slopes of the basin of attraction steeper. This amounts to increasing your “generative capacity” - how quickly are you able to recover and restore full-functionality after turbulences.

I suspect that the details of this are fairly person dependent. Some things that have been helpful for me included having a go-to (physical or mental) list of things that are reliably restorative. If you’re not yet happy with that list of yours, experimentation is key. Even before that though, I benefited a lot from getting better at noticing and understanding my body, my feelings and my needs. Which, straightforwardly, makes it easier to respond to these needs and thus refind your functional equilibrium faster. The skill of being in touch with your needs, yet also not absorbed by them deserve its own post - and I’m not writing that post right now. (I guess it’s easy enough to link to Gendlin’s Focusing as something that has helped me and others I know in making progress on this front. However, this really covers just one aspect of the bigger theme.)

For two, you can work on widening your attraction basin. My preferred method for this is increasing self-alignment. In fact, I wouldn’t hesitate a second in stipulating that the key pillar of my productivity is self-alignment.

Whenever you require self-control or self-coercion to do something, there is at least one part of you that is at odds with some other part of you. You're pulling in different directions. This is an unproductive use of your energy. Fostering self-alignment seeks to get rid of internal conflict. The goal is for all of you to be able to pull in the same direction. A direction that, having incorporated the data and considerations of all your different parts, you have come to consider most promising.

However, self-alignment is not being built overnight, and - just like with inter-personal friendships - it’s not true that you that, once gained, you will have it forever after without regularly checking-in. It is a slow but worthwhile process of building internal trust and strengthening communication bandwidth between different parts within yourself and across time.

Lastly, if self-alignment makes you less agentic, because you’re unwilling to push yourself to do things that are uncomfortable, I posit that you are doing self-alignment wrong. Self-alignment isn’t about optimizing away discomfort or conflict (although a reduction of internal conflict does often occur as a downstream effect). It’s about aligning your actions with your goals and your goals with your values, and doing so in a way that actually works in the long run.

Acknowledgements: Thanks to Jan Pieter Snoeij and Neel Nanda for useful comments on earlier drafts.

On the nature of purpose

January 22, 2021 by Nora Ammann in Essay

[cross-posted to LessWrong]

Introduction

Is the concept of purposes, and more generally teleological accounts of behaviour, to be banished from the field of biology?

For many years - essentially since the idea of Darwinian natural selection has started to be properly understood and integrated into the intellectual fabric of the field -, “yes” was the consensus answer among scholars. Much more recently, however, interest in this question has re-sparked - notably driven by voices that contradict that former consensus.

This is the context in which this letter exchange between the philosophers Alex Rosenberg and Daniel Dennett is taking place. What is the nature of "purposes"? Are they real? But mostly, what would it even mean for them to be?

In the following, I will provide a summary and discussion of what I consider the key points and lines of disagreements between the two. Quotes, if not specified otherwise, are taken from the letter exchange.

Rosenberg’s crux

Rosenberg and Dennett agree on large parts of their respective worldviews. They both share a "disenchanted" naturalist's view - they believe that reality is (nothing but) causal and (in principle) explainable. They subscribe to the narrative of reductionism which acclaims how scientific progress emancipated, first, the world of physics, and later the chemical and biological one, from metaphysical beliefs. Through Darwin, we have come to understand the fundamental drivers of life as we know it - variation and natural selection.

But despite their shared epistemic foundation, Rosenberg suspects a fundamental difference in their views concerning the nature of purpose. Rosenberg - contrary to Dennett - sees a necessity for science (and scientists) to disabuse themselves, entirely, from any anthropocentric speech of purpose and meaning. Anyone who considers the use of the “intentional stance” as justified, so Rosenberg, would have to reconcile the following:

What is the mechanism by which Darwinian natural selection turns reasons (tracked by the individual as purpose, meaning, beliefs and intentions) into causes (affecting the material world)?

Rosenberg, of course, doesn't deny that humans - what he refers to as Gregorian creatures shaped by biological as well as cultural evolution - experience higher-level properties like emotions, intentions and meaning. Wilfrid Sellars calls this the "manifest image": the framework in terms of which we ordinarily perceive and make sense of ourselves and the world. [1] But Rosenberg sees a tension between the scientific and the manifest image - one that is, to his eyes, irreconcilable.

"Darwinism is the only game in town", so Rosenberg. Everything can, and ought to be, explained in terms of it. These higher-level properties - sweetness, cuteness, sexiness, funniness, colour, solidity, weight (not mass!) - are radically illusionary. Darwin's account of natural selection doesn't explain purpose, it explains it away. Just like physics and biology, so do cognitive sciences and psychology now have to become disabused from the “intentional stance”.

In other words, it's the recalcitrance of meaning that bothers Rosenberg - the fact that we appear to need it in how we make sense of the world, while also being unable to properly integrate it in our scientific understanding.

As Quine put it: "One may accept the Brentano thesis [about the nature of intentionality] as either showing the indispensability of intentional idioms and the importance of an autonomous science of intention, or as showing the baselessness of intentional idioms and the emptiness of a science of intention." [2]

Rosenberg is compelled by the latter path. In his view, the recalcitrance of meaning is "the last bastion of resistance to the scientific world view. Science can do without them, in fact, it must do without them in its description of reality." He doesn't claim that notions of meaning have never been useful, but that they have "outlived their usefulness", replaced, today, with better tools of scientific inquiry.

As I understand it, Rosenberg argues that purposes aren't real because they aren’t tied up with reality, unable to affect the physical world. Acting as if they were real (by relying on the concept to explain observations) is contributing to confusion and convoluted thinking. We ought, instead, to resort to the classical Darwinian explanations, where all behaviour boils down to evolutionary advantages and procreation (in a way that explains purpose away).

Rosenberg’s crux (or rather, my interpretation thereof) is that, if you want to claim that purposes are real - if you want to maintain purpose as a scientifically justified concept, one that is reconcilable with science -, you need to be able to account for how reasons turn into causes.

***

Perfectly real illusions

While Dennett recognized the challenges presented by Rosenberg, he refuses to be troubled by them. Dennett paints a possible "third path" to Quine’s puzzle by suggesting to understand the manifest image (i.e. mental properties, qualia) neither as "as real as physics" (thereby making it incomprehensible to science) nor as "radically illusionary" (thereby troubling our self-understanding as Gregorian creatures). Instead, Dennett suggests, we can understand it as a user-illusion: "ways of being informed about things that matter to us in the world (our affordances) because of the way we and the environment we live in (microphysically [3]) are."

I suggest that this is, in essence, a deeply pragmatic account. (What account other than pragmatism, really, could utter, with the same ethos, a sentence like: "These are perfectly real illusions!")

While not explicitly saying such, we can interpret Dennett as invoking the bounded nature of human minds and their perceptual capacity. Mental representations, while not representing reality fully truthfully (e.g. there is no microphysical account of colours, just photons), they also aren't arbitrary. They are issued (in part) from reality, and through compression inherent to the mind’s cognitive processes, these representations get distorted such as to form false, yet in all likelihood useful, illusions.

These representations are useful because they have evolved to be such: after all, it is through the interaction with the causal world that the Darwinian fitness of an agent is determined; whether we live or die, procreate or fail to do so. Our ability to perceive has been shaped by evolution to track reality (i.e. to be truthful), but only exactly to the extent that this gives us a fitness advantage (i.e. is useful). Our perceptions are neither completely unrestrained nor completely constrained by reality, and therefore they are neither entirely arbitrary nor entirely accurate.

Let’s talk about the nature of patterns for a moment. Patterns are critical to how intelligent creatures make sense of and navigate the world. They allow (what would otherwise be far too much) data to be compressed, while still granting predictive power. But are patterns real? Patterns directly stem from reality - they are to be found in reality - and, in this very sense, they are real. But, if there wasn’t anyone or anything to perceive and make use of this structural property of the real world, it wouldn’t be meaningful to talk of patterns. Reality doesn’t care about patterns. Observers/agents do.

This same reasoning can be applied to intentions. Intentions are meaningful patterns in the world. An observer with limited resources who wants to make sense of the world (i.e. an agent that wants to reduce sample complexity) can abstract along the dimension of "intentionality" to reliably get good predictions about the world. (Except, "abstracting along the dimension of intentionality" isn't an active choice of the observer, rather than something that emerges because intentions are a meaningful pattern.) The "intentionality-based" prediction does well at ignoring variables that aren't sufficiently predictive and capturing the ones that are, which is critical in the context of a bounded agent.

Another point in case: affordances. In the preface to his book Surfing Uncertainty, Andy Clark writes : “ [...] different (but densely interanimanted) neural populations learn to predict various organism-salient regularities pertaining at many spatial and temporal scales. [...] The world is thus revealed as a world tailored to human needs, tasks and actions. It is a world built of affordances - opportunities for action and intervention.“ Just like patterns, the world isn’t made up of affordances. And yet, they are real in the sense of what Dennett calls user-illusions.

***

The cryptographer’s constraint

Dennett goes on to endow these illusionary reasons with further “realness” by invoking the cryptographer's constraint:

It is extremely hard - practically infeasible - to design an even minimally complex system for the code of which there exists more than one reasonable decryption/translation.

Dennett uses a simple crossword puzzle to illustrate the idea: “Consider a crossword puzzle that has two different solutions, and in which there is no fact that settles which is the “correct” solution. The composer of the crossword went to great pains to devise a puzzle with two solutions. [...] If making a simple crossword puzzle with two solutions is difficult, imagine how difficult it would be to take the whole corpus of human utterances in all languages and come up with a pair of equally good versions of Google Translate that disagreed!” [slight edits to improve readability]

The practical consequence of the constraint is that, “if you can find one reasonable decryption of a cipher-text, you’ve found the decryption.” Furthermore, this constraint is a general property of all forms of encryption/decryption.

Let’s look at the sentence: “Give me a peppermint candy!”

Given the cryptographer’s constraint, there are, practically speaking, very (read: astronomically) few plausible interpretations of the words “peppermint”, “candy”, etc. This is at the heart of what makes meaning non-arbitrary and language reliable.

To add a bit of nuance: the fact that the concept "peppermint" reliably translates to the same meaning across minds requires iterated interactions. In other words, Dennett doesn’t claim that, if I just now came up with an entirely new concept (say "klup"), its meaning would immediately be unambiguously clear. But its meaning (across minds) would become increasingly more precise and robust after using it for some time, and - on evolutionary time horizons - we can be preeetty sure we mean (to all practical relevance) the same things by the words we use.

But what does all this have to do with the question of whether purpose is real? Here we go:

The cryptographer's constraint - which I will henceforth refer to as the principle of pragmatic reliability [4]- is an essential puzzle piece to understanding what allows representations of reasons (e.g. a sentence making a claim) to turn into causes (e.g. a human taking a certain action because of that claim).

We are thus starting to get closer to Rosenberg’s crux as stated above: a scientific account for how reasons become causes. There is one more leap to take.

***

Reasons-turning-causes

Having invoked the role of pragmatic reliability, let’s examine another pillar of Dennett's view - one that will eventually get us all the way to addressing Rosenberg’s crux.

Rosenberg says: "I see how we represent in public language, turning inscriptions and noises into symbols. I don’t see how, prior to us and our language, mother nature (a.k.a Darwinian natural selection) did it."

What Rosenberg conceives to be an insurmountable challenge to Dennett’s view, the latter prefers to walk around rather than over, figuratively speaking. As developed at length in his book From Bacteria to Bach and Back, Dennett suggests that "mother nature didn’t represent reasons at all", nor did it need to.

First, the mechanism of natural selection uncovers what Dennett calls "free-floating rationales” - reasons that existed billions of years before and independent of reasoners. Only when the tree of life grew a particular (and so far unique) branch - humans, together with their use of language -, these reasons start to get represented.

"We humans are the first reason representers on the planet and maybe in the universe. Free-floating rationales are not represented anywhere, not in the mind of God, or Mother Nature, and not in the organisms who benefit from all the arrangements of their bodies for which there are good reasons. Reasons don’t have to be representations; representations of reasons do."

This is to say: it isn't exactly the reasons, so much as their representations, that become causes.

Reasons-turning-causes, so Dennett, is unique to humans because only humans represent reasons. I would nuance that the capacity to represent lives on a spectrum rather than a binary. Some other animals seem to be able to do something like representation, too. [5] That said, humans remain unchallenged in the degree two which they have developed the capacity to represent (among the forms of life we are currently aware of).

"Bears have a good reason to hibernate, but they don’t represent that reason, any more than trees represent their reason for growing tall: to get more sunlight than the competition." While there are rational explanations for the bear’s or the tree’s behaviour, they don’t understand, think about or represent these reasons. The rationale has been discovered by natural selection, but the bear/tree doesn’t know - nor does it need to - why it wants to stay in their dens during winter.

Language plays a critical role in this entire representation-jazz. Language is instrumental to our ability to represent; whether as necessary precursor, mediator or (ex-post) manifestation of that ability remains a controversial question among philosophers of language. Less controversial, however, is the role of language in allowing us to externalize representations of reasons, thereby “creating causes” not only for ourselves but also for people around us. Wilfrid Sellars suggested that language bore what he calls “the space of reasons” - the space of argumentation, explanation, query and persuasion. [6] In other words, language bore the space in which reasons can become causes.

We can even go a step further: while acknowledging the role of natural selection in shaping what we are - the fact that the purposes of our genes are determined by natural selection -, we are still free to make our own choices. To put it differently: "Humans create the purposes they are subject to; we are not subject to purposes created by something external to us.” [7]

In From Darwin to Derrida: Selfish Genes, Social Selves, and the Meanings of Life, David Haigh argues for this point of view by suggesting that there does not need be full concordance, nor congruity, between our psychological motivations (e.g. wanting to engage in sexual activity because it is pleasurable, wanting to eat a certain food because it is tasty) and the reasons why we have those motivations (e.g. in order to pass on our genetic material).

There is a piece of folk wisdom that goes: “the meaning of life is the meaning we give it”. Based on what has been discussed in this essay, we can see this saying in a different, more scientific light: as a testimony of the fact that we humans are creatures that represent meaning, and by doing so we turn “free-floating rationales” into causes that govern our own.

Thanks to particlemania, Kyle Scott and Romeo Stevens for useful discussions and comments on earlier drafts of this post.

***

[1] Sellars, Wilfrid. "Philosophy and the scientific image of man." Science, perception and reality 2 (1963).
Also see: deVries, Willem. "Wilfrid Sellars", The Stanford Encyclopedia of Philosophy (Fall 2020 Edition). Retrieved from: https://plato.stanford.edu/archives/fall2020/entries/sellars/

[2] Quine, Willard Van Orman. "Word and Object. New Edition." MIT Press (1960).

[3] I.e. the physical world at the level of atoms

[4] AI safety relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research.

Language works, evidenced by the striking success of human civilisations made possible through advanced coordination which in return requires advanced communication. (Sure, humans miscommunicate what feels like a whole lot, but in the bigger scheme of things, we still appear to be pretty damn good at this communication thing.)

Notably, language works without there being theoretically air-tight proofs that map meanings on words.

Right there, we have an empirical case study of a symbolic system that functions on a (merely) pragmatically reliable regime. We can use it to inform our priors on how well this regime might work in other systems, such as AI, and how and why it tends to fail.

One might argue that a pragmatically reliable alignment isn’t enough - not given the sheer optimization power of the systems we are talking about. Maybe that is true; maybe we do need more certainty than pragmatism can provide. Nevertheless, I believe that there are sufficient reasons for why this is an avenue worth exploring further.

Personally, I am most interested in this line of thinking from an AI ecosystems/CAIS point of view, and as a way of addressing (what I consider a major challenge) the problem of the transient and contextual nature of preferences.

[5] People wanting to think about this more might be interesting in looking into vocal (production) learning - the ability to “modify acoustic and syntactic sounds, acquire new sounds via imitation, and produce vocalizations”. This conversation might be a good starting point.

[6] Sellars, Wilfrid. In the space of reasons: Selected essays of Wilfrid Sellars. Harvard University Press (2007).

[7] Quoted from: https://twitter.com/ironick/status/1324778875763773448

January 2021: Chandoling, donkeys and utilities — *January 2021*: Chandoling, donkeys and utilities

“It's not about making sense!” - boggling at human behaviour

November 30, 2020 by Nora Ammann

I regularly feel confused about behaviour that I observe in other people. And I genuinely mean confuse - just like I am confused when, say, I fail to understand a mathematical equation. I believe that there must be reasons for those people to behave in that way - I just can't see them from where I'm currently standing.

I specifically mean behaviours that one would generally consider common, mostly quite ordinary. For example, why do people paint their nails? Why do people dress up in ways that go far beyond the fundamental reasons why we wear clothes in the first place? Why do people own so many clothes? Why do people take long hot baths, and also light candles and put on soothing music when they do that? Why do some people make it look like they're having a conversation, but really there is close to no information being exchanged in that "conversation"? Why do people put extremely expensive creams and oils on their body? Why do people buy way more popcorn in the cinema than they can eat? Why do teenagers like to gather in large groups in public places in order to… sit around and say mostly meaningless things? Why do some couples spend nearly all of their time together, every day, while also ending up not doing anything much at all?

And, before you raise your eyebrows thinking to yourself that I sure must be a naive person for asking those questions. I get it. There *are* explanation for why we see such behaviour. Signalling, hormones, evolution, “trends” and stuff.

These are valid and good explanations at a certain level of abstraction. But what keeps bugging me is that my brain still cannot model how and why, locally, someone feels inclined to do [insert one of the examples I give above].

***

Recently, I sat in a train, and the woman in front of me had long artificial nails glued onto her fingertips. And, seriously, I’m happy for her to wear whatever the heck she wants. I just… I want to understand what’s going on in her mind as she’s putting these nails on, or as she’s going through her day with them. I don’t currently get that, and that feels like a hole in my model of the world - which I have to try filling!

And then, it suddenly dawned on me. The puzzle piece I was looking for.

All along, I have been foolishly asking the wrong question. My nagging curiosity was circling around the question of how does that make sense. It has to make sense somehow.

But it’s not about "making sense" at all.

It feels good. It’s about what it feels like when you do the thing.

Putting on nice clothes, makeup, jewellery, perfume, grooming your beard.... People like the way it makes them feel. You might feel more confident, more beautiful, more handsome, more put together, more like an adult, more whatever.

Having conversations without exchanging information - it makes you feel like you’re part of your community, or like you’ve done your good deed for the day (having called grandma).

Spending hours on end with your boy/girl-friend makes you feel like there is someone there, someone “picked you”. That feels good. (Although your day might have been boring like hell. )

The positive valence when conducting a certain behaviour is the vector through which evolution takes its course in a bounded rational actor. You can’t make a bounded rational actor seriously consider and weigh up all his options for most of the time. Instead, you have to compress a bunch of information into a single signal. It's the mechanism through which a behaviour becomes an attractor.

“When I do X, I feel good. I want to do more X.” That’s the immediate vector via which evolution (via mechanisms such as signalling) takes its course.

***

Epilogue: In order for this mechanism to work (“When I do X, I feel good. I want to do more X.”), your agent needs to be somewhat in touch with their phenomenological experience. This seems more “close to home” for some people than for others.

And even though my initial somewhat clumsy attempt to make sense of my environment might suggest otherwise, this does make sense to me, too. There are definitely things that I recognise as primarily "feeling good", in a way pushed far into the background the question of whether I have or not a neat story for how these things make sense.

But I have to admit that, for me, a lot of this requires a fairly conscious act of learning. For example, I once painted my nails and consciously paid attention to what it made me feel. It was only then that I realized that I could indeed notice something like “This looks beautiful. That feels nice.” I did a similar thing with taking a bath and trying to actually understand what it is that some people really like about it. While this hasn’t led me to constantly have my fingernails painted or take hot baths on a frequent basis, I’m glad for having and continuing to expand my understanding of human existence. I sure have lots left to learn.

*Summer 2020: Cool people lying next to me on the grass.*

Sense-making: insights and advice for future-me

June 18, 2020 by Nora Ammann

[Epistemic status: I have different levels of confidence about different points I’m making. Overall though, they are all mostly based on observations and some introspection, thus take with a couple of grains of salt.
I’m writing this with my future-self as the primary target audience. This might mean that in some instances, I’m using references I’m confident will make sense to my future-self, but that might not necessarily make sense to other people. Where it was particularly cheap, I tried to avoid this, but there are still some instances where I did not bother to provide a whole lot.

Preamble: A pandemic - “is this real?”

Early March 2020: Watching the unfolding of a global outbreak of covid-19 has taught me a bunch of things. Things I think are meaningful insights about how the world and how our society works. Things I think are meaningful to my personal growth.

Trying to make sense of the covid-19 pandemic has offered me a real-world learning environment of an intensity I don’t think I had before. Here, I aim to capture some insights such that my future-self can remember and build up upon them. Note that these aren’t insights about object-level questions of how the world works, but insights that mostly concern the functioning of cognition and epistemics processes, which I hope will improve my general capacity of ‘making sense of the world’.

To most readers, I expect the below ‘insights’ will mostly sound obvious at first sight. And they are in a sense. What I’m trying to capture for my future-self is a tacit understanding of those insights. I want to be able to do these things, all the time, perfectly.

Part 1. Prepare your mind to accept the evidence

>> Watch your ‘emotional brain’. It might not like what the evidence suggests. It might very much like coming to certain conclusions, and not to others. No, your emotional brain is not dumb, it’s not a ‘bad actor’. But it might be… scared.

There are a bunch of things that help with aligning your emotional brain with whatever evidence suggests. Personally, I think the most valuable approach is to be thinking in the framework of ‘long-term relationship-building with yourself’.

By ‘relationship building’, I mean things like a) setting up high-bandwidth communication channels with your emotional brain, b) building trust between different parts of your mind, c) building trust, coherence and consistency between different temporal versions of yourself.

While building this sort of relationship to yourself (and different ‘parts’ of your mind) is extremely valuable, it does require you to start early.

This is why, next to the encouragement for long-term relationship building with yourself, I also want to list a couple of more immediately actionable techniques that are helpful. (Note that some of these are techniques with a lot of pedagogical depth, which I am not intending to give justice to here. For anyone who isn’t familiar with these techniques/concepts, the links provided should make for a good starting point.)

Lines of retreat
Negative visualisation (page 226)
Focusing (page 64)
Internal communication framework
Internal double crux (page 144)
Bucket errors (page 80)
Understanding ‘shoulds’ (page 167)
And others...

The point is: engage with the worries, objections and rationalizations your emotional brain presents to you - instead of either i) running with whatever your emotional brain wants, or ii) inconsiderately denying their realness, legitimacy and informational content. I like the analogy of how I think good parents engage with their children. Namely, by taking them seriously, by taking them as a whole, entire person - while engaging with them in a way that is appropriate and the child can understand.

Part 2. Intellectual agency

>> Develop your intellectual agency. You can figure stuff out through reasoning. You can actually make progress through using your cognition. Some more granular breakdown of this lesson includes:

This is not a school exam. You won’t find the platonic The-Correct-Answer, and even if you did, no one will put a checkmark behind your answer so you know you got it right. More importantly, you don’t have to know the-correct-answer in order to be allowed to engage in the sense-making and problem-solving process. Another way to say this: There are no adults in the room. Not a single one. Nihil supernum.
Good object-level information and understanding matter. Collect your facts, identify your sources, compare them, understand their epistemic status, let yourself be moved by evidence, don’t over-update on speculations, consider alternative hypothesis, think about how these factors interact, think at different levels of abstraction, use different lenses to look at the world (e.g. economics, sociological, psychological, …), reason, use your cognition, etc.
1. Information is the link through which you engage with Reality (as opposed to with phantoms that only live in your or others’ heads). Seek real-world feedback loops. Make sure you stay in touch with reality, make sure you stay grounded.
2. Build informational self-efficacy. Strengthening your information base empowers you. You will be less vulnerable to random speculation you get exposed to, you will be better equipped to integrate new pieces of evidence, you will be better prepared to defend your groundedness against denial and panic.
3. Information is the fuel your ability to reason depends on. Read that sentence again.
You can make progress.
1. You can make progress. One piece of information builds on the other one. When you just started the jigsaw puzzle, it feels hopeless trying to guess what picture it will show at the end. But if you keep going, at some point, contours start to show, shapes start to appear, patterns start emerging. Be a builder, and have the patience and trust to start building, one puzzle piece at the time.
2. You can make progress, often faster than you think. It is astonishing how much headway you can make in just a day of reading and sense-making.
There is a skill to dealing with uncertainty and risk. I have only just started really getting some gears on what it means to appropriately think about and deal with uncertainty and risk, and to build tacit knowledge for how to do so. Part of this, yet worth mentioning explicitly is the skill of dealing with uncertainty and risk while staying grounded. While staying in touch with Reality.
1. Focus your uncertainty. I’ve observed that people often like to indulge in what I like to call ‘arbitrary amounts of precision’. This comes up particularly often in relation to forecasting. People seem to love to tweak parts of their model that, relative to other sources of uncertainty, amounts to pure ‘intellectual manicure’. Always ask yourself, what are the biggest sources of my uncertainty and where/how can I reduce that uncertainty more.
Letting your mind go wherever the evidence leads you, too, is a skill and can be trained. You might want to remember drawing ’intellectual lines of retreat’. Some truths are hard to swallow for the unprepared mind. This might be ‘a failure of flesh’ - but it’s the flesh we all share. Hence, not much to bemoan, though much to learn about how to work with our human irrationalities.
Finally, build strong experiential snapshots/phenomenological memories of what it means to be grounded and of what it means to be intellectually agentic (and other ‘virtues’), and install TAPs to remember to embody these virtues while engaging with the messy reality. To me, those function as anchors and signposts.

Overall, there is an important cognitive shift happening if you start ‘taking the world seriously’ as well as ‘taking yourself seriously'. I like to call it intellectual agency. Note that progress on this dimension is continuous, not discrete.

Part 3. Other people as means to sense-making

>> Talk to people while keeping in mind that talking to people has different purposes.

Some people you talk to to get data.
1. I often use people as a source of data. I think this is often ludicrously cheap compared to digging up all of that information yourself. That said, don’t forget to build informational self-sufficiency. (Which, however, I claim has more to do with how you process the data you receive, than where you get it from.) In particular, there are meaningful things you can ~only get by talking to people : their models. People don’t ever give you raw data. The data they give you has already gone through their sensemaking apparatus and is now embedded in a web made of assumptions and prior beliefs. Remember, data from other people can be both, a gold mine as well as a major source of epistemic blunder.
Some people you talk to to get perspectives.
1. People will have different views on many aspects of the issue: what are the risks, what are the crucial considerations, what does an ‘adequate response’ look like, should we be more worried or less, …? Some people, you will learn, have more groundedness than others; some people might be scope insensitive or rely too heavily on pragmatism. Other people are epistemic hypochondriacs or secretly seek a video-game-like thrill in their lives, a sense of being important or having important things to day. Expose yourself to different such perspectives, all the while remembering that most likely none of these are the exactly-true perspective to be held. Especially early-on in your own sense-making process, before you have started to build your own, substantiated working model of the situation, it’s in fact likely that, at some point in the future, you will talk to a different person, with a different perspective, and that their perspective, too, will sound pretty convincing to you (at least on the face of it).
2. Ideally, develop strong internal models of people whose judgement you respect and who represent prime examples of certain virtues (e.g. scepticism, groundedness, alertness, inside-view reasoning, wisdom, perspective-taking…) so you can consult shoulder-versions of those people when in need.
Some people will be your co-puzzlers/co-sense-makers.
1. Especially early on, especially while still building the skills of intellectual agency, taking the world seriously, groundedness, etc - get a good and reasonable person to be your figuring-things-out buddy. Most things just are much better when you do them with the right people.

Part 4. Communicating beliefs effectively

[None of this post aims to be exhaustive, but for this part in particular, I feel some urge to reiterate this fact.]

Sense-making is important. But if you want to be a good epistemic citizen, you will also have to learn how to communicate your current understanding effectively to other people. This matters both for the epistemic process itself (e.g. to compare your own model to others, to disagree constructively, etc.) as well as for coordinating with other people to take collective action.

>> Start to build models about how to communicate your model/beliefs effectively (e.g. assessment regarding risk levels and adequate responses) to different people.

While communicating with other people about your current beliefs, always remember to stay grounded and truthful. I believe it, often, it is correct to start by trying to understand where the other person is at, in order to tailor one’s communication strategy to that. However, I think people often end up doing this badly. If you are overly worried about tailoring your message to your audience, you risk skewing its content so much that it becomes ungrounded, which, in the worst case, may result in epistemically unsound information-cascade-like dynamics.
When, for example, your goal is to convince people you care about (say, your parents) that covid-19 is a bigger risk (especially but not only given its tails) than they realize - be careful. This is tricky territory.
1. Try to understand first what their self-alignment technology is shaped like, and why it is there. Don’t just tear it down.
  In most cases, people will just bounce (i.e. homoeostasis). And that’s a good outcome here. In other cases, they won’t bounce but incorporate the information, however in ways that might be really, really bad for them (i.e. chaos). Minds are delicate!
2. Then, try to understand which words (i.e. pieces of data, presented in what way) will get through to them (without causing harm, see above). For example, the person in question might be concerned most of all with their personal health and risk of fatality. Then, give them the data and also help them to interpret what these numbers mean for them. Or, they might not be concerned about their personal risks, but they might really care about being a good, responsible citizen. In this case, you might want to tell them the story about how it is important to take the issue seriously in order to prevent things from getting real bad, and many more people dying that would have needed to. While doing that, make sure to always keep true to the facts.
Communicate your arguments in order of their importance (let’s call it ‘load-bearing-ness’). I’m under the impression that, in discussions, we sometimes tend to give reasons for our beliefs that aren't our true reasons for believing a thing, and often they aren’t the ‘best’ (as in, most convincing) reasons either. This seems to be increasingly true as the discussion becomes more engaged/heated. I don’t understand why this is - it doesn’t seem to make much sense, really. When I listen to debates/discussions, the times I feel impressed by a speaker's ability reasoning abilities is when the reasons they lay forth are the most important reasons for taking X seriously, without getting distracted by their opponents' rhetoric games or strawman representation of the argument.
It seems plausible to me that what in fact is happening here is that some people have cleaner, more wells-structured belief-webs than others. It takes cognitive work to identify which of your arguments are load-bearing. In comparison, making long lists of reasons in favour of X, irrespective of whether these reasons are important reasons or not, is easy. Thus, it’s a sign of clarity of thought/clarity of reasoning, which in turn makes it less surprising that, by default, most people suck at putting forth the most important reasons for their beliefs.

Grounding my Moral Compass

November 30, 2019 by Nora Ammann

I am privileged enough to have travelled to quite a lot of places in the world - first with my parents, then through school exchanges and later on my own. Most of them are part of the developing world, the ‘global south’ (mostly in Africa and Asia). Travelling (and I mean the local-means-of-transport-type travelling) has always felt very 'real' to me; it puts me into a very particular state of mind - the words I'd use to describe it are authentic, pure, grounded, immersed, flow.

Interacting with the locals; spending hours with them - among them - in the same cramped, old bus; visiting their schools, their markets, their farms and fields... it creates in me a sense similar to the one I got from your message.

There is something straightforward about seeing this - disfavorable life circumstances, as well as our shared, nacked human-ness in all of this. It feels straightforward because 'my moral compass knows exactly what to care about'. I can't help but be filled with compassion for the world.

I've learnt to take this compassion further - extrapolating it to deeply caring about the long-term future. I think a lot of the 'why' and 'how' of my caring for the long-term future is rooted in these very experiences.

I believe there is something good about regularly reminding myself of these experiences. Something healthy about the grounding it gives me. Although it comes with certain costs.

One is the risk of bias - for evolution hasn't shaped us to be effective altruists, nor to seek truth. This is deeply worrisome to me.

The other cost is the Weltschmerz that comes with it. Parts of me have a strong narrative around wanting to learn to sit with this (and other) pain. For its sobering effects. For you can only navigate the territory, if you have a dam good map. And you can only get a dam good map if you don't close my eyes to Reality. Even if Reality is painful at times.

Nonetheless, it sometimes feels like an incredibly difficult stretch: How can I know I do the right thing? There are so many ways we can be mislead. Emotions (left on their own) can do it. But so can the warm comfort of intellectual inquiry (if sought after purely for the sake of intellectual inquiry, or for reasons of status, even). We need to guard ourselves against these pitfalls.

But that's only a lesser part of the difficulty.

There also is the deep-rooted epistemic uncertainty about cause and effect, about the ultimate gears governing the universe. And my/our role in its unfolding.

All of this leaves me confused and torn.

How, the hell, are we supposed to know?

And, dam, is it important to know! For we live in a world where it is the consequences that count. Where nobody is going to buy you anything from your good intentions.

A short account on Noticing

July 27, 2019 by Nora Ammann

[Early this summer, I stumbled upon the opportunity to do some ‘serious rationality exploration’ in the scope of a CFAR instructor training. I have a pile of notes, though most of them are unpolished and require a fair bit of work to be made legible to anyone-that-isn-not-me. I hope to get around tidying up most of these notes at some point soon.

The following is part 1/2 of my notes on noticing.]

I believe that noticing is a key meta-skill for rationalists.

It’s a key faculty if you want to master the 5-second-level of rationality - the place where you either win (at this entire rationality enterprise), or you don’t.

I surely am not the first person remarking this; and neither will I make this case as elegantly as others have before me. There exists a range of great resources on noticing, most saliently a series of articles from agentyduck (all of which I highly recommend!). A quote:

“It doesn't matter how many techniques you've learned if you miss all the opportunities to apply them, and fail to notice the obstacles when they get in your way. Opportunities and obstacles are everywhere.”

My personal observation is such that the serious exploration and training in noticing I have recently engaged in yielded a lot of benefits. In particular, it feels like it set me up to become ‘a better rationalist’ (these are strong apostrophes).

Some examples for how noticing has been immensely valuable to me :

see more of ‘what is going on’, e.g. seeing where I’m ‘going astray’ in my thinking/acting; identifying recurrent thinking or behavioural patterns, internally and interpersonally; noticing shoulds, flinching, rationalization, bugs; possibly build a taste for blind spots (?);
build better theories and hypotheses about what is happening, when and why. Subsequently, this allows to engage in better experiments about the utility of different interventions (techniques) and allows to improve/tailor/refine the interventions as well as the underlying theory more effectively;
create affordance to deal with mental states (e.g. confusion, overwhelm, anxiety, resistance, impetus, impatience, excitement, ...) and eventually increase agency

***

This is why, over the recent weeks and months, I have spent a fair bit of time and mental energy at trying to improve my noticing skills. An account on how I went about this will follow in an upcoming post.

What is noticing?

Noticing propagates normally ‘unreported’ experiences into your conscious mind.

In other words, noticing is about moving an experience from your peripheral awareness into your conscious attention.

Here is a ‘working model’ that I like to use:

Everything we ever pay attention to first appears in our peripheral awareness.
In order to ‘treat’ something, we have to become conscious/aware of it first.
If we want to be conscious of what-is-going-when-it-is-going-on (as opposed to some significant time later), we have to get good at moving the object from awareness to attention.
Doing that requires us to ‘notice’ the thing. Normally, we propagate into our attention only things that are threatening, surprising or otherwise salient to us.
However, we can train our minds to notice more, and/or more of-a-specific-‘thing’.
There are roughly two (related) approaches that I am aware of:

A) Increasing you general capacity (bandwidth) of conscious awareness.

B) Prime your brain to notice a specific thing, i.e. make it ‘artificially’ more salient to your brain.

A) is sometimes referred to as increased meta-cognition. Some form of meditation train this skill.

Related to this is what I currently call ‘granular noticing’. The specifier ‘granular’ ought to point at the capacity to notice what my mind does, instance by instance, in a very fine-grained manner.

“The more tiny and subtle are the thoughts you're aware of, the more precise can be the control you gain over the workings of your mind, and the faster can be your cognitive reflexes.” (agentyduck)

B) is how I have heard the term noticing being used most often in the rationality community. Many of my Trigger-Action-Plans (TAPs) are basically just this.

These ‘mental TAPs’ have proven really useful to me. In ‘Why mere noticing solves so much’, agnetyduck writes: “If you recognize something as a mistake, part of you probably has at least some idea of what to do instead. Indeed, anything besides ignoring the mistake is often a good thing to do instead. So merely noticing when you're going wrong can be over half the battle.”

I agree with this, and also want to add an aspect that feels important to me.

Being able to notice an internal experience as it happens gives us the change to ‘reconsider’ the default choice. (Basically, we manage to spot a trigger-action-patterns - a process that is usually happening fully on autopilot. For me see here.)

If I am sufficiently self-aligned, meaning if all of me agrees about the ‘correct’ or ‘desired’ (re)action in this situation, the act of noticing alone can build the necessary affordance to change the course of action. In other words, whenever I manage to pick up on a mental move (for example: flinching, rationalizing, getting triggered, defensiveness, ...) I have significantly higher chances than otherwise to follow my true volition.

Using a metaphor to illustrate this, affordance feels to me like standing on the top of a hill from where any direction I could take is downhill. Meaning, I don’t need to exert energy or willpower to pick a specific course of actions. I can do whatever seems most appropriate without having to ‘overcome an inner demon’. Things can just roll.

A framework I like

“Do you know what you are doing? And why you are doing it?”

At EA Global San Francisco, Duncan gave a class that I liked a lot on ’How to invent your own rationality’. I often use the proposed framework when thinking about what (and when) I want to notice (something).

The following is my attempt at rephrasing his idea - what I came to call the “Where am I?”-, short “WAI”-Framework (^^):

There is You, there is a Goal; and that’s the path how you move from where you are now, to where you want to be at t+1.

However, there is also a path that leads you away from your goal. More often than note, we somehow end up on this second path.

Can we learn to more reliably stay on the ‘good’ path?

(… ever heard of applied rationality?!)

So, this is where noticing comes in.

You can notice:

What happened way before you derailed.
- Example: Your work is piling up and a deadline is approaching fast. You had this unpleasant and still unresolved disagreement with one of your co-workers that you are still pondering upon. On your way home from work, you realize how mentally and emotionally exhausted you are.
  Maybe that’s a good time to put up some yellow flags. Maybe you want to let your partner know what is occupying you so that you can prevent from getting caught up in one of these unnecessary fights over dinner, that usually stem from a mismatch in expectations.
- This can help you to formulate theories/hypotheses about contextual factors that systematically hamper/support you in achieving your goals.
The very moment of (or just before?) derailing.
- Example: The moment when you reach for the third cookie and in your mind a belief forms that "ah, f.. it! I've eaten too many of them already anyway.")
- This awareness alone can often be enough to build the affordance necessary to activate your wisdom and self-efficacy and ‘stay on track’.
Shortly after you derailed.
- Example: Noticing that you are being defensive (e.g. you raised your voice, you are trying to defend something, you feel upset, frustrated and angry with the other person, ...).
- This awareness alone can often be enough to let go of the undesired behaviour and 'get back on track'.
When you ended up at the wrong destination
- Example: the cookie jar is empty.
- This is late, yes, but rather late than never. You can still learn from your experience and reflect back on what happened before and along the way. And maybe you want to train yourself to, next time, notice earlier when things were starting to go off-track.

***

June 2019 - a little slanted… — June 2019 - *a little slanted…*