Dubey, Rachit, Tom Griffiths, and Peter Dayan. 2021. “Why It Is Hard to Be Happy with What We Have: A Reinforcement Learning Perspective.” PsyArXiv. December 31. doi:10.31234/osf.io/8jd2x
Abstract: The pursuit of happiness is not easy. Habituation to positive changes in lifestyle and constant comparisons leave us unhappy even in the best of conditions. Given their disruptive impact, it remains a puzzle why habituation and comparisons have come to be a part of cognition in the first place. Here, we present computational evidence that suggests that these features might play an important role in promoting adaptive behavior. Using the framework of reinforcement learning, we explore the benefit of employing a reward function that, in addition to the reward provided by the underlying task, also depends on prior expectations and relative comparisons. We find that while agents equipped with this reward function are less "happy", they learn faster and significantly outperform standard reward-based agents in a wide range of environments. The fact that these features provide considerable adaptive benefits might explain why we have the propensity to keep wanting more, even if it contributes to depression, materialism, and overconsumption.
Discussion
Sensibly or not, people often find it hard to remain happy with what they have. One enjoys a newly bought car for a time, but over time it brings fewer positive feelings and one eventually begins dreaming of the next rewarding thing to pursue. As a consequence, we keep getting lured by the promise of unfathomable future happiness whilst hardly enjoying the riches of the present. Here, we have presented a series of simulations that suggest that these seemingly maladaptive “flaws” might perhaps play an important role in promoting adaptive behavior. Using the idea of reward design, we explored the value of adaptive expectations and relative comparisons as a useful reward signal and found that across a wide range of environments, these features help an agent learn faster and be more robust to changes in the environment. Thus, even though comparisons to the past and future often induce unhappiness, they might still motivate one to strive to escape the unpleasant or (even worse) mundane present.
While relative comparisons were generally advantageous, we also found that they can be quite harmful in certain settings. For instance, in an environment with many similar options, comparisons resulted in constant dissatisfaction without any improvement in performance (Exp 2a). Thus, one lesson that can be taken from our results is that when presented with many similar choices, a decision-maker is better off curtailing comparisons and making decisions without relying on them. This also accords with the view that given the explosion of choices in modern times, learning to accept good enough will increase satisfaction and simplify decision-making35, 67. However, this leaves an open question about how a decision-maker can come to manage and curtail comparisons in the first place. Future research should investigate possible mechanisms via which an agent can set and learn its own aspiration level, which can then provide insights on how to design interventions to reduce comparisons. A promising direction in this vein could be studying how aspiration levels might be shaped via a functional relationship between a model-free and model-based system68, 69. For instance, a model-based system might alter the aspiration level of the model-free system based on fluctuations in the environment. This in turn could also be helpful to understand what leads someone to develop unreasonably high aspirations70–72 .
We observed that an agent’s internal happiness was not necessarily reflective about their performance in the environment and both being ‘too happy’ and ‘too unhappy’ led to unwanted outcomes. Agents with unreasonably high aspiration levels developed sub-optimal behavior and were also very ‘unhappy’ in their lifetimes (due to unmet aspirations). Similarly, agents that had a very low aspiration level also performed poorly as they were prone to getting stuck at a local minimum. However, these agents, despite accumulating very low objective rewards, were ‘very happy’ in their lifetimes. Together, these findings provide computational support to a growing body of research which documents the “dark side” of being too happy and are consistent with early philosophical ideas that extreme levels of any emotion, including happiness, can be undesirable73–75. Further, our finding that ‘moderately unhappy’ agents obtain the highest objective rewards can be loosely compared to the finding that people who experience slightly 21/31 lower levels of happiness are more successful in terms of income and education level compared to people with the highest levels of happiness76 .
Our results also speak to a literature in economics that explores the types of evolutionary pressures that could have produced habituation and relative consumption77–80. Similar to our work, these studies model happiness using the metaphorical principal-agent framework, where the principal (evolution) wishes the agent to be maximally fit and has the ability to choose the utility function of the agent to her best advantage. One such study shows that when an agent has limited ability to make fine distinctions (i.e., it cannot tell apart two values that are within a small distance from each other) and when it has a limited range of utility levels (i.e., it has a bound on the minimum and maximum level of happiness it can experience), then evolution would favor a utility function that is adaptive and depends on relative comparisons78.In our view, the primary contribution of these studies is showing how cognitive limitations could have favored a happiness function that depends on prior expectations and relative comparisons, and our work complements these studies by suggesting that, regardless of the agent’s constraints, this function could have also been favored because of the learning advantages it confers.
Closely related to our research is recent work that posits a role for mood in learning81–84. In these proposals, mood is formalized as the moving average of reward prediction errors (and more recently, an estimate of the Advantage function84), and is considered to represent environmental momentum. Momentum indicates whether an environment is improving or worsening and can be an important variable for adaptive behavior. Our results augment these studies by showing how (myopic) reward prediction errors (in the form of prior expectations) are a valuable aid to relative comparisons and accelerate learning in a wide variety of environments. Studying the interaction of mood with prior expectations and relative comparisons is an important question for future work.
One observation in the context of mood is that the sorts of adaptive relativities for learning that we have discussed can lead to instabilities in evaluation - modeling aspects of dynamic diseases, such as bipolar disorder81. Certainly, the subjective values of states that are taught by the subjective reward functions can vary greatly from their objective values, which is problematic if, for instance, the parameters of the subjective reward function change over time. More generally, it would be worth exploring whether dysfunctions such as anhedonic depression85–87 partly arise because of problems with subjective rather than objective components of reward sensitivity. The same issues might be more broadly relevant, given the chain of reasoning that leads from disturbed average rates of reward88 to altered motivation in depression89, potentially negative symptoms in schizophrenia90, and indeed transdiagonistically across a number of psychiatric and neurological conditions91. Nevertheless, it would be remiss not to point out the careful distinctions made between hedonic and motivational aspects of rewards, as between ’liking’ and ’wanting’52, 53, that we have blurred.
Our work has several limitations which should be addressed in order to draw more concrete parallels between our simulation-based results and psychological research on happiness. For one, we assumed that the agent designer directly provided the reward function to the agent and the agent had no say in what reward function it received. This simplification meant that we were not able to study how an agent might develop biased expectations or aspirations as well as study the consequences of an agent being able to control its own happiness. A productive avenue for future research could be studying reward design using the meta-learning framework, such that an agent learns to choose the parameters of its happiness function in response to the environment it faces92, 93. Relatedly, we also did not investigate in detail the potential interaction of discounting with prior expectations and relative comparisons (since we kept a fixed value for the discount factor in our experiments). Studying this further would be an important question for the future. 22/31 Another limitation of our work is that we did not consider how aspirations can be influenced by social comparisons. Future research could address this by conducting multi-agent simulations wherein agents also compare themselves to other agents in the environment. This could also help understand how relative comparisons might interact with other components of happiness such as guilt and jealousy. Future work should also consider how the components of happiness we have considered here might interact with other affective states such as anxiety94 and boredom95. Lastly, while our choice of environments was driven in part due to their popularity within the RL community, it is not completely clear how much our results will generalize to more real-world situations and therefore, caution must be exercised when generalizing our simulation results.
We conclude by providing some perspective on the problem of overconsumption, an extremely pressing issue that severely threatens future generations. Constant habituation to modern luxuries and ever-rising aspirations are leading us to consume Earth’s natural resources at an alarming rate and resulting in rapid deterioration of our planet96–100. Paradoxically, people in modern societies are hardly more satisfied than previous generations101–104, yet we keep becoming caught in the rat race of consumption and continuing the modern obsession of growth at all costs105–109. One implication of our results is that given how advantageous habituation and relative comparisons are in promoting adaptive behavior, it could be possible that these features might be very deeply entrenched in our minds. Thus, any steps to reduce overconsumption will also need serious considerations on how to tackle these biases of the human mind and will require the expertise of scientists from multiple disciplines. For better or worse, we are prone to becoming trapped in a cycle of never-ending wants and desires, and it is more urgent than ever to develop concrete policies and large-scale interventions to reduce habituation and comparisons.