Sunday, March 26, 2023

Lifespans of the European Elite, 800–1800: Marked increases around 1400 and again around 1650

Lifespans of the European Elite, 800–1800. Neil Cummins. The Journal of Economic History, Volume 77, Issue 2, June 2017, pp. 406-439. https://doi.org/10.1017/S0022050717000468

Abstract: I analyze the adult age at death of 115,650 European nobles from 800 to 1800. Longevity began increasing long before 1800 and the Industrial Revolution, with marked increases around 1400 and again around 1650. Declines in violent deaths from battle contributed to some of this increase, but the majority must reflect other changes in individual behavior. There are historic spatial contours to European elite mortality; Northwest Europe achieved greater adult lifespans than the rest of Europe even by 1000 AD.

DISCUSSION

This study has characterized adult noble lifespans from 800 to 1800. The consistent and large association uncovered between sex and plague mortality for nobles runs counter to the indiscriminate reputation of the Black Death and counter to recent paleodemographic analysis on skeletons from fourteenth century London (DeWitte Reference Dewitte2009).Footnote 30 If plague killed more women than men, a simple supply-side effect increasing female agency in the marriage market could explain the origin of the European Marriage Pattern (Hajnal Reference Hajnal, Eversley and Glass1965; De Moor and Van Zanden Reference David, S. Ryan and Andrea2010; Voigtländer and Voth Reference Voigtländer and Hans-Joachim2013). Of course this is a premature speculation, the patterns reported here would have to be convincingly established for the population at large.

The sharp decline in the proportion of male nobles dying from battle, from over 600 years of a steady 30 percent, to less than 5 percent in the sixteenth century, predates the arrival of the Industrial Revolution by two centuries. The long-run decline in violence is cited as one of the principal correlates of the emergence of the modern World with the “civilizing process” needing the transformation of warrior nobles into gentleman courtiers (Elias Reference Elias1982).Footnote 31

One can perhaps ask why did battlefield violence decline among European nobility. Nobility certainly did not lose its taste for military life. The Wars of Religion following 1500 were aristocratic feuds at least as much as earlier wars.Footnote 32 However, the decline in battlefield death amongst nobles corresponds to the emergence of modern warfare; artillery, standing armies, and the replacement of privilege with merit.Footnote 33 The power of hereditary warrior status declined in battle as modern and larger standing armies, led by increasingly wealthy princes, focused upon artillery and infantry (Keen Reference Keen1984, pp. 1, 238–53). The decline of cavalry meant that nobility became officers, inherently a more administrative role than before (Keen Reference Keen1984, p. 240). In war, nobility still led, but from the safety of the rear guard, not the front lines.Footnote 34

I estimate the time-trend of adult noble lifespan over the millennium between 800 and 1800. The findings on the timing of the modern rise in age at death agree almost exactly with de la Croix and Licandro (Reference De and Omar2012) (the birth cohort of 1640–1649). The nobility are, in general, forerunners of Europe's mortality transition as David, Johansson, and Pozzi (Reference David, S. Ryan and Andrea2010figs. 3(a) and 3(b), p. 28) argue too.Footnote 35 This may provide a clue for those who seek to explain why mortality declined. There could be an important role for individual behavior and a demonstration effect (e.g., hygiene and other behavioral traits) as this rise predates modern medicine or any public health measures. It also predates the Industrial Revolution.Footnote 36 Whilst modern evidence suggests that life expectancy does not matter for economic growth (Acemoglu and Johnson Reference Acemoglu and Simon2007), the case has not been proven for the preindustrial era.

Unlike de la Croix and Licandro (Reference De and Omar2012), this study argues that lifespan was not a stationary trend before 1650. There are significant oscillations, most importantly the sharp Europe-wide rise in noble lifespan after 1400. The rise is stronger over the 1400–1600 interval in Ireland, Scotland, and in particular, England and Wales (Figures 11 and 12). This pattern has remained hidden. Only long and deep time series of at least a millennium in length could uncover it. For England, this result can be directly compared with existing estimates of adult mortality. The dramatic rise from the fourteenth to the fifteenth and sixteenth centuries revealed in Figure 12(a) is in broad accordance with Russell's estimates of life expectancy at age 25 (e25) for tenants-in-chief of the crown from the Inquisitions Post Mortem (Smith Reference Poos, Jim, Richard and Hicks2012Figure 10, p. 79). However, recent re-estimates of e25 for these same data (Poos, Oeppen, and Smith Reference Poos, Jim, Richard and Hicks2012) suggest a much higher level and a flat trend, at about 30 years, during the fourteenth century. Monastic evidence from communities in Durham, Canterbury, and Westminster points to a decline in e25 from 1450 to 1500 (Poos, Oeppen, and Smith Reference Poos, Jim, Richard and Hicks2012Figure 8.2, p. 162). This is not the pattern I find. Figure 12(a) reports the opposite trend for the English elite: a sharply rising trend in predicted average age at death, for those dying over 20, from 1450–1500. The evidence I have assembled and analyzed in this article strongly suggests a strong improvement in lifespan in the fifteenth century for the English elite.

No conclusions can be drawn as to why adult noble lifespan increased so much after 1400. No known medical innovations in Europe before 1500 could be responsible.Footnote 37 Nutrition, in terms of calories consumed, also cannot explain this rise. These elites could be expected to have always filled their bellies. For this reason, those who argue that the “modern rise of population” was a result of nutrition, the equality of aristocratic and peasant lifespans in the past has presented a paradox (see Fogel (Reference Fogel, Engerman and Gallman1986, pp. 480–84) and McKeown (Reference McKeown1976, pp. 139–42)). Robert Fogel attributed this “peerage paradox” to the vast quantities of alcohol the English elite consumed (Reference Fogel, Engerman and Gallman1986, p. 483). Perhaps diet changed in other ways. The late fourteenth century did witness an increase in the proportion of manuscripts on health.Footnote 38 Works such as the Tacuinum Sanitatis, incorporating Arabic and Ancient knowledge, recommended moderation in food and alcohol, adequate rest, and exercise and, similar to modern medicine, emphasized the importance of vegetables and fruit to human health (Janick, Daunay, and Paris Reference Janick, Marie and Harry2010). Of course, the actual effect of these manuscripts is speculation at this point.

The rise in elite adult age at death for those born after 1400 could also be the result of a Darwinian selection effect from the half century of recurring plague that returned in 1347. Plague killed those susceptible to plague but would also have purged the population of other frailties that may have been correlated with plague susceptibility.Footnote 39 However, most people, even during the plague era, died from other causes.Footnote 40 The real long-term demographic effect of the Black Death could have been through its effect on the disease climate. Noble lifespan in Figure 8 corresponds closely to the trend in real wages in England (Clark Reference Clark2005fig. 4, p. 1311)Footnote 41 and to recent estimates of gross domestic product (GDP) per capita (Broadberry et al. Reference Broadberry, Bruce and Alexander2015, p. 206). Improved nutrition amongst the general population, from higher real incomes via Malthusian dynamics, could have led to a reduction in the incidence of other infectious diseases among plague survivors and their offspring.Footnote 42 Nutritional status did little to diminish plague lethality (see Fogel Reference Fogel, Engerman and Gallman1986, table 9.11, p. 481) but together with a “purging” effect, the Black Death could have led to an improved climate against infectious disease, especially in cities.

The cause of the 1400 rise in adult noble lifespan is unknown. Presently only speculations can be made. Future empirical work, perhaps linking estate account books (to reconstruct diet) to specific time and location (rural/urban) effects and genealogies of the kind analyzed here, will have great potential to answer this mystery.

This article documents a geographic pattern to European elite lifespans. The mortality gradient runs South-North and East-West, and has existed since before the Black Death. The long existence of such a geographic “effect,” and the factors which are causing it, may have implications for recent work which stresses the “little divergence” between the Northwest Europe and the Southeast (Voigtländer and Voth Reference Voigtländer and Hans-Joachim2013; Broadberry Reference Broadberry2013; de Pleijt and van Zanden Reference Acemoglu and Simon2013). The Black Death is not the first turning point. There was something about the Northwest Europe long before 1346 that led to nobles living longer lives.


People indulge in the belief that social justice is on a uniform path to betterment

Beliefs About Linear Social Progress. Julia D. Hur, Rachel L. Ruttan. Personality and Social Psychology Bulletin, March 23, 2023. https://doi.org/10.1177/01461672231158843

Abstract: Society changes, but the degree to which it has changed can be difficult to evaluate. We propose that people possess beliefs that society has made, and will make, progress in a linear fashion toward social justice. Five sets of studies (13 studies in total) demonstrate that American participants consistently estimated that over time, society has made positive, linear progress toward social issues, such as gender equality, racial diversity, and environmental protection. These estimates were often not aligned with reality, where much progress has been made in a nonlinear fashion. We also ruled out some potential alternative explanations (Study 3) and explored the potential correlates of linear progress beliefs (Study 4). We further showed that these beliefs reduced the perceived urgency and effort needed to make further progress on social issues (Study 5), which may ultimately inhibit people’s willingness to act.

Saturday, March 25, 2023

Replication failure... No effects of exposure to women's fertile window body scents on men's hormonal and psychological responses

No effects of exposure to women's fertile window body scents on men's hormonal and psychological responses. James R. Roney et al. Evolution and Human Behavior, March 22 2023. https://doi.org/10.1016/j.evolhumbehav.2023.03.003

Abstract: Do men respond to women's peri-ovulatory body odors in functional ways? Prior studies reported more positive changes in men's testosterone and cortisol after exposure to women's scents collected within the putative fertile window (i.e., cycle days when conception is possible) compared to comparison odors, and also psychological priming effects that were differentially larger in response to the fertile window odors. We tested replication of these patterns in a study with precise estimation of women's ovulatory timing. Both axillary and genital scent samples were collected from undergraduate women on six nights spaced five days apart. Here, we tested men's responses to a subset of these samples that were chosen strategically to represent three cycle regions from each of 28 women with confirmed ovulation: the follicular phase prior to the start of the fertile window, the fertile window, and the luteal phase. A final sample of 182 men were randomly assigned to each smell one scent sample or plain water. Saliva samples were collected before and after smelling to assess changes in testosterone and cortisol, and psychological measures of both sexual priming and social approach motivation were assessed after stimulus exposure. Planned comparisons of fertile window to other stimuli revealed no statistically significant effects for any dependent variable, in spite of sufficient power to detect effect sizes reported in prior studies. Our findings thus failed to replicate prior publications that showed potentially adaptive responses to women's ovulatory odors. Discussion addresses the implications of these findings for the broader question of concealed ovulation in humans.

Keywords: Scent attractivenessConcealed ovulationTestosteroneCortisolHuman mating

4. Discussion

As a general summary, we found no compelling evidence that men exhibit differential hormonal or psychological responses to women's body odors collected near ovulation relative to their responses to body odors from other cycle regions (or to plain water). The nonsignificant findings occurred despite our study having sufficient power to detect effect sizes that have been reported in the prior literature. Furthermore, Bayes factors computed for each of our dependent variables suggested that the observed data were about 10 to 13 times more likely under null models than under models including the fertile window contrast, and Bayes factors in this range have been argued to provide strong evidence in favor of the null hypothesis (see Schonbrodt & Wagenmakers, 2018).

4.1. Hormone responses to scent stimuli

Our results did not replicate prior findings of more positive testosterone or cortisol changes after exposure to scents collected near ovulation relative to comparison scents (Cerda-Molina et al., 2013Miller & Maner, 2010). Our findings for testosterone were more similar to those of Roney and Simmons (2012), who reported no significant differences in hormone changes after exposure to peri-ovulatory scents vs. after exposure to plain water. Among prior studies, only Cerda-Molina et al. (2013) measured differential cortisol responses to women's peri-ovulatory body odors, and they reported a complex pattern whereby cortisol rose above basal concentrations at 15 min post-exposure for peri-ovulatory stimuli and for luteal vulvar stimuli at post 30 min., but fell below baseline concentrations for luteal axillary stimuli at 15 and 60 min. post-exposure. Our results at 15 min. post-exposure did not replicate those patterns.

What may account for differences between results of the current study and those of prior studies that have reported significant hormone responses to women's peri-ovulatory body scents? Miller and Maner (2010) used whole T-shirts as scent stimuli as opposed to our use of gauze pads; although Cerda-Molina et al. (2013) employed similar collection methods to those in the present study, we cannot rule out the possibility that hormone responses may be more reliable in response to shirt stimuli. A possible limitation of our method was the longer time that we stored frozen samples before use in testing (up to a year, as opposed to samples being used within a week in Miller and Maner (2010) and Cerda-Molina et al. (2013)), although studies that have varied length of storage have provided evidence that responses to human body scents are not affected by long freezing times (Gomes et al., 2020Lenochova et al., 2009). We estimated ovulatory timing more precisely via use of LH tests than did Miller and Maner (2010) who used highly error-prone counting methods (see Gangestad et al., 2016), and this should have increased our probability of finding true effects. Cerda-Molina et al. (2013) cited two factors that might explain discrepancies between their results and the null effects reported by Roney and Simmons (2012)—the longer stimulus collection time in their study and evidence that men in their study were aware that they were smelling women—but both of these differences were eliminated in the present study in which women collected scents overnight and male participants were explicitly told that they were smelling odors from women.

A salient difference between our methods and those of Cerda-Molina et al. (2013) was their use of a nebulizer containing scent stimuli (or plain air) in order to forcefully project odors into participants' nasal passages. It is possible that this method produces hormone responses in perceivers that are absent after taking deep sniffs from jars containing scent stimuli. The ecological validity of the nebulizer delivery method is uncertain. On the one hand, it may deliver stimuli of supra-normal intensity that are not encountered under real-world conditions. On the other hand, it is possible that this method approximates the greater intensity of odor exposure that might occur during some forms of sexual contact. In any case, this difference in scent delivery method presents a possible reason for the discrepancy in findings across the two studies.

It is also possible that prior positive findings for men's hormone responses to women's peri-ovulatory body odors were false positive results. The patterns described in Cerda-Molina et al. (2013) were particularly striking in that men generally responded with testosterone increases after smelling peri-ovulatory stimuli but testosterone decreases after exposure to luteal stimuli. That pattern suggested that men's hormone responses might be strong enough that they could accurately diagnose women's ovulatory timing from scent cues alone. The current findings shed at least some doubt on the robustness of those findings. Future research would ideally provide additional evidence.

4.2. Psychological responses to scent stimuli

Our results also failed to replicate prior findings suggesting the priming of sexual concepts after exposure to peri-ovulatory scent stimuli relative to comparison stimuli. For two dependent variables, we employed measures verbatim from Miller and Maner (2011): the word stem completion task, and a measure of attribution of sexual arousal to the scent donor. A difference in data analysis between studies was the addition of the Chemical Sensitivity Scale (CSS; Nordin, Millqvist, Lowhagen, & Bende, 2003) to the data analyses in Miller and Maner (2011). The scale measures participants' conscious awareness of odors in their environment. For the word stem task, Miller and Maner (2011) added controls for main and interaction effects for scores on this scale in the model testing effects of scent exposure condition. For the sexual arousal attribution task, they reported no main effect of scent exposure condition but a significant interaction between scent condition and CSS scores such that only among men with high smell sensitivity was greater sexual arousal attributed to the peri-ovulatory scents relative to luteal scents. We did not administer this scale, and this difference in method could help to explain discrepant findings for these variables. However, simulation data show that the addition of covariates and testing for interactions with individual difference variables are practices that can inflate type I error rates (Simmons, Nelson, & Simonsohn, 2011), which adds some doubt to the positive findings for the word stem and arousal attribution variables. Furthermore, if sexual priming effects were specialized adaptations for responding to cues of women's ovulatory timing, one would not expect their expression to be restricted only to men with highly sensitive senses of smell. Thus, the overall data pertaining to these variables—including the non-significant findings in the present study—appear to provide weak evidence for adaptations that produce sexual priming effects in response to ovulatory scent cues.

As a more direct measure of sexual priming, we also queried how much sexual desire men felt after exposure to scent cues. There were no significant effects of cycle phase for this variable (see Table 1 and Fig. 4c). Cerda-Molina et al. (2013) administered an “an interest in sex” scale and reported higher scores after exposure to peri-ovulatory scent stimuli, but the scale was quite heterogeneous and included trait-like items (e.g., “[how high do] you think that your sexual desire normally is?”) in addition to measures of current states. As with the hormone responses, it is possible that a nebulizer delivery of scents would produce stronger fertile window effects on men's self-reported sexual desire than those found here.

We did find a main effect of stimulus type on sexual desire such that men who smelled the armpit stimuli reported higher desire than those who smelled the pantyliner stimuli. Additional data analyses supported subjective scent attractiveness ratings as mediating this effect of stimulus type. The positive correlation between scent attractiveness ratings and sexual desire supports the possibility that desire responds to odor attractiveness in general even if it does not respond reliably to scents produced during the fertile window. Odor attractiveness may be related to variables like health (e.g., Olsson et al., 2014) or immune compatibility (e.g., Thornhill et al., 2003), and thus responding to it with desire may have functions aside from ovulation detection.

We also administered a custom social approach motivation scale but scores on it were not differentially higher after exposure to scents from the fertile window (see Table 1 and Fig. 4d). Tan and Goldman (2015) used an indirect behavioral measure to provide evidence that men exposed to peri-ovulatory scents were motivated to sit closer to women, and it is possible that our findings would have differed with such a measure. Oren and Shimone-Tsoory (2019) provided evidence that single but not paired men exhibited greater social perception abilities after exposure to peri-ovulatory scents. Although we did not measure social perception, we did assess the possible moderating influence of relationship status for the effects of cycle phase on our dependent measures, in part motivated by the findings of the social perception study. Results presented in SOM provide no compelling evidence that hormonal or psychological responses to fertile window stimuli were consistent with prior positive findings in the subset of single men.

4.3. Implications for concealed ovulatory timing

Our findings argue against the possibility that human ovulatory timing is detectable from body odors. Mei et al. (2022) recently used signal detection analyses to show that increased scent attractiveness during the fertile window was not substantial enough to reliably diagnose ovulatory timing. That finding left open the possibility that diagnostic cues of ovulatory timing might be revealed via adaptive patterns of responses to scents, such as reactive hormone changes. The present results failed to detect any such putatively adaptive responses, however, and thus argue against that possibility.

The present study addressed only odor cues of ovulatory timing. Cues from other sensory modalities could in principle provide more information, or a combination of cues across modalities could prove more diagnostic. With respect to the latter possibility, Miller and Maner (2011) provided evidence that men who interacted in person with a woman confederate were more likely to mimic her movements and to increase their risk-taking when her estimated conception risk was higher. Perhaps in cases like that, a combination of odor, voice, face, and behavioral cues might more accurately cue fertile window timing.

The strongest tests of multi-modal cuing of ovulatory timing should in principle come from studies that measure the responses of women's long-term romantic partners to the women's cycle phases. Such partners should have the most intimate and detailed information regarding changes in any perceptible stimuli, and would also have clear functional reasons to respond to cues of ovulatory timing for the purpose of ensuring paternity confidence. A recent study of nearly 400 couples with preregistered data analyses and many thousands of observations found no significant effects of women's estimated fertile window timing on male partners' ratings of the women's attractiveness, sexual desire for their partners, feelings of jealousy, or levels of attention to and desire to have contact with the women (Schleifenbaum et al., 2022). Those findings corroborate earlier studies that have generally found that men's rates of sexual initiation are flat across phases of their partners' menstrual cycles (Adams, Gold, & Burt, 1978Caruso et al., 2014Van Goozen, Wiegant, Endert, Helmond, & VandePoll, 1997; cf. Harvey, 1987). Likewise, and pertinent to the hormonal responses tested in the current study, studies have failed to find significant shifts in men's testosterone concentrations across different phases of their romantic partners' menstrual cycles (Ström, Ingberg, Druvefors, Theodorsson, & Theodorsson, 2012Ström, Ingberg, Slezak, Theodorsson, & Theodorsson, 2018). Collectively, these patterns are unexpected if women's body odors provide diagnostic information regarding their ovulatory timing, or if multi-modal stimulus cues jointly reveal fertile window timing.

50-70% of all dreams include residue from the previous day, especially in the early stages of sleep, while later stages refer to more distant memories

Memory reactivations during sleep: a neural basis of dream experiences? Claudia Picard-Deland et al. Trends in Cognitive Sciences, March 22 2023. https://doi.org/10.1016/j.tics.2023.02.006


Abstract: Newly encoded memory traces are spontaneously reactivated during sleep. Since their discovery in the 1990s, these memory reactivations have been discussed as a potential neural basis for dream experiences. New results from animal and human research, as well as from the rapidly growing field of sleep and dream engineering, provide essential insights into this question, and reveal both strong parallels and disparities between the two phenomena. We suggest that, although memory reactivations may contribute to subjective experiences across different states of consciousness, they are not likely to be the primary neural basis of dreaming. We identify important limitations in current research paradigms and suggest novel strategies to address this question empirically.


Systematic review of all published fMRI research on psychopathy: No reproducible evidence suggests that psychopathy is associated with a functional neurobiological profile

Jalava, J., Griffiths, S., & Larsen, R. R. (2023). How to keep unreproducible neuroimaging evidence out of court: A case study in fMRI and psychopathy. Psychology, Public Policy, and Law, 29(1), 1–18. Feb 2023. https://doi.org/10.1037/law0000383

Abstract: The amount of neuroimaging evidence introduced in courts continues to increase. Meanwhile, neuroimaging research is in the midst of a reproducibility crisis, as many published findings appear to be false positives. The problem is mostly due to small sample sizes, lack of direct replications, and questionable research practices. There are concerns that a significant proportion of neuroimaging evidence introduced in court may therefore be unreliable. Guidelines governing the admissibility of scientific evidence—Frye and Daubert—are not designed to weed out such data. We propose supplementing Frye and Daubert with minimal reproducibility criteria that allow judges to make informed admissibility decisions about neuroimaging research. To demonstrate how this could work, we subjected functional magnetic resonance imaging (fMRI) findings on psychopathy—evidence that has been admitted in court—to a minimal reproducibility test. A systematic PRISMA search found 64 relevant studies but no sufficiently powered, directly replicated evidence of a psychopathy-related neurobiological profile. This illustrates two things: (a) the probability of false positives in this data set is likely to be unacceptably high and (b) the reproducibility of similar neuroimaging evidence can be evaluated in a straightforward way. Our findings suggest an urgent need to modify admissibility guidelines to exclude low-quality neuroimaging data.

Check also Is the Psychopathic Brain an Artifact of Coding Bias? A Systematic Review. Jarkko Jalava et al. Front. Psychol., April 12 2021. https://www.bipartisanalliance.com/2021/04/is-psychopathic-brain-artifact-of.html

Friday, March 24, 2023

People may not be able to tell if they are envied by another person at a particular moment, but they know who the notoriously envious ones are among the people they have known for a longer time

Lange, Jens, Birk Hagemeyer, Thomas Lösch, and Katrin Rentzsch. 2019. “Accuracy and Bias in the Social Perception of Envy.” OSF Preprints. June 16. doi:10.31219/osf.io/8jc7x

Abstract: Research converges on the notion that when people feel envy, they disguise it towards others. This implies that a person’s envy in a given situation cannot be accurately perceived by peers, as envy lacks a specific display that could be used as a perceptual cue. In contrast to this reasoning, research supports that envy contributes to the regulation of status hierarchies. If envy threatens status positions, people should be highly attentive to identify enviers. The combination of the two led us to expect that (a) state envy is difficult to accurately perceive in unacquainted persons and (b) dispositional enviers can be accurately identified by acquaintances. To investigate these hypotheses, we used actor-partner interdependence models to disentangle accuracy and bias in the perception of state and trait envy. In Study 1, 436 unacquainted dyad members competed against each other and rated their own and the partner’s state envy. Perception bias was significantly positive, yet perception accuracy was non-significant. In Study 2, 502 acquainted dyad members rated their own and the partner’s dispositional benign and malicious envy as well as trait authentic and hubristic pride. Accuracy coefficients were positive for dispositional benign and malicious envy and robust when controlling for trait authentic and hubristic pride. Moreover, accuracy for dispositional benign envy increased with the depth of the relationship. We conclude that enviers might be identifiable but only after extended contact and discuss how this contributes to research on the ambiguous experience of being envied.


Whether intelligence can be achieved without any agency or intrinsic motivation is an important philosophical question; equipping LLMs with agency & intrinsic motivation is a fascinating & important direction for future work

Sparks of Artificial General Intelligence: Early experiments with GPT-4. Sebastien Bubeck et al. Mar 22 2023. https://arxiv.org/pdf/2303.12712.pdf

Abstract: Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4 [Ope23], was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT4 is part of a new cohort of LLMs (along with ChatGPT and Google’s PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

---
For example, whether intelligence can be achieved without any agency or intrinsic motivation is an important philosophical question. Equipping LLMs with agency and intrinsic motivation is a fascinating and important direction for future work. With 92 this direction of work, great care would have to be taken on alignment and safety per a system’s abilities to take autonomous actions in the world and to perform autonomous self-improvement via cycles of learning. We discuss a few other crucial missing components of LLMs next.