Citation: Gartstein MA, Seamon DE, Mattera JA, Bosquet Enlow M, Wright RJ, Perez-Edgar K, et al. (2022) Using machine learning to understand age and gender classification based on infant temperament. PLoS ONE 17(4): e0266026. https://doi.org/10.1371/journal.pone.0266026
Abstract: Age and gender differences are prominent in the temperament literature, with the former particularly salient in infancy and the latter noted as early as the first year of life. This study represents a meta-analysis utilizing Infant Behavior Questionnaire-Revised (IBQ-R) data collected across multiple laboratories (N = 4438) to overcome limitations of smaller samples in elucidating links among temperament, age, and gender in early childhood. Algorithmic modeling techniques were leveraged to discern the extent to which the 14 IBQ-R subscale scores accurately classified participating children as boys (n = 2,298) and girls (n = 2,093), and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779). Additionally, simultaneous classification into age and gender categories was performed, providing an opportunity to consider the extent to which gender differences in temperament are informed by infant age. Results indicated that overall age group classification was more accurate than child gender models, suggesting that age-related changes are more salient than gender differences in early childhood with respect to temperament attributes. However, gender-based classification was superior in the oldest age group, suggesting temperament differences between boys and girls are accentuated with development. Fear emerged as the subscale contributing to accurate classifications most notably overall. This study leads infancy research and meta-analytic investigations more broadly in a new direction as a methodological demonstration, and also provides most optimal comparative data for the IBQ-R based on the largest and most representative dataset to date.
Discussion
We set out to leverage existing IBQ-R datasets from multiple laboratories (N = 4,438) to address an important gap in research by investigating age and gender classifications in early childhood, and overcoming limitations of the published studies such as small sample sizes that cannot be considered representative or provide widely generalizable results. Relying on algorithmic modeling techniques, 14 IBQ-R subscale scores served as features used to classify participating children as boys (n = 2,298) and girls (n = 2,093), and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779). Importantly, this approach allowed us to simultaneously classify infants into age and gender categories, providing an opportunity for the first time to consider the extent to which gender differences are informed by infant age. This study also makes an important contribution to the literature as a novel methodological demonstration. That is, the present application of machine learning algorithms provides a new direction for infancy and temperament research, as well as meta-analytic investigations more broadly.
Results based on accuracy indicators (the inverse of misclassification rates), Cohen’s kappa coefficients, and AUC (incorporating sensitivity and specificity parameters) demonstrated that temperament features provided superior classification of age groups relative to gender, which is consistent with the existing literature insofar as age effects have generally been more robust (e.g., not dependent on methodology; [5,26,52]). As noted, gender differences in infancy have been largely limited to activity level and fear/behavioral inhibition, with higher activity level and approach reported for boys [29,30] and greater fear/behavioral inhibition for girls [14,25,31,35,36]. These gender differences are somewhat controversial due to a lack of consensus regarding their origin (i.e., biologically based or largely a function of socialization; [53]) and questions regarding the role of parental expectations. That is, parents could rate boys and girls differently not due to actual variability in behavior but as a function of their own culturally influenced ideas about what is typical behavior in boys vs. girls. This explanation cannot be ruled out completely, although existing research suggests that gender differences are not entirely dependent on methodology (i.e., have been identified via behavioral observations along with parent report; [33,52]).
Importantly, gender classification by age groups results suggest this is most effective for the oldest age group, in line with the literature that indicates gender differences in temperament attributes become more pronounced with age [54]. Although a number of factors could be contributing to this pattern of results—accentuated gender differences in temperament with increasing age, and, conversely more accurate classification of gender with temperament features for oldest participants—socialization is often described as critical among these. The primary mechanism invoked in such explanations involves the infants’ interactional history, and is consistent with literature that indicates mothers respond differently to their sons and daughters [55–59], presenting with different affordances as social interaction partners (e.g., [60]). Over time, such differences could result in divergent trajectories with respect to temperament due to differences in socialization goals/approaches for boys vs. girls. Specifically, parents may prioritize relationship orientation for daughters, but competence and autonomy for sons [61–63]. These and other socialization-related pathways may be responsible for the stronger temperament-based classification of boys and girls later in infancy observed herein.
At the same time, gender is viewed as a marker for a host of sex-linked distinctions in physiological processes. For example, prenatal exposure to high levels of androgen is predictive of later behavior problems, primarily of the externalizing type (e.g., ADHD; [64]), and used to explain early vulnerability observed in boys with respect to this set of problems [65]. Postpartum biological effects are also possible, for example via testosterone increases for boys in infancy, referred to as “mini-puberty,” peaking by the second month and returning to baseline at about 6 months [66]. Sex-linked differentiation in brain structures and functions occurs with maturation, resulting in greater discrepancies with age. For example, Goldstein et al. [67] reported that the amygdala tends to be larger in males and the hippocampus larger in females (see Hines [68] for a related review).
Follow-up analyses outlining feature importance for classification models were performed for the Ensembled Decision Trees (Random Forest) to further interpretation of the observed results. Random Forest methods provide an effective mechanism for feature selection and importance using tree-based mechanisms to rank node classification via the mean decrease in gini impurity, i.e., the probability that a random sample in a particular tree node would be mislabeled using the distribution of the node sample, averaged across all trees [69]. Figures provided in Supplemental Materials (S1–S3 Figs) demonstrate that while Fear was the most important feature in distinguishing boys and girls for the youngest and mid-range age group, for oldest infants, low intensity pleasure was most influential. In fact, for youngest infants (S3 Fig), all three distress-related scales (Fear, Distress to Limitations, Sadness) were of primary importance in classifying infants accurately by gender via the Random Forest algorithm. Positive emotionality and regulatory dimensions of temperament (e.g., Falling Reactivity, Approach) begin to take on greater importance for mid-range and oldest infants. Notably, certain temperament features detracted from model accuracy in classifying infants by gender (i.e., associated with lowest negative importance values), particularly Cuddliness, Vocal Reactivity, and Smiling and Laughter in the youngest age group and Smiling and Laughter, Perceptual Sensitivity, and Activity in the oldest age group. These results identify the temperament attributes that did not differentiate boys and girls effectively, and it is of interest that the list of these poorly differentiating features varied by age. When the most important features were considered for age classification and gender classification models only, Fear again emerged as the critical dimension, which is in line with the extensive literature documenting the developmental progression as well as gender differences for this domain of temperament [2,13,14,26,54].
This work is not without limitations, chief among these our reliance on a single method (i.e., parent report) in the assessment of infant temperament. Future studies should aggregate datasets providing different sources of information, including behavioral observations and physiological measures, such as cortisol reactivity, heart rate variability/respiratory sinus arrhythmia, and/or frontal alpha asymmetry ascertained via electroencephalogram (EEG) recordings. In addition, the outcomes examined in this study were limited to child gender and age. Future studies with older children should conduct classification analyses with additional dependent variables, particularly symptom and disorder classifications (e.g., clinical/subclinical/asymptomatic ADHD). It should be noted that we did not consider classification based on race/ethnicity because of a far more limited literature suggesting these differences can be discerned on the basis of temperament, and future research should examine related models, as relevant studies accumulate. Finally, the present modeling approach could be extended and potentially improved by applying ensembling modeling approaches (i.e., using multiple algorithms simultaneously), as opposed to relying on singular modeling frameworks.
This study underscores the importance of meta-analytic investigations and cross-laboratory collaborations, providing illusive answers to questions, such as those related to intersections of gender and age in temperament development, that have not been previously addressed. Because of the large cross-laboratory sample included herein, this study provides most optimal comparative data for the IBQ-R (Table 2), which has emerged as a widely used infant temperament assessment tool. Importantly, the present investigation serves as a methodological illustration for application of machine learning techniques in infancy and temperament research, as well as developmental science more broadly. Given the propensity for differing algorithmic methods to have strengths and weaknesses that may bias predictive outcomes and classification accuracy, we selected 11 established algorithmic modeling and classification techniques to quantify the most robust outcomes, simultaneously demonstrating the viability of machine learning approaches in this area of scientific inquiry. Results of this study make an important contribution to developmental temperament research, demonstrating effective age group classification on the basis of fine-grained temperament features, and indicating more effective gender classification for the older age group, with multiple implications for future mechanistic research examining potential socialization and biological contributors.