From 2019... What Makes Foreign Policy Teams Tick: Explaining Variation in Group Performance at Geopolitical Forecasting. Michael Horowitz, Brandon M. Stewart, Dustin Tingley, Michael Bishop, Laura Resnick Samotin, Margaret Roberts, Welton Chang, Barbara Mellers and Philip Tetlock. The Journal of Politics, Vol. 81, No. 4, Oct 2019. https://www.journals.uchicago.edu/doi/abs/10.1086/704437
Abstract: When do groups—be they countries, administrations, or other organizations—more or less accurately understand the world around them and assess political choices? Some argue that group decision-making processes often fail due to biases induced by groupthink. Others argue that groups, by aggregating knowledge, are better at analyzing the foreign policy world. To advance knowledge about the intersection of politics and group decision making, this paper draws on evidence from a multiyear geopolitical forecasting tournament with thousands of participants sponsored by the US government. We find that teams outperformed individuals in making accurate geopolitical predictions, with regression discontinuity analysis demonstrating specific teamwork effects. Moreover, structural topic models show that more cooperative teams outperformed less cooperative teams. These results demonstrate that information sharing through groups, cultivating reasoning to hedge against cognitive biases, and ensuring all perspectives are heard can lead to greater success for groups at forecasting and understanding politics.
5 What Kinds of Teams Succeed? Modelling Team
Communication
To test hypothesis 2 and hypothesis 3 concerning what explains variation in the ability
of groups to forecast, we focus on the content of forecast explanations. In particular, we
examine explanations given by individuals in the team conditions. By understanding how
different kinds of teams (trained teams, untrained teams, and top teams) use explanations,
we can begin unpacking what makes teams more or less effective. We find several patterns
in the content of explanations that help to explain top team success.
When making their predictions, participants —whether in the individual or team condition—could also choose to provide an explanation for their forecast. There was
a comment box underneath the place where individuals entered their forecasts and participants were encouraged to leave a comment that included an explanation for their forecast. For participants in an individual experimental condition, only the researchers would
see those explanations. For participants in a team experimental condition, however, their
teammates would be able to see their explanation/comment. These explanations therefore
potentially provide useful information to help identify what leads to forecasting accuracy,
giving us a way to test hypotheses 2 and 3.
5.1 The Conversational Norms Of Successful Geopolitical Forecasting Groups
An obvious starting point is to ask whether, on average, individuals differ in how extensively they made explanations (i.e., how many comments per IFP) and how intensively
(i.e., how long were the comments). Both of these metrics give us a sense of forecaster
engagement - since those that explain their predictions are likely more engaged than those
that do not. We do this by contrasting behavior by whether a forecaster was on a team
or not, whether they were on a team that got training, or not, and whether they were on
a top team. Below, we switch from focusing on the extent of engagement to the intensity
of engagement, when it occurs.
To calculate the degree of extensive engagement, for each individual we first calculated
the total number of explanations made per IFP for which the individual made at least one
explanation. Then for each individual we calculated their average number of comments
per IFP, averaging over all of the forecasting questions they answered. Thus, for any
person we know the average number of explanations they will give for an prediction task.
Figure 3 plots the resulting distribution of this value for each group (individuals, untrained teams, trained teams, and top teams). The x-axis is scaled along a base 10 log for
each individual’s score because this distribution is heavily skewed. The log transformation
reduces the presentational influence of extreme outliers in this distribution. Each group is
presented as a different density plot, with the height of the plot giving a relative estimate
of how many observations were at the particular value of the x-axis.15 We observe that
both individuals and untrained teams have relatively low levels of average responses per IFP. Trained teams and particularly top teams have considerably higher average responses
per IFP.
Next we calculate how intensively individuals engage with explaining their prediction.
For each individual we calculated the median length of their first explanation of an IFP. We
use the first explanation for a variety of reasons. First, as seen in Figure 3, individuals that
were not on a team, or were in untrained teams, rarely made more than one explanation
per IFP. Second, we are most interested in individuals providing information and analysis
to others on their team. Someone’s first explanation is an important first step in doing
this. Figure 4 shows the distribution for the four conditions. We see that individuals
who are in top teams are clearly engaging in more intensive explanation compared to
individuals in other conditions.
Next, we combine Figures 3 and 4 and plot each individual’s value of their extensive
engagement and intensive engagement in Figure 5. Here we separate out the plots by
each of our groups and overlay a contour plot to give a sense of the distribution of data
in this space. As expected, we observe that top teams tend to have more individuals
who are engaging both more extensively per IFP and more intensively. On the other
hand, while people not on teams on occasion would provide multiple explanations per
IFP, most did not. Teams with and without training had individuals who provided more
lengthy explanations, but these teams do not have individuals who both supplied multiple
responses to an IFP and began their engagement with an IFP with a lengthy explanation
(which could then be read by other participants on their team).
We also examined other metrics of intensive engagement. Figure 6 plots the fraction
of total words in explanations that came after the first response.16 The plot shows a low
proportion of total words coming after the very first explanation from individuals. Teams
did better, with more intensive engagement after the first explanation by trained teams
and top teams.
Figure 7 investigates the degree to which explanations are generated by a single member
of a team or a broader discussion amongst multiple participants. To measure this we
calculate for each IFP, in each team, the total number of explanations of the most prolific
responder. We then divided this by the average number of responses within the team to
that IFP to generate a score for each team/IFP combination. We then plot the distribution
of these scores by condition in Figure 7. This shows a distinct pattern illustrating strong
effects for one particular type of team - top teams. Prolific posters for top teams posted
four times as much as the team average. But for non-top teams, the relative contribution of
the most prolific posters was significantly higher. Essentially, in non-top teams, a single
person often completely dominates the conversation while top teams featured broader
conversations among more team members.