The effects of COVID-19 on test-retest reliability in a behavioral measure for impulsivity. Paul Romanowich & Qian Chen. International Journal of Social Research Methodology, Jun 6 2021. https://doi.org/10.1080/13645579.2021.1935821
Abstract: Predictive power of many behavioral measures relies on high test-retest reliability, whereby a measure yields similar data when repeated measure administration occurs at spaced-out intervals. However, major environmental disruptions between measure administration may impact test-retest reliability. The novel coronavirus (COVID-19) pandemic caused just such a major environmental disruption. We collected impulsivity data via a delay discounting task before, during, and after this environmental disruption. Test-retest reliability was generally statistically significant throughout the study even as delay discounting rates changed in the expected direction between the two experimental groups. Importantly, non-significant correlation coefficients (i.e. poor test-retest reliability) typically occurred immediately after the environmental disruption. Participant’s anecdotal self-reports corroborated COVID-19’s temporary disruptive impact. Although not a planned manipulation, this data provides useful information about whether major environmental disruptions may impact test-retest reliability for events that may not be replicable during a controlled experiment. Social and behavioral scientists attempting behavioral measurement through well-validated measures should be aware of whether large environmental changes can affect measure reliability, and how long such a disruption may last.
Keywords: Coronavirusdelay discountingimpulsivenesspandemicreliability
Results & discussion
There were no significant differences between EFT and SET participants on any measured demographic variable. Overall, the median age was 24 (range 21–72) with most participants self-identifying as male (88%) and Hispanic (67%). An equal percentage of each group self-reported drinking (75%), and no participants self-reported smoking.
Figure 1 shows Pearson correlation coefficients plotted as a function of delay discounting task administration date for EFT (top) and SET (bottom) participants. EFT participants completed eight delay discounting tasks throughout the semester, whereas SET participants completed seven delay discounting tasks. Only the first (week 1 – baseline) and last delay discounting tasks were administered to each group at the same time. EFT group data (top graph) show that most correlation coefficients were above the significance threshold (r > 0.553; two-tailed p < 0.05). Points below the significance threshold were all related to delay discounting data collected immediately after COVID-19 shifted all university courses to online instruction (i.e. Time 3; online course announcement made 11 March 2020). The black closed triangles indicate relationships between delay discounting data collected immediately after COVID-19 and subsequent delay discounting measurements. Only one subsequent delay discounting task was significantly associated with delay discounting data obtained at Time 3. There was also a gradual increase in correlation coefficients for delay discounting tasks further in time from Time 3 (i.e. closer to the end of the semester).
In contrast, SET participants (Figure 1 – bottom) did not show any systematic change in Pearson correlation coefficients as a function of when the delay discounting task was administered. Like EFT participants, test-retest data for SET participants were generally statistically significant. Two correlation coefficients associated with the second-to-last delay discounting measure were below the significance threshold. However, the other two correlation coefficients associated with the second-to-last discounting measure were above the significance threshold.
Significant Pearson correlation coefficients could have resulted from little or no change in delay discounting rates over the semester. That is, if delay discounting rates do not change for each participant, across multiple measurement they will necessarily be highly correlated and show high test-retest reliability. Therefore, EFT and SET effects on delay discounting were measured by percentage delay discounting relative to week 1 (baseline). In the EFT group, 39% (26 of 67) of the subsequent delay discounting rates were less than week 1 delay discounting rates. For SET participants, only 7% (4 of 52) of delay discounting rates were less than week 1 rates. EFT participants produced significantly more delay discounting rates less than week 1 relative to SET participants,1 χ2 = 16.58, p < 0.001, Φ = 0.37. Seven of the 12 EFT participants had at least one delay discounting rate less than week 1, whereas only three of 12 SET participants could do the same. Consistent with the previous literature (Hollis-Hansen et al., 2019), EFT was more likely to decrease delay discounting rates, relative to SET.
To further explore whether the COVID-19 disruption was associated with changes in test-retest reliability, EFT descriptions were scored for COVID-19 content. If COVID-19 was a large but temporary disruption for participants, then more COVID-19-related content should appear for EFT descriptions closer to decreased delay discounting test-retest reliability (i.e., Time 3 for EFT participants). During the first EFT training there was no COVID-19-related content (prior to March 2020). During the second EFT training on 22 March 2020, there were seven COVID-19-related descriptions: six for the 1-month description and one for the 3-month description. Examples included, ‘In one month, I will be at home because of the virus. I will be in my room watching television. I will be calm as I pass the time to try and get through the days’ and ‘I will be passing all my classes and starting to prepare myself for the real world. I also will be staying home for the remainder of the semester because of the virus outbreak.’ The third and fourth EFT trainings contained seven and two descriptions, respectively. The third EFT training contained three 1-month, two 3-month, and two 1-year descriptions. During the fourth EFT training both were for the 3-month description. By comparison, there was only one COVID-19-related statement during each SET training. For example, one participant wrote ‘Reading news articles on my phone includes national news about COVID-19, politics, and front-page stories.’ Thus, EFT participants provided COVID-19-related content at the shortest episodic description (1-month) immediately after the pandemic occurred. These descriptions shifted to more temporally distant (6-month and 1-year) episodic descriptions and decreased in frequency.
In sum, the current results provide a non-experimental window into a potential relationship between an unexpected major environmental event and delay discounting test-retest reliability. Although major environmental disruptions cannot be controlled, many social and behavioral researchers using the same measures during a major environmental disruption could profitably compare and/or combine their data as a way to validate findings for test-retest reliability through replication.