Namespotting: Username toxicity and actual toxic behavior on reddit. Rafal Urbaniak et al. Computers in Human Behavior, July 1 2022, 107371. https://doi.org/10.1016/j.chb.2022.107371
Highlights
• First large-scale comparative exploration of the toxic behavior of Reddit users with toxic usernames.
• Our analysis employs algorithmic methods and Bayesian statistical models without relying on self-reported data.
• Users with toxic usernames produce more toxic content (of various types) than those with neutral usernames.
• Users with toxic usernames are more likely to have their account suspended than those with neutral usernames.
Abstract
Without relying on any user reports, we use algorithmic detection and Bayesian statistical methods to analyse two large data streams (329 k users) of Reddit content to study the correlations between username toxicity (of various types, such as offensive or sexually explicit) and their online toxic behavior (personal attacks, sexual harassment among others).
As it turns out, username toxicity (type) is a useful predictor in online profiling. Users with toxic usernames produce more toxic content than their neutral counterparts, with the difference in predicted mean increasing with activity (predicted 1.9 vs. 1.4 toxic comments a week for users with regular activity, and 5.6 vs. 4 for top 5% active users). More users with toxic usernames engage in toxic behavior than among neutral usernames (around 40% vs. 30%). They are also around 2.2 times more likely to have their account suspended by moderators (3.2% vs. 1.5% probability of suspension for regular and 4.5% vs. 2% for top 5% users)—detailed results vary depending on the username toxicity type and toxic behavior type. Thus, username toxicity can be used in the efforts of online communities to predict toxic behavior and to provide more safety to their users.
Keywords: Verbal aggression onlineUsernamesSocial mediaArtificial intelligenceBayesian data analysisReddit
☆ During the study, we have utilized content that is publicly available on Reddit.com and can be accessed via the Reddit API or other similar technologies. Since usernames were essential for the analysis, we could not fully anonymize them in the released datasets, but to provide our subjects more anonymity (even though Reddit is characterized by site-wide norms which discourage from using one's real name Proferes, Jones, Gilbert, Fiesler, & Zimmer, 2021), we have additionally obfuscated usernames with an alteration that will greatly impede if not prevent from accessing the account of each individual, but without changing the semantics of the usernames. This study was also not interventional research and no posts or comments of particular users are quoted. To maximize confidentiality, as a part of the released datasets, we have included the summarised data points only with the already aforementioned altered usernames. For these reasons, we assume that no distress or harm might be involved and no informed consent was required (following point 8.05 of the Ethical Principles of Psychologists and Code of Conduct of the American Psychological Association). All the data sets, source code, and technical documentation are available at https://rfl-urbaniak.github.io/namespotting/. The toxicity identification tools are not open-sourced, but its use is free for researchers in academia and NGOs, and anyone who contacts the company with the willingness to reproduce the result, will obtain access.
☆☆ Disclaimer is anonymised due to Double Blind Peer Review policy. ORCID(s)
☆☆☆ Some computing power and part of this research has been funded by National Science Center research grant number 2016/22/E/HS1/00304. More computing power and further research has been funded by Samurai Labs.
No comments:
Post a Comment