Monday, March 25, 2019

54–58% of Google search snippets amplify partisanship, likely because of journalistic practice: It uses terms & quotes from partisan politicians in the introduction (& meta-data) of articles (favored by the summarization algorithm)

Auditing the Partisanship of Google Search Snippets. Desheng Hu et al. To be presented in THE International World Wide Web Conference 2019. https://cbw.sh/static/pdf/hu-www19.pdf

Abstract: The text snippets presented in web search results provide userswith a slice of page content that they can quickly scan to help in-form their click decisions. However, little is known about how thesesnippets are generated or how they relate to a user’s search query. Motivated by the growing body of evidence suggesting that searchengine rankings can influence undecided voters, we conducted an algorithm audit of the political partisanship of Google Search snip-pets relative to the webpages they are extracted from. To accomplish this, we constructed lexicon of partisan cues to measure partisan-ship and construct a set of left- and right-leaning search queries.Then, we collected a large dataset of Search Engine Results Pages (SERPs) by running our partisan queries and their autocompletesuggestions on Google Search. After using our lexicon to score themachine-coded partisanship of snippets and webpages, we found that Google Search’s snippets generally amplify partisanship, and that this effect is robust across different types of webpages, query topics, and partisan (left- and right-leaning) queries.

---
• We present the first large-scale analysis of machine-codedpartisanship in Google Search snippets, covering 4,570 political queries and their autocomplete suggestions.
• We audit the behavior of Google Search’s document summarization algorithm, and find that snippets tend to be drawn from text that is near the beginning of webpages. We further observe that the algorithm leverages visible text and textual meta-data (such as alt-text on images) from webpages.
• Overall, we find that 54–58% of snippets amplify partisanship, depending on the fraction of our lexicon that is used for scoring, i.e., the snippets contain stronger partisan cues on average than the corresponding webpage they were synthesized from. This finding remains consistent across SERPs from left- and right-leaning queries and pages with and with-out structured meta-data that may influence Google Search’s document summarization algorithm [28, 29].
• Surprisingly, we find that 19–24% of snippets have inverse partisanship than the corresponding webpage.
• We identify 31 websites where Google Search consistently produces snippets that differ from the underlying webpagesin terms of the machine-coded partisanship, with high statistical significance. These websites include prominent news and social media services.

We believe that it is highly unlikely that Google has intentionallyengineered their document summarization algorithm to amplify partisan cues. Instead, a more likely explanation for our findingsis that journalistic practice encourages the use of partisan terms and quotes from partisan politicians in the introduction (and meta-data) of articles, which are also the types of text favored by the summarization algorithm.

It is unlikely that Google has intentionally engineered their document summarization algorithm to amplify partisan cues. Instead, a more likely explanation for our findings is that journalistic practice encourages the use of partisan terms and quotes from partisan politicians in the introduction (and meta-data) of articles, which are also the types of text favored by the summarization algorithm.

No comments:

Post a Comment