skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Identifying the Gaps in the Coverage of Web Domains in Wikipedia and Wikidata for Credibility Assessment Purposes
In February 2021, Google Search added a new interface feature to support the evaluation of web domains, known as the “About this result” feature. A prominent part of this feature is a snippet of text pulled automatically from Wikipedia, if a Wiki page for the web domain exists. While conducting large-scale audits of Google Search, we discovered that less than 40% of web domains shown in Google Search results contain a Wikipedia page. Then, we retrieved their Wikidata entries and looked at the extent they incorporate features related to W3C credibility signals. The lack of information for many signals points out to avenues for expanding Wikidata coverage.  more » « less
Award ID(s):
1751087
PAR ID:
10575983
Author(s) / Creator(s):
;
Publisher / Repository:
Wiki Workshop (10th edition)
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This study assesses the awareness and perceived utility of two features Google Search introduced in February 2021: “About this result” and “More about this page”. Google stated that the goal of these features is to help users vet unfamiliar web domains (or sources). We investigated whether the features were sufficiently prominent to be detected by frequent users of Google Search, and their perceived utility for making credibility judgments of sources, in one-on-one user studies with 25 undergraduate college students, who identify as frequent users of Google Search. Our results indicate a lack of adoption or awareness of these features by our participants and neutral-positive perceptions of their utility in evaluating web sources. We also examined the perceived usefulness of nine other domain credibility signals collected from the W3C. 
    more » « less
  2. The contemporary Google Search Engine Results Page (SERP) supplements classic blue hyperlinks with complex components. These components produce tensions between searchers, 3rd-party websites, and Google itself over clicks and attention. In this study, we examine 12 SERP components from two categories: (1) extracted results (e.g., featured-snippets) and (2) Google Services (e.g., shopping-ads) to determine their effect on peoples’ behavior. We measure behavior with two variables: (1) click- through rate (CTR) to Google’s own domains versus 3rd-party domains and (2) time spent on the SERP. We apply causal inference methods to an ecologically valid trace dataset comprising 477,485 SERPs from 1,756 participants. We find that multiple components substantially increase CTR to Google domains, while others decrease CTR and increase time on the SERP. These findings may inform efforts to regulate the design of powerful intermediary platforms like Google. 
    more » « less
  3. Google Search is an important way that people seek information about politics [8], and Google states that it is “committed to providing timely and authoritative information on Google Search to help voters understand, navigate, and participate in democratic processes.”1 This paper studies the extent to which government-maintained web domains are represented in the online electoral information environment, as captured through 3.45 Google Search result pages collected during the 2022 US midterm elections for 786 locations across the United States. Focusing on state, county, and local government domains that provide locality-specific information, we study not only the extent to which these sources appear in organic search results, but also the extent to which these sources are correctly targeted to their respective constituents. We label misalignment between the geographic area that non-federal domains serve and the locations for which they appear in search results as algorithmic mistargeting, a subtype of algorithmic misjudgement in which the search algorithm targets locality-specific information to users in different (incorrect) locations. In the context of the 2022 US midterm elections, we find that 71% of all occurrences of state, county, and local government sources were mistargeted, with some domains appearing disproportionately often among organic results despite providing locality-specific information that may not be relevant to all voters. However, we also find that mistargeting often occurs in low ranks. We conclude by considering the potential consequences of extensive mistargeting of non-federal government sources and argue that ensuring the correct targeting of these sources to their respective constituents is a critical part of Google’s role in facilitating access to authoritative and locally-relevant electoral information. 
    more » « less
  4. How do Google Search results change following an impactful real-world event, such as the U.S. Supreme Court decision on June 24, 2022 to overturn Roe v. Wade? And what do they tell us about the nature of event-driven content, generated by various participants in the online information environment? In this paper, we present a dataset of more than 1.74 million Google Search results pages collected between June 24 and July 17, 2022, intended to capture what Google Search surfaced in response to queries about this event of national importance. These search pages were collected for 65 locations in 13 U.S. states, a mix of red, blue, and purple states, with respect to their voting patterns. We describe the process of building a set of circa 1,700 phrases used for searching Google, how we gathered the search results for each location, and how these results were parsed to extract information about the most frequently encountered web domains. We believe that this dataset, which comprises raw data (search results as HTML files) and processed data (extracted links organized as CSV files) can be used to answer research questions that are of interest to computational social scientists as well as communication and media studies scholars. 
    more » « less
  5. Structured data peer production (SDPP) platforms like Wikidata play an important role in knowledge production. Compared to traditional peer production platforms like Wikipedia, Wikidata data is more structured and intended to be used by machines, not (directly) by people; end-user interactions with Wikidata often happen through intermediary "invisible machines." Given this distinction, we wanted to understand Wikidata contributor motivations and how they are affected by usage invisibility caused by the machine intermediaries. Through an inductive thematic analysis of 15 interviews, we find that: (i) Wikidata editors take on two archetypes---Architects who define the ontological infrastructure of Wikidata, and Masons who build the database through data entry and editing; (ii) the structured nature of Wikidata reveals novel editor motivations, such as an innate drive for organizational work; (iii) most Wikidata editors have little understanding of how their contributions are used, which may demotivate some. We synthesize these insights to help guide the future design of SDPP platforms in supporting the engagement of different types of editors. 
    more » « less