skip to main content


This content will become publicly available on June 5, 2024

Title: Capturing the Aftermath of the Dobbs v. Jackson Women’s Health Organization Decision in Google Search Results across the U.S.
How do Google Search results change following an impactful real-world event, such as the U.S. Supreme Court decision on June 24, 2022 to overturn Roe v. Wade? And what do they tell us about the nature of event-driven content, generated by various participants in the online information environment? In this paper, we present a dataset of more than 1.74 million Google Search results pages collected between June 24 and July 17, 2022, intended to capture what Google Search surfaced in response to queries about this event of national importance. These search pages were collected for 65 locations in 13 U.S. states, a mix of red, blue, and purple states, with respect to their voting patterns. We describe the process of building a set of circa 1,700 phrases used for searching Google, how we gathered the search results for each location, and how these results were parsed to extract information about the most frequently encountered web domains. We believe that this dataset, which comprises raw data (search results as HTML files) and processed data (extracted links organized as CSV files) can be used to answer research questions that are of interest to computational social scientists as well as communication and media studies scholars.  more » « less
Award ID(s):
1751087
NSF-PAR ID:
10438930
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the International AAAI Conference on Web and Social Media
Volume:
17
ISSN:
2162-3449
Page Range / eLocation ID:
1063 to 1072
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The COVID-19 pandemic has dramatically altered family life in the United States. Over the long duration of the pandemic, parents had to adapt to shifting work conditions, virtual schooling, the closure of daycare facilities, and the stress of not only managing households without domestic and care supports but also worrying that family members may contract the novel coronavirus. Reports early in the pandemic suggest that these burdens have fallen disproportionately on mothers, creating concerns about the long-term implications of the pandemic for gender inequality and mothers’ well-being. Nevertheless, less is known about how parents’ engagement in domestic labor and paid work has changed throughout the pandemic, what factors may be driving these changes, and what the long-term consequences of the pandemic may be for the gendered division of labor and gender inequality more generally.

    The Study on U.S. Parents’ Divisions of Labor During COVID-19 (SPDLC) collects longitudinal survey data from partnered U.S. parents that can be used to assess changes in parents’ divisions of domestic labor, divisions of paid labor, and well-being throughout and after the COVID-19 pandemic. The goal of SPDLC is to understand both the short- and long-term impacts of the pandemic for the gendered division of labor, work-family issues, and broader patterns of gender inequality.

    Survey data for this study is collected using Prolifc (www.prolific.co), an opt-in online platform designed to facilitate scientific research. The sample is comprised U.S. adults who were residing with a romantic partner and at least one biological child (at the time of entry into the study). In each survey, parents answer questions about both themselves and their partners. Wave 1 of SPDLC was conducted in April 2020, and parents who participated in Wave 1 were asked about their division of labor both prior to (i.e., early March 2020) and one month after the pandemic began. Wave 2 of SPDLC was collected in November 2020. Parents who participated in Wave 1 were invited to participate again in Wave 2, and a new cohort of parents was also recruited to participate in the Wave 2 survey. Wave 3 of SPDLC was collected in October 2021. Parents who participated in either of the first two waves were invited to participate again in Wave 3, and another new cohort of parents was also recruited to participate in the Wave 3 survey. This research design (follow-up survey of panelists and new cross-section of parents at each wave) will continue through 2024, culminating in six waves of data spanning the period from March 2020 through October 2024. An estimated total of approximately 6,500 parents will be surveyed at least once throughout the duration of the study.

    SPDLC data will be released to the public two years after data is collected; Waves 1 and 2 are currently publicly available. Wave 3 will be publicly available in October 2023, with subsequent waves becoming available yearly. Data will be available to download in both SPSS (.sav) and Stata (.dta) formats, and the following data files will be available: (1) a data file for each individual wave, which contains responses from all participants in that wave of data collection, (2) a longitudinal panel data file, which contains longitudinal follow-up data from all available waves, and (3) a repeated cross-section data file, which contains the repeated cross-section data (from new respondents at each wave) from all available waves. Codebooks for each survey wave and a detailed user guide describing the data are also available. Response Rates: Of the 1,157 parents who participated in Wave 1, 828 (72%) also participated in the Wave 2 study. Presence of Common Scales: The following established scales are included in the survey:
    • Self-Efficacy, adapted from Pearlin's mastery scale (Pearlin et al., 1981) and the Rosenberg self-esteem scale (Rosenberg, 2015) and taken from the American Changing Lives Survey
    • Communication with Partner, taken from the Marriage and Relationship Survey (Lichter & Carmalt, 2009)
    • Gender Attitudes, taken from the National Survey of Families and Households (Sweet & Bumpass, 1996)
    • Depressive Symptoms (CES-D-10)
    • Stress, measured using Cohen's Perceived Stress Scale (Cohen, Kamarck, & Mermelstein, 1983)
    Full details about these scales and all other items included in the survey can be found in the user guide and codebook
    The second wave of the SPDLC was fielded in November 2020 in two stages. In the first stage, all parents who participated in W1 of the SPDLC and who continued to reside in the United States were re-contacted and asked to participate in a follow-up survey. The W2 survey was posted on Prolific, and messages were sent via Prolific’s messaging system to all previous participants. Multiple follow-up messages were sent in an attempt to increase response rates to the follow-up survey. Of the 1,157 respondents who completed the W1 survey, 873 at least started the W2 survey. Data quality checks were employed in line with best practices for online surveys (e.g., removing respondents who did not complete most of the survey or who did not pass the attention filters). After data quality checks, 5.2% of respondents were removed from the sample, resulting in a final sample size of 828 parents (a response rate of 72%).

    In the second stage, a new sample of parents was recruited. New parents had to meet the same sampling criteria as in W1 (be at least 18 years old, reside in the United States, reside with a romantic partner, and be a parent living with at least one biological child). Also similar to the W1 procedures, we oversampled men, Black individuals, individuals who did not complete college, and individuals who identified as politically conservative to increase sample diversity. A total of 1,207 parents participated in the W2 survey. Data quality checks led to the removal of 5.7% of the respondents, resulting in a final sample size of new respondents at Wave 2 of 1,138 parents.

    In both stages, participants were informed that the survey would take approximately 20 minutes to complete. All panelists were provided monetary compensation in line with Prolific’s compensation guidelines, which require that all participants earn above minimum wage for their time participating in studies.
    To be included in SPDLC, respondents had to meet the following sampling criteria at the time they enter the study: (a) be at least 18 years old, (b) reside in the United States, (c) reside with a romantic partner (i.e., be married or cohabiting), and (d) be a parent living with at least one biological child. Follow-up respondents must be at least 18 years old and reside in the United States, but may experience changes in relationship and resident parent statuses. Smallest Geographic Unit: U.S. State

    This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. In accordance with this license, all users of these data must give appropriate credit to the authors in any papers, presentations, books, or other works that use the data. A suggested citation to provide attribution for these data is included below:            

    Carlson, Daniel L. and Richard J. Petts. 2022. Study on U.S. Parents’ Divisions of Labor During COVID-19 User Guide: Waves 1-2.  

    To help provide estimates that are more representative of U.S. partnered parents, the SPDLC includes sampling weights. Weights can be included in statistical analyses to make estimates from the SPDLC sample representative of U.S. parents who reside with a romantic partner (married or cohabiting) and a child aged 18 or younger based on age, race/ethnicity, and gender. National estimates for the age, racial/ethnic, and gender profile of U.S. partnered parents were obtained using data from the 2020 Current Population Survey (CPS). Weights were calculated using an iterative raking method, such that the full sample in each data file matches the nationally representative CPS data in regard to the gender, age, and racial/ethnic distributions within the data. This variable is labeled CPSweightW2 in the Wave 2 dataset, and CPSweightLW2 in the longitudinal dataset (which includes Waves 1 and 2). There is not a weight variable included in the W1-W2 repeated cross-section data file.
     
    more » « less
  2. Search engines, by ranking a few links ahead of million others based on opaque rules, open themselves up to criticism of bias. Previous research has focused on measuring political bias of search engine algorithms to detect possible search engine manipulation effects on voters or unbalanced ideological representation in search results. Insofar that these concerns are related to the principle of fairness, this notion of fairness can be seen as explicitly oriented toward election candidates or political processes and only implicitly oriented toward the public at large. Thus, we ask the following research question: how should an auditing framework that is explicitly centered on the principle of ensuring and maximizing fairness for the public (i.e., voters) operate? To answer this question, we qualitatively explore four datasets about elections and politics in the United States: 1) a survey of eligible U.S. voters about their information needs ahead of the 2018 U.S. elections, 2) a dataset of biased political phrases used in a large-scale Google audit ahead of the 2018 U.S. election, 3) Google’s “related searches” phrases for two groups of political candidates in the 2018 U.S. election (one group is composed entirely of women), and 4) autocomplete suggestions and result pages for a set of searches on the day of a statewide election in the U.S. state of Virginia in 2019. We find that voters have much broader information needs than the search engine audit literature has accounted for in the past, and that relying on political science theories of voter modeling provides a good starting point for informing the design of voter-centered audits. 
    more » « less
  3. The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below. This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes. We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. https://github.com/infoqualitylab/Scopus_author_info_collection ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports. We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016). UPDATES IN THIS VERSION COMPARED TO V1 (Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal (2022): The Salt Controversy Systematic Review Reports and Primary Study Reports Network Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6128763_V1) - We added two new columns in salt_cont_author.csv, "author_id_scopus" and "author_id_mannual" to indicate which author ids were from Scopus and which were assigned by us. - We corrected a few mistakes in "last_search_year," "last_search_month," and "last_search_day" column in systematic_review_inclusion_criteria.csv. - We systematically adjusted the information related to report #12 in report_list.csv, systematic_review_inclusion_criteria.csv, supplementary_reference_list.pdf, salt_cont_author.csv, and inclusion_net_edges.csv to reflect information found in Adler 2014 (Adler, A. J., Taylor, F., Martin, N., Gottlieb, S., Taylor, R. S., & Ebrahim, S. (2014). Reduced dietary salt for the prevention of cardiovascular disease. Cochrane Database of Systematic Reviews, 12. https://doi.org/10.1002/14651858.CD009217.pub3). See our explaination in section "Explanations about report #12". - We sorted the salt_cont_author.csv file by "author_id," not by "ID" (the id of the report). 
    more » « less
  4. The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184 ), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below. This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes. We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. https://github.com/infoqualitylab/Scopus_author_info_collection ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports. We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016). UPDATES IN THIS VERSION COMPARED TO V2 (Fu, Yuanxi; Hsiao, Tzu-Kun; Joshi, Manasi Ballal (2022): The Salt Controversy Systematic Review Reports and Primary Study Reports Network Dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6128763_V2) - We added a new column "pub_date" to report_list.csv - We corrected mistakes in supplementary_reference_list.pdf for report #28 and report #80. The author of report #28 is not Salisbury D but Khaw, K.-T., & Barrett-Connor, E. Report #80 was mistakenly mixed up with report #81. 
    more » « less
  5. The salt controversy is the public health debate about whether a population-level salt reduction is beneficial. This dataset covers 82 publications--14 systematic review reports (SRRs) and 68 primary study reports (PSRs)--addressing the effect of sodium intake on cerebrocardiovascular disease or mortality. These present a snapshot of the status of the salt controversy as of September 2014 according to previous work by epidemiologists: The reports and their opinion classification (for, against, and inconclusive) were from Trinquart et al. (2016) (Trinquart, L., Johns, D. M., & Galea, S. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology, 45(1), 251–260. https://doi.org/10.1093/ije/dyv184), which collected 68 PSRs, 14 SRRs, 11 clinical guideline reports, and 176 comments, letters, or narrative reviews. Note that our dataset covers only the 68 PSRs and 14 SRRs from Trinquart et al. 2016, not the other types of publications, and it adds additional information noted below. This dataset can be used to construct the inclusion network and the co-author network of the 14 SRRs and 68 PSRs. A PSR is "included" in an SRR if it is considered in the SRR's evidence synthesis. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis or PSRs. Based on which PSRs are included in which SRRs, we can construct the inclusion network. The inclusion network is a bipartite network with two types of nodes: one type represents SRRs, and the other represents PSRs. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. The attribute file (report_list.csv) includes attributes of the 82 reports, and the edge list file (inclusion_net_edges.csv) contains the edge list of the inclusion network. Notably, 11 PSRs have never been included in any SRR in the dataset. They are unused PSRs. If visualized with the inclusion network, they will appear as isolated nodes. We used a custom-made workflow (Fu, Y. (2022). Scopus author info tool (1.0.1) [Python]. https://github.com/infoqualitylab/Scopus_author_info_collection ) that uses the Scopus API and manual work to extract and disambiguate authorship information for the 82 reports. The author information file (salt_cont_author.csv) is the product of this workflow and can be used to compute the co-author network of the 82 reports. We also provide several other files in this dataset. We collected inclusion criteria (the criteria that make a PSR eligible to be included in an SRR) and recorded them in the file systematic_review_inclusion_criteria.csv. We provide a file (potential_inclusion_link.csv) recording whether a given PSR had been published as of the search date of a given SRR, which makes the PSR potentially eligible for inclusion in the SRR. We also provide a bibliography of the 82 publications (supplementary_reference_list.pdf). Lastly, we discovered minor discrepancies between the inclusion relationships identified by Trinquart et al. (2016) and by us. Therefore, we prepared an additional edge list (inclusion_net_edges_trinquart.csv) to preserve the inclusion relationships identified by Trinquart et al. (2016). 
    more » « less