skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Government websites as data: a methodological pipeline with application to the websites of municipalities in the United States
Award ID(s):
1637089 2028675
PAR ID:
10313585
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Information Technology & Politics
ISSN:
1933-1681
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In the United States, every state has a tourism website. These sites highlight the main attractions of the state, travel tips, and blog posts among other relevant information. The funding for these websites often comes from occupancy taxes, a form of taxes that comes from tourists who stay in hotels and visit attractions. Therefore, current and past tourists fund the efforts to draw future tourists into the state. Since state tourism is funded by the success of past tourism efforts, it is important for researchers to spend their time and resources on finding out what efforts were successful and which weren’t. With this comes the importance of seeing trends in past tourism endeavors. By examining past tourism websites, patterns can be drawn about information that changed, from season to season and year to year. These patterns can be used to see what researchers deemed as successful tourism efforts, and help guide future state tourism decisions. Our client, Dr. Florian Zach of the Howard Feiertag Department of Hospitality and Tourism Management, wants to use this historical analysis on state tourism information to help with his research on trends in state tourism website content. Iterations of the California state tourism website, among other sites, are stored as snapshots on the Internet Archive and can be accessed to see changes in websites over time. Our team was given Parquet files of these snapshots dating back to 2008. The goal of the project was to assist Dr. Zach by using the California state tourism website, visitcalifornia.com, and these snapshots as an avenue to explore data extraction and visualization techniques on tourism patterns to later be expanded to other states’ tourism websites. Python’s Pandas library was utilized to examine and extract relevant pieces of data from the given Parquet files. Once the data was extracted, we used Python’s Natural Language Processing Toolkit to remove non-English words, punctuation, and a set of unimportant “stop words”. With this refined data, we were able to make visualizations regarding the frequency of words in the headers and body of the website snapshots. The data was examined in its entirety as well as in groups of seasons and years. Microsoft Excel functions were utilized to examine and visualize the data in these formats. These data extraction and visualization techniques that we became familiar with will be passed down to a future team. The research on state tourism site information can be expanded to different metadata sets and to other states. 
    more » « less
  2. Misinformation poses a threat to democracy and to people’s health. Reliability criteria for news websites can help people identify misinformation. But despite their importance, there has been no empirically substantiated list of criteria for distinguishing reliable from unreliable news websites. We identify reliability criteria, describe how they are applied in practice, and compare them to prior work. Based on our analysis, we distinguish between manipulable and less manipulable criteria and compare politically diverse laypeople as end-users and journalists as expert users. We discuss 11 widely recognized criteria, including the following 6 criteria that are difficult to manipulate: content, political alignment, authors, professional standards, what sources are used, and a website’s reputation. Finally, we describe how technology may be able to support people in applying these criteria in practice to assess the reliability of websites. 
    more » « less