skip to main content

This content will become publicly available on January 1, 2023

Title: Supercalifragilisticexpialidocious: Why Using the “Right” Readability Formula in Children’s Web Search Matters
Readability is a core component of information retrieval (IR) tools as the complexity of a resource directly affects its relevance: a resource is only of use if the user can comprehend it. Even so, the link between readability and IR is often overlooked. As a step towards advancing knowledge on the influence of readability on IR, we focus on Web search for children. We explore how traditional formulas–which are simple, efficient, and portable–fare when applied to estimating the readability of Web resources for children written in English. We then present a formula well-suited for readability estimation of child-friendly Web resources. Lastly, we empirically show that readability can sway children’s information access. Outcomes from this work reveal that: (i) for Web resources targeting children, a simple formula suffices as long as it considers contemporary terminology and audience requirements, and (ii) instead of turning to Flesch-Kincaid–a popular formula–the use of the “right” formula can shape Web search tools to best serve children. The work we present herein builds on three pillars: Audience, Application, and Expertise. It serves as a blueprint to place readability estimation methods that best apply to and inform IR applications serving varied audiences.
Hagen, Matthias and
Award ID(s):
Publication Date:
Journal Name:
44th European Conference on Information Retrieval (ECIR)
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Bicycle design has not changed for a long time, as they are well-crafted for those that possess the skills to ride, i.e., adults. Those learning to ride, however, often need additional support in the form of training wheels. Searching for information on the Web is much like riding a bicycle, where modern search engines (the bicycle) are optimized for general use and adult users, but lack the functionality to support non-traditional audiences and environments. In this thesis, we introduce a set of training wheels in the form of a learning to rank model as augmentation for standard search engines tomore »support classroom search activities for children (ages 6–11). This new model extends the known listwise learning to rank framework through the balancing of risk and reward. Doing so enables the model to prioritize Web resources of high educational alignment, appropriateness, and adequate readability by analyzing the URLs, snippets, and page titles of Web resources retrieved by a given mainstream search engine. Experiments including an ablation study and comparisons with existing baselines showcase the correctness of the proposed model. Outcomes of this work demonstrate the value of considering multiple perspectives inherent to the classroom setting, e.g., educational alignment, readability, and objectionability, when applied to the design of algorithms that can better support children's information discovery.« less
  2. Given the more widespread nature of natural language interfaces, it is increasingly important to understand who are accessing those interfaces, and how those interfaces are being used. In this paper, we explore spellchecking in the context of web search with children as the target audience. In particular, via a literature review we show that, while widely used, popular search tools are ill-designed for children. We then use spellcheckers as a case study to highlight the need for an interdisciplinary approach that brings together natural language processing, education, human-computer interaction to address a known information retrieval problem: query misspelling. We concludemore »that it is imperative that those for whom the interfaces are designed have a voice in the design process.« less
  3. Children use popular web search tools, which are generally designed for adult users. Because children have different developmental needs than adults, these tools may not always adequately support their search for information. Moreover, even though search tools offer support to help in query formulation, these too are aimed at adults and may hinder children rather than help them. This calls for the examination of existing technologies in this area, to better understand what remains to be done when it comes to facilitating query-formulation tasks for young users. In this paper, we investigate interaction elements of query formulation--including query suggestion algorithms--formore »children. The primary goals of our research efforts are to: (i) examine existing plug-ins and interfaces that explicitly aid children's query formulation; (ii) investigate children's interactions with suggestions offered by a general-purpose query suggestion strategy vs. a counterpart designed with children in mind; and (iii) identify, via participatory design sessions, their preferences when it comes to tools / strategies that can help children find information and guide them through the query formulation process. Our analysis shows that existing tools do not meet children's needs and expectations; the outcomes of our work can guide researchers and developers as they implement query formulation strategies for children.« less
  4. Abstract Precipitation measurements with high spatiotemporal resolution are a vital input for hydrometeorological and water resources studies; decision-making in disaster management; and weather, climate, and hydrological forecasting. Moreover, real-time precipitation estimation with high precision is pivotal for the monitoring and managing of catastrophic hydroclimate disasters such as flash floods, which frequently transpire after extreme rainfall. While algorithms that exclusively use satellite infrared data as input are attractive owing to their rich spatiotemporal resolution and near-instantaneous availability, their sole reliance on cloud-top brightness temperature (Tb) readings causes underestimates in wet regions and overestimates in dry regions—this is especially evident over themore »western contiguous United States (CONUS). We introduce an algorithm, the Precipitation Estimations from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN) Dynamic Infrared–Rain rate model (PDIR), which utilizes climatological data to construct a dynamic (i.e., laterally shifting) Tb–rain rate relationship that has several notable advantages over other quantitative precipitation-estimation algorithms and noteworthy skill over the western CONUS. Validation of PDIR over the western CONUS shows a promising degree of skill, notably at the annual scale, where it performs well in comparison to other satellite-based products. Analysis of two extreme landfalling atmospheric rivers show that solely IR-based PDIR performs reasonably well compared to other IR- and PMW-based satellite rainfall products, marking its potential to be effective in real-time monitoring of extreme storms. This research suggests that IR-based algorithms that contain the spatiotemporal richness and near-instantaneous availability needed for rapid natural hazards response may soon contain the skill needed for hydrologic and water resource applications.« less
  5. The design of cyber-physical systems (CPSs) requires methods and tools that can efficiently reason about the interaction between discrete models, e.g., representing the behaviors of ``cyber'' components, and continuous models of physical processes. Boolean methods such as satisfiability (SAT) solving are successful in tackling large combinatorial search problems for the design and verification of hardware and software components. On the other hand, problems in control, communications, signal processing, and machine learning often rely on convex programming as a powerful solution engine. However, despite their strengths, neither approach would work in isolation for CPSs. In this paper, we present a newmore »satisfiability modulo convex programming (SMC) framework that integrates SAT solving and convex optimization to efficiently reason about Boolean and convex constraints at the same time. We exploit the properties of a class of logic formulas over Boolean and nonlinear real predicates, termed monotone satisfiability modulo convex formulas, whose satisfiability can be checked via a finite number of convex programs. Following the lazy satisfiability modulo theory (SMT) paradigm, we develop a new decision procedure for monotone SMC formulas, which coordinates SAT solving and convex programming to provide a satisfying assignment or determine that the formula is unsatisfiable. A key step in our coordination scheme is the efficient generation of succinct infeasibility proofs for inconsistent constraints that can support conflict-driven learning and accelerate the search. We demonstrate our approach on different CPS design problems, including spacecraft docking mission control, robotic motion planning, and secure state estimation. We show that SMC can handle more complex problem instances than state-of-the-art alternative techniques based on SMT solving and mixed integer convex programming.« less