The increasing popularity of adblockers has prompted online publishers to retaliate against adblock users by deploying anti-adblock scripts, which detect adblock users and bar them from accessing content unless they disable their adblocker. To circumvent antiadblockers, adblockers rely on manually curated anti-adblock filter lists for removing anti-adblock scripts. Anti-adblock filter lists currently rely on informal crowdsourced feedback from users to add/remove filter list rules. In this paper, we present the first comprehensive study of anti-adblock filter lists to analyze their effectiveness against anti-adblockers. Specifically, we compare and contrast the evolution of two popular anti-adblock filter lists. We show that these filter lists are implemented very differently even though they currently have a comparable number of filter list rules. We then use the Internet Archive’s Wayback Machine to conduct a retrospective coverage analysis of these filter lists on Alexa top-5K websites over the span of last five years. We find that the coverage of these filter lists has considerably improved since 2014 and they detect anti-adblockers on about 9% of Alexa top-5K websites. To improve filter list coverage and speedup addition of new filter rules, we also design and implement a machine learning based method to automatically
more »
« less
Top-k List Aggregation: Mathematical Formulations and Polyhedral Comparisons
Top-k lists are being increasingly utilized in various fields and applications including information retrieval, machine learning, and recommendation systems. Since multiple top-k lists may be generated by different algorithms to evaluate the same set of entities or system of interest, there is often a need to consolidate this collection of heterogeneous top-k lists to obtain a more robust and coherent list. This work introduces various exact mathematical formulations of the top-k list aggregation problem under the generalized Kendall tau distance. Furthermore, the strength of the proposed formulations is analyzed from a polyhedral point of view.
more »
« less
- Award ID(s):
- 1850355
- PAR ID:
- 10336860
- Date Published:
- Journal Name:
- International Symposium on Combinatorial Optimization
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Relevance feedback techniques assume that users provide relevance judgments for the top k (usually 10) documents and then re-rank using a new query model based on those judgments. Even though this is effective, there has been little research recently on this topic because requiring users to provide substantial feedback on a result list is impractical in a typical web search scenario. In new environments such as voice-based search with smart home devices, however, feedback about result quality can potentially be obtained during users' interactions with the system. Since there are severe limitations on the length and number of results that can be presented in a single interaction in this environment, the focus should move from browsing result lists to iterative retrieval and from retrieving documents to retrieving answers. In this paper, we study iterative relevance feedback techniques with a focus on retrieving answer passages. We first show that iterative feedback can be at least as effective as the top-k approach on standard TREC collections, and more effective on answer passage collections. We then propose an iterative feedback model for answer passages based on semantic similarity at passage level and show that it can produce significant improvements compared to both word-based iterative feedback models and those based on term-level semantic similarity.more » « less
-
Abstract Recently, Dvořák, Norin, and Postle introduced flexibility as an extension of list coloring on graphs (J Graph Theory 92(3):191–206, 2019, https://doi.org/10.1002/jgt.22447 ). In this new setting, each vertex v in some subset of V ( G ) has a request for a certain color r ( v ) in its list of colors L ( v ). The goal is to find an L coloring satisfying many, but not necessarily all, of the requests. The main studied question is whether there exists a universal constant $$\varepsilon >0$$ ε > 0 such that any graph G in some graph class $$\mathscr {C}$$ C satisfies at least $$\varepsilon$$ ε proportion of the requests. More formally, for $$k > 0$$ k > 0 the goal is to prove that for any graph $$G \in \mathscr {C}$$ G ∈ C on vertex set V , with any list assignment L of size k for each vertex, and for every $$R \subseteq V$$ R ⊆ V and a request vector $$(r(v): v\in R, ~r(v) \in L(v))$$ ( r ( v ) : v ∈ R , r ( v ) ∈ L ( v ) ) , there exists an L -coloring of G satisfying at least $$\varepsilon |R|$$ ε | R | requests. If this is true, then $$\mathscr {C}$$ C is called $$\varepsilon$$ ε - flexible for lists of size k . Choi, Clemen, Ferrara, Horn, Ma, and Masařík (Discrete Appl Math 306:20–132, 2022, https://doi.org/10.1016/j.dam.2021.09.021 ) introduced the notion of weak flexibility , where $$R = V$$ R = V . We further develop this direction by introducing a tool to handle weak flexibility. We demonstrate this new tool by showing that for every positive integer b there exists $$\varepsilon (b)>0$$ ε ( b ) > 0 so that the class of planar graphs without $$K_4, C_5 , C_6 , C_7, B_b$$ K 4 , C 5 , C 6 , C 7 , B b is weakly $$\varepsilon (b)$$ ε ( b ) -flexible for lists of size 4 (here $$K_n$$ K n , $$C_n$$ C n and $$B_n$$ B n are the complete graph, a cycle, and a book on n vertices, respectively). We also show that the class of planar graphs without $$K_4, C_5 , C_6 , C_7, B_5$$ K 4 , C 5 , C 6 , C 7 , B 5 is $$\varepsilon$$ ε -flexible for lists of size 4. The results are tight as these graph classes are not even 3-colorable.more » « less
-
The skip list is an elegant dictionary data structure that is com- monly deployed in RAM. A skip list with N elements supports searches, inserts, and deletes in O(logN) operations with high probability (w.h.p.) and range queries returning K elements in O(log N + K) operations w.h.p. A seemingly natural way to generalize the skip list to external memory with block size B is to “promote” with probability 1/B, rather than 1/2. However, there are practical and theoretical obsta- cles to getting the skip list to retain its efficient performance, space bounds, and high-probability guarantees. We give an external-memory skip list that achieves write- optimized bounds. That is, for 0 < ε < 1, range queries take O(logBε N + K/B) I/Os w.h.p. and insertions and deletions take O((logBε N)/B1−ε) amortized I/Os w.h.p. Our write-optimized skip list inherits the virtue of simplicity from RAM skip lists. Moreover, it matches or beats the asymptotic bounds of prior write-optimized data structures such as Bε trees or LSM trees. These data structures are deployed in high-performance databases and file systems.more » « less
-
Abstract We introduce theReverseSpatial Top-kKeyword (RSK)query, which is defined as:given a query term q, an integer k and a neighborhood size find all the neighborhoods of that size where q is in the top-k most frequent terms among the social posts in those neighborhoods. An obvious approach would be to partition the dataset with a uniform grid structure of a given cell size and identify the cells where this term is in the top-k most frequent keywords. However, this answer would be incomplete since it only checks for neighborhoods that are perfectly aligned with the grid. Furthermore, for every neighborhood (square) that is an answer, we can define infinitely more result neighborhoods by minimally shifting the square without including more posts in it. To address that, we need to identify contiguous regions where any point in the region can be the center of a neighborhood that satisfies the query. We propose an algorithm to efficiently answer an RSK query using an index structure consisting of a uniform grid augmented by materialized lists of term frequencies. We apply various optimizations that drastically improve query latency against baseline approaches. We also provide a theoretical model to choose the optimal cell size for the index to minimize query latency. We further examine a restricted version of the problem (RSKR) that limits the scope of the answer and propose efficientapproximatealgorithms. Finally, we examine how parallelism can improve performance by balancing the workload using a smartload slicingtechnique. Extensive experimental performance evaluation of the proposed methods using real Twitter datasets and crime report datasets, shows the efficiency of our optimizations and the accuracy of the proposed theoretical model.more » « less
An official website of the United States government

