NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Target Span Detection for Implicit Harmful Content

https://doi.org/10.1145/3664190.3672525

Jafari, Nazanin; Allan, James; Sarwar, Sheikh Muhammad (August 2024, ACM)

Identifying the targets of hate speech is a crucial step in grasping the nature of such speech and, ultimately, in improving the detection of offensive posts on online forums. Much harmful content on online platforms uses implicit language – especially when targeting vulnerable and protected groups – such as using stereotypical characteristics instead of explicit target names, making it harder to detect and mitigate the language. In this study, we focus on identifying implied targets of hate speech, essential for recognizing subtler hate speech and enhancing the detection of harmful content on digital platforms. We define a new task aimed at identifying the targets even when they are not explicitly stated. To address that task, we collect and annotate target spans in three prominent implicit hate speech datasets: SBIC, DynaHate, and IHC. We call the resulting merged collection Implicit-Target-Span. The collection is achieved using an innovative pooling method with matching scores based on human annotations and Large Language Models (LLMs). Our experiments indicate that Implicit-Target-Span provides a challenging test bed for target span detection methods.
more » « less
Full Text Available
Utility of Missing Concepts in Query-biased Summarization

https://doi.org/10.1145/3404835.3463121

Sarwar, Sheikh Muhammad; Moraes, Felipe; Jiang, Jiepu; Allan, James (July 2021, Proceedings of The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 21))
null (Ed.)
Query Biased Summarization (QBS) aims to produce a summary of the documents retrieved against a query to reduce the human effort for inspecting the full-text content of a document. Typical summarization approaches extract a document text snippet that has term overlap with the query and show that to a searcher. While snippets show relevant information in a document, to the best of our knowledge, there does not exist a summarization system that shows what relevant concepts is missing in a document. Our study focuses on the reduction of user effort in finding relevant documents by exposing them to omitted relevant information. To this end, we use a classical approach, DSPApprox, to find terms or phrases relevant to a query. Then we identify which terms or phrases are missing in a document, present them in a search interface, and ask crowd workers to judge document relevance based on snippets and missing information. Experimental results show both benefits and limitations of this approach.
more » « less
Full Text Available
SearchIE: A Retrieval Approach for Information Extraction

https://doi.org/10.1145/3341981.3344248

Sarwar, Sheikh Muhammad; Allan, James (September 2019, Proceedings of the International Conference on the Theory of Information Retrieval (ICTIR '19))

We address the problem of entity extraction with a very few examples and address it with an information retrieval approach. Existing extraction approaches consider millions of features extracted from a large number of training data cases. Generally, these data cases are generated by a distant supervision approach with entities in a knowledge base. After that a model is learned and entities are extracted. However, with extremely limited data a ranked list of relevant entities can be helpful to obtain user feedback to get more training data. As Information Retrieval (IR) is a natural choice for ranked list generation, we explore its effectiveness in such a limited data case. To this end, we propose SearchIE, a hybrid of IR and NLP approach that indexes documents represented using handcrafted NLP features. At query time SearchIE samples terms from a Logistic Regression model trained with extremely limited data. We show that SearchIE supersedes state-of-the-art NLP models to find civilians killed by US police officers with even a single civilian name as example.
more » « less
Full Text Available
Sentence Retrieval for Entity List Extraction with a Seed, Context, and Topic

https://doi.org/10.1145/3341981.3344250

Sarwar, Sheikh Muhammad; Foley, John; Yang, Liu; Allan, James (September 2019, Proceedings of the 2019 International Conference on the Theory of Information Retrieval (ICTIR 2019))

We present a variation of the corpus-based entity set expansion and entity list completion task. A user-specified query and a sentence containing one seed entity are the input to the task. The output is a list of sentences that contain other instances of the entity class indicated by the input. We construct a semantic query expansion model that leverages topical context around the seed entity and scores sentences. The proposed model finds 46\% of the target entity class by retrieving 20 sentences on average. It achieves 16\% improvement over BM25 model in terms of recall@20.
more » « less
Full Text Available
SQuID: Semantic Similarity-Aware Query Intent Discovery

https://doi.org/10.1145/3183713.3193548

Fariha, Anna; Sarwar, Sheikh Muhammad; Meliou, Alexandra (July 2018, Proceedings of the 2018 International Conference on Management of Data)

Full Text Available

Search for: All records