skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Satadisha Saha Bhowmick"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. An important task for Information Extraction from Microblogs is Named Entity Recognition (NER) that extracts mentions of real-world entities from microblog messages and meta-information like entity type for better entity characterization. A lot of microblog NER systems have rightly sought to prioritize modeling the non-literary nature of microblog text. These systems are trained on offline static datasets and extract a combination of surface-level features – orthographic, lexical, and semantic – from individual messages for noisy text modeling and entity extraction. But given the constantly evolving nature of microblog streams, detecting all entity mentions from such varying yet limited context in short messages remains a difficult problem to generalize. In this paper, we propose the NER Globalizer pipeline better suited for NER on microblog streams. It characterizes the isolated message processing by existing NER systems as modeling local contextual embeddings, where learned knowledge from the immediate context of a message is used to suggest seed entity candidates. Additionally, it recognizes that messages within a microblog stream are topically related and often repeat mentions of the same entity. This suggests building NER systems that go beyond localized processing. By leveraging occurrence mining, the proposed system therefore follows up traditional NER modeling by extracting additional mentions of seed entity candidates that were previously missed. Candidate mentions are separated into well-defined clusters which are then used to generate a pooled global embedding drawn from the collective context of the candidate within a stream. The global embeddings are utilized to separate false positives from entities whose mentions are produced in the final NER output. Our experiments show that the proposed NER system exhibits superior effectiveness on multiple NER datasets with an average Macro F1 improvement of 47.04% over the best NER baseline while adding only a small computational overhead. 
    more » « less