skip to main content

Search for: All records

Creators/Authors contains: "Betke, Margrit"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Media framing refers to highlighting certain aspect of an issue in the news to promote a particular interpretation to the audience. Supervised learning has often been used to recognize frames in news articles, requiring a known pool of frames for a particular issue, which must be identified by communication researchers through thorough manual content analysis. In this work, we devise an unsupervised learning approach to discover the frames in news articles automatically. Given a set of news articles for a given issue, e.g., gun violence, our method first extracts frame elements from these articles using related Wikipedia articles and the Wikipedia category system. It then uses a community detection approach to identify frames from these frame elements. We discuss the effectiveness of our approach by comparing the frames it generates in an unsupervised manner to the domain-expert-derived frames for the issue of gun violence, for which a supervised learning model for frame recognition exists.
    Free, publicly-accessible full text available June 1, 2023
  2. Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include illustrations or photographs. However, text recognizers do not work as logical analyzers. Logical layout analysis automatically determines the function of a document text region, that is, it labels each region as a title, paragraph, or caption, and so on, and thus is an essential part of a document understanding system. In the past, rule-based algorithms have been used to conduct logical layout analysis, using limited size data sets. We here instead focus on supervised learning methods for logical layout analysis. We describe LABA, a system based on multiple support vector machines to perform logical Layout Analysis of scanned Books pages in Arabic. The system detects the function of a text region based on the analysis of various images features and a voting mechanism. For a baseline comparison, we implemented an older but state-of-the-art neural network method. We evaluated LABA using a data set of scannedmore »pages from illustrated Arabic books and obtained high recall and precision values. We also found that the F-measure of LABA is higher for five of the tested six classes compared to the state-of-the-art method.« less
    Free, publicly-accessible full text available April 1, 2023
  3. We propose a five-step computational framing analysis framework that researchers can use to analyze multilingual news data. The framework combines unsupervised and supervised machine learning and leverages a state-of-the-art multilingual deep learning model, which can significantly enhance frame prediction performance while requiring a considerably small sample of manual annotations. Most importantly, anyone can perform the proposed computational framing analysis using a free, open-sourced system, created by a team of communication scholars, computer scientists, web designers and web developers. Making advanced computational analysis available to researchers without a programming background to some degree bridges the digital divide within the communication research discipline in particular and the academic community in general.
    Free, publicly-accessible full text available February 15, 2023
  4. Datasets of documents in Arabic are urgently needed to promote computer vision and natural language processing research that addresses the specifics of the language. Unfortunately, publicly available Arabic datasets are limited in size and restricted to certain document domains. This paper presents the release of BE-Arabic-9K, a dataset of more than 9000 high-quality scanned images from over 700 Arabic books. Among these, 1500 images have been manually segmented into regions and labeled by their functionality. BE-Arabic-9K includes book pages with a wide variety of complex layouts and page contents, making it suitable for various document layout analysis and text recognition research tasks. The paper also presents a page layout segmentation and text extraction baseline model based on fine-tuned Faster R-CNN structure (FFRA). This baseline model yields cross-validation results with an average accuracy of 99.4% and F1 score of 99.1% for text versus non-text block classification on 1500 annotated images of BE-Arabic-9K. These results are remarkably better than those of the state-of-the-art Arabic book page segmentation system ECDP. FFRA also outperforms three other prior systems when tested on a competition benchmark dataset, making it an outstanding baseline model to challenge.
  5. Unsupervised domain adaptation for semantic segmentation has been intensively studied due to the low cost of the pixel-level annotation for synthetic data. The most common approaches try to generate images or features mimicking the distribution in the target domain while preserving the semantic contents in the source domain so that a model can be trained with annotations from the latter. However, such methods highly rely on an image translator or feature extractor trained in an elaborated mechanism including adversarial training, which brings in extra complexity and instability in the adaptation process. Furthermore, these methods mainly focus on taking advantage of the labeled source dataset, leaving the unlabeled target dataset not fully utilized. In this paper, we propose a bidirectional style-induced domain adaptation method, called BiSIDA, that employs consistency regularization to efficiently exploit information from the unlabeled target domain dataset, requiring only a simple neural style transfer model. BiSIDA aligns domains by not only transferring source images into the style of target images but also transferring target images into the style of source images to perform high-dimensional perturbation on the unlabeled target images, which is crucial to the success in applying consistency regularization in segmentation tasks. Extensive experiments show that ourmore »BiSIDA achieves new state-of-the-art on two commonly-used synthetic-to-real domain adaptation benchmarks: GTA5-to-CityScapes and SYNTHIA-to-CityScapes. Code and pretrained style transfer model are available at: https://github.com/wangkaihong/BiSIDA.« less
  6. In this work, we propose a video-based transfer learning approach for predicting problem outcomes of students working with an intelligent tutoring system (ITS). By analyzing a student's face and gestures, our method predicts the outcome of a student answering a problem in an ITS from a video feed. Our work is motivated by the reasoning that the ability to predict such outcomes enables tutoring systems to adjust interventions, such as hints and encouragement, and to ultimately yield improved student learning. We collected a large labeled dataset of student interactions with an intelligent online math tutor consisting of 68 sessions, where 54 individual students solved 2,749 problems. We will release this dataset publicly upon publication of this paper. It will be available at https://www.cs.bu.edu/faculty/betke/research/learning/. Working with this dataset, our transfer-learning challenge was to design a representation in the source domain of pictures obtained “in the wild” for the task of facial expression analysis, and transferring this learned representation to the task of human behavior prediction in the domain of webcam videos of students in a classroom environment. We developed a novel facial affect representation and a user-personalized training scheme that unlocks the potential of this representation. We designed several variants ofmore »a recurrent neural network that models the temporal structure of video sequences of students solving math problems. Our final model, named ATL-BP for Affect Transfer Learning for Behavior Prediction, achieves a relative increase in mean F -score of 50 % over the state-of-the-art method on this new dataset.« less
    Free, publicly-accessible full text available December 15, 2022
  7. A major challenge for online learning is the inability of systems to support student emotion and to maintain student engagement. In response to this challenge, computer vision has become an embedded feature in some instructional applications. In this paper, we propose a video dataset of college students solving math problems on the educational platform MathSpring.org with a front facing camera collecting visual feedback of student gestures. The video dataset is annotated to indicate whether students’ attention at specific frames is engaged or wandering. In addition, we train baselines for a computer vision module that determines the extent of student engagement during remote learning. Baselines include state-of-the-art deep learning image classifiers and traditional conditional and logistic regression for head pose estimation. We then incorporate a gaze baseline into the MathSpring learning platform, and we are evaluating its performance with the currently implemented approach.