NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Evolution and differentiation of the cybersecurity communities in three social question and answer sites: A mixed-methods analysis

https://doi.org/10.1371/journal.pone.0261954

Wu, Muting; Aranovich, Raul; Filkov, Vladimir (December 2021, PLOS ONE)
Haldorai, Anandakumar (Ed.)
Cybersecurity affects us all in our daily lives. New knowledge on best practices, new vulnerabilities, and timely fixes for cybersecurity issues is growing super-linearly, and is spread across numerous, heterogeneous sources. Because of that, community contribution-based, question and answer sites have become clearinghouses for cybersecurity-related inquiries, as they have for many other topics. Historically, Stack Overflow has been the most popular platform for different kinds of technical questions, including for cybersecurity. That has been changing, however, with the advent of Security Stack Exchange, a site specifically designed for cybersecurity-related questions and answers. More recently, some cybersecurity-related subreddits of Reddit, have become hubs for cybersecurity-related questions and discussions. The availability of multiple overlapping communities has created a complex terrain to navigate for someone looking for an answer to a cybersecurity question. In this paper, we investigate how and why people choose among three prominent, overlapping, question and answer communities, for their cybersecurity knowledge needs. We aggregated data of several consecutive years of cybersecurity-related questions from Stack Overflow, Security Stack Exchange, and Reddit, and performed statistical, linguistic, and longitudinal analysis. To triangulate the results, we also conducted user surveys. We found that the user behavior across those three communities is different, in most cases. Likewise, cybersecurity-related questions asked on the three sites are different, more technical on Security Stack Exchange and Stack Overflow, and more subjective and personal on Reddit. Moreover, there appears to have been a differentiation of the communities along the same lines, accompanied by overall popularity trends suggestive of Stack Overflow’s decline and Security Stack Exchange’s rise within the cybersecurity community. Reddit is addressing the more subjective, discussion type needs of the lay community, and is growing rapidly.
more » « less
Full Text Available
Beyond NVD: Cybersecurity meets the Semantic Web.

https://doi.org/10.1145/3498891.3501259

Aranovich, Raúl; Wu, Muting; Yu, Dian; Katsy, Katya; Ahmadnia, Benyamin; Bishop, Matthew; Filkov, Vladimir; Sagae, Kenji (October 2021, NSPW '21: New Security Paradigms Workshop)

Full Text Available
Augmented Spanish-Persian Neural Machine Translation [Augmented Spanish-Persian Neural Machine Translation]

https://doi.org/10.5220/0010369804820488

Ahmadnia, Benyamin; Aranovich, Raul (January 2021, Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021))
null (Ed.)
Neural Machine Translation (NMT) performs training of a neural network employing an encoder-decoder architecture. However, the quality of the neural-based translations predominantly depends on the availability of a large amount of bilingual training dataset. In this paper, we explore the performance of translations predicted by attention-based NMT systems for Spanish to Persian low-resource language pairs. We analyze the errors of NMT systems that occur in the Persian language and provide an in-depth comparison of the performance of the system based on variations in sentence length and size of the training dataset. We evaluate our translation results using BLEU and human evaluation measures based on the adequacy, fluency, and overall rating.
more » « less
Full Text Available
Automatically Exposing Problems with Neural Dialog Models

https://doi.org/10.18653/v1/2021.emnlp-main.37

Yu, Dian; Sagae, Kenji (January 2021, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing)

Neural dialog models are known to suffer from problems such as generating unsafe and inconsistent responses. Even though these problems are crucial and prevalent, they are mostly manually identified by model designers through interactions. Recently, some research instructs crowdworkers to goad the bots into triggering such problems. However, humans leverage superficial clues such as hate speech, while leaving systematic problems undercover. In this paper, we propose two methods including reinforcement learning to automatically trigger a dialog model into generating problematic responses. We show the effect of our methods in exposing safety and contradiction issues with state-of-the-art dialog models.
more » « less
Full Text Available
Attribute Alignment: Controlling Text Generation from Pre-trained Language Models

https://doi.org/10.18653/v1/2021.findings-emnlp.194

Yu, Dian; Yu, Zhou; Sagae, Kenji (January 2021, Findings of the Association for Computational Linguistics: EMNLP 2021)

Large language models benefit from training with a large amount of unlabeled text, which gives them increasingly fluent and diverse generation capabilities. However, using these models for text generation that takes into account target attributes, such as sentiment polarity or specific topics, remains a challenge. We propose a simple and flexible method for controlling text generation by aligning disentangled attribute representations. In contrast to recent efforts on training a discriminator to perturb the token level distribution for an attribute, we use the same data to learn an alignment function to guide the pre-trained, non-controlled language model to generate texts with the target attribute without changing the original language model parameters. We evaluate our method on sentiment- and topic-controlled generation, and show large performance gains over previous methods while retaining fluency and diversity.
more » « less
Full Text Available
Language Embeddings for Typology and Cross-lingual Transfer Learning

https://doi.org/10.18653/v1/2021.acl-long.560

Yu, Dian; He, Taiqi; Sagae, Kenji (January 2021, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers))
null (Ed.)
Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data. We explore whether language representations that capture relationships among languages can be learned and subsequently leveraged in cross-lingual tasks without the use of parallel data. We generate dense embeddings for 29 languages using a denoising autoencoder, and evaluate the embeddings using the World Atlas of Language Structures (WALS) and two extrinsic tasks in a zero-shot setting: cross-lingual dependency parsing and cross-lingual natural language inference.
more » « less
Full Text Available
Strengthening Low-resource Neural Machine Translation through Joint Learning: The Case of Farsi-Spanish [Strengthening Low-resource Neural Machine Translation through Joint Learning: The Case of Farsi-Spanish]

https://doi.org/10.5220/0010362604750481

Ahmadnia, Benyamin; Aranovich, Raul; Dorr, Bonnie (January 2021, Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI)
null (Ed.)
This paper describes a systematic study of an approach to Farsi-Spanish low-resource Neural Machine Translation (NMT) that leverages monolingual data for joint learning of forward and backward translation models. As is standard for NMT systems, the training process begins using two pre-trained translation models that are iteratively updated by decreasing translation costs. In each iteration, either translation model is used to translate monolingual texts from one language to another, to generate synthetic datasets for the other translation model. Two new translation models are then learned from bilingual data along with the synthetic texts. The key distinguishing feature between our approach and standard NMT is an iterative learning process that improves the performance of both translation models, simultaneously producing a higher-quality synthetic training dataset upon each iteration. Our empirical results demonstrate that this approach outperforms baselines.
more » « less
Full Text Available
Impact of Filtering Generated Pseudo Bilingual Texts in Low-Resource Neural Machine Translation Enhancement: The Case of Persian-Spanish

https://doi.org/10.1016/j.procs.2021.05.093

Ahmadnia, Benyamin; Dorr, Bonnie J.; Aranovich, Raul (January 2021, Procedia Computer Science)
null (Ed.)
Full Text Available
An Effective Optimization Method for Neural Machine Translation: The Case of English-Persian Bilingually Low-Resource Scenario

Benyamin Ahmadnia, Raul Aranovich (December 2020, Proceedings of the 7th Workshop on Asian Translation)
null (Ed.)
In this paper, we propose a useful optimization method for low-resource Neural Machine Translation (NMT) by investigating the effectiveness of multiple neural network optimization algorithms. Our results confirm that applying the proposed optimization method on English-Persian translation can exceed translation quality compared to the English-Persian Statistical Machine Translation (SMT) paradigm.
more » « less
Full Text Available

Search for: All records