With the aim of analyzing large-sized multidimensional single-cell datasets, we are describing a method for Cosine-based Tanimoto similarity-refined graph for community detection using Leiden’s algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN) graph. The cells are represented by the vertices of the graph, while an edge between two vertices in the graph represents the close relatedness between the two cells. Specifically, CosTaL builds an exact kNN graph using cosine similarity and uses the Tanimoto coefficient as the refining strategy to re-weight the edges in order to improve the effectiveness of clustering. We demonstrate that CosTaL generally achieves equivalent or higher effectiveness scores on seven benchmark cytometry datasets and six single-cell RNA-sequencing datasets using six different evaluation metrics, compared with other state-of-the-art graph-based clustering methods, including PhenoGraph, Scanpy and PARC. As indicated by the combined evaluation metrics, Costal has high efficiency with small datasets and acceptable scalability for large datasets, which is beneficial for large-scale analysis.
LexDivPara: A Measure of Paraphrase Quality with Integrated Sentential Lexical Complexity
We present a novel method that automatically measures quality of sentential paraphrasing. Our method balances two conflicting criteria: semantic similarity and lexical diversity. Using a diverse annotated corpus, we built learning to rank models on edit distance, BLEU, ROUGE, and cosine similarity features. Extrinsic evaluation on STS Benchmark and ParaBank Evaluation datasets resulted in a model ensemble with moderate to high quality. We applied our method on both small benchmarking and large-scale datasets as resources for the community.
more »
« less
- PAR ID:
- 10293412
- Date Published:
- Journal Name:
- International Workshop on Intelligent Systems and Applications
- ISSN:
- 2159-1539
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Research of Protein-Protein Interaction (PPI) Network Alignment is playing an important role in understanding the crucial underlying biological knowledge such as functionally homologous proteins and conserved evolutionary pathways across different species. Existing methods of PPI network alignment often try to improve the coverage ratio of the alignment result by aligning all proteins from different species. However, there is a fundamental biological premise that needs to be considered carefully: not every protein in a species can, nor should, find its homologous proteins in other species. In this work, we propose a novel alignment method to map only those proteins with the most similarity throughout the PPI networks of multiple species. For the similarity features of the protein in the networks, we integrate both topological features with biological characteristics to provide enhanced supports for the alignment procedures. For topological features, we apply a representation learning method on the networks that can generate a low dimensional vector embedding with its surrounding structural features for each protein. The topological similarity of proteins from different PPI networks can thus be transferred as the similarity of their corresponding vector representations, which provides a new way to comprehensively quantify the topological similarities between proteins. We also propose a new measure for the topological evaluation of the alignment results which better uncover the structural quality of the alignment across multiple networks. Both biological and topological evaluations on the alignment results of real datasets demonstrate our approach is promising and preferable against previous multiple alignment methodsmore » « less
-
Ossi, Federico ; Hachem, Fatima ; Robira, Benjamin ; Ellis Soto, Diego ; Rutz, Christian ; Dodge, Somayeh ; Cagnacci, Francesca ; Damiani, Maria Luisa (Ed.)Data collected about routine human activity and mobility is used in diverse applications to improve our society. Robust models are needed to address the challenges of our increasingly interconnected world. Methods capable of portraying the dynamic properties of complex human systems, such as simulation modeling, must comply to rigorous data requirements. Modern data sources, like SafeGraph, provide aggregate data collected from location aware technologies. Opportunities and challenges arise to incorporate the new data into existing analysis and modeling methods. Our research employs a multiscale spatial similarity index to compare diverse origin-destination mobility datasets. Established distance ranges accommodate spatial variability in the model’s datasets. This paper explores how similarity scores change with different aggregations to address discrepancies in the source data’s temporal granularity. We suggest possible explanations for variations in the similarity scores and extract characteristics of human mobility for the study area. The multiscale spatial similarity index may be integrated into a vast array of analysis and modeling workflows, either during preliminary analysis or later evaluation phases as a method of data validation (e.g., agent-based models). We propose that the demonstrated tool has potential to enhance mobility modeling methods in the context of complex human systems.more » « less
-
Assessing similarity between design ideas is an inherent part of many design evaluations to measure novelty. In such evaluation tasks, humans excel at making mental connections among diverse knowledge sets to score ideas on their uniqueness. However, their decisions about novelty are often subjective and difficult to explain. In this paper, we demonstrate a way to uncover human judgment of design idea similarity using two-dimensional (2D) idea maps. We derive these maps by asking participants for simple similarity comparisons of the form “Is idea A more similar to idea B or to idea C?” We show that these maps give insight into the relationships between ideas and help understand the design domain. We also propose that novel ideas can be identified by finding outliers on these idea maps. To demonstrate our method, we conduct experimental evaluations on two datasets—colored polygons (known answer) and milk frother sketches (unknown answer). We show that idea maps shed light on factors considered by participants in judging idea similarity and the maps are robust to noisy ratings. We also compare physical maps made by participants on a white-board to their computationally generated idea maps to compare how people think about spatial arrangement of design items. This method provides a new direction of research into deriving ground truth novelty metrics by combining human judgments and computational methods.more » « less
-
Assessing similarity between design ideas is an inherent part of many design evaluations to measure novelty. In such evaluation tasks, humans excel at making mental connections among diverse knowledge sets and scoring ideas on their uniqueness. However, their decisions on novelty are often subjective and difficult to explain. In this paper, we demonstrate a way to uncover human judgment of design idea similarity using two dimensional idea maps. We derive these maps by asking humans for simple similarity comparisons of the form “Is idea A more similar to idea B or to idea C?” We show that these maps give insight into the relationships between ideas and help understand the domain. We also propose that the novelty of ideas can be estimated by measuring how far items are on these maps. We demonstrate our methodology through the experimental evaluations on two datasets of colored polygons (known answer) and milk frothers (unknown answer) sketches. We show that these maps shed light on factors considered by raters in judging idea similarity. We also show how maps change when less data is available or false/noisy ratings are provided. This method provides a new direction of research into deriving ground truth novelty metrics by combining human judgments and computational methods.more » « less