Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

We give a concentration inequality for a stochastic version of the facility location problem. We show the objective Cn = minF[0;1]2 jFj +Px2X minf2F kx fk is concentrated in an interval of length O(n1=6) and E[Cn] = (n2=3) if the input X consists of i.i.d. uniform points in the unit square. Our main tool is to use a geometric quantity, previously used in the design of approximation algorithms for the facility location problem, to analyze a martingale process. Many of our techniques generalize to other settings.Free, publiclyaccessible full text available March 1, 2023

Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream S of elements from some universe U={1…n}, the goal is to compute, in a single pass, a short sketch of S so that for any element i∈U, one can estimate the number xi of times i occurs in S based on the sketch alone. Two state of the art solutions to this problems are the CountMin and CountSketch algorithms. The frequency estimator x~ produced by CountMin, using O(1/ε⋅logn) dimensions, guarantees that ∥x~−x∥∞≤ε∥x∥1 with high probability, and x~≥x holds deterministically. Also, CountMin works under the assumption that x≥0. On the other hand, CountSketch, using O(1/ε2⋅logn) dimensions, guarantees that ∥x~−x∥∞≤ε∥x∥2 with high probability. A natural question is whether it is possible to design the best of both worlds sketching method, with error guarantees depending on the ℓ2 norm and space comparable to CountSketch, but (like CountMin) also has the nounderestimation property. Our main set of results shows that the answer to the above question is negative. We show this in two incomparable computational models: linear sketching and streaming algorithms. We also study the complementary problem, where the sketch is required to not overestimate, i.e., x~≤x should holdmore »

Recently, using credit cards has been considered one of the essential things of our life due to its pros of being easy to use and flexible to pay. The critical impact of the increment of using credit cards is the occurrence of fraudulent transactions, which allow the illegal user to get money and free goods via unauthorized usage. Artificial Intelligence (AI) and Machine Learning (ML) have become effective techniques used in different applications to ensure cybersecurity. This paper proposes our fraud detection system called ManEnsemble CCFD using an ensemblelearning model with two stages of classification and detection. Stage one, called MLCCFD, utilizes ten machine learning (ML) algorithms to classify credit card transactions to class 1 as a fraudulent transaction or class 0 as a legitimate transaction. As a result, we compared their classification reports together, precisely precision, recall (sensitivity), and f1score. Then, we selected the most accurate ML algorithms based on their classification performance and prediction accuracy. The second stage, known Ensemblelearning CCFD, is an ensemble model that applies the ManEnsemble method on the most effective ML algorithms from stage one. The output of the second stage is to get the final prediction instead of using common types of ensemblemore »

We study the problem of representing all distances between n points in Rd, with arbitrarily small distortion, using as few bits as possible. We give asymptotically tight bounds for this problem, for Euclidean metrics, for ℓ1 (a.k.a.~Manhattan) metrics, and for general metrics. Our bounds for Euclidean metrics mark the first improvement over compression schemes based on discretizing the classical dimensionality reduction theorem of Johnson and Lindenstrauss (Contemp.~Math.~1984). Since it is known that no better dimension reduction is possible, our results establish that Euclidean metric compression is possible beyond dimension reduction.

There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a blackbox adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the whitebox adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1heavy hitters problem that outperforms the optimal deterministic MisraGries algorithm on long streams. If the whitebox adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in amore »

A* is a classic and popular method for graphs search and path finding. It assumes the existence of a heuristic function h(u,t) that estimates the shortest distance from any input node u to the destination t. Traditionally, heuristics have been handcrafted by domain experts. However, over the last few years, there has been a growing interest in learning heuristic functions. Such learned heuristics estimate the distance between given nodes based on "features" of those nodes. In this paper we formalize and initiate the study of such featurebased heuristics. In particular, we consider heuristics induced by norm embeddings and distance labeling schemes, and provide lower bounds for the tradeoffs between the number of dimensions or bits used to represent each graph node, and the running time of the A* algorithm. We also show that, under natural assumptions, our lower bounds are almost optimal.

Vapor loss and molecular absorption make the transmission distance in subTerahertz bands a challenge, especially in mobile statues such as UAVs communication. The molecular absorption element is an essential part of the path loss in THz communication channel modeling that cannot be neglected. Along this direction, we investigated the UAV trajectories in subTHz band. To maximize the secrecy rate of the UAVs communication, an optimization problem has been proposed to jointly optimize the trajectory and transmit power. To enhance the obtained average secrecy rate, MIMO communication and a cooperative UAV jammer strategy were used in this paper. Also, analysis and simulations results were presented to show the performance of UAVground communication at THz communications. Finally, Secrecy Outage Probability was obtained for each UAV trajectories in different flight periods to examine the performance of physical layer security added to the UAVground communication at subTHz communication.

Datadriven algorithms can adapt their internal structure or parameters to inputs from unknown applicationspecific distributions, by learning from a training sample of inputs. Several recent works have applied this approach to problems in numerical linear algebra, obtaining significant empirical gains in performance. However, no theoretical explanation for their success was known. In this work we prove generalization bounds for those algorithms, within the PAClearning framework for datadriven algorithm selection proposed by Gupta and Roughgarden (SICOMP 2017). Our main results are closely matching upper and lower bounds on the fat shattering dimension of the learningbased low rank approximation algorithm of Indyk et al.~(NeurIPS 2019). Our techniques are general, and provide generalization bounds for many other recently proposed datadriven algorithms in numerical linear algebra, covering both sketchingbased and multigridbased methods. This considerably broadens the class of datadriven algorithms for which a PAClearning analysis is available.

We explore algorithms and limitations for sparse optimization problems such as sparse linear regression and robust linear regression. The goal of the sparse linear regression problem is to identify a small number of key features, while the goal of the robust linear regression problem is to identify a small number of erroneous measurements. Specifically, the sparse linear regression problem seeks a ksparse vector x ∈ Rd to minimize ‖Ax − b‖2, given an input matrix A ∈ Rn×d and a target vector b ∈ Rn, while the robust linear regression problem seeks a set S that ignores at most k rows and a vector x to minimize ‖(Ax − b)S ‖2. We first show bicriteria, NPhardness of approximation for robust regression building on the work of [OWZ15] which implies a similar result for sparse regression. We further show finegrained hardness of robust regression through a reduction from the minimumweight kclique conjecture. On the positive side, we give an algorithm for robust regression that achieves arbitrarily accurate additive error and uses runtime that closely matches the lower bound from the finegrained hardness result, as well as an algorithm for sparse regression with similar runtime. Both our upper and lower bounds relymore »