skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sequences of Sets
Sequential behavior such as sending emails, gathering in groups, tagging posts, or authoring academic papers may be characterized by a set of recipients, attendees, tags, or coauthors respectively. Such łsequences of sets" show complex repetition behavior, sometimes repeating prior sets wholesale, and sometimes creating new sets from partial copies or partial merges of earlier sets. In this paper, we provide a stochastic model to capture these pat- terns. The model has two classes of parameters. First, a correlation parameter determines how much of an earlier set will contribute to a future set. Second, a vector of recency parameters captures the fact that a set in a sequence is more similar to recent sets than more distant ones. Comparing against a strong baseline, we ind that modeling both correlation and recency structures are required for high accuracy. We also ind that both parameter classes vary widely across domains, so must be optimized on a per-dataset basis. We present the model in detail, provide a theoretical examination of its asymptotic behavior, and perform a set of detailed experiments on its predictive performance.  more » « less
Award ID(s):
1740822
PAR ID:
10064641
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Nonlinear response history analysis (NLRHA) is generally considered to be a reliable and robust method to assess the seismic performance of buildings under strong ground motions. While NLRHA is fairly straightforward to evaluate individual structures for a select set of ground motions at a specific building site, it becomes less practical for performing large numbers of analyses to evaluate either (1) multiple models of alternative design realizations with a site‐specific set of ground motions, or (2) individual archetype building models at multiple sites with multiple sets of ground motions. In this regard, surrogate models offer an alternative to running repeated NLRHAs for variable design realizations or ground motions. In this paper, a recently developed surrogate modeling technique, called probabilistic learning on manifolds (PLoM), is presented to estimate structural seismic response. Essentially, the PLoM method provides an efficient stochastic model to develop mappings between random variables, which can then be used to efficiently estimate the structural responses for systems with variations in design/modeling parameters or ground motion characteristics. The PLoM algorithm is introduced and then used in two case studies of 12‐story buildings for estimating probability distributions of structural responses. The first example focuses on the mapping between variable design parameters of a multidegree‐of‐freedom analysis model and its peak story drift and acceleration responses. The second example applies the PLoM technique to estimate structural responses for variations in site‐specific ground motion characteristics. In both examples, training data sets are generated for orthogonal input parameter grids, and test data sets are developed for input parameters with prescribed statistical distributions. Validation studies are performed to examine the accuracy and efficiency of the PLoM models. Overall, both examples show good agreement between the PLoM model estimates and verification data sets. Moreover, in contrast to other common surrogate modeling techniques, the PLoM model is able to preserve correlation structure between peak responses. Parametric studies are conducted to understand the influence of different PLoM tuning parameters on its prediction accuracy. 
    more » « less
  2. Model-based approaches to navigation, control, and fault detection that utilize precise nonlinear models of vehicle plant dynamics will enable more accurate control and navigation, assured autonomy, and more complex missions for such vehicles. This paper reports novel theoretical and experimental results addressing the problem of parameter estimation of plant and actuator models for underactuated underwater vehicles operating in 6 degrees-of-freedom (DOF) whose dynamics are modeled by finite-dimensional Newton-Euler equations. This paper reports the first theoretical approach and experimental validation to identify simultaneously plant-model parameters (parameters such as mass, added mass, hydrodynamic drag, and buoyancy) and control-actuator parameters (control-surface models and thruster models) in 6-DOF. Most previously reported studies on parameter identification assume that the control-actuator parameters are known a priori. Moreover, this paper reports the first proof of convergence of the parameter estimates to the true set of parameters for this class of vehicles under a persistence of excitation condition. The reported adaptive identification (AID) algorithm does not require instrumentation of 6-DOF vehicle acceleration, which is required by conventional approaches to parameter estimation such as least squares. Additionally, the reported AID algorithm is applicable under any arbitrary open-loop or closed-loop control law. We report simulation and experimental results for identifying the plant-model and control-actuator parameters for an L3 OceanServer Iver3 autonomous underwater vehicle. We believe this general approach to AID could be extended to apply to other classes of machines and other classes of marine, land, aerial, and space vehicles. 
    more » « less
  3. The introduction of large-scale data sets in psychology allows for more robust accounts of various cognitive mechanisms, one of which is human learning. However, these data sets provide participants with complete autonomy over their own participation in the task, and therefore require precisely studying the factors influencing dropout alongside learning. In this work, we present such a data set where 1,234,844 participants play 10,874,547 games of a challenging variant of tic-tac-toe. We establish that there is a correlation between task performance and total experience, and independently analyze participants’ dropout behavior and learning trajectories. We find evidence for stopping patterns as a function of playing strength and investigate the processes underlying playing strength increases with experience using a set of metrics derived from a planning model. Finally, we develop a joint model to account for both dropout and learning functions which replicates our empirical findings. 
    more » « less
  4. Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that a data-driven approach to parameter tuning can lead to significant improvements in performance. This approach uses atraining setof problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide techniques for derivinggeneralization guaranteesthat bound the difference between the algorithm’s average performance over the training set and its expected performance on the unknown distribution. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause a large change in behavior. Prior research [e.g.,12,16,20,62] has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We streamline these analyses with a general theorem that applies whenever an algorithm’s performance is a piecewise-constant, piecewise-linear, or—more generally—piecewise-structuredfunction of its parameters. Our results, which are tight up to logarithmic factors in the worst case, also imply novel bounds for configuring dynamic programming algorithms from computational biology. 
    more » « less
  5. SUMMARY Inverse problems play a central role in data analysis across the fields of science. Many techniques and algorithms provide parameter estimation including the best-fitting model and the parameters statistics. Here, we concern ourselves with the robustness of parameter estimation under constraints, with the focus on assimilation of noisy data with potential outliers, a situation all too familiar in Earth science, particularly in analysis of remote-sensing data. We assume a linear, or linearized, forward model relating the model parameters to multiple data sets with a priori unknown uncertainties that are left to be characterized. This is relevant for global navigation satellite system and synthetic aperture radar data that involve intricate processing for which uncertainty estimation is not available. The model is constrained by additional equalities and inequalities resulting from the physics of the problem, but the weights of equalities are unknown. We formulate the problem from a Bayesian perspective with non-informative priors. The posterior distribution of the model parameters, weights and outliers conditioned on the observations are then inferred via Gibbs sampling. We demonstrate the practical utility of the method based on a set of challenging inverse problems with both synthetic and real space-geodetic data associated with earthquakes and nuclear explosions. We provide the associated computer codes and expect the approach to be of practical interest for a wide range of applications. 
    more » « less