skip to main content

Title: Structured Disentangled Representations
Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle discrete factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. We derive this objective as a generalization of the evidence lower bound, which allows us to explicitly represent the trade-offs between mutual information between data and representation, KL divergence between representation and prior, and coverage of the support of the empirical data distribution. Experiments on a variety of datasets demonstrate that our objective can not only disentangle discrete variables, but that doing so also improves disentanglement of other variables and, importantly, generalization even to unseen combinations of factors
Authors:
; ; ; ; ; ; ; ;
Award ID(s):
1835309
Publication Date:
NSF-PAR ID:
10107752
Journal Name:
Proceedings of Machine Learning Research
Volume:
89
Page Range or eLocation-ID:
2525--2534
ISSN:
2640-3498
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract: Jury notetaking can be controversial despite evidence suggesting benefits for recall and understanding. Research on note taking has historically focused on the deliberation process. Yet, little research explores the notes themselves. We developed a 10-item coding guide to explore what jurors take notes on (e.g., simple vs. complex evidence) and how they take notes (e.g., gist vs. specific representation). In general, jurors made gist representations of simple and complex information in their notes. This finding is consistent with Fuzzy Trace Theory (Reyna & Brainerd, 1995) and suggests notes may serve as a general memory aid, rather than verbatim representation. Summary: The practice of jury notetaking in the courtroom is often contested. Some states allow it (e.g., Nebraska: State v. Kipf, 1990), while others forbid it (e.g., Louisiana: La. Code of Crim. Proc., Art. 793). Some argue notes may serve as a memory aid, increase juror confidence during deliberation, and help jurors engage in the trial (Hannaford & Munsterman, 2001; Heuer & Penrod, 1988, 1994). Others argue notetaking may distract jurors from listening to evidence, that juror notes may be given undue weight, and that those who took notes may dictate the deliberation process (Dann, Hans, & Kaye, 2005). Whilemore »research has evaluated the efficacy of juror notes on evidence comprehension, little work has explored the specific content of juror notes. In a similar project on which we build, Dann, Hans, and Kaye (2005) found jurors took on average 270 words of notes each with 85% including references to jury instructions in their notes. In the present study we use a content analysis approach to examine how jurors take notes about simple and complex evidence. We were particularly interested in how jurors captured gist and specific (verbatim) information in their notes as they have different implications for information recall during deliberation. According to Fuzzy Trace Theory (Reyna & Brainerd, 1995), people extract “gist” or qualitative meaning from information, and also exact, verbatim representations. Although both are important for helping people make well-informed judgments, gist-based understandings are purported to be even more important than verbatim understanding (Reyna, 2008; Reyna & Brainer, 2007). As such, it could be useful to examine how laypeople represent information in their notes during deliberation of evidence. Methods Prior to watching a 45-minute mock bank robbery trial, jurors were given a pen and notepad and instructed they were permitted to take notes. The evidence included testimony from the defendant, witnesses, and expert witnesses from prosecution and defense. Expert testimony described complex mitochondrial DNA (mtDNA) evidence. The present analysis consists of pilot data representing 2,733 lines of notes from 52 randomly-selected jurors across 41 mock juries. Our final sample for presentation at AP-LS will consist of all 391 juror notes in our dataset. Based on previous research exploring jury note taking as well as our specific interest in gist vs. specific encoding of information, we developed a coding guide to quantify juror note-taking behaviors. Four researchers independently coded a subset of notes. Coders achieved acceptable interrater reliability [(Cronbach’s Alpha = .80-.92) on all variables across 20% of cases]. Prior to AP-LS, we will link juror notes with how they discuss scientific and non-scientific evidence during jury deliberation. Coding Note length. Before coding for content, coders counted lines of text. Each notepad line with at minimum one complete word was coded as a line of text. Gist information vs. Specific information. Any line referencing evidence was coded as gist or specific. We coded gist information as information that did not contain any specific details but summarized the meaning of the evidence (e.g., “bad, not many people excluded”). Specific information was coded as such if it contained a verbatim descriptive (e.g.,“<1 of people could be excluded”). We further coded whether this information was related to non-scientific evidence or related to the scientific DNA evidence. Mentions of DNA Evidence vs. Other Evidence. We were specifically interested in whether jurors mentioned the DNA evidence and how they captured complex evidence. When DNA evidence was mention we coded the content of the DNA reference. Mentions of the characteristics of mtDNA vs nDNA, the DNA match process or who could be excluded, heteroplasmy, references to database size, and other references were coded. Reliability. When referencing DNA evidence, we were interested in whether jurors mentioned the evidence reliability. Any specific mention of reliability of DNA evidence was noted (e.g., “MT DNA is not as powerful, more prone to error”). Expert Qualification. Finally, we were interested in whether jurors noted an expert’s qualifications. All references were coded (e.g., “Forensic analyst”). Results On average, jurors took 53 lines of notes (range: 3-137 lines). Most (83%) mentioned jury instructions before moving on to case specific information. The majority of references to evidence were gist references (54%) focusing on non-scientific evidence and scientific expert testimony equally (50%). When jurors encoded information using specific references (46%), they referenced non-scientific evidence and expert testimony equally as well (50%). Thirty-three percent of lines were devoted to expert testimony with every juror including at least one line. References to the DNA evidence were usually focused on who could be excluded from the FBIs database (43%), followed by references to differences between mtDNA vs nDNA (30%), and mentions of the size of the database (11%). Less frequently, references to DNA evidence focused on heteroplasmy (5%). Of those references that did not fit into a coding category (11%), most focused on the DNA extraction process, general information about DNA, and the uniqueness of DNA. We further coded references to DNA reliability (15%) as well as references to specific statistical information (14%). Finally, 40% of jurors made reference to an expert’s qualifications. Conclusion Jury note content analysis can reveal important information about how jurors capture trial information (e.g., gist vs verbatim), what evidence they consider important, and what they consider relevant and irrelevant. In our case, it appeared jurors largely created gist representations of information that focused equally on non-scientific evidence and scientific expert testimony. This finding suggests note taking may serve not only to represent information verbatim, but also and perhaps mostly as a general memory aid summarizing the meaning of evidence. Further, jurors’ references to evidence tended to be equally focused on the non-scientific evidence and the scientifically complex DNA evidence. This observation suggests jurors may attend just as much to non-scientific evidence as they to do complex scientific evidence in cases involving complicated evidence – an observation that might inform future work on understanding how jurors interpret evidence in cases with complex information. Learning objective: Participants will be able to describe emerging evidence about how jurors take notes during trial.« less
  2. The deep sea (>500 m ocean depth) is the largest global habitat, characterized by cool temperatures, low ambient light, and food-poor conditions relative to shallower waters. Deep-sea teleosts generally grow more slowly than those inhabiting shallow water. However, this is a generalization, and even amongst deep-sea teleosts, there is a broad continuum of growth rates. The importance of potential drivers of growth rate variability amongst deep-sea species, such as temperature, food availability, oxygen concentration, metabolic rate, and phylogeny, have yet to be fully evaluated. We present a meta-analysis in which age and size data were collected for 53 species of teleosts whose collective depth ranges span from surface waters to 4000 m. We calculated growth metrics using both calendar and thermal age, and compared them with environmental, ecological, and phylogenetic variables. Temperature alone explained up to 30% of variation in the von Bertalanffy growth coefficient ( K , yr -1 ), and 21% of the variation in the average annual increase in mass (AIM, %), a metric of growth prior to maturity. After correcting for temperature effects, depth was still a significant driver of growth, explaining up to 20 and 10% of the remaining variation in K and AIM, respectively.more »Oxygen concentration also explained ~11% of remaining variation in AIM following temperature correction. Relatively minor amounts of variation may be explained by food availability, phylogeny, and the locomotory mode of the teleosts. We also found strong correlation between growth and metabolic rate, which may be an underlying driver also related to temperature, depth, and other factors, or the 2 parameters may simply covary as a result of being linked by evolutionary pressures. Evaluating the influence of ecological and/or environmental drivers of growth is a vital step in understanding both the evolution of life history parameters across the depth continuum as well as their implications for species’ resilience to increasing anthropogenic stressors.« less
  3. Decoding auditory stimulus from neural activity can enable neuroprosthetics and direct communication with the brain. Some recent studies have shown successful speech decoding from intracranial recording using deep learning models. However, scarcity of training data leads to low quality speech reconstruction which prevents a complete brain-computer-interface (BCI) application. In this work, we propose a transfer learning approach with a pre-trained GAN to disentangle representation and generation layers for decoding. We first pre-train a generator to produce spectrograms from a representation space using a large corpus of natural speech data. With a small amount of paired data containing the stimulus speech and corresponding ECoG signals, we then transfer it to a bigger network with an encoder attached before, which maps the neural signal to the representation space. To further improve the network generalization ability, we introduce a Gaussian prior distribution regularizer on the latent representation during the transfer phase. With at most 150 training samples for each tested subject, we achieve a state-of-the-art decoding performance. By visualizing the attention mask embedded in the encoder, we observe brain dynamics that are consistent with findings from previous studies investigating dynamics in the superior temporal gyrus (STG), pre-central gyrus (motor) and inferior frontal gyrusmore »(IFG). Our findings demonstrate a high reconstruction accuracy using deep learning networks together with the potential to elucidate interactions across different brain regions during a cognitive task.« less
  4. Abstract Expert testimony varies in scientific quality and jurors have a difficult time evaluating evidence quality (McAuliff et al., 2009). In the current study, we apply Fuzzy Trace Theory principles, examining whether visual and gist aids help jurors calibrate to the strength of scientific evidence. Additionally we were interested in the role of jurors’ individual differences in scientific reasoning skills in their understanding of case evidence. Contrary to our preregistered hypotheses, there was no effect of evidence condition or gist aid on evidence understanding. However, individual differences between jurors’ numeracy skills predicted evidence understanding. Summary Poor-quality expert evidence is sometimes admitted into court (Smithburn, 2004). Jurors’ calibration to evidence strength varies widely and is not robustly understood. For instance, previous research has established jurors lack understanding of the role of control groups, confounds, and sample sizes in scientific research (McAuliff, Kovera, & Nunez, 2009; Mill, Gray, & Mandel, 1994). Still others have found that jurors can distinguish weak from strong evidence when the evidence is presented alone, yet not when simultaneously presented with case details (Smith, Bull, & Holliday, 2011). This research highlights the need to present evidence to jurors in a way they can understand. Fuzzy Trace Theory purportsmore »that people encode information in exact, verbatim representations and through “gist” representations, which represent summary of meaning (Reyna & Brainerd, 1995). It is possible that the presenting complex scientific evidence to people with verbatim content or appealing to the gist, or bottom-line meaning of the information may influence juror understanding of that evidence. Application of Fuzzy Trace Theory in the medical field has shown that gist representations are beneficial for helping laypeople better understand risk and benefits of medical treatment (Brust-Renck, Reyna, Wilhelms, & Lazar, 2016). Yet, little research has applied Fuzzy Trace Theory to information comprehension and application within the context of a jury (c.f. Reyna et. al., 2015). Additionally, it is likely that jurors’ individual characteristics, such as scientific reasoning abilities and cognitive tendencies, influence their ability to understand and apply complex scientific information (Coutinho, 2006). Methods The purpose of this study was to examine how jurors calibrate to the strength of scientific information, and whether individual difference variables and gist aids inspired by Fuzzy Trace Theory help jurors better understand complicated science of differing quality. We used a 2 (quality of scientific evidence: high vs. low) x 2 (decision aid to improve calibration - gist information vs. no gist information), between-subjects design. All hypotheses were preregistered on the Open Science Framework. Jury-eligible community participants (430 jurors across 90 juries; Mage = 37.58, SD = 16.17, 58% female, 56.93% White). Each jury was randomly assigned to one of the four possible conditions. Participants were asked to individually fill out measures related to their scientific reasoning skills prior to watching a mock jury trial. The trial was about an armed bank robbery and consisted of various pieces of testimony and evidence (e.g. an eyewitness testimony, police lineup identification, and a sweatshirt found with the stolen bank money). The key piece of evidence was mitochondrial DNA (mtDNA) evidence collected from hair on a sweatshirt (materials from Hans et al., 2011). Two experts presented opposing opinions about the scientific evidence related to the mtDNA match estimate for the defendant’s identification. The quality and content of this mtDNA evidence differed based on the two conditions. The high quality evidence condition used a larger database than the low quality evidence to compare to the mtDNA sample and could exclude a larger percentage of people. In the decision aid condition, experts in the gist information group presented gist aid inspired visuals and examples to help explain the proportion of people that could not be excluded as a match. Those in the no gist information group were not given any aid to help them understand the mtDNA evidence presented. After viewing the trial, participants filled out a questionnaire on how well they understood the mtDNA evidence and their overall judgments of the case (e.g. verdict, witness credibility, scientific evidence strength). They filled this questionnaire out again after a 45-minute deliberation. Measures We measured Attitudes Toward Science (ATS) with indices of scientific promise and scientific reservations (Hans et al., 2011; originally developed by National Science Board, 2004; 2006). We used Drummond and Fischhoff’s (2015) Scientific Reasoning Scale (SRS) to measure scientific reasoning skills. Weller et al.’s (2012) Numeracy Scale (WNS) measured proficiency in reasoning with quantitative information. The NFC-Short Form (Cacioppo et al., 1984) measured need for cognition. We developed a 20-item multiple-choice comprehension test for the mtDNA scientific information in the cases (modeled on Hans et al., 2011, and McAuliff et al., 2009). Participants were shown 20 statements related to DNA evidence and asked whether these statements were True or False. The test was then scored out of 20 points. Results For this project, we measured calibration to the scientific evidence in a few different ways. We are building a full model with these various operationalizations to be presented at APLS, but focus only on one of the calibration DVs (i.e., objective understanding of the mtDNA evidence) in the current proposal. We conducted a general linear model with total score on the mtDNA understanding measure as the DV and quality of scientific evidence condition, decision aid condition, and the four individual difference measures (i.e., NFC, ATS, WNS, and SRS) as predictors. Contrary to our main hypotheses, neither evidence quality nor decision aid condition affected juror understanding. However, the individual difference variables did: we found significant main effects for Scientific Reasoning Skills, F(1, 427) = 16.03, p <.001, np2 = .04, Weller Numeracy Scale, F(1, 427) = 15.19, p <.001, np2 = .03, and Need for Cognition, F(1, 427) = 16.80, p <.001, np2 = .04, such that those who scored higher on these measures displayed better understanding of the scientific evidence. In addition there was a significant interaction of evidence quality condition and scores on the Weller’s Numeracy Scale, F(1, 427) = 4.10, p = .04, np2 = .01. Further results will be discussed. Discussion These data suggest jurors are not sensitive to differences in the quality of scientific mtDNA evidence, and also that our attempt at helping sensitize them with Fuzzy Trace Theory-inspired aids did not improve calibration. Individual scientific reasoning abilities and general cognition styles were better predictors of understanding this scientific information. These results suggest a need for further exploration of approaches to help jurors differentiate between high and low quality evidence. Note: The 3rd author was supported by an AP-LS AP Award for her role in this research. Learning Objective: Participants will be able to describe how individual differences in scientific reasoning skills help jurors understand complex scientific evidence.« less
  5. Embedding properties of network realizations of dissipative reduced order models Jörn Zimmerling, Mikhail Zaslavsky,Rob Remis, Shasri Moskow, Alexander Mamonov, Murthy Guddati, Vladimir Druskin, and Liliana Borcea Mathematical Sciences Department, Worcester Polytechnic Institute https://www.wpi.edu/people/vdruskin Abstract Realizations of reduced order models of passive SISO or MIMO LTI problems can be transformed to tridiagonal and block-tridiagonal forms, respectively, via dierent modications of the Lanczos algorithm. Generally, such realizations can be interpreted as ladder resistor-capacitor-inductor (RCL) networks. They gave rise to network syntheses in the rst half of the 20th century that was at the base of modern electronics design and consecutively to MOR that tremendously impacted many areas of engineering (electrical, mechanical, aerospace, etc.) by enabling ecient compression of the underlining dynamical systems. In his seminal 1950s works Krein realized that in addition to their compressing properties, network realizations can be used to embed the data back into the state space of the underlying continuum problems. In more recent works of the authors Krein's ideas gave rise to so-called nite-dierence Gaussian quadrature rules (FDGQR), allowing to approximately map the ROM state-space representation to its full order continuum counterpart on a judicially chosen grid. Thus, the state variables can be accessed directly from themore »transfer function without solving the full problem and even explicit knowledge of the PDE coecients in the interior, i.e., the FDGQR directly learns" the problem from its transfer function. This embedding property found applications in PDE solvers, inverse problems and unsupervised machine learning. Here we show a generalization of this approach to dissipative PDE problems, e.g., electromagnetic and acoustic wave propagation in lossy dispersive media. Potential applications include solution of inverse scattering problems in dispersive media, such as seismic exploration, radars and sonars. To x the idea, we consider a passive irreducible SISO ROM fn(s) = Xn j=1 yi s + σj , (62) assuming that all complex terms in (62) come in conjugate pairs. We will seek ladder realization of (62) as rjuj + vj − vj−1 = −shˆjuj , uj+1 − uj + ˆrj vj = −shj vj , (63) for j = 0, . . . , n with boundary conditions un+1 = 0, v1 = −1, and 4n real parameters hi, hˆi, ri and rˆi, i = 1, . . . , n, that can be considered, respectively, as the equivalent discrete inductances, capacitors and also primary and dual conductors. Alternatively, they can be viewed as respectively masses, spring stiness, primary and dual dampers of a mechanical string. Reordering variables would bring (63) into tridiagonal form, so from the spectral measure given by (62 ) the coecients of (63) can be obtained via a non-symmetric Lanczos algorithm written in J-symmetric form and fn(s) can be equivalently computed as fn(s) = u1. The cases considered in the original FDGQR correspond to either (i) real y, θ or (ii) real y and imaginary θ. Both cases are covered by the Stieltjes theorem, that yields in case (i) real positive h, hˆ and trivial r, rˆ, and in case (ii) real positive h,r and trivial hˆ,rˆ. This result allowed us a simple interpretation of (62) as the staggered nite-dierence approximation of the underlying PDE problem [2]. For PDEs in more than one variables (including topologically rich data-manifolds), a nite-dierence interpretation is obtained via a MIMO extensions in block form, e.g., [4, 3]. The main diculty of extending this approach to general passive problems is that the Stieltjes theory is no longer applicable. Moreover, the tridiagonal realization of a passive ROM transfer function (62) via the ladder network (63) cannot always be obtained in port-Hamiltonian form, i.e., the equivalent primary and dual conductors may change sign [1]. 100 Embedding of the Stieltjes problems, e.g., the case (i) was done by mapping h and hˆ into values of acoustic (or electromagnetic) impedance at grid cells, that required a special coordinate stretching (known as travel time coordinate transform) for continuous problems. Likewise, to circumvent possible non-positivity of conductors for the non-Stieltjes case, we introduce an additional complex s-dependent coordinate stretching, vanishing as s → ∞ [1]. This stretching applied in the discrete setting induces a diagonal factorization, removes oscillating coecients, and leads to an accurate embedding for moderate variations of the coecients of the continuum problems, i.e., it maps discrete coecients onto the values of their continuum counterparts. Not only does this embedding yields an approximate linear algebraic algorithm for the solution of the inverse problems for dissipative PDEs, it also leads to new insight into the properties of their ROM realizations. We will also discuss another approach to embedding, based on Krein-Nudelman theory [5], that results in special data-driven adaptive grids. References [1] Borcea, Liliana and Druskin, Vladimir and Zimmerling, Jörn, A reduced order model approach to inverse scattering in lossy layered media, Journal of Scientic Computing, V. 89, N1, pp. 136,2021 [2] Druskin, Vladimir and Knizhnerman, Leonid, Gaussian spectral rules for the three-point second dierences: I. A two-point positive denite problem in a semi-innite domain, SIAM Journal on Numerical Analysis, V. 37, N 2, pp.403422, 1999 [3] Druskin, Vladimir and Mamonov, Alexander V and Zaslavsky, Mikhail, Distance preserving model order reduction of graph-Laplacians and cluster analysis, Druskin, Vladimir and Mamonov, Alexander V and Zaslavsky, Mikhail, Journal of Scientic Computing, V. 90, N 1, pp 130, 2022 [4] Druskin, Vladimir and Moskow, Shari and Zaslavsky, Mikhail LippmannSchwingerLanczos algorithm for inverse scattering problems, Inverse Problems, V. 37, N. 7, 2021, [5] Mark Adolfovich Nudelman The Krein String and Characteristic Functions of Maximal Dissipative Operators, Journal of Mathematical Sciences, 2004, V 124, pp 49184934 Go back to Plenary Speakers Go back to Speakers Go back« less