skip to main content

Title: Shape-based functional data analysis

Functional data analysis (FDA) is a fast-growing area of research and development in statistics. While most FDA literature imposes the classical$$\mathbb {L}^2$$L2Hilbert structure on function spaces, there is an emergent need for a different, shape-based approach for analyzing functional data. This paper reviews and develops fundamental geometrical concepts that help connect traditionally diverse fields of shape and functional analyses. It showcases that focusing on shapes is often more appropriate when structural features (number of peaks and valleys and their heights) carry salient information in data. It recaps recent mathematical representations and associated procedures for comparing, summarizing, and testing the shapes of functions. Specifically, it discusses three tasks: shape fitting, shape fPCA, and shape regression models. The latter refers to the models that separate the shapes of functions from their phases and use them individually in regression analysis. The ensuing results provide better interpretations and tend to preserve geometric structures. The paper also discusses an extension where the functions are not real-valued but manifold-valued. The article presents several examples of this shape-centric functional data analysis using simulated and real data.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Medium: X Size: p. 1-47
["p. 1-47"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    It has been recently established in David and Mayboroda (Approximation of green functions and domains with uniformly rectifiable boundaries of all dimensions.arXiv:2010.09793) that on uniformly rectifiable sets the Green function is almost affine in the weak sense, and moreover, in some scenarios such Green function estimates are equivalent to the uniform rectifiability of a set. The present paper tackles a strong analogue of these results, starting with the “flagship degenerate operators on sets with lower dimensional boundaries. We consider the elliptic operators$$L_{\beta ,\gamma } =- {\text {div}}D^{d+1+\gamma -n} \nabla $$Lβ,γ=-divDd+1+γ-nassociated to a domain$$\Omega \subset {\mathbb {R}}^n$$ΩRnwith a uniformly rectifiable boundary$$\Gamma $$Γof dimension$$d < n-1$$d<n-1, the now usual distance to the boundary$$D = D_\beta $$D=Dβgiven by$$D_\beta (X)^{-\beta } = \int _{\Gamma } |X-y|^{-d-\beta } d\sigma (y)$$Dβ(X)-β=Γ|X-y|-d-βdσ(y)for$$X \in \Omega $$XΩ, where$$\beta >0$$β>0and$$\gamma \in (-1,1)$$γ(-1,1). In this paper we show that the Green functionGfor$$L_{\beta ,\gamma }$$Lβ,γ, with pole at infinity, is well approximated by multiples of$$D^{1-\gamma }$$D1-γ, in the sense that the function$$\big | D\nabla \big (\ln \big ( \frac{G}{D^{1-\gamma }} \big )\big )\big |^2$$|D(ln(GD1-γ))|2satisfies a Carleson measure estimate on$$\Omega $$Ω. We underline that the strong and the weak results are different in nature and, of course, at the level of the proofs: the latter extensively used compactness arguments, while the present paper relies on some intricate integration by parts and the properties of the “magical distance function from David et al. (Duke Math J, to appear).

    more » « less
  2. Abstract

    Harmonic Hilbert spaces on locally compact abelian groups are reproducing kernel Hilbert spaces (RKHSs) of continuous functions constructed by Fourier transform of weighted$$L^2$$L2spaces on the dual group. It is known that for suitably chosen subadditive weights, every such space is a Banach algebra with respect to pointwise multiplication of functions. In this paper, we study RKHSs associated with subconvolutive functions on the dual group. Sufficient conditions are established for these spaces to be symmetric Banach$$^*$$-algebras with respect to pointwise multiplication and complex conjugation of functions (here referred to as RKHAs). In addition, we study aspects of the spectra and state spaces of RKHAs. Sufficient conditions are established for an RKHA on a compact abelian groupGto have the same spectrum as the$$C^*$$C-algebra of continuous functions onG. We also consider one-parameter families of RKHSs associated with semigroups of self-adjoint Markov operators on$$L^2(G)$$L2(G), and show that in this setting subconvolutivity is a necessary and sufficient condition for these spaces to have RKHA structure. Finally, we establish embedding relationships between RKHAs and a class of Fourier–Wermer algebras that includes spaces of dominating mixed smoothness used in high-dimensional function approximation.

    more » « less
  3. Abstract

    In this paper, we study multistage stochastic mixed-integer nonlinear programs (MS-MINLP). This general class of problems encompasses, as important special cases, multistage stochastic convex optimization withnon-Lipschitzianvalue functions and multistage stochastic mixed-integer linear optimization. We develop stochastic dual dynamic programming (SDDP) type algorithms with nested decomposition, deterministic sampling, and stochastic sampling. The key ingredient is a new type of cuts based on generalized conjugacy. Several interesting classes of MS-MINLP are identified, where the new algorithms are guaranteed to obtain the global optimum without the assumption of complete recourse. This significantly generalizes the classic SDDP algorithms. We also characterize the iteration complexity of the proposed algorithms. In particular, for a$$(T+1)$$(T+1)-stage stochastic MINLP satisfyingL-exact Lipschitz regularization withd-dimensional state spaces, to obtain an$$\varepsilon $$ε-optimal root node solution, we prove that the number of iterations of the proposed deterministic sampling algorithm is upper bounded by$${\mathcal {O}}((\frac{2LT}{\varepsilon })^d)$$O((2LTε)d), and is lower bounded by$${\mathcal {O}}((\frac{LT}{4\varepsilon })^d)$$O((LT4ε)d)for the general case or by$${\mathcal {O}}((\frac{LT}{8\varepsilon })^{d/2-1})$$O((LT8ε)d/2-1)for the convex case. This shows that the obtained complexity bounds are rather sharp. It also reveals that the iteration complexity dependspolynomiallyon the number of stages. We further show that the iteration complexity dependslinearlyonT, if all the state spaces are finite sets, or if we seek a$$(T\varepsilon )$$(Tε)-optimal solution when the state spaces are infinite sets, i.e. allowing the optimality gap to scale withT. To the best of our knowledge, this is the first work that reports global optimization algorithms as well as iteration complexity results for solving such a large class of multistage stochastic programs. The iteration complexity study resolves a conjecture by the late Prof. Shabbir Ahmed in the general setting of multistage stochastic mixed-integer optimization.

    more » « less
  4. Abstract

    We propose a new observable for the measurement of the forward–backward asymmetry$$(A_{FB})$$(AFB)in Drell–Yan lepton production. At hadron colliders, the$$A_{FB}$$AFBdistribution is sensitive to both the electroweak (EW) fundamental parameter$$\sin ^{2} \theta _{W}$$sin2θW, the weak mixing angle, and the parton distribution functions (PDFs). Hence, the determination of$$\sin ^{2} \theta _{W}$$sin2θWand the updating of PDFs by directly using the same$$A_{FB}$$AFBspectrum are strongly correlated. This correlation would introduce large bias or uncertainty into both precise measurements of EW and PDF sectors. In this article, we show that the sensitivity of$$A_{FB}$$AFBon$$\sin ^{2} \theta _{W}$$sin2θWis dominated by its average value around theZpole region, while the shape (or gradient) of the$$A_{FB}$$AFBspectrum is insensitive to$$\sin ^{2} \theta _{W}$$sin2θWand contains important information on the PDF modeling. Accordingly, a new observable related to the gradient of the spectrum is introduced, and demonstrated to be able to significantly reduce the potential bias on the determination of$$\sin ^{2} \theta _{W}$$sin2θWwhen updating the PDFs using the same$$A_{FB}$$AFBdata.

    more » « less
  5. Abstract

    This paper presents a search for dark matter,$$\chi $$χ, using events with a single top quark and an energeticWboson. The analysis is based on proton–proton collision data collected with the ATLAS experiment at$$\sqrt{s}=$$s=13 TeV during LHC Run 2 (2015–2018), corresponding to an integrated luminosity of 139 fb$$^{-1}$$-1. The search considers final states with zero or one charged lepton (electron or muon), at least oneb-jet and large missing transverse momentum. In addition, a result from a previous search considering two-charged-lepton final states is included in the interpretation of the results. The data are found to be in good agreement with the Standard Model predictions and the results are interpreted in terms of 95% confidence-level exclusion limits in the context of a class of dark matter models involving an extended two-Higgs-doublet sector together with a pseudoscalar mediator particle. The search is particularly sensitive to on-shell production of the charged Higgs boson state,$$H^{\pm }$$H±, arising from the two-Higgs-doublet mixing, and its semi-invisible decays via the mediator particle,a:$$H^{\pm } \rightarrow W^\pm a (\rightarrow \chi \chi )$$H±W±a(χχ). Signal models with$$H^{\pm }$$H±masses up to 1.5 TeV andamasses up to 350 GeV are excluded assuming a$$\tan \beta $$tanβvalue of 1. For masses ofaof 150 (250) GeV,$$\tan \beta $$tanβvalues up to 2 are excluded for$$H^{\pm }$$H±masses between 200 (400) GeV and 1.5 TeV. Signals with$$\tan \beta $$tanβvalues between 20 and 30 are excluded for$$H^{\pm }$$H±masses between 500 and 800 GeV.

    more » « less