Title: FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference
A classical problem in causal inference is that of matching, where treatment units need to be matched to control units based on covariate information. In this work, we propose a method that computes high quality almost-exact matches for high-dimensional categorical datasets. This method, called FLAME (Fast Large-scale Almost Matching Exactly), learns a distance metric for matching using a hold-out training data set. In order to perform matching efficiently for large datasets, FLAME leverages techniques that are natural for query processing in the area of database management, and two implementations of FLAME are provided: the first uses SQL queries and the second uses bit-vector techniques. The algorithm starts by constructing matches of the highest quality (exact matches on all covariates), and successively eliminates variables in order to match exactly on as many variables as possible, while still maintaining interpretable high-quality matches and balance between treatment and control groups. We leverage these high quality matches to estimate conditional average treatment effects (CATEs). Our experiments show that FLAME scales to huge datasets with millions of observations where existing state-of-the-art methods fail, and that it achieves significantly better performance than other matching methods. more »« less
Awan, Usaid; Morucci, Marco; Orlandi, Vittorio; Roy, Sudeepa; Rudin, Cynthia; Volfovsky, Alexander
(, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics)
null
(Ed.)
We propose a matching method that recovers direct treatment effects from randomized experiments where units are connected in an observed network, and units that share edges can potentially influence each others’ outcomes. Traditional treatment effect estimators for randomized experiments are biased and error prone in this setting. Our method matches units almost exactly on counts of unique subgraphs within their neighborhood graphs. The matches that we construct are interpretable and high-quality. Our method can be extended easily to accommodate additional unit-level covariate information. We show empirically that our method performs better than other existing methodologies for this problem, while producing meaningful, interpretable results.
Harsh Parikh; Cynthia Rudin; Alexander Volfovsky
(, Journal of machine learning research)
We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate's contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.
Balsara, Dinshaw S.; Samantaray, Saurav; Subramanian, Sethupathy
(, Communications on Applied Mathematics and Computation)
Abstract Adaptive mesh refinement (AMR) is the art of solving PDEs on a mesh hierarchy with increasing mesh refinement at each level of the hierarchy. Accurate treatment on AMR hierarchies requires accurate prolongation of the solution from a coarse mesh to a newly defined finer mesh. For scalar variables, suitably high-order finite volume WENO methods can carry out such a prolongation. However, classes of PDEs, such as computational electrodynamics (CED) and magnetohydrodynamics (MHD), require that vector fields preserve a divergence constraint. The primal variables in such schemes consist of normal components of the vector field that are collocated at the faces of the mesh. As a result, the reconstruction and prolongation strategies for divergence constraint-preserving vector fields are necessarily more intricate. In this paper we present a fourth-order divergence constraint-preserving prolongation strategy that is analytically exact. Extension to higher orders using analytically exact methods is very challenging. To overcome that challenge, a novel WENO-like reconstruction strategy is invented that matches the moments of the vector field in the faces, where the vector field components are collocated. This approach is almost divergence constraint-preserving, therefore, we call it WENO-ADP. To make it exactly divergence constraint-preserving, a touch-up procedure is developed that is based on a constrained least squares (CLSQ) method for restoring the divergence constraint up to machine accuracy. With the touch-up, it is called WENO-ADPT. It is shown that refinement ratios of two and higher can be accommodated. An item of broader interest in this work is that we have also been able to invent very efficient finite volume WENO methods, where the coefficients are very easily obtained and the multidimensional smoothness indicators can be expressed as perfect squares. We demonstrate that the divergence constraint-preserving strategy works at several high orders for divergence-free vector fields as well as vector fields, where the divergence of the vector field has to match a charge density and its higher moments. We also show that our methods overcome the late time instability that has been known to plague adaptive computations in CED.
Mundra, Pranay; Zhang, Jianhao; Nargesian, Fatemeh; Augsten, Nikolaus
(, 2023 IEEE 39th International Conference on Data Engineering (ICDE))
We study the top-k set similarity search problem using semantic overlap. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to increase the overlap. The semantic overlap is the maximum matching score of a bipartite graph, where an edge weight between two set elements is defined by a user-defined similarity function, e.g., cosine similarity between embeddings. Common techniques like token indexes fail for semantic search since similar elements may be unrelated at the character level. Further, verifying candidates is expensive (cubic versus linear for syntactic overlap), calling for highly selective filters. We propose Koios, the first exact and efficient algorithm for semantic overlap search. Koios leverages sophisticated filters to minimize the number of required graph-matching calculations. Our experiments show that for medium to large sets less than 5% of the candidate sets need verification, and more than half of those sets are further pruned without requiring the expensive graph matching. We show the efficiency of our algorithm on four real datasets and demonstrate the improved result quality of semantic over vanilla set similarity search.
Bennett, Andrew; Kallus, Nathan; Schnabel, Tobias
(, Advances in neural information processing systems)
Instrumental variable analysis is a powerful tool for estimating causal effects when randomization or full control of confounders is not possible. The application of standard methods such as 2SLS, GMM, and more recent variants are significantly impeded when the causal effects are complex, the instruments are high-dimensional, and/or the treatment is high-dimensional. In this paper, we propose the DeepGMM algorithm to overcome this. Our algorithm is based on a new variational reformulation of GMM with optimal inverse-covariance weighting that allows us to efficiently control very many moment conditions. We further develop practical techniques for optimization and model selection that make it particularly successful in practice. Our algorithm is also computationally tractable and can handle large-scale datasets. Numerical results show our algorithm matches the performance of the best tuned methods in standard settings and continues to work in high-dimensional settings where even recent methods break.
Wang, Tianyu, Morucci, Marco, Awan, M. Usaid, Liu, Yameng, Roy, Sudeepa, Rudin, Cynthia, and Volfovsky, Alexander. FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference. Retrieved from https://par.nsf.gov/biblio/10291692. Journal of machine learning research 22.31
Wang, Tianyu, Morucci, Marco, Awan, M. Usaid, Liu, Yameng, Roy, Sudeepa, Rudin, Cynthia, and Volfovsky, Alexander.
"FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference". Journal of machine learning research 22 (31). Country unknown/Code not available. https://par.nsf.gov/biblio/10291692.
@article{osti_10291692,
place = {Country unknown/Code not available},
title = {FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference},
url = {https://par.nsf.gov/biblio/10291692},
abstractNote = {A classical problem in causal inference is that of matching, where treatment units need to be matched to control units based on covariate information. In this work, we propose a method that computes high quality almost-exact matches for high-dimensional categorical datasets. This method, called FLAME (Fast Large-scale Almost Matching Exactly), learns a distance metric for matching using a hold-out training data set. In order to perform matching efficiently for large datasets, FLAME leverages techniques that are natural for query processing in the area of database management, and two implementations of FLAME are provided: the first uses SQL queries and the second uses bit-vector techniques. The algorithm starts by constructing matches of the highest quality (exact matches on all covariates), and successively eliminates variables in order to match exactly on as many variables as possible, while still maintaining interpretable high-quality matches and balance between treatment and control groups. We leverage these high quality matches to estimate conditional average treatment effects (CATEs). Our experiments show that FLAME scales to huge datasets with millions of observations where existing state-of-the-art methods fail, and that it achieves significantly better performance than other matching methods.},
journal = {Journal of machine learning research},
volume = {22},
number = {31},
author = {Wang, Tianyu and Morucci, Marco and Awan, M. Usaid and Liu, Yameng and Roy, Sudeepa and Rudin, Cynthia and Volfovsky, Alexander},
editor = {null}
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.