skip to main content

Search for: All records

Creators/Authors contains: "Kahn, Michael G."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The quality of data is extremely important for data analytics. Data quality tests typically involve checking constraints specified by domain experts. Existing approaches detect trivial constraint violations and identify outliers without explaining the constraints that were violated. Moreover, domain experts may specify constraints in an ad hoc manner and miss important ones. We describe an automated data quality test approach, ADQuaTe2, which uses an autoencoder to (1) discover constraints that may have been missed by experts, (2) label as suspicious those records that violate the constraints, and (3) provide explanations about the violations. An interactive learning technique incorporates expert feedback, which improves the accuracy. We evaluate the effectiveness of ADQuaTe2 on real-world datasets from health and plant domains. We also use datasets from the UCI repository to evaluate the improvement in the accuracy after incorporating ground truth knowledge.