Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments,more »
QSAR without borders
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure–activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
- Award ID(s):
- 1802831
- Publication Date:
- NSF-PAR ID:
- 10198968
- Journal Name:
- Chemical Society Reviews
- Volume:
- 49
- Issue:
- 11
- Page Range or eLocation-ID:
- 3525 to 3564
- ISSN:
- 0306-0012
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract
This dataset includes rainfall, cloud, river and stream hydro-chemistry of the Plynlimon research catchments. The data is from weekly monitoring of stream hydrochemistry of the River Hafren (Severn) at both the Lower and Upper Hafren site from 1998, stream hydrochemistry of the River Hore at the Lower Hore site from 1983 and Upper Hore site from 1984 as well as rainfall hydrochemistry near the Carreg Wen meteorological site from 1983 and cloud hydrochemistry near the Carreg Wen meteorological site from 1990. Data for over 50 chemical determinands are presented alongside data for some in-situ measurements such as water temperature. Full descriptions of the analytical methods used for each determinand is included. The Plynlimon research catchments lie within the headwaters of the River Severn and the River Wye in the uplands of mid-Wales. Intensive and long-term monitoring within the catchments underpins a wealth of hydrological and hydro-chemical research; other linked datasets include river flow, meteorology and a variety of detailed spatial datasets representing the topography, soils and rivers of the catchments. Monitoring is funded by the Centre for Ecology & Hydrology, and is ongoing since 1968.Methods
Originally designed to improve understanding of water use by coniferous forests, monitoring within -
Machine learning represents a milestone in data-driven research, including material informatics, robotics, and computer-aided drug discovery. With the continuously growing virtual and synthetically available chemical space, efficient and robust quantitative structure–activity relationship (QSAR) methods are required to uncover molecules with desired properties. Herein, we propose variable-length-array SMILES-based (VLA-SMILES) structural descriptors that expand conventional SMILES descriptors widely used in machine learning. This structural representation extends the family of numerically coded SMILES, particularly binary SMILES, to expedite the discovery of new deep learning QSAR models with high predictive ability. VLA-SMILES descriptors were shown to speed up the training of QSAR models based on multilayer perceptron (MLP) with optimized backpropagation (ATransformedBP), resilient propagation (iRPROP‒), and Adam optimization learning algorithms featuring rational train–test splitting, while improving the predictive ability toward the more compute-intensive binary SMILES representation format. All the tested MLPs under the same length-array-based SMILES descriptors showed similar predictive ability and convergence rate of training in combination with the considered learning procedures. Validation with the Kennard–Stone train–test splitting based on the structural descriptor similarity metrics was found more effective than the partitioning with the ranking by activity based on biological activity values metrics for the entire set of VLA-SMILES featured QSAR. Robustness andmore »
-
BACKGROUND Optical sensing devices measure the rich physical properties of an incident light beam, such as its power, polarization state, spectrum, and intensity distribution. Most conventional sensors, such as power meters, polarimeters, spectrometers, and cameras, are monofunctional and bulky. For example, classical Fourier-transform infrared spectrometers and polarimeters, which characterize the optical spectrum in the infrared and the polarization state of light, respectively, can occupy a considerable portion of an optical table. Over the past decade, the development of integrated sensing solutions by using miniaturized devices together with advanced machine-learning algorithms has accelerated rapidly, and optical sensing research has evolved into a highly interdisciplinary field that encompasses devices and materials engineering, condensed matter physics, and machine learning. To this end, future optical sensing technologies will benefit from innovations in device architecture, discoveries of new quantum materials, demonstrations of previously uncharacterized optical and optoelectronic phenomena, and rapid advances in the development of tailored machine-learning algorithms. ADVANCES Recently, a number of sensing and imaging demonstrations have emerged that differ substantially from conventional sensing schemes in the way that optical information is detected. A typical example is computational spectroscopy. In this new paradigm, a compact spectrometer first collectively captures the comprehensive spectral information ofmore »
-
Research interest in nanoscale biomaterials has continued to grow in the past few decades, driving the need to form families of nanomaterials grouped by similar physical or chemical properties. Nanotubes have occupied a unique space in this field, primarily due to their high versatility in a wide range of biomedical applications. Although similar in morphology, members of this nanomaterial family widely differ in synthesis methods, mechanical and physiochemical properties, and therapeutic applications. As this field continues to develop, it is important to provide insight into novel biomaterial developments and their overall impact on current technology and therapeutics. In this review, we aim to characterize and compare two members of the nanotube family: carbon nanotubes (CNTs) and janus-base nanotubes (JBNts). While CNTs have been extensively studied for decades, JBNts provide a fresh perspective on many therapeutic modalities bound by the limitations of carbon-based nanomaterials. Herein, we characterize the morphology, synthesis, and applications of CNTs and JBNts to provide a comprehensive comparison between these nanomaterial technologies.