skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Application of Systems Engineering Principles and Techniques in Biological Big Data Analytics: A Review
In the past few decades, we have witnessed tremendous advancements in biology, life sciences and healthcare. These advancements are due in no small part to the big data made available by various high-throughput technologies, the ever-advancing computing power, and the algorithmic advancements in machine learning. Specifically, big data analytics such as statistical and machine learning has become an essential tool in these rapidly developing fields. As a result, the subject has drawn increased attention and many review papers have been published in just the past few years on the subject. Different from all existing reviews, this work focuses on the application of systems, engineering principles and techniques in addressing some of the common challenges in big data analytics for biological, biomedical and healthcare applications. Specifically, this review focuses on the following three key areas in biological big data analytics where systems engineering principles and techniques have been playing important roles: the principle of parsimony in addressing overfitting, the dynamic analysis of biological data, and the role of domain knowledge in biological data analytics.  more » « less
Award ID(s):
1805950
PAR ID:
10291800
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Processes
Volume:
8
Issue:
8
ISSN:
2227-9717
Page Range / eLocation ID:
951
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In the last few years, the field of data science has been growing rapidly as various businesses have adopted statistical and machine learning techniques to empower their decision-making and applications. Scaling data analyses to large volumes of data requires the utilization of distributed frameworks. This can lead to serious technical challenges for data analysts and reduce their productivity. AFrame, a data analytics library, is implemented as a layer on top of Apache AsterixDB, addressing these issues by providing the data scientists' familiar interface, Pandas Dataframe, and transparently scaling out the evaluation of analytical operations through a Big Data management system. While AFrame is able to leverage data management facilities (e.g., indexes and query optimization) and allows users to interact with a large volume of data, the initial version only generated SQL++ queries and only operated against AsterixDB. In this work, we describe a new design that retargets AFrame's incremental query formation to other query-based database systems, making it more flexible for deployment against other data management systems with composable query languages. 
    more » « less
  2. null (Ed.)
    Mixed-initiative visual analytics systems incorporate well-established design principles that improve users' abilities to solve problems. As these systems consider whether to take initiative towards achieving user goals, many current systems address the potential for cognitive bias in human initiatives statically, relying on fixed initiatives they can take instead of identifying, communicating and addressing the bias as it occurs. We argue that mixed-initiative design principles can and should incorporate cognitive bias mitigation strategies directly through development of mitigation techniques embedded in the system to address cognitive biases in situ. We identify domain experts in machine learning adopting visual analytics techniques and systems that incorporate existing mixed-initiative principles and examine their potential to support bias mitigation strategies. This examination considers the unique perspective these experts bring to visual analytics and is situated in existing user-centered systems that make exemplary use of design principles informed by cognitive theory. We then suggest informed opportunities for domain experts to take initiative toward addressing cognitive biases in light of their existing contributions to the field. Finally, we contribute open questions and research directions for designers seeking to adopt visual analytics techniques that incorporate bias-aware initiatives in future systems. 
    more » « less
  3. From wool to Kevlar, one-dimensional (1D) fiber has experienced the transition from clothing materials to structural applications in the past centuries. However, the recent advancements in tooling engineering and manufacturing processes have attracted much attention from both academia and industry to fabricate novel, versatile fibers with unique microstructures and unprecedented properties. This mini-review focuses on the fabrication techniques of porous, coaxial, layer-by-layer, and segmented fibers with continuous solution and melt fiber spinning methods. In each section of this review article, the unique structure-related applications, including intelligent devices, healthcare devices, energy storage systems, wearable electronics, and sustainable products, are discussed and evaluated. Finally, the combination of additive manufacturing (AM) for 1D fiber patterning in two-dimensional (2D) and three-dimensional (3D) devices, in addition to challenges in the reviewed fiber microstructures, is briefly introduced in the conclusion section. 
    more » « less
  4. null (Ed.)
    The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed. 
    more » « less
  5. null (Ed.)
    Research and experimentation using big data sets, specifically large sets of electronic health records (EHR) and social media data, is demonstrating the potential to understand the spread of diseases and a variety of other issues. Applications of advanced algorithms, machine learning, and artificial intelligence indicate a potential for rapidly advancing improvements in public health. For example, several reports indicate that social media data can be used to predict disease outbreak and spread (Brown, 2015). Since real-world EHR data has complicated security and privacy issues preventing it from being widely used by researchers, there is a real need to synthetically generate EHR data that is realistic and representative. Current EHR generators, such as Syntheaä (Walonoski et al., 2018) only simulate and generate pure medical-related data. However, adding patients’ social media data with their simulated EHR data would make combined data more comprehensive and realistic for healthcare research. This paper presents a patients’ social media data generator that extends an EHR data generator. By adding coherent social media data to EHR data, a variety of issues can be examined for emerging interests, such as where a contagious patient may have been and others with whom they may have been in contact. Social media data, specifically Twitter data, is generated with phrases indicating the onset of symptoms corresponding to the synthetically generated EHR reports of simulated patients. This enables creation of an open data set that is scalable up to a big-data size, and is not subject to the security, privacy concerns, and restrictions of real healthcare data sets. This capability is important to the modeling and simulation community, such as scientists and epidemiologists who are developing algorithms to analyze the spread of diseases. It enables testing a variety of analytics without revealing real-world private patient information. 
    more » « less