skip to main content


Title: AI in Health: State of the Art, Challenges, and Future Directions
Introduction: Artificial intelligence (AI) technologies continue to attract interest from a broad range of disciplines in recent years, including health. The increase in computer hardware and software applications in medicine, as well as digitization of health-related data together fuel progress in the development and use of AI in medicine. This progress provides new opportunities and challenges, as well as directions for the future of AI in health. Objective: The goals of this survey are to review the current state of AI in health, along with opportunities, challenges, and practical implications. This review highlights recent developments over the past five years and directions for the future. Methods: Publications over the past five years reporting the use of AI in health in clinical and biomedical informatics journals, as well as computer science conferences, were selected according to Google Scholar citations. Publications were then categorized into five different classes, according to the type of data analyzed. Results: The major data types identified were multi-omics, clinical, behavioral, environmental and pharmaceutical research and development (R&D) data. The current state of AI related to each data type is described, followed by associated challenges and practical implications that have emerged over the last several years. Opportunities and future directions based on these advances are discussed. Conclusion: Technologies have enabled the development of AI-assisted approaches to healthcare. However, there remain challenges. Work is currently underway to address multi-modal data integration, balancing quantitative algorithm performance and qualitative model interpretability, protection of model security, federated learning, and model bias.  more » « less
Award ID(s):
1650723
NSF-PAR ID:
10157506
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Yearbook of Medical Informatics
Volume:
28
Issue:
01
ISSN:
0943-4747
Page Range / eLocation ID:
016 to 026
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do not have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA. 
    more » « less
  2. Background Digital health is poised to transform health care and redefine personalized health. As Internet and mobile phone usage increases, as technology develops new ways to collect data, and as clinical guidelines change, all areas of medicine face new challenges and opportunities. Inflammatory bowel disease (IBD) is one of many chronic diseases that may benefit from these advances in digital health. This review intends to lay a foundation for clinicians and technologists to understand future directions and opportunities together. Objective This review covers mobile health apps that have been used in IBD, how they have fit into a clinical care framework, and the challenges that clinicians and technologists face in approaching future opportunities. Methods We searched PubMed, Scopus, and ClinicalTrials.gov to identify mobile apps that have been studied and were published in the literature from January 1, 2010, to April 19, 2019. The search terms were (“mobile health” OR “eHealth” OR “digital health” OR “smart phone” OR “mobile app” OR “mobile applications” OR “mHealth” OR “smartphones”) AND (“IBD” OR “Inflammatory bowel disease” OR “Crohn's Disease” (CD) OR “Ulcerative Colitis” (UC) OR “UC” OR “CD”), followed by further analysis of citations from the results. We searched the Apple iTunes app store to identify a limited selection of commercial apps to include for discussion. Results A total of 68 articles met the inclusion criteria. A total of 11 digital health apps were identified in the literature and 4 commercial apps were selected to be described in this review. While most apps have some educational component, the majority of apps focus on eliciting patient-reported outcomes related to disease activity, and a few are for treatment management. Significant benefits have been seen in trials relating to education, quality of life, quality of care, treatment adherence, and medication management. No studies have reported a negative impact on any of the above. There are mixed results in terms of effects on office visits and follow-up. Conclusions While studies have shown that digital health can fit into, complement, and improve the standard clinical care of patients with IBD, there is a need for further validation and improvement, from both a clinical and patient perspective. Exploring new research methods, like microrandomized trials, may allow for more implementation of technology and rapid advancement of knowledge. New technologies that can objectively and seamlessly capture remote data, as well as complement the clinical shift from symptom-based to inflammation-based care, will help the clinical and health technology communities to understand the full potential of digital health in the care of IBD and other chronic illnesses. 
    more » « less
  3. null (Ed.)
    Today’s classrooms are remarkably different from those of yesteryear. In place of individual students responding to the teacher from neat rows of desks, one more typically finds students working in groups on projects, with a teacher circulating among groups. AI applications in learning have been slow to catch up, with most available technologies focusing on personalizing or adapting instruction to learners as isolated individuals. Meanwhile, an established science of Computer Supported Collaborative Learning has come to prominence, with clear implications for how collaborative learning could best be supported. In this contribution, I will consider how intelligence augmentation could evolve to support collaborative learning as well as three signature challenges of this work that could drive AI forward. In conceptualizing collaborative learning, Kirschner and Erkens (2013) provide a useful 3x3 framework in which there are three aspects of learning (cognitive, social and motivational), three levels (community, group/team, and individual) and three kinds of pedagogical supports (discourse-oriented, representation-oriented, and process-oriented). As they engage in this multiply complex space, teachers and learners are both learning to collaborate and collaborating to learn. Further, questions of equity arise as we consider who is able to participate and in which ways. Overall, this analysis helps us see the complexity of today’s classrooms and within this complexity, the opportunities for augmentation or “assistance to become important and even essential. An overarching design concept has emerged in the past 5 years in response to this complexity, the idea of intelligent augmentation for “orchestrating” classrooms (Dillenbourg, et al, 2013). As a metaphor, orchestration can suggest the need for a coordinated performance among many agents who are each playing different roles or voicing different ideas. Practically speaking, orchestration suggests that “intelligence augmentation” could help many smaller things go well, and in doing so, could enable the overall intention of the learning experience to succeed. Those smaller things could include helping the teacher stay aware of students or groups who need attention, supporting formation of groups or transitions from one activity to the next, facilitating productive social interactions in groups, suggesting learning resources that would support teamwork, and more. A recent panel of AI experts identified orchestration as an overarching concept that is an important focus for near-term research and development for intelligence augmentation (Roschelle, Lester & Fusco, 2020). Tackling this challenging area of collaborative learning could also be beneficial for advancing AI technologies overall. Building AI agents that better understand the social context of human activities has broad importance, as does designing AI agents that can appropriately interact within teamwork. Collaborative learning has trajectory over time, and designing AI systems that support teams not just with a short term recommendation or suggestion but in long-term developmental processes is important. Further, classrooms that are engaged in collaborative learning could become very interesting hybrid environments, with multiple human and AI agents present at once and addressing dual outcome goals of learning to collaborate and collaborating to learn; addressing a hybrid environment like this could lead to developing AI systems that more robustly help many types of realistic human activity. In conclusion, the opportunity to make a societal impact by attending to collaborative learning, the availability of growing science of computer-supported collaborative learning and the need to push new boundaries in AI together suggest collaborative learning as a challenge worth tackling in coming years. 
    more » « less
  4. Studies on human mobility have a long history with increasingly strong interdisciplinary connections across social science, environmental science, information and technology, computer science, engineering, and health science. However, what is lacking in the current research is a synthesis of the studies to identify the evolutional pathways and future research directions. To address this gap, we conduct a systematic review of human mobility-related studies published from 1990 to 2020. Drawing on the selected publications retrieved from the Web of Science, we provide a bibliometric analysis and network visualisation using CiteSpace and VOSviewer on the number of publications and year published, authors and their countries and afflictions, citations, topics, abstracts, keywords, and journals. Our findings show that human mobility-related studies have become increasingly interdisciplinary and multi-dimensional, which have been strengthened by the use of the so-called ‘big data’ from multiple sources, the development of computer technologies, the innovation of modelling approaches, and the novel applications in various areas. Based on our synthesis of the work by top cited authors we identify four directions for future research relating to data sources, modelling methods, applications, and technologies. We advocate for more in-depth research on human mobility using multi-source big data, improving modelling methods and integrating advanced technologies including artificial intelligence, and machine and deep learning to address real-world problems and contribute to social good. 
    more » « less
  5. Future wearable electronics and smart textiles face a major challenge in the development of energy storage devices that are high-performing while still being flexible, lightweight, and safe. Fiber supercapacitors are one of the most promising energy storage technologies for such applications due to their excellent electrochemical characteristics and mechanical flexibility. Over the past decade, researchers have put in tremendous effort and made significant progress on fiber supercapacitors. It is now the time to assess the outcomes to ensure that this kind of energy storage device will be practical for future wearable electronics and smart textiles. While the materials, fabrication methods, and energy storage performance of fiber supercapacitors have been summarized and evaluated in many previous publications, this review paper focuses on two practical questions: Are the reported devices providing sufficient energy and power densities to wearable electronics? Are the reported devices flexible and durable enough to be integrated into smart textiles? To answer the first question, we not only review the electrochemical performance of the reported fiber supercapacitors but also compare them to the power needs of a variety of commercial electronics. To answer the second question, we review the general approaches to assess the flexibility of wearable textiles and suggest standard methods to evaluate the mechanical flexibility and stability of fiber supercapacitors for future studies. Lastly, this article summarizes the challenges for the practical application of fiber supercapacitors and proposes possible solutions. 
    more » « less