skip to main content


Title: Artificial Swarm Intelligence employed to Amplify Diagnostic Accuracy in Radiology
Swarm Intelligence (SI) is a biological phenomenon in which groups of organisms amplify their combined intelligence by forming real-time systems. It has been studied for decades in fish schools, bird flocks, and bee swarms. Recent advances in networking and AI technologies have enabled distributed human groups to form closed-loop systems modeled after natural swarms. The process is referred to as Artificial Swarm Intelligence (ASI) and has been shown to significantly amplify group intelligence. The present research applies ASI technology to the field of medicine, exploring if small groups of networked radiologists can improve their diagnostic accuracy when reviewing chest X-rays for the presence of pneumonia by “thinking together” as an ASI system. Data was collected for individual diagnoses as well as for diagnoses made by the group working as a real-time ASI system. Diagnoses were also collected using a state-of-the-art deep learning system developed by Stanford University School of Medicine. Results showed that a small group of networked radiologists, when working as a real-time closed-loop ASI system, was significantly more accurate than the individuals on their own, reducing errors by 33%, as well as significantly more accurate (22%) than a state- of-the-art software-only solution using deep learning.  more » « less
Award ID(s):
1840937
NSF-PAR ID:
10125861
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
IEMCON 2018
Page Range / eLocation ID:
1186 to 1191
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the natural world, Swarm Intelligence (SI) is a well-known phenomenon that enables groups of organisms to make collective decisions with significantly greater accuracy than the individuals could do on their own. In recent years, a new AI technology called Artificial Swarm Intelligence (ASI) has been developed that enables similar benefits for human teams. It works by connecting networked teams into real-time systems modeled on natural swarms. Referred to commonly as “human swarms” or “hive minds,” these closed-loop systems have been shown to amplify group performance across a wide range of tasks, from financial forecasting to strategic decision-making. The current study explores the ability of ASI technology to amplify the IQ of small teams. Five small teams answered a series of questions from a commonly used intelligence test known as the Raven’s Standard Progressive Matrices (RSPM) test. Participants took the test first as individuals, and then as groups moderated by swarming algorithms (i.e. “swarms”). The average individual achieved 53.7% correct, while the average swarm achieved 76.7% correct, corresponding to an estimated IQ increase of 14 points. When the individual responses were aggregated by majority vote, the groups scored 56.7% correct, still 12 IQ points less than the real-time swarming method. 
    more » « less
  2. Many social species amplify their decision-making accuracy by deliberating in real-time closed-loop systems. Known as Swarm Intelligence (SI), this natural process has been studied extensively in schools of fish, flocks of birds, and swarms of bees. The present research looks at human groups and tests their ability to make financial forecasts by working together in systems modeled after natural swarms. Specifically, groups of financial traders were tasked with forecasting the weekly trends of four common market indices (SPX, GLD, GDX, and Crude Oil) over a period of 19 consecutive weeks. Results showed that individual forecasters, who averaged 56.6% accuracy when predicting weekly trends on their own, amplified their accuracy to 77.0% when predicting together as real-time swarms. This reflects a 36% increase in forecasting accuracy and shows high statistical significance (p<0.001). Further, if investments had been made according to these swarm-based forecasts, the group would have netted a 13.3% return on investment (ROI) over the 19 weeks, compared to the individual’s 0.7% ROI. This suggests that enabling groups of traders to form real-time systems online, governed by swarm intelligence algorithms, has the potential to significantly increase the accuracy and ROI of financial forecasts. 
    more » « less
  3. Sales forecasts are critical to businesses of all sizes, enabling teams to project revenue, prioritize marketing, plan distribution, and scale inventory levels. To date, however, sales forecasts of new products have been shown to be highly inaccurate, due in large part to the lack of data about each new product and the subjective judgements required to compensate for this lack of data. The present study explores product sales forecasting performed by human groups and compares the accuracy of group forecasts generated by traditional polls to those made using Artificial Swarm Intelligence (ASI), a technique which has been shown to amplify the forecasting accuracy of groups in a wide range of fields. In collaboration with a major fashion retailer and a major fashion publisher, groups of fashion-conscious millennial women predicted the relative sales volumes of eight sweaters promoted during the 2018 holiday season, first by ranking each sweater’s sales in an online poll, and then using Swarm software to form an ASI system. The Swarm-based forecast was significantly more accurate than the poll. In fact, the top four sweaters ranked by swarm sold 23.7% more units, or $600,000 worth of sweaters during the target period, as compared to the top four sweaters as ranked by survey, (p = 0.0497), indicating that swarms of small consumer groups can be used to forecast sales with significantly higher accuracy than a traditional poll. 
    more » « less
  4. The aggregation of individual personality tests to predict team performance is widely accepted in management theory but has significant limitations: the isolated nature of individual personality surveys fails to capture much of the team dynamics that drive real-world team performance. Artificial Swarm Intelligence (ASI), a technology that enables networked teams to think together in real-time and answer questions as a unified system, promises a solution to these limitations by enabling teams to take personality tests together, whereby the team uses ASI to converge upon answers that best represent the group’s disposition. In the present study, the group personality of 94 small teams was assessed by having teams take a standard Big Five Inventory (BFI) test both as individuals, and as a real-time system enabled by an ASI technology known as Swarm AI. The predictive accuracy of each personality assessment method was assessed by correlating the BFI personality traits to a range of real-world performance metrics. The results showed that assessments of personality generated using Swarm AI were far more predictive of team performance than the traditional survey-based method, showing a significant improvement in correlation with at least 25% of performance metrics, and in no case showing a significant decrease in predictive performance. This suggests that Swarm AI technology may be used as a highly effective team personality assessment tool that more accurately predicts future team performance than traditional survey approaches. 
    more » « less
  5. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do not have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA. 
    more » « less