Abstract— Recent advances show the wide-ranging applications of machine learning for solving multi-disciplinary problems in cancer cell growth detection, modeling cancer growths and treatments, etc. There is growing interests among the faculty and students at Clayton State University to study the applications of machine learning for medical imaging and propose new algorithms based on a recently funded NSF grant proposal in medical imaging, skin cancer detection, and associated smartphone apps and a web-based user-friendly diagnosis interface. We tested many available open-source ML algorithm-based software sets in Python as applied to medical image data processing, and modeling used to predict cancer growths and treatments. We study the use of ML concepts that promote efficient, accurate, secure computation over medical images, identifying and classifying cancer cells, and modeling the cancer cell growths. In this collaborative project with another university, we follow a holistic approach to data analysis leading to more efficient cancer detection based upon both cell analysis and image recognition. Here, we compare ML based software methods and analyze their detection accuracy. In addition, we acquire publicly available data of cancer cell image files and analyze using deep learning algorithms to detect benign and suspicious image samples. We apply the current pattern matching algorithms and study the available data with possible diagnosis of cancer types.
more »
« less
Mitigating Racial Biases for Machine Learning Based Skin Cancer Detection
Machine learning (ML) based skin cancer detection tools are an example of a transformative medical technology that could potentially democratize early detection for skin cancer cases for everyone. However, due to the dependency of datasets for training, ML based skin cancer detection always suffers from a systemic racial bias. Racial communities and ethnicity not well represented within the training datasets will not be able to use these tools, leading to health disparities being amplified. Based on empirical observations we posit that skin cancer training data is biased as it’s dataset represents mostly communities of lighter skin tones, despite skin cancer being far more lethal for people of color. In this paper we use domain adaptation techniques by employing CycleGANs to mitigate racial biases existing within state of the art machine learning based skin cancer detection tools by adapting minority images to appear as the majority. Using our domain adaptation techniques to augment our minority datasets, we are able to improve the accuracy, precision, recall, and F1 score of typical image classification machine learning models for skin cancer classification from the biased 50% accuracy rate to a 79% accuracy rate when testing on minority skin tone images. We evaluate and demonstrate a proof-of-concept smartphone application.
more »
« less
- Award ID(s):
- 1950778
- PAR ID:
- 10511266
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of ACM MobiHoc REUNS Workshop
- ISBN:
- 9781450399265
- Page Range / eLocation ID:
- 556 to 561
- Format(s):
- Medium: X
- Location:
- Washington DC USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recent advances show the wide-ranging applications of machine learning for solving multi-disciplinary problems in cancer cell growth detection, modeling cancer growths and treatments, etc. There is growing interests among the faculty and students at Clayton State University to study the applications of machine learning for medical imaging and propose new algorithms based on a recently funded NSF grant proposal in medical imaging, skin cancer detection, and associated smartphone apps and a web-based user-friendly diagnosis interface. We tested many available open-source ML algorithm-based software sets in Python as applied to medical image data processing, and modeling used to predict cancer growths and treatments. We study the use of ML concepts that promote efficient, accurate, secure computation over medical images, identifying and classifying cancer cells, and modeling the cancer cell growths. In this collaborative project with another university, we follow a holistic approach to data analysis leading to more efficient cancer detection based upon both cell analysis and image recognition. Here, we compare ML based software methods and analyze their detection accuracy. In addition, we acquire publicly available data of cancer cell image files and analyze using deep learning algorithms to detect benign and suspicious image samples. We apply the current pattern matching algorithms and study the available data with possible diagnosis of cancer types.more » « less
-
Deep learning (DL) models have demonstrated state-of-the-art performance in the classification of diagnostic imaging in oncology. However, DL models for medical images can be compromised by adversarial images, where pixel values of input images are manipulated to deceive the DL model. To address this limitation, our study investigates the detectability of adversarial images in oncology using multiple detection schemes. Experiments were conducted on thoracic computed tomography (CT) scans, mammography, and brain magnetic resonance imaging (MRI). For each dataset we trained a convolutional neural network to classify the presence or absence of malignancy. We trained five DL and machine learning (ML)-based detection models and tested their performance in detecting adversarial images. Adversarial images generated using projected gradient descent (PGD) with a perturbation size of 0.004 were detected by the ResNet detection model with an accuracy of 100% for CT, 100% for mammogram, and 90.0% for MRI. Overall, adversarial images were detected with high accuracy in settings where adversarial perturbation was above set thresholds. Adversarial detection should be considered alongside adversarial training as a defense technique to protect DL models for cancer imaging classification from the threat of adversarial images.more » « less
-
Phishing websites remain a persistent security threat. Thus far, machine learning approaches appear to have the best potential as defenses. But, there are two main concerns with existing machine learning approaches for phishing detection. The first is the large number of training features used and the lack of validating arguments for these feature choices. The second concern is the type of datasets used in the literature that are inadvertently biased with respect to the features based on the website URL or content. To address these concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. Accordingly, we design features that model the relationships, visual as well as statistical, of the domain name to the key elements of a phishing website, which are used to snare the end-users. The main value of our feature design is that, to bypass detection, an attacker will find it very difficult to tamper with the visual content of the phishing website without arousing the suspicion of the end user. Our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards specific datasets. We show the robustness of our learning algorithm by testing on unknown live phishing URLs and achieve a high detection accuracy of 99.7%.more » « less
-
Optical network failure management (ONFM) is a promising application of machine learning (ML) to optical networking. Typical ML-based ONFM approaches exploit historical monitored data, retrieved in a specific domain (e.g., a link or a network), to train supervised ML models and learn failure characteristics (a signature) that will be helpful upon future failure occurrence in that domain. Unfortunately, in operational networks, data availability often constitutes a practical limitation to the deployment of ML-based ONFM solutions, due to scarce availability of labeled data comprehensively modeling all possible failure types. One could purposely inject failures to collect training data, but this is time consuming and not desirable by operators. A possible solution is transfer learning (TL), i.e., training ML models on a source domain (SD), e.g., a laboratory testbed, and then deploying trained models on a target domain (TD), e.g., an operator network, possibly fine-tuning the learned models by re-training with few TD data. Moreover, in those cases when TL re-training is not successful (e.g., due to the intrinsic difference in SD and TD), another solution is domain adaptation, which consists of combining unlabeled SD and TD data before model training. We investigate domain adaptation and TL for failure detection and failure-cause identification across different lightpaths leveraging real optical SNR data. We find that for the considered scenarios, up to 20% points of accuracy increase can be obtained with domain adaptation for failure detection, while for failure-cause identification, only combining domain adaptation with model re-training provides significant benefit, reaching 4%–5% points of accuracy increase in the considered cases.more » « less
An official website of the United States government

