skip to main content

Title: Knee Osteoarthritis Classification Using 3D CNN and MRI
Osteoarthritis (OA) is the most common form of arthritis and can often occur in the knee. While convolutional neural networks (CNNs) have been widely used to study medical images, the application of a 3-dimensional (3D) CNN in knee OA diagnosis is limited. This study utilizes a 3D CNN model to analyze sequences of knee magnetic resonance (MR) images to perform knee OA classification. An advantage of using 3D CNNs is the ability to analyze the whole sequence of 3D MR images as a single unit as opposed to a traditional 2D CNN, which examines one image at a time. Therefore, 3D features could be extracted from adjacent slices, which may not be detectable from a single 2D image. The input data for each knee were a sequence of double-echo steady-state (DESS) MR images, and each knee was labeled by the Kellgren and Lawrence (KL) grade of severity at levels 0–4. In addition to the 5-category KL grade classification, we further examined a 2-category classification that distinguishes non-OA (KL ≤ 1) from OA (KL ≥ 2) knees. Clinically, diagnosing a patient with knee OA is the ultimate goal of assigning a KL grade. On a dataset with 1100 knees, the 3D more » CNN model that classifies knees with and without OA achieved an accuracy of 86.5% on the validation set and 83.0% on the testing set. We further conducted a comparative study between MRI and X-ray. Compared with a CNN model using X-ray images trained from the same group of patients, the proposed 3D model with MR images achieved higher accuracy in both the 5-category classification (54.0% vs. 50.0%) and the 2-category classification (83.0% vs. 77.0%). The result indicates that MRI, with the application of a 3D CNN model, has greater potential to improve diagnosis accuracy for knee OA clinically than the currently used X-ray methods. « less
Authors:
; ;
Award ID(s):
1723420
Publication Date:
NSF-PAR ID:
10280994
Journal Name:
Applied Sciences
Volume:
11
Issue:
11
Page Range or eLocation-ID:
5196
ISSN:
2076-3417
Sponsoring Org:
National Science Foundation
More Like this
  1. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do notmore »have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA.« less
  2. Abstract We present morphological classifications of ∼27 million galaxies from the Dark Energy Survey (DES) Data Release 1 (DR1) using a supervised deep learning algorithm. The classification scheme separates: (a) early-type galaxies (ETGs) from late-types (LTGs), and (b) face-on galaxies from edge-on. Our Convolutional Neural Networks (CNNs) are trained on a small subset of DES objects with previously known classifications. These typically have mr ≲ 17.7mag; we model fainter objects to mr < 21.5 mag by simulating what the brighter objects with well determined classifications would look like if they were at higher redshifts. The CNNs reach 97% accuracy to mr < 21.5 on their training sets, suggesting that they are able to recover features more accurately than the human eye. We then used the trained CNNs to classify the vast majority of the other DES images. The final catalog comprises five independent CNN predictions for each classification scheme, helping to determine if the CNN predictions are robust or not. We obtain secure classifications for ∼ 87% and 73% of the catalog for the ETG vs. LTG and edge-on vs. face-on models, respectively. Combining the two classifications (a) and (b) helps to increase the purity of the ETG sample andmore »to identify edge-on lenticular galaxies (as ETGs with high ellipticity). Where a comparison is possible, our classifications correlate very well with Sérsic index (n), ellipticity (ε) and spectral type, even for the fainter galaxies. This is the largest multi-band catalog of automated galaxy morphologies to date.« less
  3. The segmentation of the ventricular wall and the blood pool in cardiac magnetic resonance imaging (MRI) has been inves- tigated for decades, given its important role for delineation of cardiac functioning and diagnosis of heart diseases. One of the major challenges is that the inner epicardium boundary is not always visible in the image domain, due to the mix- ture of blood and muscle structures, especially at the end of contraction, or systole. To address it, we propose a novel ap- proach for the cardiac segmentation in the short-axis (SAX) MRI: coupled deep neural networks and deformable models. First, a 2D U-Net is adopted for each magnetic resonance (MR) slice, and a 3D U-Net refines the segmentation results along the temporal dimension. Then, we propose a multi- component deformable model to extract accurate contours for both endo- and epicardium with global and local constraints. Finally, a partial blood classification is explored to estimate the presence of boundary pixels near the trabeculae and solid wall, and to avoid moving the endocardium boundary inward. Quantitative evaluation demonstrates the high accuracy, ro- bustness, and efficiency of our approach for the slices ac- quired at different locations and different cardiac phases.
  4. In recent decades, computer vision has proven remarkably effective in addressing diverse issues in public health, from determining the diagnosis, prognosis, and treatment of diseases in humans to predicting infectious disease outbreaks. Here, we investigate whether convolutional neural networks (CNNs) can also demonstrate effectiveness in classifying the environmental stages of parasites of public health importance and their invertebrate hosts. We used schistosomiasis as a reference model. Schistosomiasis is a debilitating parasitic disease transmitted to humans via snail intermediate hosts. The parasite affects more than 200 million people in tropical and subtropical regions. We trained our CNN, a feed-forward neural network, on a limited dataset of 5,500 images of snails and 5,100 images of cercariae obtained from schistosomiasis transmission sites in the Senegal River Basin, a region in western Africa that is hyper-endemic for the disease. The image set included both images of two snail genera that are relevant to schistosomiasis transmission – that is, Bulinus spp. and Biomphalaria pfeifferi – as well as snail images that are non-component hosts for human schistosomiasis. Cercariae shed from Bi. pfeifferi and Bulinus spp. snails were classified into 11 categories, of which only two, S. haematobium and S. mansoni , are major etiological agentsmore »of human schistosomiasis. The algorithms, trained on 80% of the snail and parasite dataset, achieved 99% and 91% accuracy for snail and parasite classification, respectively, when used on the hold-out validation dataset – a performance comparable to that of experienced parasitologists. The promising results of this proof-of-concept study suggests that this CNN model, and potentially similar replicable models, have the potential to support the classification of snails and parasite of medical importance. In remote field settings where machine learning algorithms can be deployed on cost-effective and widely used mobile devices, such as smartphones, these models can be a valuable complement to laboratory identification by trained technicians. Future efforts must be dedicated to increasing dataset sizes for model training and validation, as well as testing these algorithms in diverse transmission settings and geographies.« less
  5. In this paper, we quantify the joint acoustic emissions (JAEs) from the knees of children with juvenile idiopathic arthritis (JIA) and support their use as a novel biomarker of the disease. JIA is the most common rheumatic disease of childhood; it has a highly variable presentation, and few reliable biomarkers which makes diagnosis and personalization of care difficult. The knee is the most commonly affected joint with hallmark synovitis and inflammation that can extend to damage the underlying cartilage and bone. During movement of the knee, internal friction creates JAEs that can be non-invasively measured. We hypothesize that these JAEs contain clinically relevant information that could be used for the diagnosis and personalization of treatment of JIA. In this study, we record and compare the JAEs from 25 patients with JIA−10 of whom were recorded a second time 3–6 months later—and 18 healthy age- and sex-matched controls. We compute signal features from each of those record cycles of flexion/extension and train a logistic regression classification model. The model classified each cycle as having JIA or being healthy with 84.4% accuracy using leave-one-subject-out cross validation (LOSO-CV). When assessing the full JAE recording of a subject (which contained at least 8 cyclesmore »of flexion/extension), a majority vote of the cycle labels accurately classified the subjects as having JIA or being healthy 100% of the time. Using the output probabilities of a JIA class as a basis for a joint health score and test it on the follow-up patient recordings. In all 10 of our 6-week follow-up recordings, the score accurately tracked with successful treatment of the condition. Our proposed JAE-based classification model of JIA presents a compelling case for incorporating this novel joint health assessment technique into the clinical work-up and monitoring of JIA.« less