skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Transfer learning for galaxy feature detection: Finding giant star-forming clumps in low-redshift galaxies using Faster Region-based Convolutional Neural Network
Abstract Giant star-forming clumps (GSFCs) are areas of intensive star-formation that are commonly observed in high-redshift (z ≳ 1) galaxies but their formation and role in galaxy evolution remain unclear. Observations of low-redshift clumpy galaxy analogues are rare but the availability of wide-field galaxy survey data makes the detection of large clumpy galaxy samples much more feasible. Deep Learning (DL), and in particular Convolutional Neural Networks (CNNs), have been successfully applied to image classification tasks in astrophysical data analysis. However, one application of DL that remains relatively unexplored is that of automatically identifying and localizing specific objects or features in astrophysical imaging data. In this paper, we demonstrate the use of DL-based object detection models to localize GSFCs in astrophysical imaging data. We apply the Faster Region-based Convolutional Neural Network object detection framework (FRCNN) to identify GSFCs in low-redshift (z ≲ 0.3) galaxies. Unlike other studies, we train different FRCNN models on observational data that was collected by the Sloan Digital Sky Survey and labelled by volunteers from the citizen science project ‘Galaxy Zoo: Clump Scout’. The FRCNN model relies on a CNN component as a ‘backbone’ feature extractor. We show that CNNs, that have been pre-trained for image classification using astrophysical images, outperform those that have been pre-trained on terrestrial images. In particular, we compare a domain-specific CNN – ‘Zoobot’ – with a generic classification backbone and find that Zoobot achieves higher detection performance. Our final model is capable of producing GSFC detections with a completeness and purity of ≥0.8 while only being trained on ∼5000 galaxy images.  more » « less
Award ID(s):
2006894
PAR ID:
10501020
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
RAS Techniques and Instruments
Volume:
3
Issue:
1
ISSN:
2752-8200
Format(s):
Medium: X Size: p. 174-197
Size(s):
p. 174-197
Sponsoring Org:
National Science Foundation
More Like this
  1. Zelinski, Michael E.; Taha, Tarek M.; Howe, Jonathan (Ed.)
    Image classification forms an important class of problems in machine learning and is widely used in many realworld applications, such as medicine, ecology, astronomy, and defense. Convolutional neural networks (CNNs) are machine learning techniques designed for inputs with grid structures, e.g., images, whose features are spatially correlated. As such, CNNs have been demonstrated to be highly effective approaches for many image classification problems and have consistently outperformed other approaches in many image classification and object detection competitions. A particular challenge involved in using machine learning for classifying images is measurement data loss in the form of missing pixels, which occurs in settings where scene occlusions are present or where the photodetectors in the imaging system are partially damaged. In such cases, the performance of CNN models tends to deteriorate or becomes unreliable even when the perturbations to the input image are small. In this work, we investigate techniques for improving the performance of CNN models for image classification with missing data. In particular, we explore training on a variety of data alterations that mimic data loss for producing more robust classifiers. By optimizing the categorical cross-entropy loss function, we demonstrate through numerical experiments on the MNIST dataset that training with these synthetic alterations can enhance the classification accuracy of our CNN models. 
    more » « less
  2. Deep learning algorithms are exceptionally valuable tools for collecting and analyzing the catastrophic readiness and countless actionable flood data. Convolutional neural networks (CNNs) are one form of deep learning algorithms widely used in computer vision which can be used to study flood images and assign learnable weights to various objects in the image. Here, we leveraged and discussed how connected vision systems can be used to embed cameras, image processing, CNNs, and data connectivity capabilities for flood label detection. We built a training database service of >9000 images (image annotation service) including the image geolocation information by streaming relevant images from social media platforms, Department of Transportation (DOT) 511 traffic cameras, the US Geological Survey (USGS) live river cameras, and images downloaded from search engines. We then developed a new python package called “FloodImageClassifier” to classify and detect objects within the collected flood images. “FloodImageClassifier” includes various CNNs architectures such as YOLOv3 (You look only once version 3), Fast R–CNN (Region-based CNN), Mask R–CNN, SSD MobileNet (Single Shot MultiBox Detector MobileNet), and EfficientDet (Efficient Object Detection) to perform both object detection and segmentation simultaneously. Canny Edge Detection and aspect ratio concepts are also included in the package for flood water level estimation and classification. The pipeline is smartly designed to train a large number of images and calculate flood water levels and inundation areas which can be used to identify flood depth, severity, and risk. “FloodImageClassifier” can be embedded with the USGS live river cameras and 511 traffic cameras to monitor river and road flooding conditions and provide early intelligence to emergency response authorities in real-time. 
    more » « less
  3. e apply a new deep learning technique to detect, classify, and deblend sources in multi-band astronomical images. We train and evaluate the performance of an artificial neural network built on the Mask R-CNN image processing framework, a general code for efficient object detection, classification, and instance segmentation. After evaluating the performance of our network against simulated ground truth images for star and galaxy classes, we find a precision of 92% at 80% recall for stars and a precision of 98% at 80% recall for galaxies in a typical field with ∼30 galaxies/arcmin2. We investigate the deblending capability of our code, and find that clean deblends are handled robustly during object masking, even for significantly blended sources. This technique, or extensions using similar network architectures, may be applied to current and future deep imaging surveys such as LSST and WFIRST. Our code, Astro R-CNN, is publicly available at https://github.com/burke86/astro_rcnn 
    more » « less
  4. ABSTRACT We compare the two largest galaxy morphology catalogues, which separate early- and late-type galaxies at intermediate redshift. The two catalogues were built by applying supervised deep learning (convolutional neural networks, CNNs) to the Dark Energy Survey data down to a magnitude limit of ∼21 mag. The methodologies used for the construction of the catalogues include differences such as the cutout sizes, the labels used for training, and the input to the CNN – monochromatic images versus gri-band normalized images. In addition, one catalogue is trained using bright galaxies observed with DES (i < 18), while the other is trained with bright galaxies (r < 17.5) and ‘emulated’ galaxies up to r-band magnitude 22.5. Despite the different approaches, the agreement between the two catalogues is excellent up to i < 19, demonstrating that CNN predictions are reliable for samples at least one magnitude fainter than the training sample limit. It also shows that morphological classifications based on monochromatic images are comparable to those based on gri-band images, at least in the bright regime. At fainter magnitudes, i > 19, the overall agreement is good (∼95 per cent), but is mostly driven by the large spiral fraction in the two catalogues. In contrast, the agreement within the elliptical population is not as good, especially at faint magnitudes. By studying the mismatched cases, we are able to identify lenticular galaxies (at least up to i < 19), which are difficult to distinguish using standard classification approaches. The synergy of both catalogues provides an unique opportunity to select a population of unusual galaxies. 
    more » « less
  5. null (Ed.)
    Abstract We present morphological classifications of ∼27 million galaxies from the Dark Energy Survey (DES) Data Release 1 (DR1) using a supervised deep learning algorithm. The classification scheme separates: (a) early-type galaxies (ETGs) from late-types (LTGs), and (b) face-on galaxies from edge-on. Our Convolutional Neural Networks (CNNs) are trained on a small subset of DES objects with previously known classifications. These typically have mr ≲ 17.7mag; we model fainter objects to mr < 21.5 mag by simulating what the brighter objects with well determined classifications would look like if they were at higher redshifts. The CNNs reach 97% accuracy to mr < 21.5 on their training sets, suggesting that they are able to recover features more accurately than the human eye. We then used the trained CNNs to classify the vast majority of the other DES images. The final catalog comprises five independent CNN predictions for each classification scheme, helping to determine if the CNN predictions are robust or not. We obtain secure classifications for ∼ 87% and 73% of the catalog for the ETG vs. LTG and edge-on vs. face-on models, respectively. Combining the two classifications (a) and (b) helps to increase the purity of the ETG sample and to identify edge-on lenticular galaxies (as ETGs with high ellipticity). Where a comparison is possible, our classifications correlate very well with Sérsic index (n), ellipticity (ε) and spectral type, even for the fainter galaxies. This is the largest multi-band catalog of automated galaxy morphologies to date. 
    more » « less