skip to main content

Title: A naive Bayes classifier for identifying Class II YSOs

A naive Bayes classifier for identifying Class II YSOs has been constructed and applied to a region of the Northern Galactic Plane containing 8 million sources with good quality Gaia EDR3 parallaxes. The classifier uses the five features: Gaia G-band variability, WISE mid-infrared excess, UKIDSS and 2MASS near-infrared excess, IGAPS Hα excess, and overluminosity with respect to the main sequence. A list of candidate Class II YSOs is obtained by choosing a posterior threshold appropriate to the task at hand, balancing the competing demands of completeness and purity. At a threshold posterior greater than 0.5, our classifier identifies 6504 candidate Class II YSOs. At this threshold, we find a false positive rate around 0.02 per cent and a true positive rate of approximately 87 per cent for identifying Class II YSOs. The ROC curve rises rapidly to almost one with an area under the curve around 0.998 or better, indicating the classifier is efficient at identifying candidate Class II YSOs. Our map of these candidates shows what are potentially three previously undiscovered clusters or associations. When comparing our results to published catalogues from other young star classifiers, we find between one quarter and three quarters of high probability candidates are unique to each classifier, telling us no single classifier is finding all young stars.

more » « less
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Monthly Notices of the Royal Astronomical Society
Page Range / eLocation ID:
p. 354-388
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Context. The Central Molecular Zone (CMZ), a ∼200 pc sized region around the Galactic Centre, is peculiar in that it shows a star formation rate (SFR) that is suppressed with respect to the available dense gas. To study the SFR in the CMZ, young stellar objects (YSOs) can be investigated. Here we present radio observations of 334 2.2 μm infrared sources that have been identified as YSO candidates. Aims: Our goal is to investigate the presence of centimetre wavelength radio continuum counterparts to this sample of YSO candidates which we use to constrain the current SFR in the CMZ. Methods: As part of the GLObal view on STAR formation (GLOSTAR) survey, D-configuration Very Large Array data were obtained for the Galactic Centre, covering −2° < l < 2° and −1° < b < 1° with a frequency coverage of 4-8 GHz. We matched YSOs with radio continuum sources based on selection criteria and classified these radio sources as potential H II regions and determined their physical properties. Results: Of the 334 YSO candidates, we found 35 with radio continuum counterparts. We find that 94 YSOs are associated with dense dust condensations identified in the 870 μm ATLASGAL survey, of which 14 have a GLOSTAR counterpart. Of the 35 YSOs with radio counterparts, 11 are confirmed as H II regions based on their spectral indices and the literature. We estimated their Lyman continuum photon flux in order to estimate the mass of the ionising star. Combining these with known sources, the present-day SFR in the CMZ is calculated to be ∼0.068 M⊙ yr−1, which is ∼6.8% of the Galactic SFR. Candidate YSOs that lack radio counterparts may not have yet evolved to the stage of exhibiting an H II region or, conversely, are older and have dispersed their natal clouds. Since many lack dust emission, the latter is more likely. Our SFR estimate in the CMZ is in agreement with previous estimates in the literature. 
    more » « less
  2. ABSTRACT We present two catalogues of active galactic nucleus (AGN) candidates selected from the latest data of two all-sky surveys – Data Release 2 of the Gaia mission and the unWISE catalogue of the Wide-field Infrared Survey Explorer (WISE). We train a random forest classifier to predict the probability of each source in the Gaia–unWISE joint sample being an AGN, PRF, based on Gaia astrometric and photometric measurements and unWISE photometry. The two catalogues, which we designate C75 and R85, are constructed by applying different PRF threshold cuts to achieve an overall completeness of 75 per cent (≈90 per cent at GaiaG ≤ 20 mag) and reliability of 85 per cent, respectively. The C75 (R85) catalogue contains 2734 464 (2182 193) AGN candidates across the effective 36 000 deg2 sky, of which ≈0.91 (0.52) million are new discoveries. Photometric redshifts of the AGN candidates are derived by a random forest regressor using Gaia and WISE magnitudes and colours. The estimated overall photometric redshift accuracy is 0.11. Cross-matching the AGN candidates with a sample of known bright cluster galaxies, we identify a high-probability strongly lensed AGN candidate system, SDSS J1326+4806, with a large image separation of 21${^{\prime\prime}_{.}}$06. All the AGN candidates in our catalogues will have ∼5-yr long light curves from Gaia by the end of the mission, and thus will be a great resource for AGN variability studies. Our AGN catalogues will also be helpful in AGN target selections for future spectroscopic surveys, especially those in the Southern hemisphere. The C75 catalogue can be downloaded at 
    more » « less

    Very metal-poor stars ($\rm [Fe/H] \lt -2$) in the Milky Way are fossil records of early chemical evolution and the assembly and structure of the Galaxy. However, they are rare and hard to find. Gaia DR3 has provided over 200 million low-resolution (R ≈ 50) XP spectra, which provides an opportunity to greatly increase the number of candidate metal-poor stars. In this work, we utilize the XGBoost classification algorithm to identify ∼200 000 very metal-poor star candidates. Compared to past work, we increase the candidate metal-poor sample by about an order of magnitude, with comparable or better purity than past studies. First, we develop three classifiers for bright stars (BP < 16). They are Classifier-T (for Turn-off stars), Classifier-GC (for Giant stars with high completeness), and Classifier-GP (for Giant stars with high purity) with expected purity of 52 per cent/45 per cent/76 per cent and completeness of 32 per cent/93 per cent/66 per cent, respectively. These three classifiers obtained a total of 11 000/111 000/44 000 bright metal-poor candidates. We apply model-T and model-GP on faint stars (BP > 16) and obtain 38 000/41 000 additional metal-poor candidates with purity 29 per cent/52 per cent, respectively. We make our metal-poor star catalogues publicly available, for further exploration of the metal-poor Milky Way.

    more » « less

    Carbon-enhanced metal-poor (CEMP) stars comprise almost a third of stars with [Fe/H] < −2, although their origins are still poorly understood. It is highly likely that one sub-class (CEMP-s stars) is tied to mass-transfer events in binary stars, while another sub-class (CEMP-no stars) are enriched by the nucleosynthetic yields of the first generations of stars. Previous studies of CEMP stars have primarily concentrated on the Galactic halo, but more recently they have also been detected in the thick disc and bulge components of the Milky Way. Gaia DR3 has provided an unprecedented sample of over 200 million low-resolution (R ≈ 50) spectra from the BP and RP photometers. Training on the CEMP catalogue from the SDSS/SEGUE database, we use XGBoost to identify the largest all-sky sample of CEMP candidate stars to date. In total, we find 58 872 CEMP star candidates, with an estimated contamination rate of 12 per cent. When comparing to literature high-resolution catalogues, we positively identify 60–68 per cent of the CEMP stars in the data, validating our results and indicating a high completeness rate. Our final catalogue of CEMP candidates spans from the inner to outer Milky Way, with distances as close as r ∼ 0.8 kpc from the Galactic centre, and as far as r > 30 kpc. Future higher resolution spectroscopic follow-up of these candidates will provide validations of their classification and enable investigations of the frequency of CEMP-s and CEMP-no stars throughout the Galaxy, to further constrain the nature of their progenitors.

    more » « less

    The evolutionary sequence for high-mass star formation starts with massive starless clumps that go on to form protostellar, young stellar objects and then compact H ii regions. While there are many examples of the three later stages, the very early stages have proved to be elusive. We follow-up a sample of 110 mid-infrared dark clumps selected from the ATLASGAL catalogue with the IRAM telescope in an effort to identify a robust sample of massive starless clumps. We have used the HCO+ and HNC (1-0) transitions to identify clumps associated with infall motion and the SiO (2-1) transition to identity outflow candidates. We have found blue asymmetric line profile in 65 per cent of the sample, and have measured the infall velocities and mass infall rates (0.6–36 × 10−3 M⊙ yr−1) for 33 of these clumps. We find a trend for the mass infall rate decreasing with an increase of bolometric luminosity to clump mass, i.e. star formation within the clumps evolves. Using the SiO 2-1 line, we have identified good outflow candidates. Combining the infall and outflow tracers reveals that 67 per cent of quiescent clumps are already undergoing gravitational collapse or are associated with star formation; these clumps provide us with our best opportunity to determine the initial conditions and study the earliest stages of massive star formation. Finally, we provide an overview of a systematic high-resolution ALMA study of quiescent clumps selected that allows us to develop a detailed understanding of earliest stages and their subsequent evolution.

    more » « less