<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>The NANOGrav 11 yr Data Set: Evolution of Gravitational-wave Background Statistics</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>02/20/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10315469</idno>
					<idno type="doi">10.3847/1538-4357/ab68db</idno>
					<title level='j'>The Astrophysical Journal</title>
<idno>1538-4357</idno>
<biblScope unit="volume">890</biblScope>
<biblScope unit="issue">2</biblScope>					

					<author>J. S. Hazboun</author><author>J. Simon</author><author>S. R. Taylor</author><author>M. T. Lam</author><author>S. J. Vigeland</author><author>K. Islo</author><author>J. S. Key</author><author>Z. Arzoumanian</author><author>P. T. Baker</author><author>A. Brazier</author><author>P. R. Brook</author><author>S. Burke-Spolaor</author><author>S. Chatterjee</author><author>J. M. Cordes</author><author>N. J. Cornish</author><author>F. Crawford</author><author>K. Crowter</author><author>H. T. Cromartie</author><author>M. DeCesar</author><author>P. B. Demorest</author><author>T. Dolch</author><author>J. A. Ellis</author><author>R. D. Ferdman</author><author>E. Ferrara</author><author>E. Fonseca</author><author>N. Garver-Daniels</author><author>P. Gentile</author><author>D. Good</author><author>A. M. Holgado</author><author>E. A. Huerta</author><author>R. Jennings</author><author>G. Jones</author><author>M. L. Jones</author><author>A. R. Kaiser</author><author>D. L. Kaplan</author><author>L. Z. Kelley</author><author>T. J. Lazio</author><author>L. Levin</author><author>A. N. Lommen</author><author>D. R. Lorimer</author><author>J. Luo</author><author>R. S. Lynch</author><author>D. R. Madison</author><author>M. A. McLaughlin</author><author>S. T. McWilliams</author><author>C. M. Mingarelli</author><author>C. Ng</author><author>D. J. Nice</author><author>T. T. Pennucci</author><author>N. S. Pol</author><author>S. M. Ransom</author><author>P. S. Ray</author><author>X. Siemens</author><author>R. Spiewak</author><author>I. H. Stairs</author><author>D. R. Stinebring</author><author>K. Stovall</author><author>J. Swiggum</author><author>J. E. Turner</author><author>M. Vallisneri</author><author>R. van Haasteren</author><author>C. A. Witt</author><author>W. W. Zhu</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[An ensemble of inspiraling supermassive black hole binaries should produce a stochastic background of very low frequency gravitational waves. This stochastic background is predicted to be a power law, with a gravitationalwave strain spectral index of -2/3, and it should be detectable by a network of precisely timed millisecond pulsars, widely distributed on the sky. This paper reports a new "time slicing" analysis of the 11 yr data release from the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) using 34 millisecond pulsars. Methods to flag potential "false-positive" signatures are developed, including techniques to identify]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Pulsar timing arrays (PTAs; <ref type="bibr">Sazhin 1978;</ref><ref type="bibr">Detweiler 1979;</ref><ref type="bibr">Foster &amp; Backer 1990</ref>) are poised to detect the stochastic background of gravitational waves (GWs) from a population of supermassive binary black holes (SMBBHs) within approximately the next 5 yr <ref type="bibr">(Siemens et al. 2013;</ref><ref type="bibr">Rosado et al. 2015;</ref><ref type="bibr">Taylor et al. 2016;</ref><ref type="bibr">Kelley et al. 2017)</ref>. There are three PTA collaborations that have been in operation for over a decade: the North American Observatory for Gravitational Waves (NANOGrav; McLaughlin 2013), the European Pulsar Timing Array <ref type="bibr">(Desvignes et al. 2016)</ref>, and the Parkes Pulsar Timing Array <ref type="bibr">(Hobbs 2013)</ref>. A number of emerging collaborations, including the Chinese Pulsar Timing Array <ref type="bibr">(Lee 2016)</ref>, the Indian Pulsar Timing Array <ref type="bibr">(Joshi et al. 2018)</ref>, and telescopecentered timing groups such as MeerTime <ref type="bibr">(Bailes et al. 2018)</ref> and CHIME/Pulsar <ref type="bibr">(Ng 2018)</ref>, all have a component of their programs directed toward nanohertz GW detection and characterization. Together with the more established PTAs, these groups form the International Pulsar Timing Array Collaboration <ref type="bibr">(Verbiest et al. 2016)</ref>.</p><p>The NANOGrav Collaboration has so far released four data sets based on, respectively, 5 yr of precision pulsar timing observations <ref type="bibr">(Demorest et al. 2013, hereafter NG5a)</ref>, 9 yr of observations <ref type="bibr">(Arzoumanian et al. 2015, hereafter NG9a)</ref>,11yr of observations <ref type="bibr">(Arzoumanian et al. 2018b</ref>, hereafter NG11a), and 12.5 yr of observations (M. <ref type="bibr">Alam et al. 2020, in preparation)</ref>. <ref type="foot">42</ref> The present analysis was carried out on NG11a, since the newest data release has only recently been available.</p><p>The dominant signal expected at nanohertz GW frequencies (where the regime of sensitivity is set by the cadence &#916;t and total baseline T total of the pulsar time-series sampling: 1/T total &lt; f&#61600;&lt;&#61600;1/2&#916;t) is the stochastic background of GWs from an SMBBH population (see, e.g., <ref type="bibr">Burke-Spolaor et al. 2019)</ref>. There are several models in the literature that predict the amplitude and spectral shape of this GW background (GWB; e.g., <ref type="bibr">Sesana 2013;</ref><ref type="bibr">McWilliams et al. 2014;</ref><ref type="bibr">Simon &amp; Burke-Spolaor 2016;</ref><ref type="bibr">Kelley et al. 2017;</ref><ref type="bibr">Chen et al. 2019)</ref>. These models employ a range of galaxy surveys, galaxy evolution scenarios, and simulations to identify the most likely demographics of SMBBHs detectable by PTAs. Recent results from NANOGrav <ref type="bibr">(Arzoumanian et al. 2018a, hereafter NG11b)</ref> and other PTAs <ref type="bibr">(Lentati et al. 2015;</ref><ref type="bibr">Shannon et al. 2015)</ref> have reported constraints on the GWB characteristic strain amplitude that intersect astrophysically interesting regions of SMBBH parameter space. Using techniques developed in <ref type="bibr">Sampson et al. (2015)</ref>, <ref type="bibr">Simon &amp; Burke-Spolaor (2016)</ref>, and <ref type="bibr">Taylor et al. (2017b)</ref>, the 11 yr data set was used to constrain the relationship of supermassive black hole masses to that of their host galaxies, as well as galactic center environments that may influence the final parsec of binary dynamical evolution.</p><p>Most searches for the GWB rightly focus on the most recent data set, first searching for and then, in the absence of a signal, setting upper limits (ULs) on the GWB. These results are often juxtaposed to earlier work from shorter data sets to illustrate the gains in sensitivity of these galactic-scale GW detectors. Here we analyze the past evolution of our statistics by slicing the NG11a data set in time and performing the analyses from NG11b on each slice. This allows us to characterize the growth of NANOGrav's GW sensitivity as a function of time, as well as diagnose previously unmodeled noise processes. With regard to the latter, in this article we discuss how a noise transient produced a high-significance false positive during the time span between the release of NG5a and NG9a. Understanding this spurious signal and tracking down the pulsars from which it originates will be the subject of most of this paper.</p><p>The paper is organized as follows. In Section 2 we discuss our methods for obtaining the pulsar timing data used in this study, and in Section 3 we review our data analysis methods, including a detailed introduction to our noise models. In Section 4 we discuss the motivation for understanding the evolution of our GW statistics and introduce theoretical models for the evolution of that signal. In Section 5 we present the results of the initial time-slice analysis, including anomalous evidence for GWs in out data set. We then turn, in Section 6,to identifying which pulsars are responsible for this behavior and elucidate the various data analysis methods, noise models, and other mitigation strategies used to understand and remove it. In Section 7 we connect the shallow spectral index of the common process in early slice analyses to the first interstellar medium (ISM) event in PSR J1713+0747 and show how models extant in the literature can mitigate this behavior. And finally, in Section 8 we conclude with a summary of the issues encountered in this analysis and possible paths forward for future PTA noise mitigation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The 11 yr Data Set</head><p>The NANOGrav 11 yr data set contains observations of 45 pulsars made between 2004 and 2015. Details of the observations and pulsars can be found in NG11a. We briefly describe the data set here.</p><p>We made observations using two radio telescopes: the 100 m Robert C.&#61600;Byrd Green Bank Telescope (GBT) of the Green Bank Observatory in Green Bank, West Virginia, and the 305 m William E. Gordon Telescope (Arecibo) of Arecibo Observatory in Arecibo, Puerto Rico. Since Arecibo is more sensitive than the GBT, all pulsars that can be observed from Arecibo (0&#176;&#61600;&lt;&#61600;&#948;&#61600;&lt;&#61600;39&#176;) were observed with it, while those outside Arecibo's decl. range were observed with the GBT. Two pulsars were observed with both telescopes: PSR J1713 +0747 and PSR B1937+21. We observed most pulsars once a month. In addition, we started a high-cadence observing campaign in 2013, in which we made weekly observations of two pulsars with the GBT (PSR J1713+0747 and PSR J1909 -3744) and five pulsars with Arecibo (PSR J0030+0451, PSR J1640+2224, PSR J1713+0747, PSR J2043+1711, and PSR J2317+1439).</p><p>At the GBT, the monthly observations used the 820 MHz and 1.4 GHz receivers, while weekly observations used only the 1.4 GHz receiver. At Arecibo, pulsars were observed with two of four possible receivers (327 MHz, 430 MHz, 1.4 GHz, and 2.3 GHz), though always including the 1.4 GHz receiver. Back-end instrumentation was upgraded about midway through our project from the ASP and GASP systems, which had bandwidths of 64 MHz, to the wideband systems PUPPI and GUPPI, processing up to 800 MHz for certain receivers <ref type="bibr">(DuPlain et al. 2008)</ref>.</p><p>For each pulsar, the observed times of arrival (TOAs) were fit to a timing model that described the pulsar's spin period and spin period derivative, sky location, proper motion, and distance. To this model were added a number of parameters that describe the radio-frequency-dependent behavior of the pulse arrival times. Additionally, for those pulsars in binaries the timing model also includes five Keplerian parameters that described the binary orbit, as well as additional post-Keplerian parameters that described relativistic binary effects if they statistically improved the timing fit.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data Analysis Methods</head><p>The analysis techniques in this work largely follow the stochastic signal procedures in NG11b. The JPL solar system ephemeris DE436 <ref type="bibr">(Folkner &amp; Park 2016)</ref> was used along with the TT(BIPM2016) timescale. Our model likelihood is based on pulsar timing residuals, constructed for each pulsar &#948;t as</p><p>Tb describes noise contributions modeled with Gaussian processes, including uncertainties in the pulsar timing model and low-frequency time-correlated (red) noise, t describes white noise (WN), and s describes residuals induced by a GWB, also modeled with a Gaussian process.</p><p>The WN is modeled using the rms template-fitting errors for the TOAs. These are inflated using additional pieces, one added in quadrature (EQUAD), and a multiplicative factor (EFAC),</p><p>In practice, we build the WN correlation matrix by adding these diagonal contributions to the off-diagonal pieces, which model the correlated WN between TOAs observed in different subbands during the same observation (ECORR; <ref type="bibr">Arzoumanian et al. 2014)</ref>.</p><p>The standard likelihood for GW analysis with PTAs is well documented in the literature <ref type="bibr">(Demorest et al. 2013;</ref><ref type="bibr">Lentati et al. 2013</ref><ref type="bibr">Lentati et al. , 2014</ref><ref type="bibr">Lentati et al. , 2016</ref><ref type="bibr">Lentati et al. , 2015;;</ref><ref type="bibr">van Haasteren &amp; Levin 2013;</ref><ref type="bibr">Shannon et al. 2015;</ref><ref type="bibr">van Haasteren &amp; Vallisneri 2015;</ref><ref type="bibr">Arzoumanian et al. 2016;</ref><ref type="bibr">NG11b;</ref><ref type="bibr">Taylor et al. 2017a</ref>). Here we focus on a more detailed introduction to the types of timecorrelated (red) noise models we use for the analyses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Red-noise Models and the GWB</head><p>The precision TOAs of radio pulses from millisecond pulsars have been used to measure a myriad of astrophysical interactions. Perhaps most famous is the observation of a negative binary period derivative accurately explained by the emission of GWs in the context of general relativity <ref type="bibr">(Taylor &amp; Weisberg 1982)</ref>. Pulsar timing measurements are also responsible for the first detection of an extrasolar planet <ref type="bibr">(Wolszczan &amp; Frail 1992)</ref> and are used to monitor the content and movement of the galactic ionized ISM (e.g., <ref type="bibr">Keith et al. 2013;</ref><ref type="bibr">Jones et al. 2017)</ref>. In fact, observations from many pulsars have been put together to map the ISM content of the galaxy <ref type="bibr">(Cordes &amp; Lazio 2002;</ref><ref type="bibr">Yao et al. 2017)</ref>. Lensing events from the ISM can also be monitored using pulsars, and a recent reoccurrence of an apparent lensing event in PSR J1713+0747 has been studied extensively in <ref type="bibr">Lam et al. (2018b)</ref>.</p><p>From the perspective of a search for GWs in pulsar timing data, these astrophysical interactions are considered sources of noise, i.e., they must be removed or mitigated in order to detect a GW. The deterministic processes are modeled by a pulsar timing model; however, in order to account for the stochastic astrophysical signals, various types of models are used in our GW search analyses <ref type="bibr">(Lentati et al. 2016;</ref><ref type="bibr">Lam et al. 2018b;</ref><ref type="bibr">Madison et al. 2019)</ref>.</p><p>Since the GWB manifests as a low-frequency, timecorrelated stochastic process (a "red" spectrum), it is especially important to model astrophysical noise sources that leave a similar signature in our timing residuals. In our analyses both the GWB and the red noise intrinsic to a pulsar's line of sight are built with the same types of models. Most commonly they are built using a normal-kernel Gaussian process in a Fourier basis with a power-law prior <ref type="bibr">(Williams &amp; Rasmussen 2006;</ref><ref type="bibr">van Haasteren &amp; Vallisneri 2014;</ref><ref type="bibr">Lentati et al. 2016)</ref>,</p><p>as a power spectral density. Here we work in terms of the timing residual spectral index, rather than GW strain, for convenience. The spectral index in strain (&#945;) is related to &#947; by &#947;&#61600;=&#61600;3&#61600;-&#61600;2&#945;. The spectral index parameter &#947; prior is restricted from 0 to 7, meaning that the model must be either "red" (higher power at lower frequencies) or "flat"("white"), i.e., &#947;&#61600;=&#61600;0. The signal spectrum is then built using a Fourier basis from N frequencies (30 frequencies in most of the analyses presented here). Usually these frequencies are linearly spaced from 1/T to 30/T, T being the time span of the full data set, in this case 11.4 yr. The prior used in the Gaussian process is an Ansatz for the type of time-correlated process that one expects to find in the residuals and describes the power spectral density of that stochastic process modeled in the frequency domain.</p><p>The same frequency domain describes the GWB and non-GW red-noise sources and spans the nanohertz regime. Using a power law is the simplest model; however, a few more complex models have been used in the literature, including a turnover model, a free spectral model, and trained Gaussian process models <ref type="bibr">(Lentati et al. 2013;</ref><ref type="bibr">Taylor et al. 2013)</ref>. The free spectral model is the most generic model for a time-correlated stochastic process. This allows for a different coefficient for the Fourier basis at each frequency and is not restricted by any model for the power spectral density. While this model is very flexible, it incurs a large Occam penalty since it involves a large parameter volume. As in <ref type="bibr">Arzoumanian et al. (2016)</ref> and NG11b, the models used to search for the GWB and mitigate noise in individual pulsar data sets are not dependent on the radio frequency of the TOAs; hence, we refer to these as "achromatic" red-noise models. <ref type="foot">43</ref>The flagship Bayesian analysis for a PTA GW search includes the Hellings-Downs (HD) spatial correlations <ref type="bibr">(Hellings &amp; Downs 1983)</ref>. In practice, these analyses are often the final analysis completed since the nondiagonal correlation matrix inversion is computationally expensive in the Bayesian framework. The HD correlation Bayesian search takes advantage of the spatial correlations between pulsars and the time correlations due to the GWB, modeled as an achromatic red-noise Gaussian process. In the weak signal regime, the autocorrelations within pulsars are a reasonable first estimate for a correlated stochastic background and have the same spectral content. In much of this manuscript we discuss the latter type of search for a common red-noise process, since we will need to run numerous iterations of search types over the 18 time slices we have made.</p><p>As mentioned above, the models used for the GWB and other time-correlated processes particular to the pulsar lines of sight are very similar. Historically, the usual model for red noise intrinsic to a pulsar, as well as its line of sight, is modeled with a power law with varying amplitude and spectral index (&#947;). In the most common GW analyses, the common noise process caused by the GWB is also modeled as a power law; however, the spectral index is set to that expected for a GWB, &#947;&#61600;=&#61600;13/3. However, as we will see, it is also informative to allow the spectral index to vary when searching for the GWB. Effectively, these models are identical, but for each pulsar the model for intrinsic red noise has its own set of parameters, and no spatial correlations between other pulsars are considered. Unlike the power-law model, the free spectral model allows one to analyze noise independently at multiple frequencies, and as we will see, this can help to disentangle degeneracy between the noise process unique to the pulsar and the GWB. Additionally, the other functional forms of the spectral models mentioned above can be used for both pulsar red noise and the GWB.</p><p>"Chromatic" (radio-frequency-dependent) versions of the noise models, most often modeling dispersion with a 1/&#957; 2 -dependence on radio frequency, can be found in the literature <ref type="bibr">(Lee et al. 2014;</ref><ref type="bibr">Caballero et al. 2016;</ref><ref type="bibr">Lentati et al. 2016;</ref><ref type="bibr">Reardon et al. 2016</ref>) and were recently used in <ref type="bibr">Lam et al. (2018b)</ref> regarding the previously mentioned ISM events in the timing of PSR&#61600;J1713+0747. The standard NANOGrav analysis has not included these types of noise models, instead using a piecewise binning of dispersion measure (DM; the integrated line-of-sight electron density causing the 1/&#957; 2 dependence in the arrival times) fluctuations, called DMX, implemented as part of the timing model using the pulsar timing software TEMPO/TEMPO2 (NG11a). This method works well at describing broadband DM fluctuations that have a timescale longer than individual DMX bins (1 week; <ref type="bibr">Jones et al. 2017)</ref>. There are a significant number of possible chromatic effects (e.g., <ref type="bibr">Cordes &amp; Shannon 2010;</ref><ref type="bibr">Lam et al. 2018a)</ref>, primarily due to radio propagation through the ISM, which need to be modeled appropriately <ref type="bibr">(Shannon &amp; Cordes 2017)</ref>. The total chromatic noise assuming misestimation of DM was performed by <ref type="bibr">Lam et al. (2017)</ref> on NG9a.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Slicing the Data Set</head><p>Here we use the methods of NG11b to analyze various slices of the NANOGrav 11 yr data set presented in NG11a. The data set was partitioned by setting Modified Julian Date (MJD) cutoffs in 6-month increments after an initial 3 yr span. Three years is the nominal length of individual pulsar data sets used in NG11b, and hence it is adopted here as the minimal time span of data needed to do a worthwhile analysis. The slices were cumulative, adding 6 months of data at a time. In order to understand the noise evolution in the pulsars, we performed single-pulsar noise analyses at every slice, where all of the WN and red-noise parameters are allowed to vary in a Bayesian analysis. This follows from the general philosophy throughout this investigation-use the information known at the time of each slice to do the analysis. The WN maximum likelihood values from these analyses were then used to set the WN parameters for the full PTA analyses, analogous to NG11b.</p><p>The Bayesian analysis was done using the enterprise software suite <ref type="bibr">(Ellis et al. 2017</ref>) and the Markov Chain Monte Carlo (MCMC) sampling software PTMCMCSampler <ref type="bibr">(Ellis &amp; van Haasteren 2017)</ref>. Detection statistics were acquired by using log-uniform priors on the red-noise amplitudes for both the individual pulsar red noise and the common red-noise process. ULs were acquired by running analyses using linear exponential priors (meant to emulate a uniform prior but sampled in log space) for the GWB amplitude and the individual red-noise amplitudes. In the detection analyses log-uniform priors were instead used for A GWB and A RN .</p><p>A frequentist analysis was also undertaken using the same software as above and the optimal statistic submodule in the PTA model software package enterprise_extensions <ref type="bibr">(Taylor et al. 2018)</ref>. A noise-marginalized analysis <ref type="bibr">(Vigeland et al. 2018</ref>) was done at each slice using the MCMC chains from the Bayesian runs to sample over the red-noise parameters. The maximum likelihood values were then used to calculate the noise-maximized values. In both cases the optimal statistic and signal-to-noise ratio (S/N) were calculated.</p><p>In most cases the spectral index for the common red-noise process was set to &#947;&#61600;=&#61600;13/3, the theoretical spectral index (in terms of timing residuals) for a stochastic GWB originating from circular binary inspirals, where the loss of energy in the binary is driven by the radiation of GWs <ref type="bibr">(Phinney 2001)</ref>.I n addition, an analysis was done where the spectral index of the common process was also allowed to vary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Evolution of GWB Statistics</head><p>The first signal detected by PTAs is expected to be the stochastic sum of SMBBHs from the cosmological neighborhood <ref type="bibr">(Rosado et al. 2015)</ref> and should grow very steeply as our data sets become sensitive farther into the nanohertz regime. Unlike the first detections of GWs from compact binary coalescences <ref type="bibr">(Abbott et al. 2016</ref><ref type="bibr">(Abbott et al. , 2017))</ref>, a detection of the GWB will not appear as a single event, but rather as a steady growth in significance over the course of a number of data releases. The evolution of detection statistics has been studied in the literature extensively <ref type="bibr">(Siemens et al. 2013;</ref><ref type="bibr">Vigeland &amp; Siemens 2016)</ref>, including theoretical studies of the scaling of the frequentist optimal statistic and numerical simulations using realistic data to predict when PTAs will reach specified sensitivities. Work of this kind is important for understanding the context of current data releases and the near-future ability to characterize nanohertz GW astrophysics. These types of studies also have obvious applications to the strategic planning of future PTA facilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Theory</head><p>The scaling laws presented in <ref type="bibr">Siemens et al. (2013)</ref> provide a straightforward framework for the comparison of NANOGrav's evolving sensitivity to the GWB. In the simple case of a PTA where all pulsars only have (identical) WN, the expectation value of the S/N, r s = A GWB 2 0 , where &#963; 0 is the standard deviation of A GWB 2 , is shown to evolve in the weak signal regime as</p><p>where &#967; IJ is the HD spatial correlation between pulsars I and J, c is the cadence of observations, &#963; is the measurement error of TOAs, &#947; is the spectral index of the power-law background, T is the total time of observations, A GWB is the amplitude of the GWB at 1/yr ( f yr ), and b contains the frequency dependence of the GWB signal,</p><p>Similarly, in the intermediate signal regime the S/N scales as</p><p>where &#952; is a function of the spectral index that includes the &#915; function,</p><p>The S/N can be related to the Bayes factor, &#61506; 10 , from a GWB model versus noise-only model comparison using the Laplace approximation</p><p>where D &#61517; V is the characteristic spread of the likelihood around the maximum and &#61517; V is the total parameter space volume of the model. The second term on the right-hand side is negative and imposes an Occam penalty, favoring models with fewer parameters. While this expression is simple, in practice such a calculation requires detailed knowledge about the likelihood function for both the signal model and noise-only model. Current Bayesian PTA analyses use a nested model approach and a Savage-Dickey approximation to the Bayes factor, which does not furnish the noise-only likelihood function. Nonetheless, it is obvious from Equations (4) and (6) that one expects a monotonically increasing Bayes factor as the observation time for a PTA increases.</p><p>One can relate the aforementioned scaling laws to a UL by using the complimentary error function,</p><p>where &#242; is the significance threshold for the limit, e.g., 0.95 for a 95% UL. The expectation value yields</p><p>, and given the time dependence of the S/N in Equations ( <ref type="formula">4</ref>) and (6), the ULs for the optimal statistic should evolve as</p><p>where &#948; weak and &#948; int are shorthand for the coefficients of time, T, in Equations ( <ref type="formula">4</ref>) and (6), except for A GWB . Since these relationships are based on a frequentist statistic, it is prudent to compare them to the ULs obtained from a Bayesian analysis on simulated data sets, as it is well known that frequentist and Bayesian ULs can have different interpretations (see, e.g., <ref type="bibr">Rover et al. 2011)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Simulations</head><p>We simulated NANOGrav-like data sets following <ref type="bibr">Taylor et al. (2017b)</ref> using the Python wrapper for TEMPO2 <ref type="bibr">(Hobbs et al. 2006</ref>) and PTA simulation package libstempo <ref type="bibr">(Vallisneri 2015)</ref>. <ref type="foot">44</ref> We ran a UL analysis on each simulated data set with an injected GWB of known amplitude. The simulated data sets are based on the noise properties and epochs of observation for the 11 yr data set. A UL analysis was run for 200 different realizations of a GWB at A GWB of 1&#61600;&#215;&#61600;10 -16 ,1 &#61600;&#215;&#61600;10 -15 , and 3&#61600;&#215;&#61600;10 -15 .</p><p>In Figure <ref type="figure">1</ref> the results are summarized for the simulations with an injection of A GWB &#61600;=&#61600;1&#61600;&#215;&#61600;10 -15 . The mean of the ULs and the 90% confidence interval are shown, along with the level of the injection and a fit to the theoretical evolution for the UL in the weak regime, given by Equation 12(a). The curve is fit to the mean values greater than 5 yr into the data set by varying the &#948; weak parameter.</p><p>Such a close fit from 5 to 11 yr is extraordinary given that many of the other parameters in this relationship are changing with time, e.g., the average cadence and average TOA error. In part, it is these large changes in parameters at the beginning of the data set that are responsible for the poor fit to the theoretical prediction at early times. This fit for &#948; weak is then used in Figure <ref type="figure">2</ref> to compare the theoretical evolution of Equation 12(a) to the evolution of the UL with larger and smaller injections of the GWB. With the same value for &#948; weak , only changing the injection strength, A GWB , accordingly, the Bayesian analyses of these simulations follow the theoretical predictions at late times.</p><p>Note that since the published UL in NG11b is A GWB &#61600;=&#61600;1.45&#61600;&#215;&#61600;10 -15 , the sensitivity of this data set is such that a GWB with A GWB &#61600;=&#61600;3&#61600;&#215;&#61600;10 -15 would be counted as a detection in the full data set. Still one can calculate a 95% UL in this case, and these simulations act as an exercise to demonstrate that the UL scalings derived in Section 4.1 hold robustly in this higher-amplitude case. In addition to the work cited on detection scaling laws earlier in the section, forthcoming work looks at injections into the real data as a way of characterizing the evolution of sensitivity in our data sets.</p><p>Armed with a general understanding of the expected evolution of GWB analysis statistics, we move on to the sliced analysis of the NANOGrav 11 yr data set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Standard Analysis Results</head><p>Here we report the results of both our standard detection and UL analyses of the NANOGrav 11 yr data set for a GWB across the sliced data set.</p><p>The expectations laid out in Section 4 are that in this type of analysis, as more data are added (i.e., we have more information about the system we are studying), the posteriors for A GWB should get narrower. In broad strokes this means that the Savage-Dickey Bayes factor approximation would start near 1 and begin to increase as more data are added. The S/N should also increase according to the evolution described in <ref type="bibr">Siemens et al. (2013)</ref>, while the UL should steadily decrease with more data until the data become sufficiently informative to run up against the actual signal amplitude, as shown in Section 4. In both of our Bayesian analyses, the evolution of these statistics does not conform to these reasonable predictions.</p><p>In the case of a varied spectral index analysis we expect similar behavior, but as a detection becomes imminent, the significance of a steep spectral index common process will increase. In the case of the varied spectral index analysis, rather than seeing a steep spectral index near &#947;&#61600;=&#61600;13/3, a shallow &#947;&#61600;&#8776;&#61600;0 process appears early in the observation period.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Fixed Spectral Index Analyses</head><p>The Bayes factor in the bottom panel of Figure <ref type="figure">3</ref> remains between 0.5 and 1.5 until 7.0 yr into the data set, and then it increases dramatically for a few slices, before decreasing again. As can be seen in the top panel of Figure <ref type="figure">3</ref>, the UL decreases monotonically until 6.5 yr into the data set (slice ending about MJD 55,590 (2011.08)) and sharply increases over the next year until the 7.5 yr slice (slice ending about MJD 55,956 (2012.08)), before beginning to decrease again. We refer to this period of time as the "kink" for brevity.</p><p>The frequentist statistics are mixed. While the ULs calculated from the optimal statistic noise-marginalized posteriors are a bit lower during the "kink," the same trend (particularly the increase) in the UL can be seen centered around the 7.5 yr slice. The S/Ns are, however, drastically different from the Bayes factor trends and do not show the dramatic increase in this era. Since these calculations involve spatial correlations, they are often used as the quickest estimate of detection capabilities of a given data set. The fact that they are so different in this era from the Bayesian results is troubling and is the impetus for most of the remainder of this paper.</p><p>A full sliced analysis using HD spatial correlations was undertaken on the data set. These analyses hint at an even stronger detection of a GWB during the "kink" era. The posteriors for this analysis indicate a GWB detection too strong to be estimated using the Savage-Dickey Bayes factor approximation. Since these analyses take up to 10 times longer  to run, we restricted our follow-up analyses to searching for an uncorrelated common red-noise process among all pulsars.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Varied Spectral Index</head><p>One other analysis that needs to be summarized is that with a varying spectral index for the common red-noise process model for the GWB. This analysis shows a drastic change in the posteriors of the parameters, but this change is not contemporaneous with the "kink" in the analyses, where the spectral index is set to &#947;&#61600;=&#61600;13/3. Here we see that the spectral index describing the common red noise between the pulsars changes dramatically in the 4.5 yr slice ending at MJD 54,860 (2009.08) and butts up against &#947;&#61600;=&#61600;0 (Figure <ref type="figure">4</ref>).</p><p>The feature slowly dissipates until roughly the 7.0 yr slice. This change in the spectral index is contemporaneous with the "first" chromatic timing event in PSR J1713+0747 <ref type="bibr">(Demorest et al. 2013;</ref><ref type="bibr">Lam et al. 2018b)</ref>. Investigating further, we see that a similar feature exists in the single-pulsar red-noise analysis for this pulsar, shown by the orange dashed curve in Figure <ref type="figure">4</ref>. We will return to this feature and investigate this correlation with the "first" PSR J1713+0747 ISM event in Section 7.</p><p>Additionally, the recovered value of &#947; coincident with the "kink" does not significantly vary. Rather than moving toward larger values, indicative of the recovery of a "steeper" process, the spectral index recovery remains broad and stagnant. While this is in no way dismissive of a GW-related event occurring during this time, it is further evidence of the anomalous behavior of our Bayesian analyses throughout the "kink."</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Investigating the Anomalous Signal</head><p>In this section we investigate the anomalous GWB signal in our sliced analysis, which includes the "kink" in the UL analysis and the large spike in Bayes factor contemporaneously. We run through a number of the diagnostic analyses that were completed and summarize our mitigation strategy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.">Slice-specific Simulations</head><p>While the evolution of our UL and Bayes factors seems to be unexpected, these types of statistics are expected to evolve stochastically depending on how the particular noise and Gaussian process realizations interact with the data. This is evident from the simulations run in Section 4. Therefore, it is no surprise that the UL or Bayes factor is nonmonotonic in moving from any given slice of the data to the next. The parameter space for a PTA is large, and the data sets for the individual pulsars may interact in complex ways with various lengths of observation, red-noise frequencies used, and WN parameter characterization. However, such a large and continuous rise in the Bayes factors and ULs seems to be worth further investigation.</p><p>The "kink" in the UL time series of Figure <ref type="figure">3</ref> lies far outside of the 90% confidence intervals of the simulations in Section 4. Noting that noise parameter characterization can change substantially over time, this large deviation inspired a new set Figure <ref type="figure">3</ref>. Results of a standard PTA GW data analysis. In both plots the blue solid line/circles show the Bayesian results at each slice, while the open orange circles show the frequentist results. Note the large rise in the UL and the Bayes factor for the Bayesian analysis starting in the 6.5 yr slice and peaking in the 7.5 yr slice. The UL calculated from the noise-marginalized optimal statistic has a similar trend to the Bayesian UL, but with lower values. The S/N calculated from the optimal statistic shows no sign of a detection in the same era. of simulations, in order to better characterize the degree to which the "kink" is just a statistical fluctuation. This set of simulations uses the same techniques as in Section 4.2, with one major difference. Rather than use the noise parameters from the full 11 yr analysis to simulate the pulsar data sets, the noise parameters retrieved from a single-pulsar analysis at each slice were used to build a simulation of that slice. This gave us an injection that was "truer" to the knowledge at the time of a particular slice. In practice, this involves making a whole new data set for each slice. This analysis was done on a set of data sets with an injection of A GWB &#61600;=&#61600;10 -15 .</p><p>The mean UL and 90% confidence interval are plotted in Figure <ref type="figure">5</ref>. Besides the mostly monotonic trend in the average UL, there is a characteristic plateau that starts just around the increase in the real data. This plateau appears near the changeover in our pulsar back end at the GBT from GASP to GUPPI. GUPPI and PUPPI allow for much wider band observations of radio pulses, which allows us to do more accurate mitigation of DM fluctuations. While no causal relation has been found between this changeover and the "kink," we will present a number of results in Section 6.7 that summarize further investigations.</p><p>Even with this plateau in the same era as the "kink," the values of the UL in the 7.5 yr slice are outside of the 90% confidence interval of the simulations done. The fact that the "kink" is lessened in significance when simulating with noise estimated from each slice suggests either significant covariance between noise and GW signal at these epochs or nonstationary noise features. The results from these simulations, along with the large Bayes factors, prompted us to examine which pulsars seemed to be most responsible for the signal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.">Dropout Analysis</head><p>It is important to point out that a number of different types of GW signal search are done by the NANOGrav Collaboration on each new data set. In addition to searching for various types of stochastic backgrounds, a search for single sources of GWs from SMBBHs <ref type="bibr">(Aggarwal et al. 2019</ref>) and a search for GW memory <ref type="bibr">(Aggarwal et al. 2020</ref>) from binary coalescences<ref type="foot">foot_4</ref> were performed. The most recent studies involving these searches cover both the NG9a and NG11a data sets. In both cases there was mild evidence for GW signals in NG9a, which decreases in the analyses of NG11a. In the case of the singlesource search of <ref type="bibr">Aggarwal et al. (2019)</ref>, PSR J0613-0200 was found to be responsible for the spurious signal and both PSR J0030+0451 and PSR J1909-3744 were responsible for the burst-with-memory anomaly in NG9a. As mentioned in those manuscripts, both noise in the individual pulsars and the lowsensitivity sky positions of PSR J0030+0451 and PSR J0613 -0200 were to blame. We will see that these pulsars again appear as culprit pulsars in this analysis. These pulsars were tagged as being the source of the spurious signals using a Bayesian dropout analysis. In none of the cases above was a robust detection of GWs found.</p><p>Following <ref type="bibr">Aggarwal et al. (2019)</ref>, a dropout analysis was undertaken on the sliced data set. The dropout analysis is a new technique for isolating spurious noise sources from particular pulsars in a PTA GW analysis <ref type="bibr">(Aggarwal et al. 2019)</ref>.I na dropout analysis the signal being analyzed is coded with a socalled dropout parameter. These parameters multiply the signal amplitude and are sampled in the MCMC analysis. They are binary in the sense that, depending on the sample, the dropout parameter is either 1 or 0, turning the signal on/off. This allows one to use the Bayesian analysis to determine the signal model in terms of which pulsars prefer the signal to be turned on in the analysis. See S. J. <ref type="bibr">Vigeland et al. (2019, in preparation)</ref> for more details.</p><p>Here we have used the GWB dropout analysis in enterprise_extensions to look at which pulsars favor a common red-noise signal across the slice analysis. The only pulsar with an odds ratio significantly higher than 1 in the 7.5 yr slice (i.e., more samples favor the presence of a stochastic background) is PSR B1937+21. It has long been known that this pulsar has a great deal of red noise <ref type="bibr">(Kaspi et al. 1994)</ref>,s o much that it was not included in the analysis by <ref type="bibr">Arzoumanian et al. (2016)</ref>. As can be seen in Figure <ref type="figure">6</ref>, removing this pulsar from the analysis decreases the Bayes factor and the UL during this era. Therefore, some of the spurious signal in the "kink" era is due to this pulsar. However, while the UL and Bayes factors decrease across this set of time slices, the main features of the "kink" are still present in the statistics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3.">Single-pulsar GWB Statistics</head><p>In search of other pulsars that might be responsible for this spurious signal, an exhaustive analysis of the GWB statistics was done for each individual pulsar. This analysis is often carried out to characterize the robustness of PTA GW statistics and has been used in the past to rank the sensitivity of pulsars to the GWB. These individual pulsar ULs are in turn used to do cumulative analyses where pulsars are added until the UL asymptotes to a stable and more robust value.</p><p>In an attempt to track down pulsars that could possibly be culprits in causing the "kink," we ran individual Bayes factor and UL analyses on all 34 pulsars in the GW analysis, over the slices in question. This was not done initially because it is computationally intensive, requiring upward of 1000 individual Figure <ref type="figure">5</ref>. UL analyses of simulated NANOGrav 11 yr-like data sets with slicedependent noise characteristic. Data were simulated for each slice based on the noise parameters recovered from the sliced analyses on the real data set. This is in contrast to the simulations done in Section 4, where the noise parameters from the full data set were used to simulate a full data set, which was then sliced for the UL analyses. One can see that the difference in noise parameters makes the "kink" of the blue trace somewhat less significant. The orange line shows the expected evolution and is identical to the blue line in Figure <ref type="figure">1</ref>. The vertical lines show when the new GUPPI/PUPPI back ends came into use and when the old GASP/ASP back ends were phased out.</p><p>analyses. Plots of all of the pulsars' statistical evolution can be found online, but Figure <ref type="figure">7</ref> summarizes those of one interesting candidate source of the anomalous statistics, PSR J0030+0451. Four culprit pulsars were identified in these analyses: PSR B1855+09, PSR J0030+0451, PSR J0613-0200, and PSR J1909-3744. These pulsars all show either a similar feature to the full PTA analysis or a sharp rise in the Bayes factor in the 6.5-8.5 yr time span. This same feature does not appear in the UL and Bayes factor time series for PSR B1937+21, but we group it with these pulsars since it affects the GWB statistics in this era, as determined by the dropout analysis. After finding these candidates, the most straightforward test of their responsibility for the "kink" was to remove each of them and run the analysis on the remaining pulsars in the PTA.</p><p>Analyses were done with each of the pulsars removed individually and in all subsets of the culprit pulsars. We only discuss the results of removing either all of the pulsars or four of the pulsars while keeping PSR J1909-3744 in the analysis, since these are the most interesting cases.</p><p>Figure <ref type="figure">6</ref> shows these cases. When four of the culprits (not including PSR J1909-3744) were removed from the analysis, the "kink" strongly decreases in the time series and the evolution of the UL falls well within the 90% confidence interval of the simulations in Figure <ref type="figure">5</ref>. The Bayes factor time series still shows a remnant feature during this period, but when PSR J1909-3744 is removed, the Bayes factor time series decreases even further. The removal of this pulsar drastically reduces our sensitivity to the GWB across the time span of interest. This is not surprising since PSR J1909-3744 is one of our most precisely timed pulsars, but in the next section it will be important to include this pulsar in our mitigation strategy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4.">Free Spectral Red-noise Models</head><p>While it is important to identify which pulsars have such strong, spurious GW detections, we are primarily interested in finding a way of mitigating the noise in these pulsars so they can be included in the full PTA analysis. In Section 6.3 we demonstrated that removing a small subset of the pulsars from the GW analysis removed the "kink." While it is reassuring that we have isolated the spurious detection of a GWB to a handful of pulsars, we would like to devise a mitigation strategy for this type of noise and understand the root cause more thoroughly, especially since one of our most sensitive pulsars is included in the list.</p><p>As discussed in Section 3, a free spectral model is another tool in the standard PTA data analysis toolbox that we can use to model the red noise in these individual pulsars. The free spectral model uses a free parameter for the amplitude of the Fourier basis red noise at each frequency modeled, which provides a much larger parameter space for the noise by not restricting the noise in a given pulsar to follow any type of functional dependence in frequency. Hence, such a model can model the noise in the lowest-frequency bins of an individual pulsar and any unmodeled higher-frequency noise independently, which the power-law model is unable to do.</p><p>In order for the free spectral model to cover the same number of frequencies as the power-law model, upward of 30 parameters need to be added per pulsar. Historically, this is the primary reason for not including free spectral models in PTA noise analyses, as the increased parameter space becomes computationally infeasible to search over. Additionally, forthcoming work <ref type="bibr">(J. Simon et al. 2019, in preparation)</ref> shows that these models do not compete well in a Bayesian model selection framework because the large Occam penalty of the  shows the median and 68% and 95% confidence intervals for the posterior on A gwb at each slice. The bottom panel shows the Savage-Dickey Bayes factor calculated at each slice. Note, in both cases, the jump that occurs at the 7.5 yr slice.</p><p>additional parameters cancels out the ability of these many parameters to describe the noise accurately.</p><p>With those caveats in mind, we undertook another set of PTA GW analyses using free spectral models for the culprit pulsars. The results are summarized in Figure <ref type="figure">8</ref>. These results demonstrate that the free spectral model is effective at mitigating the spurious noise features in both the UL and Bayes factor analyses of the NANOGrav 11 yr data set. It is salient to compare the longest slices, i.e., &gt;10 yr, where the Bayes factor is slowly increasing. With all of the culprit pulsars removed in Figure <ref type="figure">6</ref>, the Bayes factor still remained at or below 1 in these late slices, showing no early signs of any type of signal. However, even though the use of a free spectral noise model for the five culprit pulsars mitigates the spurious features identified in this paper, it begins to favor the signal model in the long run, focusing on the points &gt;10 yr in Figure <ref type="figure">8</ref>. If this growth in the Bayes factor were the beginning indications of a real signal, its growth would be indicative of the amplitude of the underlying GWB. A separate, ongoing investigation is addressing this question by injecting GWB signals into the 11 yr data set and will be published separately.</p><p>Red-noise amplitude spectral densities for the 7.5 yr slice are shown in Figure <ref type="figure">9</ref> for PSR J0030+0451. The free spectral parameter posteriors are compared to a sample of the powerlaw posterior amplitude spectral densities. The thick straight lines are the power-law spectrum for the maximum likelihood values for these power-law parameters. The bottom panel shows the 2D posteriors for the power-law parameters, in both the single-pulsar noise run and the full PTA analysis. In the top panel note that the only significant free spectral parameter (i.e., sufficiently separated from the minimum amplitude) is the one for the lowest frequency. This points to a possible cause for the anomalous signal we are seeing from a few pulsars. The lowest frequency will model all power within a &#948;f defined by the inverse of the time span, but it is limited by the second-lowest frequency, where there is no substantial evidence for power. One conjecture, partly substantiated by comparing the different 2D power-law posteriors, is that the power law is able to find this power at low frequencies, but since the signal is only  violin plots show the posteriors for the free spectral-noise model at each frequency. One can judge the significance of a detection by how separated the violin plot is from the lowest amplitudes. The crowding at higher frequencies stems from the linearly spaced frequencies on a log scale. The vertical dashed lines show frequencies at 1 and 2 yr -1 . The bottom panel shows the 2D posteriors for the power-law noise models, in &#947; and log 10 &#61600;A GWB . The blue contours show the posterior from the individual noise run, while the orange heat map shows the posterior for the full PTA run for the same pulsar, PSR J0030+0451. Note that while the individual pulsar noise run shows closed contours for the power-law model, the full PTA has a very diffuse, nonsignificant posterior. It is suspected that this RN power has moved into the common red-noise process, which shows a strong detection in the 7.5 yr slice.</p><p>significant at one frequency, this power is allowed to transfer between the pulsar red-noise model and the GWB common red-noise process.</p><p>Compare these results to the same data products from the full NANOGrav 11 yr data set in Figure <ref type="figure">10</ref>. The second-lowestfrequency free spectral parameter is more significant, and the power-law model is much more consistent between the singlepulsar noise analysis and the full PTA analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.5.">Anomalous Signal Toy Model</head><p>To try to further understand the spurious signal seen in the 7.5 yr slice, we ran a number of simulations using PSR J0030 +0451-like data sets in an attempt to duplicate the jump in the Bayes factor seen in Figure <ref type="figure">7</ref>. In Figure <ref type="figure">11</ref> we show the results of the simulations and analyses. In each case the noise parameters obtained from the analysis of the real PSR J0030 +0451 data were used to try to replicate the same Bayes factor. In the first four cases no GWB background was injected. Two different types of noise injection were used, corresponding to either the power-law noise model values for this pulsar or the free spectral noise model parameters. When the same model is used in the analysis as the injection model, the Bayes factor is near 1. However, when a free spectral noise model is used in the injection and a power-law noise model is used in the analysis, the Bayes factor roughly triples. This is further evidence in support of the main conclusion in this work-the use of an inaccurate noise model can lead to anomalous detections of a GWB.</p><p>One may question whether the free spectral model will remove any evidence for a GWB entirely. The last two GWB Bayes factors in Figure <ref type="figure">11</ref> show simulations where an actual power-law GWB was injected in addition to the free spectral noise. The A GWB injected was the maximum likelihood value from the 7.5 yr slice of PSR J0030+0451&#700;s real data set. One can see that the data analysis in the 7.5 yr slice does not detect the GWB as a separate red-noise injection. The full 11 yr data set is able to better differentiate the GWB. This supports the conclusion from Section 6.4 and born out in the full time span analysis with free spectral models shown in Figure <ref type="figure">8</ref>. One expects earlier slices to have lower Bayes factors, as well as a slow rise in the Bayes factor as we accrue longer data sets.</p><p>Obviously these Bayes factors are still rather close to 1, i.e., even odds, but since the GWB signal significance is expected to grow slowly with time, it is the comparison and trends with which we are most concerned.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.6.">Power Accounting</head><p>One would like to quantify the difference between the noise model using a power-law prior for the Gaussian processes and the free spectral models that have effectively mitigated the "kink," removing the false-positive detection of a GWB. One way in which this can be done is to calculate the posterior probability distributions for the power in these red-noise channels. Here we compare the power by calculating it from the models used as priors for the Gaussian process over the frequencies sampled in the Gaussian process coefficients. For the power law this will be a sum of the power-law values across the sampled frequencies times the frequency bin size, The various elements are the same as in Figure <ref type="figure">9</ref>, but the results reveal how the "kink" is in part mitigated with time. Looking at the top panel, in addition to the lowest frequency, the second-lowest-frequency posterior for the free spectral-noise model is also above the WN floor. The power-law model is not as hindered by the WN floor, and as can be seen in the bottom panel, the power-law noise model effectively holds much of the red-noise power in this pulsar, rather than allowing it to all move into the common red-noise process.</p><p>Figure <ref type="figure">11</ref>. Bayes factor for the GWB amplitude from various noise injections and analyses. Error bars are included but are smaller than the markers in most cases. The labels show what type of injection and noise model were used. Note that using a free spectral model injection while using a power-law model in the analysis results in a higher significance detection of a GWB.</p><p>In Figure <ref type="figure">12</ref> we show the calculated posteriors for the power using these two models on the data from PSR J0030+0451, one of the main culprit pulsars in Section 6.4. The WN parameter posteriors are basically unchanged for this pulsar between the two models, so the free spectral model is effectively absorbing power that is otherwise unmodeled. While the lowest frequency is most likely the issue for this pulsar, as mentioned in Section 6, the majority of the power in the free spectral model is at high frequencies and is probably not to blame for the spurious GWB detection in the previous section. However, since the power posteriors for all of the culprit pulsars are different by approximately an order of magnitude between the power-law and free spectral models, this noise has been flagged as an obvious area for improvement in our per pulsar noise modeling. A number of in-progress projects and a forthcoming paper <ref type="bibr">(J. Simon et al. 2019, in preparation)</ref> are devoted to mitigating noise of this sort in a number of NANOGrav pulsars.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.7.">Ineffective Analyses</head><p>While the use of the free spectral model has mitigated the spurious GW signal of Figure <ref type="figure">3</ref>, there are a number of additional investigated analyses that had either a neutral or a minimal effect on the GW statistics. Here we summarize them, in part to inform the interested expert and in part to motivate their use in upcoming work that investigates more comprehensive noise models for pulsars.</p><p>The standard analyses were run with various versions of the JPL solar system ephemeris and the Bayesian solar system ephemeris model, BayesEphem (NG11b). All results were qualitatively the same as results shown in the sections above between DE421, DE430, DE436 <ref type="bibr">(Folkner et al. 2009</ref><ref type="bibr">(Folkner et al. , 2014;;</ref><ref type="bibr">Folkner &amp; Park 2016)</ref>, and using BayesEphem. In particular, all results showed the same anomalous GW statistics in the 6.5-8.5 yr slices.</p><p>The choice of frequencies sampled by the Gaussian processes that are modeling the underlying stochastic signals has been shown to affect the signal analysis (van Haasteren &amp; Vallisneri 2015; Ellis &amp; Cornish 2016). A number of strategies for choosing the frequencies, including using a log spacing rather than a linear spacing and choosing the frequencies uniformly across the slices, were carried out. These had some effect in mitigating the "kink" but were not as effective as the methods described in previous sections.</p><p>Lastly, the proximity in time of the "kink" to the changeover in the back ends used for observing is intriguing; however, analyses testing any causation were inconclusive. These included a number of analyses with different combinations of the overlapping GASP/ASP and GUPPI/PUPPI data. Here it is difficult to separate the effect of narrower bandwidths from the change in GW statistics. We also modeled back-end red noise, similar to the "band noise" of <ref type="bibr">Lentati et al. (2016)</ref>, but rather than restricting the noise to a specific observing frequency band, we restricted the noise to a specific back-end type. While the red-noise parameters seem to be significantly different between back ends in some pulsars, the use of these models did not help to mitigate the spurious GW statistics.<ref type="foot">foot_5</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Mitigating Low Spectral Index Noise</head><p>Here we turn our attention to mitigating the transient WN feature described at the end of Section 5. The analysis in which the spectral index is varied intimates PSR J1713+0747 as a strong candidate for this WN event in the common red-noise process. Sharp noise features in the time-series domain can manifest as low spectral index noise in the power spectral density; hence, an attempt at mitigating the noise in this pulsar was undertaken.</p><p>The recent observation of a second ISM event in PSR J1713 +0747 has prompted new work on chromatic noise models for our pulsars <ref type="bibr">(Lam et al. 2018b</ref>). This second event does not occur during the time span of the NG 11 yr data set, but the first ISM event occurring near <ref type="bibr">MJD 54,750 (2008.78)</ref> and first observed in <ref type="bibr">Demorest et al. (2013)</ref> does, between the 4.0 and 4.5 yr slices in this analysis. As has been shown in other recent publications <ref type="bibr">(Aggarwal et al. 2019</ref><ref type="bibr">(Aggarwal et al. , 2020))</ref>, unmodeled noise in a single pulsar can appear in a common PTA signal. In order to investigate whether the first ISM event in PSR J1713+0747 is the cause of the significant WN appearing in the common rednoise signal, we ran the analysis over again using the same DM noise model<ref type="foot">foot_6</ref> first used in <ref type="bibr">Lam et al. (2018b)</ref>. That model consists of a timing-model fit for a linear and quadratic trend in the DM (DM1 and DM2), chromatic red noise modeled with a Gaussian process, and a phenomenological model for the dip in the DM variations. In <ref type="bibr">Lam et al. (2018b)</ref> this consisted of two exponential dips modeled as</p><p>where ( ) Q t 0 is the Heaviside function and the amplitude (&#61524; ), time of occurrence (t 0 ), and decay time (&#964;) were fit for in the Bayesian analysis. In the present work we only fit for one exponential dip, to model the first ISM event. The Gaussian process and exponential dips are implemented in enterprise. This model replaces the piecewise DMX model used in NG11a and NG11b. This model was also studied in depth in <ref type="bibr">Wang et al. (2019)</ref>, where a Bayesian cross-validation study showed convincing evidence for the preference of this model. The better performance of this model for DM variations in this particular case is explained by a lack of DMX bins in the time span around the minimum of the rapid fluctuation. From MJD 54,707 (2008.66) until 100 days past the event there are only five DMX bins. This coarse sampling is possibly inadequate for such a relatively short-timescale, high-amplitude event. The lack of interobserving band TOAs also limits the precision of the &#916;DM measurement, because the measurement is done within a single, narrow receiver band.</p><p>The results of this newer model on the posteriors for the spectral index are dramatic, as can be seen in Figure <ref type="figure">13</ref>.</p><p>The posteriors show reduced significance of an unmodeled WN transient, i.e., the posterior is no longer butted up against the &#947;&#61600;=&#61600;0 end of the prior. This noise is mitigated not only in the individual pulsar noise but also in the common process, revealing that this event can have an important effect on the GWB analysis and can be mitigated with a more tailored noise model. <ref type="foot">48</ref>With the appearance of a second ISM event in PSR J1713 +0747 in the NANOGrav 12.5 yr data set, this type of model will be necessary to properly use this pulsar in our GW analyses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Summary and Conclusions</head><p>Here we have used the standard tools of PTA GW analysis to investigate the evolution of GW statistics in the NANOGrav 11 yr data set. After finding transient features in the sliced data GW analyses, we undertook an in-depth analysis to characterize and mitigate any possible sources of noise that might lead to these features. The transient GWB detection that peaks during the 7.5 yr slice was found to be due, in large part, to five "culprit" pulsars. These pulsars were identified by a combination of their individual red-noise analyses, single-pulsar GWB upper limits, and a new Bayesian PTA data analysis technique known as the dropout method. In order to test whether these pulsars were responsible for the large GWB signal, a new set of GW statistics were derived for the NANOGrav 11 yr data set with TOAs from these pulsars removed. Once a set of pulsars was identified to be responsible for these artifacts, the false signal was mitigated using a free spectral noise model. This demonstrates the importance of characterizing the noise in pulsars correctly and demonstrates that incorrect noise models can lead to false positives in our Bayesian analyses. These results highlight a number of strategies important when searching PTA data for a stochastic GWB:</p><p>1. Study the noise evolution of individual pulsars. 2. Look at the evolution of single-pulsar UL and detection analyses to see when various signals become significant and how long they remain significant. 3. Attempt to use more tailored noise models for a given pulsar.</p><p>This last point is especially important for the type of transient WN feature seen in the 4.5 yr slice. As shown in Section 7, this transient feature, causing the varying spectral index analysis to prefer a spectral index of zero, and contemporaneous with the first ISM event in PSR J1713+0747, was mitigated using a phenomenological model for the chromatic time delays consisting of a Gaussian process + exponential dip. This work has shown that the standard tools for GW analysis in pulsar timing data, while sufficient to mitigate some of the noise features in NG11a, need to be updated, as the sensitivity of our detector has revealed a new noise floor. These considerations have moved NANOGrav to undertake a program of Bayesian model selection using a full suite of individual, tailored noise models for our pulsars that pay closer attention to the astrophysics causing the noise in each case. This will be presented in an upcoming paper <ref type="bibr">(J. Simon et al. 2020, in preparation)</ref> presenting these new models and the results of model selection on the most sensitive NG11a pulsars.</p><p>Author contributions. This paper is the result of the work of dozens of people over the course of more than 13 years. We list specific contributions below. J.S.H. ran the sliced analyses and led the paper writing. J.S., S.R.T., M.T.L., S.J.V, K.I., and J.S. K. contributed substantially to paper writing, discussion, and interpretation of results. M.T.L. helped with analyses. J.S.  Comparison of posteriors for the spectral index of a power-law rednoise process. Posteriors are shown for both the full PTA analysis and the individual PSR J1713+0747 noise analysis. Note that using a more tailored noise model on this one pulsar has a significant impact on the spectral index of the common process in the full PTA.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="42" xml:id="foot_0"><p>The release is available at https://data.nanograv.org.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>The Astrophysical Journal, 890:108 (15pp), 2020 February 20Hazboun et al.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="43" xml:id="foot_2"><p>This nomenclature is sometimes confusing, as there are two frequency domains discussed in pulsar timing, the frequencies of the GWB and red noise and the radio frequencies of the pulsar observations.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="44" xml:id="foot_3"><p>https://github.com/vallis/libstempo</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="45" xml:id="foot_4"><p>A rudimentary version of a generic burst search, based on the signal model and search algorithm of<ref type="bibr">Ellis &amp; Cornish (2016)</ref>, was also done on this data set in the course of these investigations, with no significant evidence for GWs.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="46" xml:id="foot_5"><p>Anyone interested in other analyses undertaken during this research, or seeing the results of those discussed in this subsection, should feel free to contact the first author for more information.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="47" xml:id="foot_6"><p>This type of model has been used previously in other analyses<ref type="bibr">(Lentati et al. 2016</ref>).</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="48" xml:id="foot_7"><p>It should be noted that there appears to be a large amount of power at lower spectral indices, shallower than expected from either pulsar spin noise or a GWB. This points to a systemic issue with the current noise models used in PTA GW analysis and will be confronted in an upcoming NANOGrav publication(J. Simon et al. 2020, in preparation).</p></note>
		</body>
		</text>
</TEI>
