<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Neural Bootstrapper</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10333136</idno>
					<idno type="doi"></idno>
					<title level='j'>Advances in neural information processing systems</title>
<idno>1049-5258</idno>
<biblScope unit="volume">34</biblScope>
<biblScope unit="issue"></biblScope>					

					<author>M. Shin</author><author>H. Cho</author><author>H-S. Min</author><author>S. Lim</author><author>M. Ranzato</author><author>A. Beygelzimer</author><author>Y. Dauphin</author><author>P. S. Liang</author><author>J. Wortman Vaughan</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Bootstrapping has been a primary tool for ensemble and uncertainty quantification in machine learning and statistics. However, due to its nature of multiple training and resampling, bootstrapping deep neural networks is computationally burdensome; hence it has difficulties in practical application to the uncertainty estimation and related tasks. To overcome this computational bottleneck, we propose a novel approach called Neural Bootstrapper (NeuBoots), which learns to generate bootstrapped neural networks through single model training. NeuBoots injects the bootstrap weights into the high-level feature layers of the backbone network and outputs the bootstrapped predictions of the target, without additional parameters and the repetitive computations from scratch. We apply NeuBoots to various machine learning tasks related to uncertainty quantification, including prediction calibrations in image classification and semantic segmentation, active learning, and detection of out-of-distribution samples. Our empirical results show that NeuBoots outperforms other bagging based methods under a much lower computational cost without losing the validity of bootstrapping.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Bootstrapping <ref type="bibr">[7]</ref> or bagging <ref type="bibr">[3]</ref> procedures have been commonly used as a primary tool in quantifying uncertainty lying on statistical inference, e.g. evaluations of standard errors, confidence intervals, and hypothetical null distribution. Despite its success in statistics and machine learning field, the naive use of bootstrap procedures in deep neural network applications has been less practical due to its computational intensity. Bootstrap procedures require evaluating a number of models; however, training multiple deep neural networks are infeasible in practice in terms of computational cost.</p><p>To utilize bootstrap for deep neural networks, we propose a novel bootstrapping procedure called Neural Bootstrapper (NeuBoots). The proposed method is mainly motivated by Generative Bootstrap Sampler (GBS) <ref type="bibr">[38]</ref>, which trains a bootstrap generator by model parameterization based on Random Weight Bootstrapping (RWB, <ref type="bibr">[37]</ref>) framework. For many statistical models, the idea of GBS is more theoretically valid than amortized bootstrap <ref type="bibr">[31]</ref>, which trains an implicit model to approximate the bootstrap distribution over model parameters. However, GBS is hardly scalable to modern deep neural networks containing millions of parameters.</p><p>Contrary to the previous method, the proposed method is effortlessly scalable and universally applicable to the various architectures. The key idea of NeuBoots is simple; multiplying bootstrap weights to the final layer of the backbone network and instead of model parameterization. Hence it outputs the bootstrapped predictions of the target without additional parameters and the repetitive &#8676; </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Equal Contribution</head><p>&#8224; Corresponding Author 35th Conference on Neural Information Processing Systems (NeurIPS 2021).</p><p>Standard Bootstrap <ref type="bibr">[7]</ref> MCDrop <ref type="bibr">[13]</ref> DeepEnsemble <ref type="bibr">[24]</ref> NeuBoots</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Memory Efficiency Fast Training Fast Prediction</head><p>Table <ref type="table">1</ref>.1. Computational comparison between bagging based uncertainty estimation methods in the view of memory efficiency and computational speed during the training and prediction step.</p><p>computations from scratch. NeuBoots outperforms the previous sampling-based methods <ref type="bibr">[13,</ref><ref type="bibr">24,</ref><ref type="bibr">31]</ref> on the various uncertainty quantification related tasks with deep convolutional networks <ref type="bibr">[17,</ref><ref type="bibr">20,</ref><ref type="bibr">22]</ref>. Throughout this paper, we show that NeuBoots has multiple advantages over the existing uncertainty quantification procedures in terms of memory efficiency and computational speed (see Table <ref type="table">1</ref>.1).</p><p>To verify the empirical power of the proposed method, we apply NeuBoots to a wide range of experiments related to uncertainty quantification and bagging. We apply the NeuBoots to prediction calibration, active learning, out-of-distribution (OOD) detection, semantic segmentation, and learning on imbalanced datasets. Notably, we test the proposed method on biomedical data of high-resolution, NIH3T3 data <ref type="bibr">[5]</ref>. In Section 4, our results show that NeuBoots achieves at least comparable or better performance than the state-of-the-art methods in the considered applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminaries</head><p>As preliminaries, we briefly review the standard bootstrapping <ref type="bibr">[7]</ref> and introduce an idea of Generative Bootstrap Sampler (GBS, <ref type="bibr">[38]</ref>), which is the primary motivation of the proposed method. Let [m] := {1, . . . , m} and denote a given training data by D = {(X i , y i ) : i 2 [n]}, where each feature X i 2 X &#8674; R p and its response y i 2 R d . We denote the class of models f : R p ! R d by M. For the standard bootstrapping, we sample B sets of bootstrap data</p><p>For each bootstrap data D (b) , we define a loss functional L on f 2 M:</p><p>where `: R d &#8677; R d ! R is an arbitrary loss function. Then we minimize (2.1) with respect to f 2 M to obtain bootstrapped models: for</p><p>Random Weight Bootstrapping It is well-known that the standard bootstrap uses only (approximately) 63% of observations for each bootstrap evaluation <ref type="bibr">[24]</ref>. To resolve this problem, we use Random Weight Bootstrapping (RWB, <ref type="bibr">[37]</ref>), which reformulates (2.2) as a sampling of bootstrapping weights for a weighted loss functional. Let W = {w 2 R n + :</p><p>P n i=1 w i = n} be a dilated standard (n 1)-simplex. For w = (w 1 , . . . , w n ) 2 W and the original training data D, we define the Weighted Bootstrapping Loss (WBL) functional on f 2 M as follows:</p><p>Then for any resampled dataset D (b) , there exists a unique w 2 W such that (2.1) matches to <ref type="bibr">(2.3)</ref>. This reformulation provides a relaxation method to consider full data set without any omission in bootstrapping. Precisely, as a continuous relaxation of the standard bootstrap, we use Dirichlet distribution <ref type="bibr">[32]</ref>; <ref type="bibr">)</ref>, where P W is a probability distribution on the simplex W. Hence RWB fully utilizes the observed data points, since sampled bootstrap weights w &#8672; P W are strictly positive. Also, <ref type="bibr">[34]</ref> showed that RWB achieves the same theoretical properties with these of the standard bootstrap i.e. P W = Multinomial(n; 1/n, . . . , 1/n) in (2.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Bootstrap Distribution Generator</head><p>Although RWB resolves the data discard problem, training multiple networks b f (1) , . . . , b f (B) remains a computational problem, and one has to store the parameters of every network for prediction. To reduce the computational bottlenecks, GBS <ref type="bibr">[38]</ref> proposes a procedure to train a generator function of bootstrapped estimators for parametric statistical models. The main idea of GBS is to parameterize the model parameter with bootstrap weight w 2 W. When the GBS is applied to bootstrapping neural networks, it considers a bootstrap generator g : R p &#8677; W ! R d with parameter &#10003;(w), where d is the total number of neural net parameters in g, so that g(X, w) = g &#10003;(w) (X). Based on (2.3), we define a new WBL functional: </p><p>The above procedure asymptotically converges to the same target distribution where the conventional non-block bootstrap converges. See appendix A for more detailed procedure and proofs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Neural Bootstrapper</head><p>Now we propose Neural Bootstrapper (NeuBoots), which reduces computational complexity and memory requirement of the networks in the learning of bootstrapped distribution to being suitable for deep neural networks.</p><p>How to implement the bootstrap generator g for deep neural networks? One may consider directly applying GBS to existing deep neural networks by modeling a neural net &#10003;(&#8226;) that outputs the neural net parameters of g. However, this approach is computationally challenging due to the high-dimensionality of the output dimension of &#10003;(&#8226;) Indeed, <ref type="bibr">[38]</ref> proposes an architecture which concatenates bootstrap weight vector to every layer of a given neural network (Figure <ref type="figure">3</ref>.1(b)) and trains it with (2.6). However, the bagging performance of GBS gradually degrades as we applied it to the deeper neural networks. This may be because the information of bootstrap weights in the earlier layers less propagate since the target model reduces the parameters of the weights during the training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Adaptive Block Bootstrapping</head><p>We found that the bootstrap weight in the final layer mainly affects the bootstrap performance of GBS. This fact motivates us to utilize the following adaptive block bootstrapping, which is the key idea of NeuBoots. Take a neural network f &#10003; 2 M with parameter &#10003;. Let M &#10003;1 and F &#10003;2 be the single-layer neural network in the final layer and the feature extractor of f , respectively, with parameter &#10003; = (&#10003; 1 , &#10003; 2 ), so we can decompose f &#10003; into M &#10003;1 F &#10003;2 . Set S := dim(F &#10003;2 (X)) for the number of blocks for block bootstrapping. Then, we redefine bootstrap generator as follows:</p><p>where denotes an elementwise multiplication. Bootstrap generator (3.1) can also be trained with (2.6); hence optimized b g &#10003; (X, &#8226;) can generate the bootstrapped prediction as we plug &#8629;. This</p><p>A n / 4 J 3 p r O g F r E T k p w 5 9 5 5 z 5 8 5 N q j y r p e 9 / c z r u r d t 3 7 q 6 t b 9 y 7 / 2 B z q / v w 0</p><p>A comparison between the bootstrapping procedure of (a) standard bootstrapping <ref type="bibr">[7]</ref>, (b) GBS <ref type="bibr">[38]</ref>, and (c) NeuBoots. This figure is best viewed in color.</p><p>modification brings a computational benefit, since we can generate bootstrap samples quickly and memory-efficiently by reusing a priori computed tensor F &#10003;2 (X) without repetitive computation from scratch. See Figure <ref type="figure">3</ref>.1 for the comparison between the previous methods and NeuBoots. In our empirical experience, the bootstrap evaluations over different groupings were consistent for all examined examples in this article.</p><p>Training and Prediction At every epoch, we update the w &#8629; = {&#8629; u(1) , . . . , &#8629; u(n) } randomly, and the expectation in (2.6) can be approximated by the average over the sampled weights. Considering the stochastic gradient descent (SGD) algorithms to update the parameter &#10003; via mini-batch sequence </p><p>Algorithm 2: Prediction step in NeuBoots.</p><p>&#8672; S &#8677; Dirichlet(1, . . . , 1) and evaluate b y</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Discussion</head><p>NeuBoots vs Standard Bootstrap To examine the approximation power of NeuBoots, we have measured the frequentist's coverage rate of the confidence bands (Figure <ref type="figure">3</ref>.2.(a)). We estimate 95% confidence band for nonparametric regression function by using the NeuBoots, and compare it with credible bands (or confidence bands) evaluated by the standard bootstrap, Gaussian Process (GP) regression, and MCDrop <ref type="bibr">[13]</ref>. We adopt Algorithm 1 to train the NeuBoots generator with 3 hidden-layers with 500 hidden-nodes for each layer. For the standard bootstrap, we train 1000 neural networks. The result shows the confidence band via NeuBoots stably covers the true regression function on each predictor value with almost 95% of frequency, which is compatible with the standard bootstrapping. In contrast, the coverage of the MCDrop is unstable and sometimes below 70%. This result indicates that the NeuBoots performs comparably with the standard bootstrapping in uncertainty quantification tasks.</p><p>NeuBoots vs Amortized Bootstrapping We applied NeuBoots to classification and regression experiments presented by the amortized bootstrap <ref type="bibr">[31]</ref>. Indeed, every experiment demonstrates that NeuBoots outperforms the amortized bootstrap in bagging performance for various tasks: the rotated MNIST classification (  Computation time and cost As we mentioned earlier, the algorithm evaluates the network from scratch for only once to store the tensor F &#10003;2 (X &#8676; ), while the standard bootstrapping and MCDrop <ref type="bibr">[13]</ref> need repetitive feed-forward propagations.  <ref type="table">3</ref>.2. A comparison of computational costs. We use the following notations: L the number of layers, K the number of bootstrapping (or ensemble), M the parameter size of a single model, I memory size of input data. We also compare NeuBoots to MIMO <ref type="bibr">[16]</ref> and BatchEnsemble <ref type="bibr">[42]</ref> in terms of training, test, and memory complexities (see Table <ref type="table">3</ref> Diversity of predictions Diversity of predictions has been a reliable measure to examine overfits and performance of uncertainty quantification for ensemble procedures <ref type="bibr">[10,</ref><ref type="bibr">35,</ref><ref type="bibr">42]</ref>. In the presence of overfitting, it is expected that the diversity of different ensemble predictions would be minimal because the resulting ensemble models would produce similar predictions that are over-fitted towards the training data points. To examine the diversity performance of NeuBoots, we consider various diversity measures including ratio-error, Q-statistics, correlation coefficient, and prediction disagreement (see <ref type="bibr">[1,</ref><ref type="bibr">10,</ref><ref type="bibr">35]</ref>). For the CIFAR-100 along with DenseNet-100, Table <ref type="table">3</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>NeuBoots vs Dropout</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Empirical Studies</head><p>In this section, we conduct the wide range of empirical studies of NeuBoots for uncertainty quantification and bagging performance. We apply NeuBoots to prediction calibration, active learning, out-of-distribution detection, bagging performance for semantic segmentation, and learning on imbalanced dataset with various deep convolutional neural networks. Our code is open to the public<ref type="foot">foot_0</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Prediction Calibration</head><p>Setting We evaluate the proposed method on the prediction calibration for image classification tasks. We apply NeuBoots to image classification tasks on CIFAR and SVHN with ResNet-110 and DenseNet-100. We take k = 5 predictions of MCDrop and DeepEnsemble for calibration.</p><p>For fair comparisons, we set the number of bootstrap sampling B = 5 as well, and fix the other hyperparameters same with baseline models. All models are trained using SGD with a momentum of 0.9, an initial learning rate of 0.1, and a weight decay of 0.0005 with the mini-batch size of 128. We use CosineAnnealing for the learning rate scheduler. We implement MCDrop and evaluates its performance with dropout rate p = 0.2, which is a close setting to the original paper. For Deep Ensemble, we utilize adversarial training and the Brier loss function <ref type="bibr">[24]</ref> and cross-entropy loss function <ref type="bibr">[2]</ref>. For the metric, we evaluate the error rate, ECE, NLL, and Brier score. We also compute each method's training and prediction times to compare the relative speed based on the baseline.</p><p>Results See Table <ref type="table">B</ref>.    </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Active Learning</head><p>Setting We evaluate the NeuBoots on the active learning with ResNet-18 architecture on CIFAR. For a comparison, we consider MCDrop and DeepEnsemble with entropy-based sampling and random sampling. We follow an ordinary process to evaluate the performance of active learning (see <ref type="bibr">[29]</ref>). Initially, a randomly sampled 2,000 labeled images are given, and we train a model. Based on the uncertainty estimation of each model, we sample 2,000 additional images from the unlabeled dataset and add to the labeled dataset for the next stage. We continue this process ten times for a single trial and repeat five trials for each model.</p><p>shows the sequential performance improvement on CIFAR-10 and CIFAR-100. Note that CIFAR-100 is more challenging dataset than CIFAR-10. Both plots demonstrate that NeuBoots is superior to the other sampling methods in the active learning task. NeuBoots records 71.6% accuracy in CIFAR-100 and 2.5% gap with MCDrop and DeepEnsemble. Through the experiment, we verify that NeuBoots has a significant advantage in active learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Out-of-Distribution Detection</head><p>Setting As an important application of uncertainty quantification, we have applied NeuBoots to detection of out-of-distribution (OOD) samples. The setting for OOD is based on the Mahalanobis method <ref type="bibr">[26]</ref>. At first, we train ResNet-34 for the classification task only using the training set of the CIFAR-10 (in-distribution). Then, we evaluate the performance of NeuBoots for OOD detection both in the test sets of in-distribution dataset and the SVHN (out-of-distribution). Using a separate validation set from the testsets, we train a logistic regression based detector to discriminate OOD samples from in-distribution dataset. For the input vectors of the OOD detector, we extract the following four statistics regarding logit vectors: the max of predictive mean vectors, the standard deviation of logit vectors, expected entropy, and predictive entropy, which can be computed by the sampled output vectors of NeuBoots. To evaluate the performance of the detector, we measure the true negative rate (TNR) at 95% true positive rate (TPR), the are under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPR), and the detection accuracy. For comparison, we examine the baseline method <ref type="bibr">[19]</ref>, MCDrop, DeepEnsemble <ref type="bibr">[24]</ref>, DeepEnsemble_CE (trained with cross-entropy loss) <ref type="bibr">[2]</ref>, ODIN <ref type="bibr">[27]</ref>, and Mahalanobis <ref type="bibr">[26]</ref>. Results Table <ref type="table">4</ref>.1 shows NeuBoots significantly outperform the baseline method <ref type="bibr">[19]</ref>, DeepEnsemble <ref type="bibr">[2,</ref><ref type="bibr">24]</ref>, and ODIN <ref type="bibr">[27]</ref> without any calibration technique in OOD detection. Furthermore, with the input pre-processing technique studied in <ref type="bibr">[27]</ref>, NeuBoots is superior to Mahalanobis <ref type="bibr">[26]</ref> in most metrics, which employs both the feature ensemble and the input pre-processing for the calibration techniques. This validates NeuBoots can discriminate OOD samples effectively. In order to see the performance change of the OOD detector concerning the bootstrap sample size, we evaluate the predictive standard deviation estimated by the proposed method for different B 2 {2, 5, 10, 20, 30}. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Bagging Performance for Semantic Segmentation</head><p>Setting To demonstrate the applicability of NeuBoots to different computer vision tasks, we validate NeuBoots on PASCAL VOC 2012 semantic segmentation benchmark <ref type="bibr">[9]</ref> with DeepLab-v3 <ref type="bibr">[4]</ref> based on the backbone architectures of ResNet-50 and ResNet-101. We modify the final 1 &#8677; 1 convolution layer after the Atrous Spatial Pyramid Pooling (ASPP) module by multiplying the channel-wise bootstrap weights. This is a natural modification of the segmentation architecture analogous to the fully connected layer of the networks for classification tasks. Additionally, we apply NeuBoots to real 3D image segmentation task on commercial ODT microscopy NIH3T3 <ref type="bibr">[5]</ref> dataset, which is challenging for not only models but also human due to the 512 &#8677; 512 &#8677; 64 sized large resolution and endogenous cellular variability. We use two different U-Net-like models for this 3D image segmentation task, which are U-ResNet and SCNAS. We simply amend the bottleneck layer in the same way as the 2D version. Same as an image classification task, we set B = 5 and k = 5. For the remaining, we follow the usual setting.</p><p>Results Table <ref type="table">4</ref>.2 shows NeuBoots significantly improves mean IoU and ECE compared to the baseline. Furthermore, similar to the image classification task, NeuBoots records faster prediction time than MCDrop and DeepEnsemble. This experiment indeed verifies that NeuBoots can be applied to the wider scope of computer vision tasks beyond image classification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Imbalanced Dataset</head><p>Setting To validate the efficacy for the imbalanced dataset, we have applied NeuBoots to two imbalance sets, the imbalanced CIFAR-10 and the white blood cell dataset with ResNet-18. To  Especially, NeuBoots outperforms for eosinophil identification, the class with the lowest number of samples in the white blood cell dataset, with low variance. This result shows that the NeuBoots boosts the prediction power for the fewer sampled classes with high stability via simple implementation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Related Work</head><p>Bootstrapping Neural Network Since <ref type="bibr">[7]</ref> first proposed the nonparametric bootstrapping to quantify uncertainty in general settings, there has been a rich amount of literature that investigate theoretical advantages of using bootstrap procedures for parametric models <ref type="bibr">[8,</ref><ref type="bibr">14,</ref><ref type="bibr">15]</ref>. For nerural networks, <ref type="bibr">[12]</ref> investigated bootstrap consistency of one-layered MLP under some strong regularity conditions.</p><p>[36] considered using a conventional nonparametric bootstrapping to robustify classifiers under noisy labeling. However, due to the nature of repetitive computations, its practical application to large-sized data sets is not trivial. <ref type="bibr">[31]</ref> proposed an approximation of bootstrapping for neural network by applying amortized variational Bayes. Despite its computational efficiency, the armortized bootstrap does not induce the exact target bootstrap distribution, and its theoretical justification is lacking.</p><p>Recently, <ref type="bibr">[25]</ref> proposes a bootstrapping method for neural processes. They utilized residual bootstrap to resolve the data discard problem, but their approach is not scalable since it requires multiple encoder computations.</p><p>Ensemble Methods Various advances of neural net ensembles have been made to improve computational efficiency and uncertainty quantification performance. Havasi et al. <ref type="bibr">[16]</ref> introduces Multiple Input Multiple Output (MIMO), that approximates independent neural nets by imposing multiple inputs and outputs, and Wen et al. <ref type="bibr">[42]</ref> proposes a low-rank approximation of ensemble networks, called BatchEnsemble. Latent Posterior Bayes NN (LP-BNN, <ref type="bibr">[11]</ref>) extends the BatchEnsemble to a Bayesian paradigm imposing a VAE structure on the individual low-rank factors, and the LP-BNN outperforms the MIMO and the BatchEnsemble in prediction calibration and OOD detection, but its computational burden is heavier than that of the BatchEnsemble. Stochastic Weight Averaging Gaussian (SWAG, <ref type="bibr">[28]</ref>) computes the posterior of the base neural net via a low-rank approximation with a batch sampling. Even though these strategies reduces the computational cost to train each ensemble network, unlike NeuBoots, they still demand multiple optimizations, and its computational cost linearly increases as the ensemble size grows up.</p><p>Uncertainty Estimation There are numerous approaches to quantify the uncertainty in predictions of NNs. Deep Confidence <ref type="bibr">[6]</ref> proposes a framework to compute confidence intervals for individual predictions using snapshot ensembling and conformal prediction. Also, a calibration procedure to approximate a confidence interval is proposed based on Bayesain neural networks <ref type="bibr">[23]</ref>. Gal and Ghahramani <ref type="bibr">[13]</ref> proposes MCDrop which captures model uncertainty casting dropout training in neural networks as an approximation of variational Bayes. Smith and Gal <ref type="bibr">[39]</ref> examines various measures of uncertainty for adversarial example detection. Lakshminarayanan et al. <ref type="bibr">[24]</ref> proposes a non-Bayesian approach, called DeepEnsemble, to estimate predictive uncertainty based on ensembles and adversarial training. Compared to DeepEnsemble, NeuBoots does not require adversarial training nor learning multiple models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>We introduced a novel and scalable bootstrapping method, NeuBoots, for neural networks. We applied it to the wide range of machine learning tasks related to uncertainty quantification; prediction calibration, active learning, out-of-distribution detection, and imbalanced datasets. NeuBoots also demonstrates superior bagging performance over semantic segmentation. Our empirical studies show that NeuBoots attains significant potential in quantifying uncertainty for large-sized applications, such as biomedical data analysis with high-resolution. As a future research, one can apply NeuBoots to natural language processing tasks using Transformor <ref type="bibr">[41]</ref>.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0"><p>https://github.com/sungbinlim/NeuBoots</p></note>
		</body>
		</text>
</TEI>
