Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far. In this work, we propose a new design called sequentially rule-adaptive trials to learn optimal ITRs based on the contextual bandit framework, in contrast to the response-adaptive design in traditional adaptive trials. In our design, each entering patient will be allocated with a high probability to the current best treatment for this patient, which is estimated using the past data based on some machine learning algorithm (for example, outcome weighted learning in our implementation). We explore the tradeoff between training and test values of the estimated ITR in single-stage problems by proving theoretically that for a higher probability of following the estimated ITR, the training value converges to the optimal value at a faster rate, while the test value converges at a slower rate. This problem is different from traditional decision problems in the sense that the training data are generated sequentially and are dependent. We also develop a tool that combines martingale with empirical process to tackle the problem that cannot be solved by previous techniques for i.i.d. data. We show by numerical examples that without much loss of the test value, our proposed algorithm can improve the training value significantly as compared to existing methods. Finally, we use a real data study to illustrate the performance of the proposed method.
more »
« less
Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
The goal of this paper is to provide a unifying view of a wide range of problems of interest in machine learning by framing them as the minimization of functionals defined on the space of probability measures. In particular, we show that generative adversarial networks, variational inference, and actor-critic methods in reinforcement learning can all be seen through the lens of our framework. We then discuss a generic optimization algorithm for our formulation, called probability functional descent (PFD), and show how this algorithm recovers existing methods developed independently in the settings mentioned earlier.
more »
« less
- Award ID(s):
- 1820942
- PAR ID:
- 10110456
- Date Published:
- Journal Name:
- Proceedings of the 36th International Conference on Machine Learning
- Volume:
- 97
- Page Range / eLocation ID:
- 1213-1222
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Area under the ROC curve (AUC) is a standard metric that is used to measure classification performance for imbalanced class data. Developing stochastic learning algorithms that maximize AUC over accuracy is of practical interest. However, AUC maximization presents a challenge since the learning objective function is defined over a pair of instances of opposite classes. Existing methods circumvent this issue but with high space and time complexity. From our previous work of redefining AUC optimization as a convex-concave saddle point problem, we propose a new stochastic batch learning algorithm for AUC maximization. The key difference from our previous work is that we assume that the underlying distribution of the data is uniform, and we develop a batch learning algorithm that is a stochastic primal-dual algorithm (SPDAM) that achieves a linear convergence rate. We establish the theoretical convergence of SPDAM with high probability and demonstrate its effectiveness on standard benchmark datasets.more » « less
-
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)We investigate replicable learning algorithms. Informally a learning algorithm is replicable if the algorithm outputs the same canonical hypothesis over multiple runs with high probability, even when different runs observe a different set of samples from the unknown data distribution. In general, such a strong notion of replicability is not achievable. Thus we consider two feasible notions of replicability called {\em list replicability} and {\em certificate replicability}. Intuitively, these notions capture the degree of (non) replicability. The goal is to design learning algorithms with optimal list and certificate complexities while minimizing the sample complexity. Our contributions are the following. 1. We first study the learning task of estimating the biases of $$d$$ coins, up to an additive error of $$\varepsilon$$, by observing samples. For this task, we design a $(d+1)$-list replicable algorithm. To complement this result, we establish that the list complexity is optimal, i.e there are no learning algorithms with a list size smaller than $d+1$ for this task. We also design learning algorithms with certificate complexity $$\tilde{O}(\log d)$$. The sample complexity of both these algorithms is $$\tilde{O}(\frac{d^2}{\varepsilon^2})$$ where $$\varepsilon$$ is the approximation error parameter (for a constant error probability). 2. In the PAC model, we show that any hypothesis class that is learnable with $$d$$-nonadaptive statistical queries can be learned via a $(d+1)$-list replicable algorithm and also via a $$\tilde{O}(\log d)$$-certificate replicable algorithm. The sample complexity of both these algorithms is $$\tilde{O}(\frac{d^2}{\nu^2})$$ where $$\nu$$ is the approximation error of the statistical query. We also show that for the concept class \dtep, the list complexity is exactly $d+1$ with respect to the uniform distribution. To establish our upper bound results we use rounding schemes induced by geometric partitions with certain properties. We use Sperner/KKM Lemma to establish the lower bound results.more » « less
-
Abstract To know if a lightning discharge reaches the ground or remains within the thundercloud is critical for lightning safety as cloud‐to‐ground lightning poses the greatest threat to life and property. The current classification methods for most lightning detection networks, which are based on the classification of electromagnetic pulses produced by lightning, still have plenty of room to improve, including some known issues to be addressed. We present a machine‐learning approach to classify lightning discharges. The classification model used in this study is based on Support Vector Machines (SVMs). Compared with traditional multiparameter methods, our algorithm does not require extraction of individual pulse parameters and additionally provides a probability for each prediction. Using a representative lightning pulse data collected by the Cordoba Marx Meter Array in Argentina, we found the classification accuracy of our machine‐learning algorithm to be 97%, which is higher than that for the existing lightning detection networks.more » « less
-
This letter studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such prob- lems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and per- form SG. The oracle probabilistically returns either a noisy gradient of the function or a non-informative measure- ment, depending on the oracle state and incentive. The learner’s query and incentive are visible to an eavesdropper who wishes to estimate the stationary point. This letter formulates the problem of the learner performing covert optimization by dynamically incentivizing the stochastic oracle and obfuscating the eavesdropper as a finite-horizon Markov decision process (MDP). Using conditions for interval-dominance on the cost and transition probability structure, we show that the optimal policy for the MDP has a monotone threshold structure. We propose searching for the optimal stationary policy with the threshold structure using a stochastic approximation algorithm and a multi– armed bandit approach. The effectiveness of our methods is numerically demonstrated on a covert federated learning hate-speech classification task.more » « less
An official website of the United States government

