skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 27, 2026

Title: On logistic regression and maximum entropy approaches
Logistic Regression is a widely used generalized linear model applied in classification settings to assign probabilities to class labels. It is also well known that logistic regression is a maximum entropy procedure subject to what are sometimes called the balance conditions. The dominant view in existing explanations are all discriminative, i.e., modeling labels given the data. This paper adds to the maximum entropy interpretation, establishing a generative, maximum entropy explanation for the commonly used logistic regression training and optimization procedures. We show that logistic regression models the conditional distribution on the instance space given class labels with a maximum entropy model subject to a first moment constraint on the training data, and that the commonly used fitting procedure would be a Monte-Carlo fit for the generative view.  more » « less
Award ID(s):
2244574 2324396
PAR ID:
10586900
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE Symposium on Information Theory
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ordinal regression classifies an object to a class out of a given set of possible classes, where labels possess a natural order. It is relevant to a wide array of domains including risk assessment, sentiment analysis, image ranking, and recommender systems. Like common classification, the primary goal of ordinal regression is accuracy. Yet, in this context, the severity of prediction errors varies, e.g., in risk assessment, Critical Risk is more urgent than High risk and significantly more urgent than No risk. This leads to a modified objective of ensuring that the model's output is as close as possible to the correct class, considering the order of labels. Therefore, ordinal regression models should use ordinality-aware loss for training. In this work, we focus on two properties of ordinality-aware losses, namely monotonicity and balance sensitivity. We show that existing ordinal loss functions lack these properties and introduce SLACE (Soft Labels Accumulating Cross Entropy), a novel loss function that provably satisfies said properties. We demonstrate empirically that SLACE outperforms the state-of-the-art ordinal loss functions on most tabular ordinal regression benchmarks. 
    more » « less
  2. Novel machine learning algorithms that make the best use of a significantly less amount of data are of great interest. For example, active learning (AL) aims at addressing this problem by iteratively training a model using a small number of labeled data, testing the whole data on the trained model, and then querying the labels of some selected data, which then are used for training a new model. This paper presents a fast and accurate data selection method, in which the selected samples are optimized to span the subspace of all data. We propose a new selection algorithm, referred to as iterative projection and matching (IPM), with linear complexity w.r.t. the number of data, and without any parameters to be tuned. In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples. The computational efficiency and the selection accuracy of our proposed algorithm outperform those of the conventional methods. Furthermore, the superiority of the proposed algorithm is shown on active learning for video action recognition dataset on UCF-101; learning using representatives on ImageNet; training a generative adversarial network (GAN) to generate multi-view images from a single-view input on CMU Multi-PIE dataset; and video summarization on UTE Egocentric dataset. 
    more » « less
  3. We consider the differentially private sparse learning problem, where the goal is to estimate the underlying sparse parameter vector of a statistical model in the high-dimensional regime while preserving the privacy of each training example. We propose a generic differentially private iterative gradient hard threshoding algorithm with a linear convergence rate and strong utility guarantee. We demonstrate the superiority of our algorithm through two specific applications: sparse linear regression and sparse logistic regression. Specifically, for sparse linear regression, our algorithm can achieve the best known utility guarantee without any extra support selection procedure used in previous work [Kifer et al., 2012]. For sparse logistic regression, our algorithm can obtain the utility guarantee with a logarithmic dependence on the problem dimension. Experiments on both synthetic data and real world datasets verify the effectiveness of our proposed algorithm. 
    more » « less
  4. Categorical data analysis becomes challenging when high-dimensional sparse covariates are involved, which is often the case for omics data. We introduce a statistical procedure based on multinomial logistic regression analysis for such scenarios, including variable screening, model selection, order selection for response categories, and variable selection. We perform our procedure on high-dimensional gene expression data with 801 patients, 2426 genes, and five types of cancerous tumors. As a result, we recommend three finalized models: one with 74 genes achieves extremely low cross-entropy loss and zero predictive error rate based on a five-fold cross-validation; and two other models with 31 and 4 genes, respectively, are recommended for prognostic multi-gene signatures. 
    more » « less
  5. null (Ed.)
    Background : Machine learning has been used for classification of physical behavior bouts from hip-worn accelerometers; however, this research has been limited due to the challenges of directly observing and coding human behavior “in the wild.” Deep learning algorithms, such as convolutional neural networks (CNNs), may offer better representation of data than other machine learning algorithms without the need for engineered features and may be better suited to dealing with free-living data. The purpose of this study was to develop a modeling pipeline for evaluation of a CNN model on a free-living data set and compare CNN inputs and results with the commonly used machine learning random forest and logistic regression algorithms. Method : Twenty-eight free-living women wore an ActiGraph GT3X+ accelerometer on their right hip for 7 days. A concurrently worn thigh-mounted activPAL device captured ground truth activity labels. The authors evaluated logistic regression, random forest, and CNN models for classifying sitting, standing, and stepping bouts. The authors also assessed the benefit of performing feature engineering for this task. Results : The CNN classifier performed best (average balanced accuracy for bout classification of sitting, standing, and stepping was 84%) compared with the other methods (56% for logistic regression and 76% for random forest), even without performing any feature engineering. Conclusion : Using the recent advancements in deep neural networks, the authors showed that a CNN model can outperform other methods even without feature engineering. This has important implications for both the model’s ability to deal with the complexity of free-living data and its potential transferability to new populations. 
    more » « less