Optimal ratio for data splitting

Joseph, V. Roshan  (ORCID:0000000294305301)

doi:10.1002/sam.11583

Citation Details

Optimal ratio for data splitting

Abstract

It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article, we show that the optimal training/testing splitting ratio is , where is the number of parameters in a linear regression model that explains the data well.

Award ID(s):: 1921646 1921873

NSF-PAR ID:: 10445061

Author(s) / Creator(s):: Joseph, V. Roshan

Publisher / Repository:: Wiley Blackwell (John Wiley & Sons)

Date Published:: 2022-04-04

Journal Name:: Statistical Analysis and Data Mining: The ASA Data Science Journal

Volume:: 15

Issue:: 4

ISSN:: 1932-1864

Page Range / eLocation ID:: p. 531-538

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1002/sam.11583

More Like this