SUFFICIENT SAMPLE SIZE: LIKELIHOOD BOOTTRAPPING

Kiselev, N. S; Grabovoi, A. V

doi:10.7868/S303453325020094

PII

S3034533S0044466925020094-1

DOI

10.7868/S303453325020094

Publication type

Article

Status

Published

Authors

N. S Kiselev

Affiliation: MIPT

A. V Grabovoi

Affiliation: MIPT

Volume/ Edition

Volume 65 / Issue number 2

Pages

235-242

Abstract

Determining the appropriate sample size is crucial for building effective machine learning models. Existing methods often either lack a rigorous theoretical basis or are tied to specific statistical hypotheses about the model parameters. In this paper, we present two new methods based on likelihood values on bootstrapped subsamples. We demonstrate the correctness of one of these methods in a linear regression model. Computational experiments with both synthetic and real datasets show that the proposed functions converge as the sample size increases, highlighting the practical usefulness of the approach.

Keywords

достаточный размер выборки бутстрапирование правдоподобия линейная регрессия вычислительная линейная алгебра

Date of publication

01.02.2025

Year of publication

2025

Number of purchasers

Views

126

References

1. Robert R Bies, Matthew F Muldoon, Bruce G Pollock et al. A genetic algorithm-based, hybrid machine learning approach to model selection // J. Pharmacokinet. Pharmacodyn. 2006. V. 33. № 2. P. 195.
2. Cawley, Gavin C. On over-fitting in model selection and subsequent selection bias in performance evaluation // J. Mach. Learn. Res. 2010. V. 11. № 1. P. 2079–2107.
3. Richard H Byrd, Gillian M Chin, Jorge Nocedal, Yuchen Wu. Sample size selection in optimization methods for machine learning // Math. Program. 2012. V. 134. № 1. P. 127–155.
4. Rosa L Figueroa, Qing Zeng-Treitler, Sasikiran Kandula, Long H Ngo. Predicting sample size required for classification performance // BMC Med. Inf. Decis. Making. 2012. V. 12. № 1. P. 1–10.
5. Indranil Balki, Afsaneh Amirabadi, Jacob Levman et al. Sample-size determination methodologies for machine learning in medical imaging research: a systematic review // Can. Assoc. Radiol. J. 2019. V. 70. № 4. P. 344–353.
6. Adcock, C. J. A Bayesian Approach to Calculating Sample Sizes // J. R. Stat. Soc. D. 1988. V. 37. № 4. P. 433.
7. Lawrence Joseph, David B. Wolfson, Roxane Du Berger. Sample Size Calculations for Binomial Proportions via Highest Posterior Density Intervals // J. R. Stat. Soc. D. 1995. V. 44. № 2. P. 143–154.
8. Steven G Self, Robert H Mauritsen. Power/sample size calculations for generalized linear models // Biometrics. 1988. V. 44. № 1. P. 79–86.
9. Gwowen Shieh. On power and sample size calculations for likelihood ratio tests in generalized linear models // Biometrics. 2000. V. 56. № 4. P. 1192–1196.
10. Gwowen Shieh. On power and sample size calculations for Wald tests in generalized linear models // J. Stat. Plann. Inference . 2005. V. 128. № 1. P. 43–59.
11. Dennis V. Lindley. The choice of sample size // J. R. Stat. Soc. D. 1997. V. 46. № 2. P. 129–138.
12. Dennis V. Lindley. On Bayesian analysis, Bayesian decision theory and the sample size problem // J. R. Stat. Soc. D. 1997. V. 46. № 2. P. 139–144.
13. Alan E. Gelfand, Fei Wang. A simulation-based approach to Bayesian sample size determination for performance under a given model and for separating models // Stat. Sci. 2002. V. 17. № 2. P. 192-208.
14. Jing Cao, J. Jack Lee, Susan Alber. Comparison of Bayesian sample size criteria: ACC, ALC, and WOC // J. Stat. Plann. Inference. 2009. V. 139. № 12. P. 4111–4122.
15. Pierpaolo Brutti, Fulvio De Santis, Stefania Gubbiotti. Bayesian-frequentist sample size determination: a game of two priors // METRON 2014. V. 72. № 2. P. 133–151.
16. Hamid Pezeshk, Nader Nematollahi, Vahed Maroufy, John Gittins. The choice of sample size: a mixed Bayesian / frequentist approach // Stat. Methods Med. Res. 2008. V. 18. № 2. P. 183–194.
17. A. V. Grabovoy, T. T. Gadaev, A. P. Motrenko, V. V. Strijov. Numerical Methods of Sufficient Sample Size Estimation for Generalised Linear Models // Lobachevskii J. Math. 2022. V. 43. № 9. P. 2453–2462.
18. Anastasiya Motrenko, Vadim Strijov, Gerhard-Wilhelm Weber. Sample size determination for logistic regression // J. Comput. Appl. Math. 2014. V. 255. № 2. P. 743–752.
19. Lawrence Joseph, Roxane Du Berger, Patrick Belisle. Bayesian and mixed Bayesian/likelihood criteria for sample size determination // Stat. Med. 1997. V. 16. № 7. P. 769–781.
20. Markelle, Kelly. The UCI Machine Learning Repository. https://archive.ics.uci.edu.

GOST	Grabovoi A., Kiselev N. SUFFICIENT SAMPLE SIZE: LIKELIHOOD BOOTTRAPPING // Computational Mathematics and Mathematical Physics. – 2025. – V. 65. – Issue number 2 C. 235-242 . URL: https://zhvmmfras.ru/s3034533s0044466925020094-1/?version_id=106963. DOI: 10.7868/S303453325020094
MLA	Grabovoi, A. V, Kiselev, N. S "SUFFICIENT SAMPLE SIZE: LIKELIHOOD BOOTTRAPPING." Computational Mathematics and Mathematical Physics. 65.2 (2025).:235-242. DOI: 10.7868/S303453325020094
APA	Grabovoi A., Kiselev N. (2025). SUFFICIENT SAMPLE SIZE: LIKELIHOOD BOOTTRAPPING. Computational Mathematics and Mathematical Physics. vol. 65, no. 2, pp.235-242 DOI: 10.7868/S303453325020094

RAS MathematicsЖурнал вычислительной математики и математической физики Computational Mathematics and Mathematical Physics

SUFFICIENT SAMPLE SIZE: LIKELIHOOD BOOTTRAPPING

You can

References

Indexing

RAS MathematicsЖурнал вычислительной математики и математической физики Computational Mathematics and Mathematical Physics

SUFFICIENT SAMPLE SIZE: LIKELIHOOD BOOTTRAPPING

You can

References

Indexing

Via social network