A two-step method for variable selection in the analysis of a case-cohort study

13Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interest are required. Case-cohort studies often collect data on a large number of variables which have not been measured in the entire cohort (e.g. panels of biomarkers). There is a lack of guidance on methods for variable selection in case-cohort studies. Methods: We describe and explore the application of three variable selection methods to data from a case-cohort study. These are: (i) selecting variables based on their level of significance in univariable (i.e. one-at-a-time) Prentice-weighted Cox regression models; (ii) stepwise selection applied to Prentice-weighted Cox regression; and (iii) a two-step method which applies a Bayesian variable selection algorithm to obtain posterior probabilities of selection for each variable using multivariable logistic regression followed by effect estimation using Prentice-weighted Cox regression. Results: Across nine different simulation scenarios, the two-step method demonstrated higher sensitivity and lower false discovery rate than the one-at-a-time and stepwise methods. In an application of the methods to data from the EPIC-InterAct case-cohort study, the two-step method identified an additional two fatty acids as being associated with incident type 2 diabetes, compared with the one-at-a-time and stepwise methods. Conclusions: The two-step method enables more powerful and accurate detection of exposure-outcome associations in case-cohort studies. An R package is available to enable researchers to apply this method.

References Powered by Scopus

Regression Shrinkage and Selection Via the Lasso

35617Citations
N/AReaders
Get full text

Regularization and variable selection via the elastic net

13097Citations
N/AReaders
Get full text

Bayes factors

12719Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Effects of Primary Sclerosing Cholangitis on Risks of Cancer and Death in People With Inflammatory Bowel Disease, Based on Sex, Race, and Age

122Citations
N/AReaders
Get full text

Plasma cell-free DNA-based predictors of response to abiraterone acetate/prednisone and prognostic factors in metastatic castration-resistant prostate cancer

17Citations
N/AReaders
Get full text

Prevalence and predictors of work-related musculoskeletal disorders among workers of a gold mine in south Kivu, Democratic Republic of Congo

14Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Newcombe, P. J., Connolly, S., Seaman, S., Richardson, S., & Sharp, S. J. (2018). A two-step method for variable selection in the analysis of a case-cohort study. International Journal of Epidemiology, 47(2), 597–604. https://doi.org/10.1093/ije/dyx224

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 18

75%

Researcher 5

21%

Professor / Associate Prof. 1

4%

Readers' Discipline

Tooltip

Medicine and Dentistry 15

68%

Mathematics 4

18%

Biochemistry, Genetics and Molecular Bi... 2

9%

Decision Sciences 1

5%

Article Metrics

Tooltip
Social Media
Shares, Likes & Comments: 2

Save time finding and organizing research with Mendeley

Sign up for free