The summaries are free for public
use. The Chronic Liver Disease
Foundation will continue to add and
archive summaries of articles deemed
relevant to CLDF by the Board of
Trustees and its Advisors.
Hepatocellular Carcinoma Risk Prediction in the NIH-AARP Diet and Health Study Cohort: A Machine Learning Approach
J Hepatocell Carcinoma. 2022 Feb 15;9:69-81. doi: 10.2147/JHC.S341045. eCollection 2022.
Jonathan Thomas1, Linda M Liao2, Rashmi Sinha2, Tushar Patel#1, Samuel O Antwi#3
Department of Transplantation, Mayo Clinic, Jacksonville, FL, USA.
Division of Cancer Epidemiology and Genetics, The National Cancer Institute, Bethesda, MD, USA.
Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, FL, USA.
Background: Prediction of hepatocellular carcinoma (HCC) development in persons with known risk factors remain a challenge and is an urgent unmet need, considering projected increases in HCC incidence and mortality in the US. We aimed to use machine learning techniques to identify a set of demographic, lifestyle, and health history information that can be used simultaneously for population-level HCC risk prediction.
Methods: Data from 377,065 participants of the NIH-AARP Diet and Health Study, among whom 647 developed HCC over 16 years of follow-up, were analyzed. The sample was randomly divided into independent training (60%) and validation (40%) sets. We evaluated 123 participant characteristics and tested 15 different machine learning algorithms for robustness in predicting HCC risk. Separately, we evaluated variables selected from multivariable logistic regression for risk prediction.
Results: The random under-sampling boosting (RUSBoost) algorithm performed best during model testing. Fourteen participant characteristics were selected for risk prediction based on differences between cases and controls (Bonferroni-corrected p-values <0.0004) and from the most frequently used variables in the initial two decision trees of the RUSBoost learner trees. A predictive model based on the 14 variables had an AUC of 0.72 (sensitivity=0.68, specificity=0.63) and independent validation AUC of 0.65 (sensitivity=0.68, specificity=0.63). A subset of 9 variables identified through logistic regression also had an AUC of 0.72 (sensitivity=0.67, specificity=0.63) and independent validation AUC of 0.65 (sensitivity=0.70, specificity=0.61).
Conclusion: Population-level HCC risk prediction can be performed with a machine learning-based algorithm and could inform strategies for improving HCC risk reduction in at-risk groups.