Skip to content
Home
About Us
Resources
Profiles Metrics
Authors Directory
Institutions Directory
Top Authors
Top Institutions
Top Sponsors
AI Digest
Contact Us
Menu
Home
About Us
Resources
Profiles Metrics
Authors Directory
Institutions Directory
Top Authors
Top Institutions
Top Sponsors
AI Digest
Contact Us
Home
About Us
Resources
Profiles Metrics
Authors Directory
Institutions Directory
Top Authors
Top Institutions
Top Sponsors
AI Digest
Contact Us
Menu
Home
About Us
Resources
Profiles Metrics
Authors Directory
Institutions Directory
Top Authors
Top Institutions
Top Sponsors
AI Digest
Contact Us
Publication Details
AFRICAN RESEARCH NEXUS
SHINING A SPOTLIGHT ON AFRICAN RESEARCH
computer science
Uncertainty estimation with a finite dataset in the assessment of classification models
Computational Statistics and Data Analysis, Volume 56, No. 5, Year 2012
Notification
URL copied to clipboard!
Description
To successfully translate genomic classifiers to the clinical practice, it is essential to obtain reliable and reproducible measurement of the classifier performance. A point estimate of the classifier performance has to be accompanied with a measure of its uncertainty. In general, this uncertainty arises from both the finite size of the training set and the finite size of the testing set. The training variability is a measure of classifier stability and is particularly important when the training sample size is small. Methods have been developed for estimating such variability for the performance metric AUC (area under the ROC curve) under two paradigms: a smoothed cross-validation paradigm and an independent validation paradigm. The methodology is demonstrated on three clinical microarray datasets in the microarray quality control consortium phase two project (MAQC-II): breast cancer, multiple myeloma, and neuroblastoma. The results show that the classifier performance is associated with large variability and the estimated performance may change dramatically on different datasets. Moreover, the training variability is found to be of the same order as the testing variability for the datasets and models considered. In conclusion, the feasibility of quantifying both training and testing variability of classifier performance is demonstrated on finite real-world datasets. The large variability of the performance estimates shows that patient sample size is still the bottleneck of the microarray problem and the training variability is not negligible. © 2011 Elsevier B.V. All rights reserved.
Authors & Co-Authors
Chen, Weijie
United States, Rockville
Food and Drug Administration, Center for Devices and Radiological Health
Yousef, Waleed A.
Egypt, Helwan
Faculty of Computers and Artificial Intelligence
Gallas, Brandon D.
United States, Rockville
Food and Drug Administration, Center for Devices and Radiological Health
Hsu, Elizabeth R.
United States, Rockville
Food and Drug Administration, Center for Devices and Radiological Health
Lababidi, Samir
United States, Rockville
Food and Drug Administration, Center for Devices and Radiological Health
Tang, Rong
United States, Rockville
Food and Drug Administration, Center for Devices and Radiological Health
Pennello, Gene A.
United States, Rockville
Food and Drug Administration, Center for Devices and Radiological Health
Symmans, William F Frasher
United States, Houston
University of Texas Health Science Center at Houston
Pusztai, Lajos
United States, Houston
University of Texas Health Science Center at Houston
Statistics
Citations: 9
Authors: 9
Affiliations: 3
Identifiers
Doi:
10.1016/j.csda.2011.05.024
ISSN:
01679473
Research Areas
Cancer
Health System And Policy