American Statistical Association
In recent times, statisticians are encountering a new variety of datasets with many more variables than observations that we call “megavariate” data. The methodology for analyzing Megavariate remains a work in progress but new methodology has been developed that are worth of consideration. The central issue of these analyses is how to avoid overfitting the data, or how to separate the true signal from the spurious signal that is generated by the abundance of variables. I will present a pipeline of methods for analyzing Megavariate data that focuses on the following issues: (1) Individual variable analysis: to identify significant variables borrowing strength from all the variables; (2) Group analysis: procedures to identify significant groups of variables; and (3) Enriched methods for supervised and unsupervised classification: procedures that improve the performance of classifiers with the use of enrichment methodology. In the biological sciences, Megavariate datasets are common in experiments with DNA microarrays, DNA sequencing, mass spectroscopy and molecular imaging, technologies that are now gradually beginning to be more and more widely used in laboratory experiments and in clinical settings.
Javier Cabrera Ph.D. is the director of the Institute of Biostatistics and professor of the department of Statistics and Biostatistics at Rutgers University. He got his Ph.D. from Princeton University and has lectured in Statistics at Rutgers University, National University of Singapore and Hong Kong University of Science & Tech. He is author and coauthor of many publications in the areas of data mining and functional Genomics, including a book in Exploration and Analysis of DNA microarray and Protein array data. He is in the executive board of the Data Analysis working group for the DNA barcode of life. Research funded by NSF, Fulbright Fellow, 2010 SPAIG award for the Pfizer/Rutgers partnership.
|Date:||Thursday, November 18, 2010|
|Time:||4:00 - 5:00 P.M.|
Mailman School of Public Health
Department of Biostatistics
722 West 168th Street
Biostatistics Computer Lab
6th Floor - Room 656
New York, New York