Andy Shen

I am a fourth-year PhD Candidate in Statistics at UC Berkeley. I am advised by Haiyan Huang and Sam Pimentel.

My research is motivated by applied problems in epidemiology and medicine. I am interested in developing causal inference methods to help scientists make robust and transparent causal claims. I am also interested in solving data-driven problems using statistical learning and causal inference tools. My research is supported by the NSF Graduate Research Fellowship.

I graduated from UCLA in June 2021 with my B.Sc. in Statistics. I was advised by Rick Schoenberg and Peter Katona. I have spent many summers as a statistics intern at Los Alamos National Laboratory and most recently completed a biostatistics internship at Denali Therapeutics in summer 2023.

Feel free to connect with me via email or LinkedIn.

Email  /  CV  /  Google Scholar  /  Github

profile photo

News

November 2024: Our paper on sensitivity analysis for causal decompositions received a student paper award at the International Conference on Health Policy Statistics!

March 2024: I passed my qualifying exam and advanced to candidacy! Thank you to my committee chair Peng Ding and committee members Haiyan Huang, Sam Pimentel, Peter Bickel, and Jon McAuliffe for their service and feedback.

Research

Sensitivity Analysis for Causal Decomposition Analysis (2024+)
Andy Shen, Elina Visoki, Ran Barzilay, and Sam Pimentel
[arXiv Preprint]

Decomposition analysis is an observational causal inference tool to study health disparities experienced by minority groups. This paper develops a novel sensitivity analysis method in the decomposition regime, allowing clinical researchers to better understand how unmeasured confounding may impact their findings.

A Hybrid Variational Autoencoder for Synthetic Polymer Design (2022)
Shuni Li, Zhiyuan Ruan, Andy Shen, Ivan Jayapurna, Ting Xu, and Haiyan Huang
AAAI Workshop on AI to Accelerate Science and Engineering [PDF]

A recent development in material science is using synthetic heteropolymers to mimic proteins. We develop a hybrid variational autoencoder to identify monomer compositions that achieve optimal performance in wet-lab experiments.

Forecasting Ebola Spread with Hawkes Point Process Models (2021)
Sarita Lee, Andy Shen, and Rick Schoenberg
Journal of Forecasting [DOI]

Hawkes point process models are traditionally used to forecast earthquake aftershocks and have been generalized to forecast epidemics. Here we prospectively compare multiple Hawkes models to forecast the spread of Ebola in Democratic Republic of the Congo.

Teaching

Statistics 135: Concepts of Statistics (Fall 2022)
Graduate Student Instructor (GSI), UC Berkeley

Upper-division undergraduate course covering theoretical and mathematical statistics. Topics included estimation, hypothesis testing, inference and regression. Received Outstanding Graduate Student Instructor award for excellent teaching.

Statistics 100C: Linear Models (Spring 2021)
Undergraduate Learning Assistant, UCLA

Upper-division undergraduate course covering foundations of linear regression. Topics included ordinary least squares, hypothesis testing in linear models, heteroscedasticity/multicollinearity and model checking/selection.

Statistics 20: Statistical Programming with R (Fall 2020, Winter 2021, Spring 2021)
Undergraduate Learning Assistant, UCLA

Lower-division undergraduate course covering R programming basics such as vectors, matrices, flow control, graphics and simulation.

Statistics 13: Statistical Methods for Life and Health Sciences (Winter 2021)
Undergraduate Learning Assistant, UCLA

Lower-division undergraduate course covering introductory statistical concepts for life science majors.