Research

My research is on spatial epidemiology methods, specifically focusing on regularized and multi-model methods for spatial and spatio-temporal cluster detection. In my collaborative work, I have explored spatial aspects of breast cancer risk, maternal and obstetric outcomes, and environmental and ecological applications.

Regularized spatial and spatio-temporal cluster detection

Maria Kamenetsky, Junho Lee, Jun Zhu, Ronald Gangnon (Spatial and Spatio-Temporal Epidemiology, 2021)

Spatial and spatio-temporal cluster detection are important tools in public health and many other areas of application. Cluster detection can be approached as a multiple testing problem, typically using a space and time scan statistic. We recast the spatial and spatio-temporal cluster detection problem in a high-dimensional data analytical framework with Poisson or quasi-Poisson regression with the Lasso penalty. We adopt a fast and computationally-efficient method using a novel sparse matrix representation of the effects of potential clusters. The number of clusters and tuning parameters are selected based on (quasi-)information criteria. We evaluate the performance of our proposed method including the false positive detection rate and power using a simulation study. Application of the method is illustrated using breast cancer incidence data from three prefectures in Japan.

Tutorials supplement “Regularized spatial and spatio-temporal cluster detection” (in press) and are associated with the clusso R package, which can be found here.

Introduction to clusso

Mapping with clusso

Using clusso with case-control data

strm

Maria Kamenetsky, Guangqing Chi, Jun Zhu (2020)

strm is an R package that fits spatio-temporal regression model based on Chi & Zhu Spatial Regression Models for the Social Sciences (2019). The approach fits a simultaneous spatial error model (SAR) while incorporating a temporally lagged response variable and temporally lagged explanatory variables. The GitHub page can be found here and strm is now available on CRAN.

Clustered Spatio-Temporal Varying Coefficient Regression Model

Junho Lee, Maria Kamenetsky, Ronald Gangnon, Jun Zhu (Statistics in Medicine, 2021)

In regression analysis for spatio-temporal data, identifying clusters of spatial units over time in a regression coefficient could provide insight into the unique relationship between a response and covariates in certain subdomains of space and time windows relative to the background in other parts of the spatial domain and the time period of interest. In this article, we propose a varying coefficient regression method for spatial data repeatedly sampled over time, with heterogeneity in regression coefficients across both space and over time. In particular, we extend a varying coefficient regression model for spatial-only data to spatio-temporal data with flexible temporal patterns. We consider the detection of a potential cylindrical cluster of regression coefficients based on testing whether the regression coefficient is the same or not over the entire spatial domain for each time point. For multiple clusters, we develop a sequential identification approach. We assess the power and identification of known clusters via a simulation study. Our proposed methodology is illustrated by the analysis of a cancer mortality dataset in the Southeast of the U.S.

Tutorials supplement “Clustered Spatio-Temporal Varying Coefficient Regression Model” (2021) and are associated with the coefclust package, which can be found here.

Introduction to coefclust

Spatio-Temporal Analysis using coefclust

Spatial Regression Analysis of Poverty in R

Maria Kamenetsky, Guangqing Chi, Donghui Wang, Jun Zhu (Spatial Demography, 2019)

Poverty has been studied across many social science disciplines, resulting in a large body of literature. Scholars of poverty research have long recognized that the poor are not uniformly distributed across space. Understanding the spatial aspect of poverty is important because it helps us understand place-based structural inequalities. There are many spatial regression models, but there is a learning curve to learn and apply them to poverty research. This manuscript aims to introduce the concepts of spatial regression modeling and walk the reader through the steps of conducting poverty research using R: standard exploratory data analysis, standard linear regression, neighborhood structure and spatial weight matrix, exploratory spatial data analysis, and spatial linear regression. We also discuss the spatial heterogeneity and spatial panel aspects of poverty. We provide code for data analysis in the R environment and readers can modify it for their own data analyses. We also present results in their raw format to help readers become familiar with the R environment.

The tutorials below supplement “Spatial Regression Analysis of Poverty in R” (2019) by Kamenetsky, Chi, Wang, and Zhu (Spatial Demography, 2019). The SpatialRegPovertyR repository for these tutorials can be found here.

Using tidycensus

Weighting and transformations

Using tmap

Predictive Enforcement of Pollution and Hazardous Waste Violations in New York State - Data Science for Social Good

Eric Potash, Jimmy Jin, Maria Kamenetsky, Dean Magee, Paul van der Boor, Rayid Ghani (2016)

The improper treatment and disposal of hazardous waste can have disastrous effects on the environment and human health. The Resource Conservation and Recovery Act (RCRA) governs hazardous waste management in the United States. To enforce its regulations, the New York State Department of Environmental Conservation (NYSDEC) inspects facilities that handle hazardous materials. However, due to resource constraints, not all facilities can be inspected each year. We worked with NYSDEC to build predictive models that use reporting, monitoring, and enforcement data to prioritize inspection resources.

Details on the project and conference video can be found here.

Predictive Modeling for Environmental Protection: Hazardous Waste Management , can be found here .