ANALYZING INCOMPLETE SPATIAL DATA IN AIR POLLUTION PREDICTION
Abstract
In air pollution studies at metropolis, as in Bangkok or Saigon, installation of new stations for monitoring dangerous pollution sources is costly. Using statistical models and analyzing data sets collected at good stations to predict air pollution levels at malfunctioning stations, therefore, are highly demanding. We study air pollution prediction by geo-statistical methods with a realistic dataset costly observed in Ho Chi Minh City. Geostatistics includes statistical methods for modeling of spatially continuous phenomena, using data measured at a finite number of locations to build up right models, to estimate and predict values of interest (such as air or water pollutant levels in a geographical region, oil volumes of reservoirs under the ocean bed...) at unmeasured locations. To analyze our multivariate data (of SO2, PM-10 and benzen, where the last two are popular air pollution causes at metropolis) recorded in HCMC since 2003, we start from determining suitable co-kriging models for pollutants to predicting these pollutant concentrations at some un-measured stations in the city.
The paper’s key contributions include, firstly, formulating co-kriging models and computing theirs optimal unbiased estimators for air pollution prediction using the valuable observed data with two pollutants; secondly, proposing a computational mechanism (progressively co-kriging imputation) to deal with missing data at unmeasured monitoring sites.