On the Use of Random Forests and Advances in Data Capture for Small Area Estimation

05.05.2023 11:15 – 12:15

RESEARCH CENTER FOR STATISTICS SEMINAR / ABSTRACT

In this presentation I will discuss current research on the use of random forests, and in particular mixed effects random forests, as a flexible method for small area estimation. Random forests excel in terms of predictive performance. Automated model-selection and detecting covariate interactions make their use appealing for prediction problems. Mixed effects random forests however appear to be going against the algorithmic modelling culture (Breiman, 2001), that treats the prediction mechanism as unknown, and are more in line with the data modelling culture (e.g., Efron, 2020). I will focus on critically evaluating (a) the need to model the dependence structure in the data, for example via the use of random effects, (b) the need to use data-driven transformations when employing random forest methods and (c) the potential added value of using random forests for small area estimation. Small area predictors are derived by using a smearing-type estimator that has been explored in small area and survey estimation before in the context of outlier-robust estimation (Chambers and Tzavidis, 2006, Chambers et al., 2014, Welsh and Rochetti, 1998). Empirical work focuses on the estimation of non-linear indicators of poverty and inequality for small areas. Comparisons with Empirical Best Prediction under a linear mixed model are presented using model-based simulations and real data. This work aims to inform the discussion on the use of machine learning methods in the production of official statistics.

This is joint work with Patrick Krennmair (Freie Universität Berlin, Berlin) and Timo Schmid (Otto-Friedrich-Universität Bamberg, Bamberg)

The work described above assumes access to population-level microdata. The typical source of such data, especially in low resource data settings that lack administrative data sources, are Censuses. Relying on access to Census-microdata is problematic both because of the frequency (or complete absence) of Censuses and because of data access restrictions. In recent years there has been renewed interest in the use of geospatial data as auxiliaries in survey estimation. Geospatial data are publicly available, easy to harness and frequently updated. However, geospatial data processing is at the level of a geographical-grid and not at the household level, creating a spatial misalignment. If time allows, I will discuss a current methodological debate around this misalignment and initial theoretical and numerical results assessing its impact on estimation.

This second part on methodological advances in data capture is joint work Luciano Perfetti-Villa (University of Southampton) and Angela Luna (University of Southampton) and is informed by joint work with the World Bank.

Lieu

Bâtiment: Uni Mail

ONLINE & in Uni Mail

Boulevard du Pont-d'Arve 40
1205 Geneva

Room M 3393, 3rd floor

Organisé par

Faculté d'économie et de management
Research Center for Statistics

Intervenant-e-s

Nikos TZAVIDIS, University of Southampton, UK

entrée libre

Classement

Catégorie: Séminaire

Plus d'infos

www.unige.ch/gsem/en/research/seminars/rcs/

Contact: missing email