Linear Regressions With Combined Data
13.12.2024 11:15 – 12:15
RESEARCH INSTITUTE FOR STATISTICS AND INFORMATION SCIENCE: STATISTICS SEMINAR
(jointly with Xavier D’Haultfoeuille, CREST, and Arnaud Maurel, Duke University)
ABSTRACT
We study best linear predictions in a context where the outcome of interest and some of the covariates are observed in two different datasets that cannot be matched. Traditional approaches obtain point identification by relying, often implicitly, on exclusion restrictions. We show that without such restrictions, coefficients of interest can still be partially identified and we derive a constructive characterization of the sharp identified set. Technically speaking, our first identification result can be seen as an extension of the Cambanis-Simons-Stout inequality. We then build on this characterization to develop computationally simple and asymptotically normal estimators of the corresponding bounds. To obtain these results, we build on asymptotic results about the optimal transport cost on the real line. We study their performance through simulations. Finally, we apply our method to racial inequality in patent applications in the United States. Even when the race of applicants is unobserved in the patent data, our method yields informative bounds without relying on the exclusion restrictions usually imposed in the literature.
Lieu
Bâtiment: Uni Mail
Boulevard du Pont-d'Arve 40
1205 Geneva
Room M 5220, 5th floor
Organisé par
Faculté d'économie et de managementResearch Institute for Statistics and Information Science
Intervenant-e-s
Christophe GAILLAC, Professor, GSEMentrée libre
Classement
Catégorie: Séminaire
Mots clés: Best linear prediction, data combination, Optimal transports, partial identification, racial innovation gap