Research Interests

Most of my research concentrates on statistical methods applicable in the social sciences, in particular multivariate categorical data. Currently I am working on three topics.

Assessing model fit using incomplete data. Most data sets collected by surveys are incomplete because of nonresponse and coverage errors. In most analyses, researchers try to make up for missing data using imputation methods but this is only justified if one has reason to believe that the information in the observed data is relevant to find out the missing part of the data. I am looking at ways to assess model fit without making such assumptions, by allowing the unobserved part of the data to be potentially very different from the observed part. Naturally, when such an approach is implemented, the data provide the user with much weaker evidence against a statistical model, then in the approach when the unobserved part of the data is supposed to be similar to the observed part.

Marginal models for categorical data. These are statistical models when previous knowledge restricts certain marginal distributions of the contingency table. Such models are relevant in several applications, including repeated measurements and panel studies, graphical models that represent Markov type properties or fusion of data sets from different sources. I have mostly worked on existence, characterization and parameterization issues related to such models. Many of the theoretical results are generalizations of results known for log-linear models and may be used to better understand and characterize Markov models associated directed acyclic graphs and chain graphs. In general, new insights into the smoothness properties of conditional independence models may be obtained.

Treatment selection. The preferable treatment, either in the causal or the evidential sense, when two treatments are compared, is usually selected using the odds ratio or the relative risk or a similar comparison. I am working on finding out how the so-called Simpson paradox is related to the application of the odds ratio or the relative risk to select a treatment, and on developing more consistent ways of reading off from the data which treatment is better.