Good morning everyone,
For my master’s thesis, I am conducting a trend analysis of alcohol consumption in Germany in relation to socioeconomic status over time. While reviewing the SOEP dataset, I noticed that the variables related to alcohol consumption are either unavailable in some waves (for example, the early waves did not include any alcohol-related questions) or are inconsistently distributed across different waves. For instance, in 2006, the individual questionnaire included the question “How much alcohol do you drink?”, whereas in 2009, a different question—“Have you ever drunk alcohol?”—appeared only in the youth questionnaire.
My main challenge now is how to harmonize these scattered variables into consistent measures that can be used for longitudinal analysis.
My first question is whether there is a way to determine exactly in which waves alcohol-related questions were included.
My second question is whether you could provide any guidance or suggestions on how to harmonize these inconsistently measured variables across waves.
Thank you very much for your help.
Best regards,
Christian Carrillo Gonzalez