Date of target interview in SC2

Dear NEPS team,

I am trying to retrieve the target person’s age at the time of the interview. I computed this with the target’s birthday plus the variables tx8601y,tx8601m. However, I realised there is a lot of missingness in this variable, and mostly covers waves 5, 6, and 9. However, for the remaining waves in which the target interview happened (7, 8, 10, 11) the coverage is very poor, even within participating targets (according to var tx80220). Is there another variable I can use for this? Thanks!

Dear Patricia,

I suspect you used the birthdate variables directly from the pParent dataset without performing any further data transformation?

Since the birthdate generally doesn’t change over time, it wasn’t queried again in every wave, hence the many missing values. Before calculating a new variable „age at time of interview,“ you should first set the existing birthdate for each child to a single value per person, as is standard practice with time-constant information. There are several common ways to do this. One example is to calculate the mode across all non-missing values. This value is then merged with all data rows from CohortProfile using a one-to-many merge, and the age is calculated. An example syntax (e.g., in Stata) might look like this.

use pParent.dta, clear
bysort ID_t: egen birthmonth_mode = mode(p70012m), minmode
bysort ID_t: egen birthyear_mode = mode(p70012y), minmode
keep ID_t birth*_mode  // keep only variables of interest
duplicates drop      // drop unnecessary duplicate information
isid ID_t            // make sure to have only 1 line per ID_t
tempfile birthdates
save `birthdates'

use CohortProfile.dta, clear
merge m:1 ID_t using `birthdates'

--> go on with age calculation...

Note that obviously you wil have to adapt the „use dataset“ commands to your file structure. In this case, the smallest value was selected for each mode calculation using the minmode option, but there are of course other ways to handle the situation where some target individuals may have multiple, non-identical birth dates.

Good luck and dont hesitate to reach out again if you have further questions…

Kind regards,

Benno Schönberger

Hey Benno,

Sorry, I did not clarify enough in my post. But indeed I did not have a problem with the birthday, as acknowledged it does not vary, but I meant with the interview date variables (tx8601y,tx8601m). As I mentioned in my previous post, it presents a lot of missingness in waves 7, 8, 9, 11, (seems ok in waves 5, 6, 9) even for people whose data were collected (according to var tx80220). Hence, I thought maybe there was a differently named variable for the interview date of each wave? Although, I could not find them. I could roughtly proxy it with the year label that each wave presents in Stata, but I was hoping for a neater solutions.

Thanks for your help. Best,

Patricia

Hi Patricia,

the lack of interview dates at the student level in these waves can be explained by the fact that, for these mentioned waves mainly paper questionnaires were used. It has not been recorded when the children filled them out, even though this would have been possible in many cases. In all waves where competency tests were also administered, the children received the paper questionnaires on the exact same day. So, for waves 9 and 11, you can use the information from tx8610*. And for the other waves, I would assume that the information about the timing of the parent interview can also be used as a good proxy. Alternatively, you can, of course, assign all further missing data to the average of the other children or even a specific month. It naturally depends on the type of analysis you want to perform with the data, but in some cases, it’s certainly better to accept the imprecision of potentially inaccurately calculated ages than to lose entire cases due to missing data. But that’s your decision, of course…

Kind regards,

Benno