Linking SC3 and SC4 at the school level


since some students in the SC4 visit the same schools as the SC3 students due to the sampling frame, I was wondering if the two starting cohorts can be linked to their mutual school in these cases. In other words: Is it possible to identify the schools at which both the fifth graders (SC3) and ninth graders (SC4) participated in the NEPS? The result would be a dataset where each distinct school would encompass students from both the SC3 and SC4 whenever possible.

If this is feasible, I would also like to know if the pInstitution datasets from SC3 and SC4 are identical for these schools (e.g. because the principals of these schools were surveyed only once).

Thank you very much and best regards
Adrian Greiner

Hi Adrian,

it is possible to merge the pInstitution datasets of SC3 and SC4 - the ID_i are the same, because originally it really used to be a common dataset for both cohorts. So I can answer „yes“ to both of your questions.

However, you have to manipulate the wave indicator because, for example, wave 3 in SC4 does not match wave 3 in SC3 chronologically.

Best regards

Hi Dietmar,

thank you for the quick response. I currently only want to use the first two waves of SC4, so the different timing of the survey waves will hopefully not be an issue.

Would it be possible to identify the schools with the two cohorts without using pInstitution? I’m asking because the pTarget datasets also contain the variable ID_i.

Best regards

Hello Adrian!

You can use data from CohortProfile as well.

Just keep ID_i of CohortProfile of SC3 and SC4, drop the duplicate and merge both ID_i-lists. After that you merge other datasets 1:m over ID_i
This is a simple example -

best regards

use ID_i using "C:\Users\bainb201\Desktop\Data\SC4_D_13-0-0\Stata14\SC4_CohortProfile_D_13-0-0.dta", clear
keep if ID_i >0 & !missing(ID_i)
duplicates drop
tempfile sc4
save "`sc4'", replace

use ID_i using "C:\Users\bainb201\Desktop\Data\SC3_D_12-1-0\Stata14\SC3_CohortProfile_D_12-1-0.dta", clear
keep if ID_i >0 & !missing(ID_i)
duplicates drop
merge 1:1 ID_i using "`sc4'", generate(in_cohort)
label define `: value label in_cohort' 1"SC3" 2"SC4" 3"SC3+SC4", modify

merge 1:m ID_i using "C:\Users\bainb201\Desktop\Data\SC3_D_12-1-0\Stata14\SC3_CohortProfile_D_12-1-0.dta" 

Hello Dietmar,

thanks for the help and the code in particular!

Best regards