Hallo zusammen,
die Frage nach der Identifikation von Studienabbrüchen in der SC5 wird ja schon seit ein paar Jahren diskutiert. Immer wieder wird darauf verweisen, dass die Umsetzung komplex ist, und u. a. von der Definition eines Studienabbruchs und/oder der genauen Forschungsfrage abhängt (jüngst hier).
Anbei haben wir eine (vereinfachte) Syntax angehängt, die wir verwenden, um eine kategoriale Variable des finalen Status der Studierenden der SC5 zu erstellen. Die Syntax ist ausführlich kommentiert. Kommentare sind willkommen.
Viele Grüße
Daniel Klein und Lars Müller
/*
Final study state in wave 15 (NEPS SC 5)
Date: 02aug2021
Authors: Daniel Klein, Lars Müller
Description:
This do-file creates a three level indicator variable for
students' final study state at the time of the interview of
wave 15.
We differentiate between:
0 := graduate
1 := dropout
2 := studying
. := unknown (truly missing)
where
graduate := successfully completed at least one study episode
as defined by NEPS (BA, Diploma, etc.; also includes
first State Examination, etc.)
dropout := all study episodes have ended, none has been
successfully completed
additionally: self-reported dropouts in the last CAWI
if none of the study episodes has been successfully
completed and no study episode is confirmed ongoing
in wave 15 (see below)
studying := study episode confirmed to be ongoing in wave 15,
none successfully completed before
unknown := no (recent) information in wave 15, last observed
study episode ongoing, no study episode successfully
completed
The created indicator includes information from both CATI and CAWI
The created indicator does NOT contain information on
- when an event (graduation, dropout) occurs
- how many study episodes have been completed
- the highest obtained degree
- interruptions of study episodes
- changes of field of studies or pursued degrees, etc.
- field of subject, type of higher education, etc. of any episodes
Note: Some respondents obtained a higher education degree (long) before
the first NEPS interview in 2009/2010 and, thus, are arguably not
part of the initially defined target population. This file does
NOT identify those individuals.
Technically, this file requires some community-contributed commands. We
faccilitate the installation of those commands by installing the one
command -rqrs- near the top of the file. -rqrs- then automatically
installes the remaning dependencies.
*/
version 16
// install rqrs from SSC
/**
UNCOMMENT TO INSTALL rqrs FROM SSC
**/
*ssc install rqrs
// required community-contributed commands
rqrs neps nepstools , from("http://nocrypt.neps-data.de/stata")
rqrs numdate
/**
INSERT YOUR FILEPATH TO THE DATA HERE
**/
local NEPS_DATA ".../SC5_D_15-0-0/Stata"
/**/
neps set ///
directory "`NEPS_DATA'" ///
version 15.0.0 ///
level D ///
study SC5
// we start with basic information about the spells
neps : use Biography , clear
foreach x in start end {
numdate monthly sp_`x' = `x'm `x'y , pattern("MY")
}
// drop episodes with unknown dates
drop if mi(sp_start, sp_end)
// whether episode is ongoing
assert inlist(splast, 1, 2)
generate byte sp_last = (splast==1)
assert !mi(sp_start, sp_end, sp_last)
// now bring-in the spell data
neps : merge 1:m ID_t splink using "spVocTrain" , keep(matched) nogenerate
// keep only harmonized and recommended episodes
keep if ( !subspell & (tx20100==1) )
// need to fetch some information from CATI wave 1
preserve
neps : use pTargetCATI , clear
keep if wave == 1
isid ID_t
generate ts15201_cati = 9 if tg01003_g1 == 1
replace ts15201_cati = 10 if tg01003_g1 == 2
/*
note: we cannot distinguish between
University of applied sciences and
colleges of public administration,
dual higher education, etc.
*/
keep ID_t ts15201_cati
tempfile cati_wave1
save "`cati_wave1'"
restore
merge m:1 ID_t using "`cati_wave1'" , keep(master matched) nogenerate
// keep study episodes
replace ts15201 = ts15201_cati if ts15201 == -28
keep if inrange(ts15201, 6, 10)
/*
one person refuses to answer; we drop that one
*/
// mark completed study episodes
generate byte sp_complete = (ts15218 == 1) if inlist(ts15218, 1, 2)
bysort ID_t : replace sp_complete = sum(sp_complete==1) // missing := 0 !
bysort ID_t (sp_complete) : replace sp_complete = (sp_complete[_N]>0)
/*
note: because we treated missing values as 0, we cannot be certain
that sp_complete == 0 indeed means non-completion
at this point, we can only be cetrain that those with
sp_complete == 1 have indeed graduated
*/
// now keep the last, i.e., longest lasting study episode(s)
bysort ID_t (sp_end) : generate last_sp_end = sp_end[_N]
keep if sp_end == last_sp_end
// mark episode(s) that last/are ongoing
bysort ID_t (sp_last) : replace sp_last = sum(sp_last)
bysort ID_t (sp_last) : replace sp_last = (sp_last[_N]>0)
// now reduce to one observation per ID
bysort ID_t : keep if _n == _N
keep ID_t sp_last sp_complete last_sp_end
// add external exams
preserve
neps : use ID_t ts15304 using spVocExtExam , clear
generate sp_complete_ex = inrange(ts15304, 10, 21) | inlist(ts15304, 29, 30)
bysort ID_t : replace sp_complete_ex = sum(sp_complete_ex==1)
bysort ID_t (sp_complete_ex) : replace sp_complete_ex = (sp_complete_ex[_N]>0)
bysort ID_t : keep if _n == _N
keep ID_t sp_complete_ex
tempfile ex
save "`ex'"
restore
merge 1:1 ID_t using "`ex'" , keep(master matched) nogenerate
replace sp_complete = sp_complete_ex if (sp_complete != 1) & (sp_complete_ex == 1)
drop sp_complete_ex
// add information from CAWI
/*
most of the NEPS generated datasets (e.g., Education, Study States)
appear to ignore information in CAWI
we argue that students who report a successful graduation
in CAWI cannot legitimately be considered dropouts
we also record any self-reported dropout in CAWI
although less certain, we use the sef-reported dropouts to
replace missing values in the CATI at the final step
*/
preserve
neps : use pTargetCAWI , clear
generate byte sp_complete_cawi = (tg51004 == 2)
replace sp_complete_cawi = 1 if !sp_complete_cawi & inrange(tg50007, 1, 3)
replace sp_complete_cawi = 1 if !sp_complete_cawi & (tg51002 == 1)
bysort ID_t (wave) : replace sp_complete_cawi = sum(sp_complete_cawi==1)
bysort ID_t (sp_complete_cawi) : replace sp_complete_cawi = (sp_complete_cawi[_N]>0)
bysort ID_t (wave) : keep if _n == _N
generate sp_dropout_cawi = (tg51000 == 2) | (tg51004 == 3)
keep ID_t sp_complete_cawi sp_dropout_cawi
tempfile cawi
save "`cawi'"
restore
merge 1:1 ID_t using "`cawi'" , keep(master matched) nogenerate
replace sp_complete = sp_complete_cawi if (sp_complete != 1) & (sp_complete_cawi == 1)
drop sp_complete_cawi
// studying (still)
/*
people are considered still studying if
- they pariticpate in the latest wave (wave 15)
- AND (at least one) study episode is ongoing in wave 15
- AND the latest wave is CATI (as is the case for wave 15)
note: this part needs to be adjusted if
- the latest wave > 15
- OR the latest wave is CAWI
*/
preserve
neps : use CohortProfile , clear
keep if wave == 15
generate participate_wave15 = (tx80220 == 1) // missing := 0 !
keep ID_t participate_wave15
tempfile wave15
save "`wave15'"
restore
merge 1:1 ID_t using "`wave15'" , keep(matched) nogenerate
// define the final study state
label define final_status ///
0 "graduate" ///
1 "dropout" ///
2 "still studying"
/*
note: the order below is important
*/
generate final_status = 0 if (sp_complete == 1)
replace final_status = 1 if mi(final_status) & (!sp_last)
replace final_status = 2 if mi(final_status) & sp_last & participate_wave15
label values final_status final_status
// add self-reported dropouts from CAWI
replace final_status = 1 if mi(final_status) & (sp_dropout_cawi == 1)
keep ID_t final_status
// frequencies
tabulate final_status , missing