SC5: Studienabbruch, Syntax Vorschlag

Hallo zusammen,

die Frage nach der Identifikation von Studienabbrüchen in der SC5 wird ja schon seit ein paar Jahren diskutiert. Immer wieder wird darauf verweisen, dass die Umsetzung komplex ist, und u. a. von der Definition eines Studienabbruchs und/oder der genauen Forschungsfrage abhängt (jüngst hier).

Anbei haben wir eine (vereinfachte) Syntax angehängt, die wir verwenden, um eine kategoriale Variable des finalen Status der Studierenden der SC5 zu erstellen. Die Syntax ist ausführlich kommentiert. Kommentare sind willkommen.

Viele Grüße
Daniel Klein und Lars Müller

/*
    Final study state in wave 15 (NEPS SC 5)
    
    
    Date:       02aug2021
    
    Authors:    Daniel Klein, Lars Müller
    
    
    Description:
    
        This do-file creates a three level indicator variable for 
        students' final study state at the time of the interview of 
        wave 15. 
        
        We differentiate between:
        
            0 := graduate
            1 := dropout
            2 := studying
            . := unknown (truly missing)
            
        where
        
            graduate := successfully completed at least one study episode 
                        as defined by NEPS (BA, Diploma, etc.; also includes 
                        first State Examination, etc.)
                        
            dropout  := all study episodes have ended, none has been 
                        successfully completed
                        
                        additionally: self-reported dropouts in the last CAWI 
                        if none of the study episodes has been successfully 
                        completed and no study episode is confirmed ongoing 
                        in wave 15 (see below)
                        
            studying := study episode confirmed to be ongoing in wave 15,
                        none successfully completed before
                        
            unknown  := no (recent) information in wave 15, last observed 
                        study episode ongoing, no study episode successfully
                        completed
                        
        
        The created indicator includes information from both CATI and CAWI 
        
        The created indicator does NOT contain information on 
        
            - when an event (graduation, dropout) occurs
            - how many study episodes have been completed
            - the highest obtained degree
            - interruptions of study episodes
            - changes of field of studies or pursued degrees, etc.
            - field of subject, type of higher education, etc. of any episodes
        
        Note: Some respondents obtained a higher education degree (long) before 
              the first NEPS interview in 2009/2010 and, thus, are arguably not 
              part of the initially defined target population. This file does 
              NOT identify those individuals.
        
        
        Technically, this file requires some community-contributed commands. We 
        faccilitate the installation of those commands by installing the one 
        command -rqrs- near the top of the file. -rqrs- then automatically 
        installes the remaning dependencies. 
    */

version 16


// install rqrs from SSC
    /**
        UNCOMMENT TO INSTALL rqrs FROM SSC
    **/
*ssc install rqrs


// required community-contributed commands
rqrs neps nepstools , from("http://nocrypt.neps-data.de/stata")
rqrs numdate    


    /**
        INSERT YOUR FILEPATH TO THE DATA HERE
    **/
local NEPS_DATA ".../SC5_D_15-0-0/Stata"
    /**/


neps set                    ///
    directory "`NEPS_DATA'" ///
	version   15.0.0        ///
	level     D             ///
	study     SC5


    
// we start with basic information about the spells
neps : use Biography , clear

foreach x in start end {
    numdate monthly sp_`x' = `x'm `x'y , pattern("MY")
}

    // drop episodes with unknown dates
drop if mi(sp_start, sp_end)

   // whether episode is ongoing
assert inlist(splast, 1, 2)
generate byte sp_last = (splast==1)

assert !mi(sp_start, sp_end, sp_last)


// now bring-in the spell data
neps : merge 1:m ID_t splink using "spVocTrain" , keep(matched) nogenerate

    // keep only harmonized and recommended episodes
keep if ( !subspell & (tx20100==1) )

        // need to fetch some information from CATI wave 1
        preserve
        neps : use pTargetCATI , clear
        keep if wave == 1
        isid ID_t
        generate ts15201_cati =  9 if tg01003_g1 == 1
        replace  ts15201_cati = 10 if tg01003_g1 == 2
            /*
                note: we cannot distinguish between
                      University of applied sciences and 
                      colleges of public administration, 
                      dual higher education, etc.
                      
            */
        keep ID_t ts15201_cati
        tempfile cati_wave1
        save "`cati_wave1'"
        restore
        merge m:1 ID_t using "`cati_wave1'" , keep(master matched) nogenerate 
        
    // keep study episodes
replace ts15201 = ts15201_cati if ts15201 == -28
keep if inrange(ts15201, 6, 10)
    /*
        one person refuses to answer; we drop that one
    */

    // mark completed study episodes
generate byte sp_complete = (ts15218 == 1) if inlist(ts15218, 1, 2)
bysort ID_t : replace sp_complete = sum(sp_complete==1) // missing := 0 !
bysort ID_t (sp_complete) : replace sp_complete = (sp_complete[_N]>0)
    /*
        note: because we treated missing values as 0, we cannot be certain 
               that sp_complete == 0 indeed means non-completion
              at this point, we can only be cetrain that those with 
               sp_complete == 1 have indeed graduated
    */

    // now keep the last, i.e., longest lasting study episode(s)
bysort ID_t (sp_end) : generate last_sp_end = sp_end[_N]
keep if sp_end == last_sp_end

    // mark episode(s) that last/are ongoing
bysort ID_t (sp_last) : replace sp_last = sum(sp_last)
bysort ID_t (sp_last) : replace sp_last = (sp_last[_N]>0)
    
    // now reduce to one observation per ID
bysort ID_t : keep if _n == _N
keep ID_t sp_last sp_complete last_sp_end

    
// add external exams
preserve
neps : use ID_t ts15304 using spVocExtExam , clear
generate sp_complete_ex = inrange(ts15304, 10, 21) | inlist(ts15304, 29, 30)
bysort ID_t : replace sp_complete_ex = sum(sp_complete_ex==1)
bysort ID_t (sp_complete_ex) : replace sp_complete_ex = (sp_complete_ex[_N]>0)
bysort ID_t : keep if _n == _N
keep ID_t sp_complete_ex
tempfile ex
save "`ex'"
restore

merge 1:1 ID_t using "`ex'" , keep(master matched) nogenerate

replace sp_complete = sp_complete_ex if (sp_complete != 1) & (sp_complete_ex == 1)
drop sp_complete_ex


// add information from CAWI
    /*
        most of the NEPS generated datasets (e.g., Education, Study States)
        appear to ignore information in CAWI
        
        we argue that students who report a successful graduation
         in CAWI cannot legitimately be considered dropouts
        we also record any self-reported dropout in CAWI
        although less certain, we use the sef-reported dropouts to 
         replace missing values in the CATI at the final step
    */
preserve
neps : use pTargetCAWI , clear
generate byte sp_complete_cawi = (tg51004 == 2)
replace sp_complete_cawi = 1 if !sp_complete_cawi & inrange(tg50007, 1, 3)
replace sp_complete_cawi = 1 if !sp_complete_cawi & (tg51002 == 1)
bysort ID_t (wave) : replace sp_complete_cawi = sum(sp_complete_cawi==1)
bysort ID_t (sp_complete_cawi) : replace sp_complete_cawi = (sp_complete_cawi[_N]>0)
bysort ID_t (wave) : keep if _n == _N
generate sp_dropout_cawi = (tg51000 == 2) | (tg51004 == 3)
keep ID_t sp_complete_cawi sp_dropout_cawi
tempfile cawi
save "`cawi'"
restore

merge 1:1 ID_t using "`cawi'" , keep(master matched) nogenerate

replace sp_complete = sp_complete_cawi if (sp_complete != 1) & (sp_complete_cawi == 1)
drop sp_complete_cawi


// studying (still)
    /*
        people are considered still studying if 
            - they pariticpate in the latest wave (wave 15)
            - AND (at least one) study episode is ongoing in wave 15
            - AND the latest wave is CATI (as is the case for wave 15)
            
        note: this part needs to be adjusted if
            - the latest wave > 15
            - OR the latest wave is CAWI
    */
preserve
neps : use CohortProfile , clear
keep if wave == 15
generate participate_wave15 = (tx80220 == 1) // missing := 0 !
keep ID_t participate_wave15
tempfile wave15
save "`wave15'"
restore

merge 1:1 ID_t using "`wave15'" , keep(matched) nogenerate


// define the final study state
label define final_status ///
    0 "graduate"          ///
	1 "dropout"           ///
	2 "still studying"
	
    /*
        note: the order below is important
    */
generate final_status = 0 if (sp_complete == 1)
replace  final_status = 1 if mi(final_status) & (!sp_last)
replace  final_status = 2 if mi(final_status) & sp_last & participate_wave15

label values final_status final_status

    
    // add self-reported dropouts from CAWI
    replace final_status = 1 if mi(final_status) & (sp_dropout_cawi == 1)

keep ID_t final_status


// frequencies
tabulate final_status , missing
1 Like