I am a new user of NEPS data and am still working to understand it fully. Please apologize if my questions seem trivial.
I would like to use and merge different SC3 data files in Stata: CohortProfile, pTartget, pParent, xTargetCompetencies.
Here is what I did so far:
Loading CohortProfile data and recode the missing values using nepsmiss _all
Loading the pTarget data, recode the missing values, and keep only the variables I am interested for
Trying to merge the two datasets, with CohortProfile as the master file, with this command: merge 1:1 ID_t wave using pTarget, nogen
However, when running this piece of code, I get a message saying that variables ID_t and wave do not uniquely identify the observations.
The same occurs when I run isid ID_t wave on the pTarget dataā¦ As it is specified in the documentation and in the merging matrix that ID_t and wave are sufficient for identification, I do not understandā¦ Any recommandation or thoughts about what I might be doing wrong?
And a second question:
If I get it correctly, the merging is going to be 1:1 for CohortProfile, pTarget, and pParent, but then xTargetCompetencies is in wide format. So should Iā¦
First covert it to long format (as sometimes recomanded in the tutorials), then merge it to the other files with 1:1 merging
merge it to the other files with a m:1 merging directly?
fortunately your first problem is a minor one
Unfortunately the pTarget dataset does not uniquely identify all observations by ID_t and wave and this is more or less exactly what the merge command issues as error. This is due the fact that there are a few duplicates in the pTarget data which have to be dealt with in some way.
To cope with this you will find a variable tx20100 in the data, which is a recommendation which duplicate observation to keep and which one to delete.
When preparing the pTarget data, just add keep if tx20100 == 1
to your code and everything will work nicely.
Regarding you second question on how to deal with the problem of mergeing long and wide datasets its more or less up to you which method you prefer or which competency measures you plan to use and cannot be answered that easily.
Just few thoughts on this:
Obviously you can perform a m:1 or 1:m merge with wide competency and long pTarget data. This will result in a long dataset with all used competency measures duplicated for all occurences of each ID_t. After that you can easily drop duplicated competency measures for all waves where the respective measure was not part of the testing.
You can also only keep competency variables of a particular wave - say wave 5- (you will have to carefully read the documentation in order to define from which test in which wave a certain competency variable stems from) and then only keep those variables and ID_t, artificially generate a new wave variable suitable for merging by for instance gen wave = 5 and then perform a 1:1 merge between this reduced competency dataset and pTarget using ID_t AND wave as key identifiers. If you plan to use competency variables from different waves, this process can also be used multiple times for different waves.
If you plan to use raw single competency items it can also be helpful to rearrange the wide competency data into a wave-like long structure using the function comp2long which is also part of the nepstools package (installation using net install nepstools, from("http://nocrypt.neps-data.de/stata"))
You see, there are a lot of possible ways to do this. If you have chosen a particular one and are still facing problems, dont hesitate to ask.
Thank you very much for your clear and detailed responseāit has been very helpful! I successfully merged the CohortProfile and pTarget datasets following your instructions.
As for my second question, Iāll take your comments into careful consideration and get back to it after reflecting on which approach might best suit my goals.