Number of children in NEPS SC6

Hello everyone!

I have a question on the number of children from NEPS SC6 data. I found that, at least for respondents born after 1969, the total number of children reported is significantly higher (50%) from the female respondents than the males. For the first birth, female respondents from the 70s and 80s cohort gave birth to 1700 children until the last wave, while males only gave birth to around 1200. Similar gap is also observed for the second birth. Do you know why this that? I derived this data from the aggregate file children.

Thank you very much!

Best regards,
Chen

To add more information, this is the number of biological children from male/female respondents for each cohort, there are significant gaps for the 60s, 70s and 80s, but not others. For the 80s, it’s possible that not all of them have finished their reproductive ages and there should be right censoring issues, particular for men who may have children later. But for the 60s and 70s, the gap is also large.

Gender of Parent cohort n
männlich] 40 1640
männlich] 50 3962
männlich] 60 3990
männlich] 70 1814
männlich] 80 651
weiblich] 40 1556
weiblich] 50 4058
weiblich] 60 4834
weiblich] 70 2553
weiblich] 80 926

Dear Chen,
unfortunately, as you did not mention any details on your calculation I have no idea how you arrived at these numbers and whether there may be a calculation error or not. Unfortunately, I can’t even come close to reproducing your numbers. Furthermore, I think it is a rather unusual if not completely wrong approach to compare absolute numbers between different cohorts without using any form of normalization or weighting. The absolute numbers of targets within the cohorts differ greatly.
In the SC6, for example, there are almost three times as many women in the 60 cohort than in the 40 cohort, to give just one example. Logically, 3x as many women also report around 3x as many births.
In this respect, in my opinion, comparing overall reported biological children makes no sense at all. If you instead calculate, for example, how many children those born in a cohort have on average, you will find that the reported numbers between men and women only differ in the second decimal place, which I personally find very plausible.
Kind regards,
Benno Schönberger

Dear Mr. Schönberger,
Thank you for your message and suggestions.

I’m interested in the gender differences in the transition to parenthood. Maybe it was not clear in previous messages, but I don’t compare people between cohorts, but compare men and women within each cohort.

Based on your explanation, I checked the number of parents, children, share of people who became parents, average number of children per parent, and the average number of children per person:
Here is my code in r:

basic ← read_stata(„P:/NEPS_project 1/Data/Data_sc6/Stata14/SC6_Basics_D_14-0-0.dta“)

child ← read_stata(„P:/NEPS_project 1/Data/Data_sc6/Stata14/SC6_Children_D_14-0-0.dta“)

Keep info only on people’s year/month of birth, their ID and their gender

basic_merge ← subset(basic, select = c(1:4))

Merge basic info with data on children

child_parent ← merge(x = child, y = basic_merge, by=„ID_t“, all.x =T)

Keep only the biological children

child_parent ← subset(child_parent, tx27100==1)

Calculate the cohort

child_parent$cohort ← 10*(floor((child_parent$t70000y-1900)/10))

Calculate the total number of children per cohort per gender (since each row corresponds to one child, the number of rows is the number of children)

df1 ← child_parent%>%
group_by(t700001, cohort)%>%
summarise(n_kid=n())

Calculate the number of person who became a parent by cohort and gender

df2 ← child_parent %>%
group_by(t700001, cohort) %>%
summarise(parent = n_distinct(ID_t))

Merge the two tables to calculate the number of children per person by cohort and sex

df3 ← left_join(df1, df2, by = c(„cohort“, „t700001“))

df3$kid_per_parent ← df3$n_kid/df3$parent

check the number of respondent by gender per cohort

basic$cohort ← 10*(floor((basic$t70000y-1900)/10))

df5 ← basic %>%
group_by(t700001, cohort) %>%
summarise(people = n_distinct(ID_t))

df6 ← left_join(df3, df5, by =c(„cohort“, „t700001“))

df6$share_parent ← df6$parent/df6$people

df6$kid_per_person ← df6$n_kid / df6$people

################################
Then I got this:

t700001 cohort n_kid parent kid_per_parent people share_parent kid_per_person
1 [[m] männlich] 40 1640 806 2.03 1017 0.793 1.61
1 [[m] männlich] 50 3962 1927 2.06 2360 0.817 1.68
1 [[m] männlich] 60 3990 1961 2.03 2581 0.76 1.55
1 [[m] männlich] 70 1814 942 1.93 1414 0.666 1.28
1 [[m] männlich] 80 651 388 1.68 1117 0.347 0.583
2 [[w] weiblich] 40 1556 775 2.01 927 0.836 1.68
2 [[w] weiblich] 50 4058 1975 2.05 2357 0.838 1.72
2 [[w] weiblich] 60 4834 2362 2.05 2813 0.84 1.72
2 [[w] weiblich] 70 2553 1238 2.06 1557 0.795 1.64
2 [[w] weiblich] 80 926 507 1.83 997 0.509 0.929

So the problem of the gender gap in the transition to the parenthood for the 70s and 80s (my first post here), is that more men born in the 80s cohort (who have not became fathers) are in the sample (1117 vs997). And the share of father is smaller than the share of mother (0.666 vs 0.795) for the 70s cohort.

Please correct me if you found any mistakes.

Best regards,
Chen

Hi Chen,
since I’m not that familiar with R code, I don’t want to guarantee that I fully understand everything, but your results are almost identical to my own calculations as far as I remember. So if you now analyze, instead of the absolute number of children, how many children a person (male or female) in a cohort has on average (kid_per_parent), then you will also see the differences that you mentioned in your first post have clarified. This all looks plausible to me.

Another note:
To increase the readability of your posts, you can use predefined formats when for example integrating code by using button </> from the format bar:
generate example = var * varb

Kind regards,
Benno