Log-linear models Lecture 2: raking

Maarten Buis office F532 maarten.buis@uni.kn office hours by appointment

-------------------------------------------------------------------------------

index >>

-------------------------------------------------------------------------------
                               Table of content

------------------------------------------------------------------------------- Slide table of contents -------------------------------------------------------------------------------

    Standardizing a table

       Standardize the table for homogamy

       Making all the row totals 100

       Making all the colum totals 100

       Repeat

       Iterative Proportional Fitting

       Can all tables be standardized?(ancillary)

       Try it yourself

    Standardization to compare tables

       comparing tables

       Try it yourself

    keep the margins as observed, but change the pattern

       Independence revisited

    Standardization to known margins in the population

       Margins in our sample and margins in the population

       Computing post-stratification weights for the cohort 1940-1945
       

------------------------------------------------------------------------------- Supporting materials -------------------------------------------------------------------------------

Datasets homogamy_allbus.dta ALLBUS 1980 - 2016; on slide slide1.smcl place.dta German Live History Study I; on slide slide8.smcl margins1940.dta Volks- und Berufszählung 1970 and 1987; on slide slide13.smcl

Do files slide1ex1.do initial look at the homogamy data; on slide slide1.smcl slide2ex1.do load the data in Mata; on slide slide2.smcl slide2ex2.do adjust the rows to sum to 100; on slide slide2.smcl slide3ex1.do adjust the columns to sum to 100; on slide slide3.smcl slide4ex1.do repeat adjusting rows and colums; on slide slide4.smcl slide5ex1.do write that up in a loop; on slide slide5.smcl slide5ex2.do odds ratio in raw data; on slide slide5.smcl slide5ex3.do odds ratio in adjusted data; on slide slide5.smcl slide6ex4.do do this standardization with stdtable; on slide slide6.smcl slide9ex1.do load data by cohort in Mata; on slide slide9.smcl slide9ex2.do store the desired margins; on slide slide9.smcl slide9ex3.do standardize the 1940 table to 1960 margins; on slide slide9.smcl slide9ex4.do do this with stdtable; on slide slide9.smcl slide11ex1.do use IPF to find the counts under independence; on slide slide11.smcl slide13ex1.do load data and census margins in Stata; on slide slide13.smcl slide13ex2.do load data and census margins in Mata; on slide slide13.smcl slide13ex3.do adjust table to fit census margins; on slide slide13.smcl slide13ex4.do use these adjusted counts to create weights; on slide slide13.smcl

solutions to "Try it yourself" raking_sol1.do sollution; on slide slide8.smcl raking_sol2.do sollution; on slide slide10.smcl

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Standardize the table for homogamy

Consider the table of the education of the male and female partners in a marriage or stable partnership.

. clear

. use homogamy_allbus.dta (ALLBUS 1980 - 2016)

. tab meduc feduc, matcell(data)

male | female education education | low lower voc medium vo higher vo | Total ------------+--------------------------------------------+---------- low | 1,378 600 314 87 | 2,434 lower voc. | 3,864 7,665 2,528 407 | 14,696 medium voc. | 815 1,847 4,802 809 | 8,849 higher voc. | 276 530 1,122 898 | 3,422 university | 387 729 1,828 1,115 | 6,488 ------------+--------------------------------------------+---------- Total | 6,720 11,371 10,594 3,316 | 35,889

| female male | education education | universit | Total ------------+-----------+---------- low | 55 | 2,434 lower voc. | 232 | 14,696 medium voc. | 576 | 8,849 higher voc. | 596 | 3,422 university | 2,429 | 6,488 ------------+-----------+---------- Total | 3,888 | 35,889

    It is hard to interpret this table as is, because of the differences in
    the margins

For example, it appears that a female with medium vocational education marying a male with lower vocational education is more common than vice versa.

This is contrary to the notion that partnerships where the men are better educated are more common.

But the observed pattern may be due to the fact that lower vocational education is more common among men than women, and medium vocational education is more common among women than men.

Can't we change the table such that the pattern of association remains constant, but all the margins are the same, e.g. 100?

That would make it easier to see patterns.

Last week we solved this problem by looking at how independence was defined in a chi-square test:

We imagined what the table would look like if the margins remained as we observed them, but there is no other association between the row and column variable

This week we turn this around: We keep the association in the table as observed, but change the margins.

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Making all the row totals 100

Notice, that I added the option matcell(data) to the tab command. This leaves behind the table as a Stata matrix named data, which in turn can be read into Mata

. mata ------------------------------------------------- mata (type end to exit) ----- : data = st_matrix("data")

: data 1 2 3 4 5 +------------------------------------+ 1 | 1378 600 314 87 55 | 2 | 3864 7665 2528 407 232 | 3 | 815 1847 4802 809 576 | 4 | 276 530 1122 898 596 | 5 | 387 729 1828 1115 2429 | +------------------------------------+

: end -------------------------------------------------------------------------------

    If we divide all cell entries by the rowsum, then the new rowsum will be
    1.

Multiply the new cell entries by a 100, and the rowsum will be a 100.

. mata ------------------------------------------------- mata (type end to exit) ----- : muhat = data

: : muhat = muhat:/rowsum(muhat):*100

: muhat 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 56.61462613 24.65078061 12.90057518 3.574363188 2.259654889 | 2 | 26.29286881 52.15704954 17.20195972 2.769461078 1.57866086 | 3 | 9.210080235 20.87241496 54.26601876 9.142275963 6.50921008 | 4 | 8.065458796 15.4880187 32.78784337 26.24196376 17.41671537 | 5 | 5.9648582 11.23612824 28.17509248 17.18557337 37.43834772 | +-----------------------------------------------------------------------+

: rowsum(muhat) 1 +-------+ 1 | 100 | 2 | 100 | 3 | 100 | 4 | 100 | 5 | 100 | +-------+

: colsum(muhat) 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 106.1478922 124.404392 145.3314895 58.91363736 65.20258892 | +-----------------------------------------------------------------------+

: end -------------------------------------------------------------------------------

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Making all the colum totals 100

The row totals are as we want them, but the column totals are not. What if we repeat this process for the columns?

. mata ------------------------------------------------- mata (type end to exit) ----- : muhat = muhat:/colsum(muhat):*100

: muhat 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 53.33561032 19.81504045 8.876655176 6.067123587 3.465590748 | 2 | 24.77003384 41.92540848 11.83636098 4.700882855 2.42116285 | 3 | 8.676649198 16.77787626 37.33947746 15.51809797 9.983054643 | 4 | 7.598322144 12.44973626 22.56072891 44.54310571 26.71169299 | 5 | 5.6193845 9.031938545 19.38677748 29.17078988 57.41849877 | +-----------------------------------------------------------------------+

: rowsum(muhat) 1 +---------------+ 1 | 91.56002028 | 2 | 85.65384901 | 3 | 88.29515553 | 4 | 113.863586 | 5 | 120.6273892 | +---------------+

: colsum(muhat) 1 2 3 4 5 +-------------------------------+ 1 | 100 100 100 100 100 | +-------------------------------+

: end -------------------------------------------------------------------------------

    Now the column totals are as we want them, but now the row totals are a
    bit off.

However, the row totals are better than in the original table, so maybe we need to repeat this process a couple of times?

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Repeat

. mata ------------------------------------------------- mata (type end to exit) ----- : muhat = muhat:/rowsum(muhat):*100

: muhat 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 58.25207351 21.64158591 9.694903024 6.626389518 3.785048034 | 2 | 28.91876328 48.94748919 13.8188314 5.488233056 2.826683071 | 3 | 9.826868922 19.00203489 42.28938409 17.57525413 11.30645796 | 4 | 6.673180084 10.93390494 19.81382258 39.11971094 23.45938146 | 5 | 4.65846483 7.487469145 16.07162155 24.18255927 47.5998852 | +-----------------------------------------------------------------------+

: rowsum(muhat) 1 +-------+ 1 | 100 | 2 | 100 | 3 | 100 | 4 | 100 | 5 | 100 | +-------+

: colsum(muhat) 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 108.3293506 108.0124841 101.6885626 92.99214691 88.97745573 | +-----------------------------------------------------------------------+

: : muhat = muhat:/colsum(muhat):*100

: muhat 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 53.77312166 20.03618942 9.533916866 7.125751731 4.25394051 | 2 | 26.69522444 45.31651096 13.58936643 5.901824227 3.176853112 | 3 | 9.07128942 17.59244318 41.58715886 18.89971865 12.70710414 | 4 | 6.160085005 10.12281593 19.48480936 42.06775759 26.36553413 | 5 | 4.300279474 6.932040503 15.80474848 26.00494781 53.4965681 | +-----------------------------------------------------------------------+

: rowsum(muhat) 1 +---------------+ 1 | 94.72292019 | 2 | 94.67977917 | 3 | 99.85771425 | 4 | 104.201002 | 5 | 106.5385844 | +---------------+

: colsum(muhat) 1 2 3 4 5 +-------------------------------+ 1 | 100 100 100 100 100 | +-------------------------------+

: end -------------------------------------------------------------------------------

    Notice that each time we get a bit closer to our goal

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Iterative Proportional Fitting

The algorithm is called Iterative Proportional Fitting (IPF)

We can automate this repeating using a loop, and in particular we want to continue the loop until the table does not change anymore.

. mata ------------------------------------------------- mata (type end to exit) ----- : muhat = data

: muhat2 = 0:*data

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) { > muhat2 = muhat > muhat = muhat:/rowsum(muhat):*100 > muhat = muhat:/colsum(muhat):*100 > printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr > eldif(muhat2,muhat)) > i = i + 1 > } iteration 1 relative change 196.018454 iteration 2 relative change .264736172 iteration 3 relative change .085845379 iteration 4 relative change .032427941 iteration 5 relative change .01302549 iteration 6 relative change .005266606 iteration 7 relative change .002120014 iteration 8 relative change .000852175 iteration 9 relative change .000342384 iteration 10 relative change .00013754 iteration 11 relative change .000055248 iteration 12 relative change .000022192 iteration 13 relative change 8.9140e-06 iteration 14 relative change 3.5805e-06 iteration 15 relative change 1.4382e-06 iteration 16 relative change 5.7769e-07 iteration 17 relative change 2.3204e-07 iteration 18 relative change 9.3206e-08 iteration 19 relative change 3.7438e-08 iteration 20 relative change 1.5038e-08 iteration 21 relative change 6.0404e-09

: muhat 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 55.2594258 21.01486714 10.56463793 8.17117594 4.989893004 | 2 | 27.29231899 47.28612524 14.98125546 6.732957746 3.707342411 | 3 | 8.441187075 16.70825366 41.72879959 19.62468059 13.49707912 | 4 | 5.378131668 9.02020621 18.34353466 40.98329233 26.27483526 | 5 | 3.628936464 5.970547749 14.38177236 24.48789339 51.53085021 | +-----------------------------------------------------------------------+

: data 1 2 3 4 5 +------------------------------------+ 1 | 1378 600 314 87 55 | 2 | 3864 7665 2528 407 232 | 3 | 815 1847 4802 809 576 | 4 | 276 530 1122 898 596 | 5 | 387 729 1828 1115 2429 | +------------------------------------+

: end -------------------------------------------------------------------------------

    In the raw data the odds of men with lower voc marying a women with lower
    vocational instead of low is 4.6 times the odds of men with low education
    marying a women with lower vocational:

. di (7665 / 3864 ) / (600 / 1378 ) 4.5558877

    In our new table we get the exact same odds ratio:

. di (47.28612524 / 27.29231899) / (21.01486714 / 55.2594258) 4.5558877

    Standardizing a table like this is nice in a teaching setting, because
    you can see what is going on. In a real analysis you use the >> stdtable
    package

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardizing a table
-------------------------------------------------------------------------------

Try it yourself

Use IPF to standardize the place of residence table (place.dta) to have margins of all 100.

raking_sol1.do

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardization to compare tables
-------------------------------------------------------------------------------

comparing tables

Say we want to compare the tables for the cohorts born in 1940-1945 (i.e. were 20 in 1960-1965) and born in 1960-1965 (i.e. were 20 in 1980-1985).

What if we standardize the table from 1940-1945 to have the margins of the table from 1960-1965?

This way we can easier compare across cohorts, and still have margins that are more realistic than all 100s.

We start with preparing the data and loading it into Mata

. use homogamy_allbus.dta, clear (ALLBUS 1980 - 2016)

. gen coh = cond(inrange(byr, 1960, 1965), 1, /// > cond(inrange(byr, 1940, 1945), 0, .)) (36,808 missing values generated)

. label define coh 0 "1940-1945" 1 "1960-1965"

. label value coh coh

. label var coh "resp. birth cohort"

. tab meduc feduc if coh==0, matcell(data1940)

male | female education education | low lower voc medium vo higher vo | Total ------------+--------------------------------------------+---------- low | 146 81 36 9 | 278 lower voc. | 493 1,432 384 48 | 2,388 medium voc. | 99 306 376 52 | 887 higher voc. | 29 83 119 62 | 338 university | 75 157 312 113 | 955 ------------+--------------------------------------------+---------- Total | 842 2,059 1,227 284 | 4,846

| female male | education education | universit | Total ------------+-----------+---------- low | 6 | 278 lower voc. | 31 | 2,388 medium voc. | 54 | 887 higher voc. | 45 | 338 university | 298 | 955 ------------+-----------+---------- Total | 434 | 4,846

. tab meduc feduc if coh==1, matcell(data1960)

male | female education education | low lower voc medium vo higher vo | Total ------------+--------------------------------------------+---------- low | 120 44 56 23 | 249 lower voc. | 181 477 365 73 | 1,115 medium voc. | 94 168 1,031 167 | 1,577 higher voc. | 44 47 217 185 | 616 university | 30 34 223 174 | 818 ------------+--------------------------------------------+---------- Total | 469 770 1,892 622 | 4,375

| female male | education education | universit | Total ------------+-----------+---------- low | 6 | 249 lower voc. | 19 | 1,115 medium voc. | 117 | 1,577 higher voc. | 123 | 616 university | 357 | 818 ------------+-----------+---------- Total | 622 | 4,375

. mata ------------------------------------------------- mata (type end to exit) ----- : data1940 = st_matrix("data1940")

: data1960 = st_matrix("data1960")

: end -------------------------------------------------------------------------------

    We extract the desired row and column totals

. mata ------------------------------------------------- mata (type end to exit) ----- : col = colsum(data1960)

: row = rowsum(data1960)

: end -------------------------------------------------------------------------------

    We can now apply these row and column totals instead of 100.

We can see that a large part of the apparent difference between the cohorts is due to the change in distribution of education between the cohorts.

. mata ------------------------------------------------- mata (type end to exit) ----- : muhat = data1940

: muhat2 = 0:*data1940

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) { > muhat2 = muhat > muhat = muhat:/rowsum(muhat):*row > muhat = muhat:/colsum(muhat):*col > printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr > eldif(muhat2,muhat)) > i = i + 1 > } iteration 1 relative change 3.35926798 iteration 2 relative change .412621342 iteration 3 relative change .096720838 iteration 4 relative change .022674838 iteration 5 relative change .005356008 iteration 6 relative change .001267613 iteration 7 relative change .000300115 iteration 8 relative change .000071057 iteration 9 relative change .000016824 iteration 10 relative change 3.9832e-06 iteration 11 relative change 9.4306e-07 iteration 12 relative change 2.2328e-07 iteration 13 relative change 5.2864e-08 iteration 14 relative change 1.2516e-08 iteration 15 relative change 2.9633e-09

: data1940 1 2 3 4 5 +------------------------------------+ 1 | 146 81 36 9 6 | 2 | 493 1432 384 48 31 | 3 | 99 306 376 52 54 | 4 | 29 83 119 62 45 | 5 | 75 157 312 113 298 | +------------------------------------+

: muhat 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 107.9401832 41.79372109 63.5105883 23.4738278 12.28167952 | 2 | 206.3512558 418.3106688 383.5347872 70.87817787 35.92510995 | 3 | 101.1983989 218.300937 917.1484403 187.5222911 152.8299328 | 4 | 25.01236059 49.9609286 244.9159022 188.6511611 107.4596476 | 5 | 28.49780156 41.63374457 282.8902821 151.474542 313.5036302 | +-----------------------------------------------------------------------+

: data1960 1 2 3 4 5 +------------------------------------+ 1 | 120 44 56 23 6 | 2 | 181 477 365 73 19 | 3 | 94 168 1031 167 117 | 4 | 44 47 217 185 123 | 5 | 30 34 223 174 357 | +------------------------------------+

: end -------------------------------------------------------------------------------

    Alternatively, we can use stdtable

. stdtable meduc feduc, by(coh,baseline(1))

----------------------------------------------------------------------------- resp. birth | cohort and | male | female education education | low lower voc. medium voc. higher voc. university ------------+---------------------------------------------------------------- 1940-1945 | low | 108 41.8 63.5 23.5 12.3 lower voc. | 206 418 384 70.9 35.9 medium voc. | 101 218 917 188 153 higher voc. | 25 50 245 189 107 university | 28.5 41.6 283 151 314 | Total | 469 770 1892 622 622 ------------+---------------------------------------------------------------- 1960-1965 | low | 120 44 56 23 6 lower voc. | 181 477 365 73 19 medium voc. | 94 168 1031 167 117 higher voc. | 44 47 217 185 123 university | 30 34 223 174 357 | Total | 469 770 1892 622 622 -----------------------------------------------------------------------------

------------------------- resp. birth | cohort and | female male | education education | Total ------------+------------ 1940-1945 | low | 249 lower voc. | 1115 medium voc. | 1577 higher voc. | 616 university | 818 | Total | 4375 ------------+------------ 1960-1965 | low | 249 lower voc. | 1115 medium voc. | 1577 higher voc. | 616 university | 818 | Total | 4375 -------------------------

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardization to compare tables
-------------------------------------------------------------------------------

Try it yourself

Standardize the tables in place.dta such that the margins for all cohorts correspond to the margins of the 1950 cohort.

raking_sol2.do

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
keep the margins as observed, but change the pattern
-------------------------------------------------------------------------------

Independence revisited

We have until now keept the patern as observed in the data, and changed the margins. Can we not turn that around: Keep the margins as observed in the data, and change the pattern?

An interesting baseline pattern would be independence. We would start with a table that satisfies independence, and change the values such that the margins correspond to the observed margins. A table that satisfies independence is a table with all 1s.

. mata: ------------------------------------------------- mata (type end to exit) ----- : row = rowsum(data)

: col = colsum(data)

: muhat = J(5,5,1)

: muhat2 = 0:*muhat

: muhat [symmetric] 1 2 3 4 5 +---------------------+ 1 | 1 | 2 | 1 1 | 3 | 1 1 1 | 4 | 1 1 1 1 | 5 | 1 1 1 1 1 | +---------------------+

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) { > muhat2 = muhat > muhat = muhat:/rowsum(muhat):*row > muhat = muhat:/colsum(muhat):*col > printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr > eldif(muhat2,muhat)) > i = i + 1 > } iteration 1 relative change .999570562 iteration 2 relative change 3.3466e-16

: muhat 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 455.7519017 771.183761 718.4874474 224.891861 263.6850288 | 2 | 2751.737858 4656.251665 4338.081975 1357.851598 1592.076904 | 3 | 1656.922177 2803.699713 2612.118086 817.6121932 958.6478308 | 4 | 640.748976 1084.219733 1010.133133 316.1791078 370.7190504 | 5 | 1214.839087 2055.645128 1915.179359 599.46524 702.8711862 | +-----------------------------------------------------------------------+

: end -------------------------------------------------------------------------------

. tab meduc feduc , exp nofreq

male | female education education | low lower voc medium vo higher vo | Total ------------+--------------------------------------------+---------- low | 455.8 771.2 718.5 224.9 | 2,434.0 lower voc. | 2,751.7 4,656.3 4,338.1 1,357.9 | 14,696.0 medium voc. | 1,656.9 2,803.7 2,612.1 817.6 | 8,849.0 higher voc. | 640.7 1,084.2 1,010.1 316.2 | 3,422.0 university | 1,214.8 2,055.6 1,915.2 599.5 | 6,488.0 ------------+--------------------------------------------+---------- Total | 6,720.0 11,371.0 10,594.0 3,316.0 | 35,889.0

| female male | education education | universit | Total ------------+-----------+---------- low | 263.7 | 2,434.0 lower voc. | 1,592.1 | 14,696.0 medium voc. | 958.6 | 8,849.0 higher voc. | 370.7 | 3,422.0 university | 702.9 | 6,488.0 ------------+-----------+---------- Total | 3,888.0 | 35,889.0

    Notice that the second iteration added nothing, so the IPF converged in
    one iteration. Also, the estimated counts correspond to the counts we
    computed last week.

This is not a coincidence:

In the first step of the first iteration, each cell gets 1/5 rowtotal

At the begining of the second step of the first iteration, the collumn totals are 1/5th of N

So we get: (rowtotal/5)/(N/5)*coltotal = (rowtotal*coltotal)/N

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardization to known margins in the population
-------------------------------------------------------------------------------

Margins in our sample and margins in the population

Samples often deviate from the population because of The way the sample was drawn the way the data was collected some people are harder to contact some people are less likely to participate

What if we had the marginal distriubtion of our variables from the population?

Can't we use the same trick to standardize our table to those population margins?

This is a classic application of raking, and is often used when computing (post-stratification) weights.

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Standardization to known margins in the population
-------------------------------------------------------------------------------

Computing post-stratification weights for the cohort 1940-1945

Lets get the observed data again and take a look at the population margins

. use homogamy_allbus, clear (ALLBUS 1980 - 2016)

. tab meduc feduc if inrange(byr,1940,1945), matcell(data1940)

male | female education education | low lower voc medium vo higher vo | Total ------------+--------------------------------------------+---------- low | 146 81 36 9 | 278 lower voc. | 493 1,432 384 48 | 2,388 medium voc. | 99 306 376 52 | 887 higher voc. | 29 83 119 62 | 338 university | 75 157 312 113 | 955 ------------+--------------------------------------------+---------- Total | 842 2,059 1,227 284 | 4,846

| female male | education education | universit | Total ------------+-----------+---------- low | 6 | 278 lower voc. | 31 | 2,388 medium voc. | 54 | 887 higher voc. | 45 | 338 university | 298 | 955 ------------+-----------+---------- Total | 434 | 4,846

. use margins1940, clear

. list

+-----------------------------------------+ | female ed p | |-----------------------------------------| 1. | male basic .14092565 | 2. | male vocational, lower .53389831 | 3. | male vocational, middle .12625549 | 4. | male vocational, higher .03011818 | 5. | male university .16880237 | |-----------------------------------------| 6. | female basic .33888268 | 7. | female vocational, lower .40107984 | 8. | female vocational, middle .16634283 | 9. | female vocational, higher .02550338 | 10. | female university .06819128 | +-----------------------------------------+

    Lets get the data and the desired margins in Mata and compare them with
    the observed margins.

Notice the ' at the end of the line starting with col. This turns the columnvector col into a rowvector.

. mata ------------------------------------------------- mata (type end to exit) ----- : data1940 = st_matrix("data1940")

: row = st_data((1,5),3)

: col = st_data((6,10),3)'

: n = sum(data1940)

: row = row:*n

: col = col:*n

: row 1 +---------------+ 1 | 682.9257144 | 2 | 2587.271186 | 3 | 611.834118 | 4 | 145.9527007 | 5 | 818.0162805 | +---------------+

: rowsum(data1940) 1 +--------+ 1 | 278 | 2 | 2388 | 3 | 887 | 4 | 338 | 5 | 955 | +--------+

: col 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 1642.225466 1943.632883 806.0973572 123.5893736 330.4549203 | +-----------------------------------------------------------------------+

: colsum(data1940) 1 2 3 4 5 +------------------------------------+ 1 | 842 2059 1227 284 434 | +------------------------------------+

: end -------------------------------------------------------------------------------

    Now we can apply the same trick as before.

. mata ------------------------------------------------- mata (type end to exit) ----- : muhat = data1940

: muhat2 = 0:*data1940

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) { > muhat2 = muhat > muhat = muhat:/rowsum(muhat):*row > muhat = muhat:/colsum(muhat):*col > printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr > eldif(muhat2,muhat)) > i = i + 1 > } iteration 1 relative change 3.15360994 iteration 2 relative change .345717331 iteration 3 relative change .06376476 iteration 4 relative change .016214917 iteration 5 relative change .004553063 iteration 6 relative change .001271767 iteration 7 relative change .000354736 iteration 8 relative change .000098909 iteration 9 relative change .000027576 iteration 10 relative change 7.6877e-06 iteration 11 relative change 2.1432e-06 iteration 12 relative change 5.9750e-07 iteration 13 relative change 1.6657e-07 iteration 14 relative change 4.6438e-08 iteration 15 relative change 1.2946e-08 iteration 16 relative change 3.6092e-09

: data1940 1 2 3 4 5 +------------------------------------+ 1 | 146 81 36 9 6 | 2 | 493 1432 384 48 31 | 3 | 99 306 376 52 54 | 4 | 29 83 119 62 45 | 5 | 75 157 312 113 298 | +------------------------------------+

: muhat 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 474.5208782 143.3508106 47.95215625 8.169263599 8.932605961 | 2 | 875.0072936 1383.950116 279.3181446 23.79271058 25.20292251 | 3 | 131.9710065 222.1147631 205.4160385 19.35907528 32.97323446 | 4 | 26.30712883 40.99833415 44.24106626 15.70742786 18.69874349 | 5 | 134.4191584 153.2188596 229.1699516 56.56089627 244.6474139 | +-----------------------------------------------------------------------+

: end -------------------------------------------------------------------------------

    So if the margins in our data corresponded with the margins in the
    population then we would expect to find 475 couples with both low
    education, but in our data we only found 146 such couples.

So a single couple with both low education in our data stands for 475/146=3.3 observations in the table with the population margins.

This 3.3 is our post-stratification weight

. use homogamy_allbus (ALLBUS 1980 - 2016)

. mata ------------------------------------------------- mata (type end to exit) ----- : muhat:/data1940 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 3.250143001 1.769763094 1.33200434 .9076959554 1.48876766 | 2 | 1.774862664 .9664456116 .7273910014 .4956814704 .8129975004 | 3 | 1.33304047 .725865239 .5463192514 .3722899092 .610615453 | 4 | .9071423736 .4939558331 .3717736661 .2533456107 .4155276332 | 5 | 1.792255446 .9759163033 .7345190755 .5005389051 .8209644761 | +-----------------------------------------------------------------------+

: st_matrix("weights", muhat:/data1940)

: end -------------------------------------------------------------------------------

. gen weight = weights[meduc,feduc] if inrange(byr,1940,1945) (43,748 missing values generated)

. tab meduc feduc if inrange(byr, 1940, 1945) [iweight=weight]

male | female education education | low lower voc medium vo higher vo | Total ------------+--------------------------------------------+---------- low | 474.52089 143.35081 47.952155 8.1692635 | 682.92572 lower voc. |875.007285 1,383.95 279.31815 23.7927103 |2,587.2712 medium voc. | 131.97101 222.11476 205.41604 19.359075 | 611.83412 higher voc. | 26.30713 40.998333 44.241066 15.707428 | 145.9527 university | 134.41916 153.21886 229.16995 56.560894 | 818.01627 ------------+--------------------------------------------+---------- Total | 1,642.225 1,943.633 806.09735 123.58937 | 4,846

| female male | education education | universit | Total ------------+-----------+---------- low | 8.9326057 | 682.92572 lower voc. | 25.202923 |2,587.2712 medium voc. | 32.973233 | 611.83412 higher voc. | 18.698744 | 145.9527 university | 244.64741 | 818.01627 ------------+-----------+---------- Total | 330.45491 | 4,846

-------------------------------------------------------------------------------

<< index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
digression
-------------------------------------------------------------------------------

stdtable

This method has been implemented in Stata as the stdtable command.

. stdtable meduc feduc

----------------------------------------------------------------------------- male | female education education | low lower voc. medium voc. higher voc. university ------------+---------------------------------------------------------------- low | 55.3 21 10.6 8.17 4.99 lower voc. | 27.3 47.3 15 6.73 3.71 medium voc. | 8.44 16.7 41.7 19.6 13.5 higher voc. | 5.38 9.02 18.3 41 26.3 university | 3.63 5.97 14.4 24.5 51.5 | Total | 100 100 100 100 100 -----------------------------------------------------------------------------

------------------------- | female male | education education | Total ------------+------------ low | 100 lower voc. | 100 medium voc. | 100 higher voc. | 100 university | 100 | Total | 500 -------------------------

-------------------------------------------------------------------------------

<< index

-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
ancillary
-------------------------------------------------------------------------------

Can all tables be standardized?

Consider the following table

0 0 2 1 5 2 8 7 0 In order to make the first row total 100, the top right cell must be 100

In order to make the last column total 100, the top right cell cannot be 100

This is an example of a table that cannot be standardized. The Mata program we created above will stop after 30 iterations, but the condition mreldif(muhat2,muhat)>1e-8 will not be met. In other words the algorithm has not converged. The stdtable command will give a more explicit warning when that happens.

-------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------

raking_sol1.do



use place.dta, clear
tab place15 place30, matcell(data)
mata
data = st_matrix("data")
muhat = data
muhat2 = 0:*data
i = 1
while(i<30 & mreldif(muhat2,muhat)>1e-8) {
	muhat2 = muhat
	muhat = muhat:/rowsum(muhat):*100
	muhat = muhat:/colsum(muhat):*100
	printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mreldif(muhat2,muhat))
	i = i + 1
}
muhat
data
end
stdtable place15 place30

-------------------------------------------------------------------------------

<<

-------------------------------------------------------------------------------

raking_sol2.do



use place.dta, clear
tab place15 place30 if coh==30, matcell(data30)
tab place15 place30 if coh==40, matcell(data40)
tab place15 place30 if coh==50, matcell(data50)
mata
data30 = st_matrix("data30")
data40 = st_matrix("data40")
data50 = st_matrix("data50")
row = rowsum(data50)
col = colsum(data50)
muhat = data30
muhat2 = 0:*data
i = 1
while(i<30 & mreldif(muhat2,muhat)>1e-8) {
	muhat2 = muhat
	muhat = muhat:/rowsum(muhat):*row
	muhat = muhat:/colsum(muhat):*col
	printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mreldif(muhat2,muhat))
	i = i + 1
}
muhat30 = muhat
muhat = data40
muhat2 = 0:*data
i = 1
while(i<30 & mreldif(muhat2,muhat)>1e-8) {
	muhat2 = muhat
	muhat = muhat:/rowsum(muhat):*row
	muhat = muhat:/colsum(muhat):*col
	printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mreldif(muhat2,muhat))
	i = i + 1
}
muhat40 = muhat
muhat30
muhat40
data50
end
stdtable place15 place30, by(coh,baseline(50))

-------------------------------------------------------------------------------

<<

-------------------------------------------------------------------------------