Table of content

------------------------------------------------------------------------------- Slide table of contents -------------------------------------------------------------------------------

    Standardizing a table

       Standardize the table for homogamy

       Making all the row totals 100

       Making all the colum totals 100

       Repeat

       Iterative Proportional Fitting

       Can all tables be standardized?(ancillary)

       Try it yourself

    Standardization to compare tables

       comparing tables

       Try it yourself

    keep the margins as observed, but change the pattern

       Independence revisited

    Standardization to known margins in the population

       Margins in our sample and margins in the population

       Computing post-stratification weights for the cohort 1940-1945

------------------------------------------------------------------------------- Supporting materials -------------------------------------------------------------------------------

Datasets homogamy_allbus.dta ALLBUS 1980 - 2016; on slide slide1.smcl place.dta German Live History Study I; on slide slide8.smcl margins1940.dta Volks- und Berufszählung 1970 and 1987; on slide slide13.smcl

Do files slide1ex1.do initial look at the homogamy data; on slide slide1.smcl slide2ex1.do load the data in Mata; on slide slide2.smcl slide2ex2.do adjust the rows to sum to 100; on slide slide2.smcl slide3ex1.do adjust the columns to sum to 100; on slide slide3.smcl slide4ex1.do repeat adjusting rows and colums; on slide slide4.smcl slide5ex1.do write that up in a loop; on slide slide5.smcl slide5ex2.do odds ratio in raw data; on slide slide5.smcl slide5ex3.do odds ratio in adjusted data; on slide slide5.smcl slide6ex4.do do this standardization with stdtable; on slide slide6.smcl slide9ex1.do load data by cohort in Mata; on slide slide9.smcl slide9ex2.do store the desired margins; on slide slide9.smcl slide9ex3.do standardize the 1940 table to 1960 margins; on slide slide9.smcl slide9ex4.do do this with stdtable; on slide slide9.smcl slide11ex1.do use IPF to find the counts under independence; on slide slide11.smcl slide13ex1.do load data and census margins in Stata; on slide slide13.smcl slide13ex2.do load data and census margins in Mata; on slide slide13.smcl slide13ex3.do adjust table to fit census margins; on slide slide13.smcl slide13ex4.do use these adjusted counts to create weights; on slide slide13.smcl

solutions to "Try it yourself" raking_sol1.do sollution; on slide slide8.smcl raking_sol2.do sollution; on slide slide10.smcl

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- Standardizing a table -------------------------------------------------------------------------------

Standardize the table for homogamy

Consider the table of the education of the male and female partners in a marriage or stable partnership.


. clear

. use homogamy_allbus.dta
(ALLBUS 1980 - 2016)

. tab meduc feduc, matcell(data)

       male |              female education
  education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
        low |     1,378        600        314         87 |     2,434 
 lower voc. |     3,864      7,665      2,528        407 |    14,696 
medium voc. |       815      1,847      4,802        809 |     8,849 
higher voc. |       276        530      1,122        898 |     3,422 
 university |       387        729      1,828      1,115 |     6,488 
------------+--------------------------------------------+----------
      Total |     6,720     11,371     10,594      3,316 |    35,889 


            |   female
       male | education
  education | universit |     Total
------------+-----------+----------
        low |        55 |     2,434 
 lower voc. |       232 |    14,696 
medium voc. |       576 |     8,849 
higher voc. |       596 |     3,422 
 university |     2,429 |     6,488 
------------+-----------+----------
      Total |     3,888 |    35,889

It is hard to interpret this table as is, because of the differences in the margins

For example, it appears that a female with medium vocational education marying a male with lower vocational education is more common than vice versa.

This is contrary to the notion that partnerships where the men are better educated are more common.

But the observed pattern may be due to the fact that lower vocational education is more common among men than women, and medium vocational education is more common among women than men.

Can't we change the table such that the pattern of association remains constant, but all the margins are the same, e.g. 100?

That would make it easier to see patterns.

Last week we solved this problem by looking at how independence was defined in a chi-square test:

We imagined what the table would look like if the margins remained as we observed them, but there is no other association between the row and column variable

This week we turn this around: We keep the association in the table as observed, but change the margins.

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- Standardizing a table -------------------------------------------------------------------------------

Making all the row totals 100

Notice, that I added the option matcell(data) to the tab command. This leaves behind the table as a Stata matrix named data, which in turn can be read into Mata


. mata
------------------------------------------------- mata (type end to exit) -----
: data = st_matrix("data")

: data
          1      2      3      4      5
    +------------------------------------+
  1 |  1378    600    314     87     55  |
  2 |  3864   7665   2528    407    232  |
  3 |   815   1847   4802    809    576  |
  4 |   276    530   1122    898    596  |
  5 |   387    729   1828   1115   2429  |
    +------------------------------------+

: end
-------------------------------------------------------------------------------

If we divide all cell entries by the rowsum, then the new rowsum will be 1.

Multiply the new cell entries by a 100, and the rowsum will be a 100.

. mata ------------------------------------------------- mata (type end to exit) ----- : muhat = data

: : muhat = muhat:/rowsum(muhat):*100

: muhat 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 56.61462613 24.65078061 12.90057518 3.574363188 2.259654889 | 2 | 26.29286881 52.15704954 17.20195972 2.769461078 1.57866086 | 3 | 9.210080235 20.87241496 54.26601876 9.142275963 6.50921008 | 4 | 8.065458796 15.4880187 32.78784337 26.24196376 17.41671537 | 5 | 5.9648582 11.23612824 28.17509248 17.18557337 37.43834772 | +-----------------------------------------------------------------------+

: rowsum(muhat) 1 +-------+ 1 | 100 | 2 | 100 | 3 | 100 | 4 | 100 | 5 | 100 | +-------+

: colsum(muhat) 1 2 3 4 5 +-----------------------------------------------------------------------+ 1 | 106.1478922 124.404392 145.3314895 58.91363736 65.20258892 | +-----------------------------------------------------------------------+

: end -------------------------------------------------------------------------------

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- Standardizing a table -------------------------------------------------------------------------------

Making all the colum totals 100

The row totals are as we want them, but the column totals are not. What if we repeat this process for the columns?


. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = muhat:/colsum(muhat):*100

: muhat
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |  53.33561032   19.81504045   8.876655176   6.067123587   3.465590748  |
  2 |  24.77003384   41.92540848   11.83636098   4.700882855    2.42116285  |
  3 |  8.676649198   16.77787626   37.33947746   15.51809797   9.983054643  |
  4 |  7.598322144   12.44973626   22.56072891   44.54310571   26.71169299  |
  5 |    5.6193845   9.031938545   19.38677748   29.17078988   57.41849877  |
    +-----------------------------------------------------------------------+

: rowsum(muhat)
                 1
    +---------------+
  1 |  91.56002028  |
  2 |  85.65384901  |
  3 |  88.29515553  |
  4 |   113.863586  |
  5 |  120.6273892  |
    +---------------+

: colsum(muhat)
         1     2     3     4     5
    +-------------------------------+
  1 |  100   100   100   100   100  |
    +-------------------------------+

: end
-------------------------------------------------------------------------------

Now the column totals are as we want them, but now the row totals are a bit off.

However, the row totals are better than in the original table, so maybe we need to repeat this process a couple of times?

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- Standardizing a table -------------------------------------------------------------------------------

Repeat


. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = muhat:/rowsum(muhat):*100

: muhat
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |  58.25207351   21.64158591   9.694903024   6.626389518   3.785048034  |
  2 |  28.91876328   48.94748919    13.8188314   5.488233056   2.826683071  |
  3 |  9.826868922   19.00203489   42.28938409   17.57525413   11.30645796  |
  4 |  6.673180084   10.93390494   19.81382258   39.11971094   23.45938146  |
  5 |   4.65846483   7.487469145   16.07162155   24.18255927    47.5998852  |
    +-----------------------------------------------------------------------+

: rowsum(muhat)
         1
    +-------+
  1 |  100  |
  2 |  100  |
  3 |  100  |
  4 |  100  |
  5 |  100  |
    +-------+

: colsum(muhat)
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |  108.3293506   108.0124841   101.6885626   92.99214691   88.97745573  |
    +-----------------------------------------------------------------------+

: 
: muhat = muhat:/colsum(muhat):*100

: muhat
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |  53.77312166   20.03618942   9.533916866   7.125751731    4.25394051  |
  2 |  26.69522444   45.31651096   13.58936643   5.901824227   3.176853112  |
  3 |   9.07128942   17.59244318   41.58715886   18.89971865   12.70710414  |
  4 |  6.160085005   10.12281593   19.48480936   42.06775759   26.36553413  |
  5 |  4.300279474   6.932040503   15.80474848   26.00494781    53.4965681  |
    +-----------------------------------------------------------------------+

: rowsum(muhat)
                 1
    +---------------+
  1 |  94.72292019  |
  2 |  94.67977917  |
  3 |  99.85771425  |
  4 |   104.201002  |
  5 |  106.5385844  |
    +---------------+

: colsum(muhat)
         1     2     3     4     5
    +-------------------------------+
  1 |  100   100   100   100   100  |
    +-------------------------------+

: end
-------------------------------------------------------------------------------

Notice that each time we get a bit closer to our goal

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- Standardizing a table -------------------------------------------------------------------------------

Iterative Proportional Fitting

The algorithm is called Iterative Proportional Fitting (IPF)

We can automate this repeating using a loop, and in particular we want to continue the loop until the table does not change anymore.


. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = data

: muhat2 = 0:*data

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) {
>         muhat2 = muhat
>         muhat = muhat:/rowsum(muhat):*100
>         muhat = muhat:/colsum(muhat):*100
>         printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr
> eldif(muhat2,muhat))
>         i = i + 1
> }
iteration 1 relative change 196.018454
iteration 2 relative change .264736172
iteration 3 relative change .085845379
iteration 4 relative change .032427941
iteration 5 relative change .01302549
iteration 6 relative change .005266606
iteration 7 relative change .002120014
iteration 8 relative change .000852175
iteration 9 relative change .000342384
iteration 10 relative change .00013754
iteration 11 relative change .000055248
iteration 12 relative change .000022192
iteration 13 relative change 8.9140e-06
iteration 14 relative change 3.5805e-06
iteration 15 relative change 1.4382e-06
iteration 16 relative change 5.7769e-07
iteration 17 relative change 2.3204e-07
iteration 18 relative change 9.3206e-08
iteration 19 relative change 3.7438e-08
iteration 20 relative change 1.5038e-08
iteration 21 relative change 6.0404e-09

: muhat
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |   55.2594258   21.01486714   10.56463793    8.17117594   4.989893004  |
  2 |  27.29231899   47.28612524   14.98125546   6.732957746   3.707342411  |
  3 |  8.441187075   16.70825366   41.72879959   19.62468059   13.49707912  |
  4 |  5.378131668    9.02020621   18.34353466   40.98329233   26.27483526  |
  5 |  3.628936464   5.970547749   14.38177236   24.48789339   51.53085021  |
    +-----------------------------------------------------------------------+

: data
          1      2      3      4      5
    +------------------------------------+
  1 |  1378    600    314     87     55  |
  2 |  3864   7665   2528    407    232  |
  3 |   815   1847   4802    809    576  |
  4 |   276    530   1122    898    596  |
  5 |   387    729   1828   1115   2429  |
    +------------------------------------+

: end
-------------------------------------------------------------------------------

In the raw data the odds of men with lower voc marying a women with lower vocational instead of low is 4.6 times the odds of men with low education marying a women with lower vocational:


. di (7665 / 3864 ) / (600 / 1378 )
4.5558877

In our new table we get the exact same odds ratio:


. di (47.28612524 / 27.29231899) / (21.01486714 / 55.2594258)
4.5558877

Standardizing a table like this is nice in a teaching setting, because you can see what is going on. In a real analysis you use the >> stdtable package

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- Standardizing a table -------------------------------------------------------------------------------

Try it yourself

Use IPF to standardize the place of residence table (place.dta) to have margins of all 100.

raking_sol1.do

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- Standardization to compare tables -------------------------------------------------------------------------------

comparing tables

Say we want to compare the tables for the cohorts born in 1940-1945 (i.e. were 20 in 1960-1965) and born in 1960-1965 (i.e. were 20 in 1980-1985).

What if we standardize the table from 1940-1945 to have the margins of the table from 1960-1965?

This way we can easier compare across cohorts, and still have margins that are more realistic than all 100s.

We start with preparing the data and loading it into Mata


. use homogamy_allbus.dta, clear
(ALLBUS 1980 - 2016)

. gen coh = cond(inrange(byr, 1960, 1965), 1, ///
>           cond(inrange(byr, 1940, 1945), 0, .))
(36,808 missing values generated)

. label define coh 0 "1940-1945" 1 "1960-1965"

. label value coh coh

. label var coh "resp. birth cohort"

. tab meduc feduc if coh==0, matcell(data1940)

       male |              female education
  education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
        low |       146         81         36          9 |       278 
 lower voc. |       493      1,432        384         48 |     2,388 
medium voc. |        99        306        376         52 |       887 
higher voc. |        29         83        119         62 |       338 
 university |        75        157        312        113 |       955 
------------+--------------------------------------------+----------
      Total |       842      2,059      1,227        284 |     4,846 


            |   female
       male | education
  education | universit |     Total
------------+-----------+----------
        low |         6 |       278 
 lower voc. |        31 |     2,388 
medium voc. |        54 |       887 
higher voc. |        45 |       338 
 university |       298 |       955 
------------+-----------+----------
      Total |       434 |     4,846 


. tab meduc feduc if coh==1, matcell(data1960)

       male |              female education
  education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
        low |       120         44         56         23 |       249 
 lower voc. |       181        477        365         73 |     1,115 
medium voc. |        94        168      1,031        167 |     1,577 
higher voc. |        44         47        217        185 |       616 
 university |        30         34        223        174 |       818 
------------+--------------------------------------------+----------
      Total |       469        770      1,892        622 |     4,375 


            |   female
       male | education
  education | universit |     Total
------------+-----------+----------
        low |         6 |       249 
 lower voc. |        19 |     1,115 
medium voc. |       117 |     1,577 
higher voc. |       123 |       616 
 university |       357 |       818 
------------+-----------+----------
      Total |       622 |     4,375 


. mata
------------------------------------------------- mata (type end to exit) -----
: data1940 = st_matrix("data1940")

: data1960 = st_matrix("data1960")

: end
-------------------------------------------------------------------------------

We extract the desired row and column totals


. mata
------------------------------------------------- mata (type end to exit) -----
: col = colsum(data1960)

: row = rowsum(data1960)

: end
-------------------------------------------------------------------------------

We can now apply these row and column totals instead of 100.

We can see that a large part of the apparent difference between the cohorts is due to the change in distribution of education between the cohorts.


. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = data1940

: muhat2 = 0:*data1940

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) {
>         muhat2 = muhat
>         muhat = muhat:/rowsum(muhat):*row
>         muhat = muhat:/colsum(muhat):*col
>         printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr
> eldif(muhat2,muhat))
>         i = i + 1
> }
iteration 1 relative change 3.35926798
iteration 2 relative change .412621342
iteration 3 relative change .096720838
iteration 4 relative change .022674838
iteration 5 relative change .005356008
iteration 6 relative change .001267613
iteration 7 relative change .000300115
iteration 8 relative change .000071057
iteration 9 relative change .000016824
iteration 10 relative change 3.9832e-06
iteration 11 relative change 9.4306e-07
iteration 12 relative change 2.2328e-07
iteration 13 relative change 5.2864e-08
iteration 14 relative change 1.2516e-08
iteration 15 relative change 2.9633e-09

: data1940
          1      2      3      4      5
    +------------------------------------+
  1 |   146     81     36      9      6  |
  2 |   493   1432    384     48     31  |
  3 |    99    306    376     52     54  |
  4 |    29     83    119     62     45  |
  5 |    75    157    312    113    298  |
    +------------------------------------+

: muhat
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |  107.9401832   41.79372109    63.5105883    23.4738278   12.28167952  |
  2 |  206.3512558   418.3106688   383.5347872   70.87817787   35.92510995  |
  3 |  101.1983989    218.300937   917.1484403   187.5222911   152.8299328  |
  4 |  25.01236059    49.9609286   244.9159022   188.6511611   107.4596476  |
  5 |  28.49780156   41.63374457   282.8902821    151.474542   313.5036302  |
    +-----------------------------------------------------------------------+

: data1960
          1      2      3      4      5
    +------------------------------------+
  1 |   120     44     56     23      6  |
  2 |   181    477    365     73     19  |
  3 |    94    168   1031    167    117  |
  4 |    44     47    217    185    123  |
  5 |    30     34    223    174    357  |
    +------------------------------------+

: end
-------------------------------------------------------------------------------

Alternatively, we can use stdtable

. stdtable meduc feduc, by(coh,baseline(1))

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- Standardization to compare tables -------------------------------------------------------------------------------

Try it yourself

Standardize the tables in place.dta such that the margins for all cohorts correspond to the margins of the 1950 cohort.

raking_sol2.do

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- keep the margins as observed, but change the pattern -------------------------------------------------------------------------------

Independence revisited

We have until now keept the patern as observed in the data, and changed the margins. Can we not turn that around: Keep the margins as observed in the data, and change the pattern?

An interesting baseline pattern would be independence. We would start with a table that satisfies independence, and change the values such that the margins correspond to the observed margins. A table that satisfies independence is a table with all 1s.


. mata:
------------------------------------------------- mata (type end to exit) -----
: row = rowsum(data)

: col = colsum(data)

: muhat = J(5,5,1)

: muhat2 = 0:*muhat

: muhat
[symmetric]
       1   2   3   4   5
    +---------------------+
  1 |  1                  |
  2 |  1   1              |
  3 |  1   1   1          |
  4 |  1   1   1   1      |
  5 |  1   1   1   1   1  |
    +---------------------+

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) {
>         muhat2 = muhat
>         muhat = muhat:/rowsum(muhat):*row
>         muhat = muhat:/colsum(muhat):*col
>         printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr
> eldif(muhat2,muhat))
>         i = i + 1
> }
iteration 1 relative change .999570562
iteration 2 relative change 3.3466e-16

: muhat
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |  455.7519017    771.183761   718.4874474    224.891861   263.6850288  |
  2 |  2751.737858   4656.251665   4338.081975   1357.851598   1592.076904  |
  3 |  1656.922177   2803.699713   2612.118086   817.6121932   958.6478308  |
  4 |   640.748976   1084.219733   1010.133133   316.1791078   370.7190504  |
  5 |  1214.839087   2055.645128   1915.179359     599.46524   702.8711862  |
    +-----------------------------------------------------------------------+

: end
-------------------------------------------------------------------------------

. tab meduc feduc , exp nofreq

       male |              female education
  education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
        low |     455.8      771.2      718.5      224.9 |   2,434.0 
 lower voc. |   2,751.7    4,656.3    4,338.1    1,357.9 |  14,696.0 
medium voc. |   1,656.9    2,803.7    2,612.1      817.6 |   8,849.0 
higher voc. |     640.7    1,084.2    1,010.1      316.2 |   3,422.0 
 university |   1,214.8    2,055.6    1,915.2      599.5 |   6,488.0 
------------+--------------------------------------------+----------
      Total |   6,720.0   11,371.0   10,594.0    3,316.0 |  35,889.0 


            |   female
       male | education
  education | universit |     Total
------------+-----------+----------
        low |     263.7 |   2,434.0 
 lower voc. |   1,592.1 |  14,696.0 
medium voc. |     958.6 |   8,849.0 
higher voc. |     370.7 |   3,422.0 
 university |     702.9 |   6,488.0 
------------+-----------+----------
      Total |   3,888.0 |  35,889.0

Notice that the second iteration added nothing, so the IPF converged in one iteration. Also, the estimated counts correspond to the counts we computed last week.

This is not a coincidence:

In the first step of the first iteration, each cell gets 1/5 rowtotal

At the begining of the second step of the first iteration, the collumn totals are 1/5th of N

So we get: (rowtotal/5)/(N/5)*coltotal = (rowtotal*coltotal)/N

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- Standardization to known margins in the population -------------------------------------------------------------------------------

Margins in our sample and margins in the population

Samples often deviate from the population because of The way the sample was drawn the way the data was collected some people are harder to contact some people are less likely to participate

What if we had the marginal distriubtion of our variables from the population?

Can't we use the same trick to standardize our table to those population margins?

This is a classic application of raking, and is often used when computing (post-stratification) weights.

-------------------------------------------------------------------------------

<< index >>

-------------------------------------------------------------------------------

Computing post-stratification weights for the cohort 1940-1945

Lets get the observed data again and take a look at the population margins


. use homogamy_allbus, clear
(ALLBUS 1980 - 2016)

. tab meduc feduc if inrange(byr,1940,1945), matcell(data1940)

       male |              female education
  education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
        low |       146         81         36          9 |       278 
 lower voc. |       493      1,432        384         48 |     2,388 
medium voc. |        99        306        376         52 |       887 
higher voc. |        29         83        119         62 |       338 
 university |        75        157        312        113 |       955 
------------+--------------------------------------------+----------
      Total |       842      2,059      1,227        284 |     4,846 


            |   female
       male | education
  education | universit |     Total
------------+-----------+----------
        low |         6 |       278 
 lower voc. |        31 |     2,388 
medium voc. |        54 |       887 
higher voc. |        45 |       338 
 university |       298 |       955 
------------+-----------+----------
      Total |       434 |     4,846 


. use margins1940, clear

. list

     +-----------------------------------------+
     | female                   ed           p |
     |-----------------------------------------|
  1. |   male                basic   .14092565 |
  2. |   male    vocational, lower   .53389831 |
  3. |   male   vocational, middle   .12625549 |
  4. |   male   vocational, higher   .03011818 |
  5. |   male           university   .16880237 |
     |-----------------------------------------|
  6. | female                basic   .33888268 |
  7. | female    vocational, lower   .40107984 |
  8. | female   vocational, middle   .16634283 |
  9. | female   vocational, higher   .02550338 |
 10. | female           university   .06819128 |
     +-----------------------------------------+

Lets get the data and the desired margins in Mata and compare them with the observed margins.

Notice the ' at the end of the line starting with col. This turns the columnvector col into a rowvector.


. mata
------------------------------------------------- mata (type end to exit) -----
: data1940 = st_matrix("data1940")

: row = st_data((1,5),3)

: col = st_data((6,10),3)'

: n = sum(data1940)

: row = row:*n

: col = col:*n

: row
                 1
    +---------------+
  1 |  682.9257144  |
  2 |  2587.271186  |
  3 |   611.834118  |
  4 |  145.9527007  |
  5 |  818.0162805  |
    +---------------+

: rowsum(data1940)
          1
    +--------+
  1 |   278  |
  2 |  2388  |
  3 |   887  |
  4 |   338  |
  5 |   955  |
    +--------+

: col
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |  1642.225466   1943.632883   806.0973572   123.5893736   330.4549203  |
    +-----------------------------------------------------------------------+

: colsum(data1940)
          1      2      3      4      5
    +------------------------------------+
  1 |   842   2059   1227    284    434  |
    +------------------------------------+

: end
-------------------------------------------------------------------------------

Now we can apply the same trick as before.


. mata
------------------------------------------------- mata (type end to exit) -----
: muhat = data1940

: muhat2 = 0:*data1940

: i = 1

: while(i<30 & mreldif(muhat2,muhat)>1e-8) {
>         muhat2 = muhat
>         muhat = muhat:/rowsum(muhat):*row
>         muhat = muhat:/colsum(muhat):*col
>         printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mr
> eldif(muhat2,muhat))
>         i = i + 1
> }
iteration 1 relative change 3.15360994
iteration 2 relative change .345717331
iteration 3 relative change .06376476
iteration 4 relative change .016214917
iteration 5 relative change .004553063
iteration 6 relative change .001271767
iteration 7 relative change .000354736
iteration 8 relative change .000098909
iteration 9 relative change .000027576
iteration 10 relative change 7.6877e-06
iteration 11 relative change 2.1432e-06
iteration 12 relative change 5.9750e-07
iteration 13 relative change 1.6657e-07
iteration 14 relative change 4.6438e-08
iteration 15 relative change 1.2946e-08
iteration 16 relative change 3.6092e-09

: data1940
          1      2      3      4      5
    +------------------------------------+
  1 |   146     81     36      9      6  |
  2 |   493   1432    384     48     31  |
  3 |    99    306    376     52     54  |
  4 |    29     83    119     62     45  |
  5 |    75    157    312    113    298  |
    +------------------------------------+

: muhat
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |  474.5208782   143.3508106   47.95215625   8.169263599   8.932605961  |
  2 |  875.0072936   1383.950116   279.3181446   23.79271058   25.20292251  |
  3 |  131.9710065   222.1147631   205.4160385   19.35907528   32.97323446  |
  4 |  26.30712883   40.99833415   44.24106626   15.70742786   18.69874349  |
  5 |  134.4191584   153.2188596   229.1699516   56.56089627   244.6474139  |
    +-----------------------------------------------------------------------+

: end
-------------------------------------------------------------------------------

So if the margins in our data corresponded with the margins in the population then we would expect to find 475 couples with both low education, but in our data we only found 146 such couples.

So a single couple with both low education in our data stands for 475/146=3.3 observations in the table with the population margins.

This 3.3 is our post-stratification weight


. use homogamy_allbus
(ALLBUS 1980 - 2016)

. mata
------------------------------------------------- mata (type end to exit) -----
: muhat:/data1940
                 1             2             3             4             5
    +-----------------------------------------------------------------------+
  1 |  3.250143001   1.769763094    1.33200434   .9076959554    1.48876766  |
  2 |  1.774862664   .9664456116   .7273910014   .4956814704   .8129975004  |
  3 |   1.33304047    .725865239   .5463192514   .3722899092    .610615453  |
  4 |  .9071423736   .4939558331   .3717736661   .2533456107   .4155276332  |
  5 |  1.792255446   .9759163033   .7345190755   .5005389051   .8209644761  |
    +-----------------------------------------------------------------------+

: st_matrix("weights", muhat:/data1940)

: end
-------------------------------------------------------------------------------

. gen weight = weights[meduc,feduc] if inrange(byr,1940,1945)
(43,748 missing values generated)

. tab meduc feduc if inrange(byr, 1940, 1945) [iweight=weight]

       male |              female education
  education |       low  lower voc  medium vo  higher vo |     Total
------------+--------------------------------------------+----------
        low | 474.52089  143.35081  47.952155  8.1692635 | 682.92572 
 lower voc. |875.007285   1,383.95  279.31815 23.7927103 |2,587.2712 
medium voc. | 131.97101  222.11476  205.41604  19.359075 | 611.83412 
higher voc. |  26.30713  40.998333  44.241066  15.707428 |  145.9527 
 university | 134.41916  153.21886  229.16995  56.560894 | 818.01627 
------------+--------------------------------------------+----------
      Total | 1,642.225  1,943.633  806.09735  123.58937 |     4,846 


            |   female
       male | education
  education | universit |     Total
------------+-----------+----------
        low | 8.9326057 | 682.92572 
 lower voc. | 25.202923 |2,587.2712 
medium voc. | 32.973233 | 611.83412 
higher voc. | 18.698744 |  145.9527 
 university | 244.64741 | 818.01627 
------------+-----------+----------
      Total | 330.45491 |     4,846

-------------------------------------------------------------------------------

<< index

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- digression -------------------------------------------------------------------------------

stdtable

This method has been implemented in Stata as the stdtable command.

. stdtable meduc feduc

----------------------------------------------------------------------------- male | female education education | low lower voc. medium voc. higher voc. university ------------+---------------------------------------------------------------- low | 55.3 21 10.6 8.17 4.99 lower voc. | 27.3 47.3 15 6.73 3.71 medium voc. | 8.44 16.7 41.7 19.6 13.5 higher voc. | 5.38 9.02 18.3 41 26.3 university | 3.63 5.97 14.4 24.5 51.5 | Total | 100 100 100 100 100 -----------------------------------------------------------------------------

-------------------------------------------------------------------------------

<< index

-------------------------------------------------------------------------------

------------------------------------------------------------------------------- ancillary -------------------------------------------------------------------------------

Can all tables be standardized?

Consider the following table

0 0 2 1 5 2 8 7 0 In order to make the first row total 100, the top right cell must be 100

In order to make the last column total 100, the top right cell cannot be 100

This is an example of a table that cannot be standardized. The Mata program we created above will stop after 30 iterations, but the condition mreldif(muhat2,muhat)>1e-8 will not be met. In other words the algorithm has not converged. The stdtable command will give a more explicit warning when that happens.

-------------------------------------------------------------------------------

index

-------------------------------------------------------------------------------

raking_sol1.do

use place.dta, clear
tab place15 place30, matcell(data)
mata
data = st_matrix("data")
muhat = data
muhat2 = 0:*data
i = 1
while(i<30 & mreldif(muhat2,muhat)>1e-8) {
	muhat2 = muhat
	muhat = muhat:/rowsum(muhat):*100
	muhat = muhat:/colsum(muhat):*100
	printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mreldif(muhat2,muhat))
	i = i + 1
}
muhat
data
end
stdtable place15 place30

-------------------------------------------------------------------------------

<<

-------------------------------------------------------------------------------

raking_sol2.do

use place.dta, clear
tab place15 place30 if coh==30, matcell(data30)
tab place15 place30 if coh==40, matcell(data40)
tab place15 place30 if coh==50, matcell(data50)
mata
data30 = st_matrix("data30")
data40 = st_matrix("data40")
data50 = st_matrix("data50")
row = rowsum(data50)
col = colsum(data50)
muhat = data30
muhat2 = 0:*data
i = 1
while(i<30 & mreldif(muhat2,muhat)>1e-8) {
	muhat2 = muhat
	muhat = muhat:/rowsum(muhat):*row
	muhat = muhat:/colsum(muhat):*col
	printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mreldif(muhat2,muhat))
	i = i + 1
}
muhat30 = muhat
muhat = data40
muhat2 = 0:*data
i = 1
while(i<30 & mreldif(muhat2,muhat)>1e-8) {
	muhat2 = muhat
	muhat = muhat:/rowsum(muhat):*row
	muhat = muhat:/colsum(muhat):*col
	printf("{txt}iteration {res}%f {txt}relative change {res}%f\n", i, mreldif(muhat2,muhat))
	i = i + 1
}
muhat40 = muhat
muhat30
muhat40
data50
end
stdtable place15 place30, by(coh,baseline(50))

-------------------------------------------------------------------------------

<<

-------------------------------------------------------------------------------