Hobson's Choice
Comment & Analysis from a Passionate Amateur
Why Hobson's Choice? Web Log Navigation Archives Links Track

Search Hobson's Choice:

Google:

Yahoo:

MSN:

free script provided by

Blog Flux Directory



Income Inequality-1: Modeling Income Distribution

January 24, 2005

[ 1 | 2 | 3 | 4 | 5 | 6 ]

Now that I've attracted an impressive commentariat again, it's time to return to the business of writing about economics. For years researchers interested in the matter have been arguing that income distribution—both within countries and among them—has suffered a notable increase in concentration. This post will focus on intra-national income inequality; later I'd like to address inter-national income inequality and discuss if they are linked.

First, how is income distributed? Usually analysts assume that the logarithm of income is normally distributed. Here is an illustration of a normal distribution. Notice the x-axis is the "distance" away from the median, where "distance" is measured by standard deviations σ (σ a measure of how much the population varies from the median. If we're speaking of the length of platinum bars tooled to be precisely one meter in length, then the standard deviation may well be less than one μ-meter. If we're talking the mass of bodies in the solar system, then σ will be immense because there is so much variation).

This is a normal curve, commonly referred to as a "bell curve"; and indeed, there is a very strong tendency for things like mass, height, and so forth to have a normal distribution. The y-axis, incidentally, represents the probability density, so we try to establish the probability that something is between x0σ and x1σ by the area under the curve between those two points (as illustrated, the magenta area).
The next figure (above) is a graph of the inverse function. Here, the x-axis is the share of the population; the y-axis is the distribution of σ corresponding to the share of the population. So, for example, the 2nd percentile of the population is -2σ below the median; the 50th percentile will be at the median, by definition. When dealing with income, the y-axis represents the natural log of income, and in the United States σ = 0.795 (that means 67.7% of the US population has an income between 46% and 216% of the median).

For most readers, the idea of measuring something by multiples of the median is a new concept, but when measuring inequality and comparing it to other countries, this is really the only way. In terms of purchasing power parity (PPP), the USA enjoys a median income greater than that of every other major nation, yet our σ of income is unusually high for an industrial nation; by way of comparison, the σ for France is 0.65 (53%-187%); for Germany, 0.55 (59%-171%); for Japan, 0.42 (67%-150%).

We can replace the curve shown above with the one below. The one above is the natural log of income ratios to the median; the one below is income by percentile (where 1 equals the median income—about $29,700 for the USA). Notice I chose to save space by making the top value 6 (times the median income, or $178,200 per year.

As we all know, incomes in the USA are not normally distributed at the high end: if they were, only one person in 1.29x1015, or 1 person in 4 million USA's, would make so much as $15 million per year. I chose that number arbitrarily; Excel cannot handle higher values.
I plotted income distribution for a normal-log distribution to get the Lorentz curve below. The Lorentz Curve correlates percentage of income to percentage of population. As we can see, in the graph of a country with a US-style distribution of income, the bottom 50% of the population has 20% of income; the top 20% of the population accrues 50% of the total income. The straight diagonal line reflects a perfectly flat distribution of income. We take the integral of the area between the two curves (A) and divide it by the area under the diagonal (A+B) to get the Gini Coefficient (A/[A+B]).

Here are the Gini Coefficients (CG) for selected countries (Wiki).

These graphs and the ratios deduced from them reflect an idealized version of the country, very close to reality for the majority but a bit skewed for the extremes.1 In the USA, for example, the Lorentz curve is distorted from the one illustrated by an extreme concentration of income at the 98th percentile, offset by a slightly lower share of income held by the 75th-97th percentiles. In fact, income levels for the 98th percentile are about $150 K per annum, but at higher levels this rapidly climbs far above any normal concentration.

This invites the question: suppose income is not normally distributed? Naturally, I've attempted to test for other mathematical models. One of these was the idea that there are two groups in the USA, each of which has a different median income (B, representing the 12.5% of the population, with a median and mean income 55% that of group A, which represents 87.5% of the population). One rule of thumb was that 1.5% of Group B had to be in the 95th percentile of income.2 As we shall see later, attempting to model national income distribution along racial lines was to have significant shortcomings.

The main shortcoming of this is that, if the population really were composed of two—or three, or more—disparate groups, all with normally distributed income (but with different medians), then this would have the same CG as the constituent populations. In other words, if the African American population has a median income 60% that of European Americans, but the σ income is the same, then this is obviously an injustice; yet it is an injustice that the CG will not show. Ironically, if the σ of income for African Americans is smaller, perhaps because their population is crowded into a narrower range of occupations, then CG will be lower (i.e., reflecting more equality) than the CG for the white population alone.

One additional quirk of this income distribution model: notice I keep using the median, rather than the mean (the way you were probably taught). The reason I do, is that I'm working with non-negative numbers that vary greatly in size; only a minority will have an income more than the mean, and that minority will be significantly smaller than 50%. If I use the mean as the basis upon which to calculate the standard deviation, then the curve has a poorer predictive power than if I use the median; the variance estimates generated will be excessively large, while the proportion of variance, i.e., the ratio of the standard deviation to the mean, will be excessively small (because the numerator will be too big). A corollary to this is that, if we hold the median of income for each ethnic cohort constant, and reduce the σ of income, then the average (mean) income will decrease sharply; obviously, income within that cohort will become more equal. So if the σ of income for an ethnic group is reduced, reducing that cohort's share of the national income, then the Gini Coefficient will go down, suggesting greater equality.

(Part 2)


NOTE: 1 In order to "test" or "model" these, I used an Excel spreadsheet. Column A had a running count from 0 to 1, with increments of 0.05; Column B had the NORMSINV of Column A. Column C had a function "=0.72*B" for the income, and the mathematically-inclined reader can easily imagine how I computed the Gini coefficient (CG) from that. Naturally, I chose the ratio "0.795" based on the fact that it yielded a CG of 0.404.

2 In order to model this, I repeated the process described in n1 above, this time with a median of 0.55 and a σ of .87. This implies that Group B has greater variance in income distribution than Group A. Then I used a weighted average of income for the two cohorts. After tweaking, the CG for Group A was 0.393, while that for Groups A and B was 0.404. Income distribution by percentage cohort matched that of African Americans (Group B) and non-AA (Group B) very closely.