An almost-as-famous alternative to the famous Maximum Likelihood Estimation is the Method of Moments. MM has always been a favorite of mine because it often requires fewer distributional assumptions than MLE, and also because MM is much easier to explain than MLE to students and consulting clients. CRAN has a package **gmm** that does MM, actually the Generalized Method of Moments, and in this post I’ll explain how to use it (on the elementary level, at least).

Our data set will be **bodyfat**, which is included in the **mfp** package, with measurements on 252 men. The first column of this data frame, **brozek**, is the percentage of body fat, which when converted to a proportion is in (0,1). That makes it a candidate for fitting a beta distribution.

Denoting the parameters of that family by α and β, the mean and variance are α / (α + β) and α β / ((α + β)^{2} (α + β + 1)), respectively. MM is a model of simplicity — we match these to the sample mean and variance of our data, and then solve for α and β. Of course, the solution part may not be so simple, due to nonlinearities, but **gmm** worries about all that for us. Yet another tribute to the vast variety of packages available on CRAN!

In our elementary usage here, the call form is

```
gmm(data,momentftn,start)
```

where **data** is our data in matrix or vector form, **momentftn** specifies the moments, and **start** is our initial guess for the iterative solution process. Let’s specify **momentftn**:

```
g <- function(th,x) {
t1 <- th[1]
t2 <- th[2]
t12 <- t1 + t2
meanb <- t1 / t12
m1 <- meanb - x
m2 <- t1*t2 / (t12^2 * (t12+1)) - (x - meanb)^2
f <- cbind(m1,m2)
return(f)
}
```

This function equates population moments to sample ones, by specifying expressions that **gmm()** is to set to 0. The argument **th** here (“theta”) will be the MM estimates (at any given iteration) of the population parameters, in this case of α and β.

The function is required to specify quantities whose averages are to be set to 0. So, in the line

m1 <- meanb – x

we are saying that we want the average of the right-hand side to be 0, where **x** is our data. That has the effect of setting our current iteration’s estimate of α / (α + β) to our sample mean — exactly what MM is supposed to do. The line assigning to **m2** then does the same thing for variance.

So, let’s try all this out on the body fat data:

```
> gmm(g,x,c(alpha=0.1,beta=0.1))
Method
twoStep
Objective function value: 2.559645e-10
alpha beta
4.6714 19.9969
Convergence code = 0
> hist(bodyfat$brozek/100,xlim=c(0,1),
probability=TRUE)
> curve(dbeta(x,4.67,20.00),add=TRUE)
```

The result looks like this (apologies for the lack of refinement in this quick graph, cutting off part of the top):

x

At least visually, it seems to be a pretty good fit.

For standard errors etc., a method for the generic function **vcov()** is provided:

```
> gmmout <- gmm(g,x,c(alpha=0.1,beta=0.1))
> vcov(gmmout)
alpha beta
alpha 0.2809361 0.9606354
beta 0.9606354 3.9266874
```

Happy GMM-ing!

Lots more posts coming, when I have time.

I’m pretty sure you should be using more moments and a weighting matrix for your objective function. What you describe is not GMM since there is no estimate of goodness of fit and no means of testing your parameters. If the model works you should be able to fit any number of moments and GMM requires at least one more than the number of parameters. The objective function is quadratic in these moments (or Euler equations/first order conditions) and in large sample is normal and the weighting function is chosen to approximate the covar matrix.).

GMM depends on large sample and central limit theorem to make the machinery work but is very general bedside. I don’t much care for it since in MLE you have a sense of the geometry of the likelihood and even Bayesians can agree that the MLE and asymptotic parameter var-covar matters while GMM seems unrooted from this all. Easy to fit,but only making sense in large sample.

I believe that for ordinary MM, not GMM, all this should be fine. (I tried it on simulated data as a check, and it was fine.) GMM is much more complex.

Your comments on MM vs. MLE are interesting, but my viewpoint is different. Geometric considerations are aesthetically pleasing in a theoretical context, but I tend to look at things more pragmatically.

Again, I can’t say much about GMM, but the asymptotics of MLE stem from the same considerations as those of MM, i.e. Taylor expansion and the CLT. I may be misunderstanding your point, though, in which case please rephrase it.

How exactly you compute `x`? Would it be possible for you to add that par of the code?

It is your data, but it is supplied by gmm().

One weird thing about this package is that it assumes the data are time series and allow correlation across observations and estimate the weight matrix allowing for that. But in your case and others working with micro-data, we can assume independence across observations unless clusters are specified. Then their method seems to have wasted time on accounting for correlations in nearby observations and may make the estimator less efficient. Do you have remedy about that?

I have not looked at the package lately, so I can’t say much, but I suspect it’s not an issue.