The Generalized Method of Moments and the gmm package

An almost-as-famous alternative to the famous Maximum Likelihood Estimation is the Method of Moments. MM has always been a favorite of mine because it often requires fewer distributional assumptions than MLE, and also because MM is much easier to explain than MLE to students and consulting clients. CRAN has a package gmm that does MM, actually the Generalized Method of Moments, and in this post I’ll explain how to use it (on the elementary level, at least).

Our data set will be bodyfat, which is included in the mfp package, with measurements on 252 men. The first column of this data frame, brozek, is the percentage of body fat, which when converted to a proportion is in (0,1). That makes it a candidate for fitting a beta distribution.

Denoting the parameters of that family by α and β, the mean and variance are α / (α + β) and α β / ((α + β)2 (α + β + 1)), respectively. MM is a model of simplicity — we match these to the sample mean and variance of our data, and then solve for α and β. Of course, the solution part may not be so simple, due to nonlinearities, but gmm worries about all that for us. Yet another tribute to the vast variety of packages available on CRAN!

In our elementary usage here, the call form is


where data is our data in matrix or vector form, momentftn specifies the moments, and start is our initial guess for the iterative solution process. Let’s specify momentftn:

g <- function(th,x) {
  t1 <- th[1]
  t2 <- th[2]
  t12 <- t1 + t2
  meanb <- t1 / t12
  m1 <- meanb - x
  m2 <- t1*t2 / (t12^2 * (t12+1)) - (x - meanb)^2
  f <- cbind(m1,m2)

This function equates population moments to sample ones, by specifying expressions that gmm() is to set to 0. The argument th here (“theta”) will be the MM estimates (at any given iteration) of the population parameters, in this case of α and β.

The function is required to specify quantities whose averages are to be set to 0. So, in the line

m1 <- meanb – x

we are saying that we want the average of the right-hand side to be 0, where x is our data. That has the effect of setting our current iteration’s estimate of α / (α + β) to our sample mean — exactly what MM is supposed to do. The line assigning to m2 then does the same thing for variance.

So, let’s try all this out on the body fat data:

> gmm(g,x,c(alpha=0.1,beta=0.1))

Objective function value: 2.559645e-10 

 alpha beta 
 4.6714 19.9969 

Convergence code = 0 
> hist(bodyfat$brozek/100,xlim=c(0,1),
> curve(dbeta(x,4.67,20.00),add=TRUE)

The result looks like this (apologies for the lack of refinement in this quick graph, cutting off part of the top):

At least visually, it seems to be a pretty good fit.

For standard errors etc., a method for the generic function vcov() is provided:

> gmmout <- gmm(g,x,c(alpha=0.1,beta=0.1))
> vcov(gmmout)
          alpha beta
alpha 0.2809361 0.9606354
beta 0.9606354 3.9266874

Happy GMM-ing!

Lots more posts coming, when I have time.


10 thoughts on “The Generalized Method of Moments and the gmm package”

  1. I’m pretty sure you should be using more moments and a weighting matrix for your objective function. What you describe is not GMM since there is no estimate of goodness of fit and no means of testing your parameters. If the model works you should be able to fit any number of moments and GMM requires at least one more than the number of parameters. The objective function is quadratic in these moments (or Euler equations/first order conditions) and in large sample is normal and the weighting function is chosen to approximate the covar matrix.).

    GMM depends on large sample and central limit theorem to make the machinery work but is very general bedside. I don’t much care for it since in MLE you have a sense of the geometry of the likelihood and even Bayesians can agree that the MLE and asymptotic parameter var-covar matters while GMM seems unrooted from this all. Easy to fit,but only making sense in large sample.

    1. I believe that for ordinary MM, not GMM, all this should be fine. (I tried it on simulated data as a check, and it was fine.) GMM is much more complex.

      Your comments on MM vs. MLE are interesting, but my viewpoint is different. Geometric considerations are aesthetically pleasing in a theoretical context, but I tend to look at things more pragmatically.

      Again, I can’t say much about GMM, but the asymptotics of MLE stem from the same considerations as those of MM, i.e. Taylor expansion and the CLT. I may be misunderstanding your point, though, in which case please rephrase it.

  2. One weird thing about this package is that it assumes the data are time series and allow correlation across observations and estimate the weight matrix allowing for that. But in your case and others working with micro-data, we can assume independence across observations unless clusters are specified. Then their method seems to have wasted time on accounting for correlations in nearby observations and may make the estimator less efficient. Do you have remedy about that?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.