Many of you may have heard of ChatGPT, a dazzling new AI tool. We are hearing lots of gushing praise for the tool. Well, how well does it do in data science contexts? I tried a few queries here, and found interesting results.

I first requested, “Write an R function that returns every other element of a vector x, starting with the third.” I won’t show the code here, but suffice it to say that it worked! It did give me correct code.

Next, “In R’s built-in dataset CO2, fit a linear regression with ‘uptake’ as the outcome variable, and find the standard error of the intercept term.” Alas, it begged ignorance: “…I do not have the ability to perform calculations or access specific datasets…” Not really true; unfortunately, ChatGPT was not able to pick up the key phrase, “built-in.”

I then gave it what turned out to be an easier question, ” In the output of R’s ‘lm’ function, how do I find the standard error of the intercept term?” It returned a correct and useful answer, stating the ‘lm’ returns lots of material, “…including the standard errors of the model coefficients. The standard error of the intercept term can be accessed by calling the summary function on the model object and looking at the output…”

Next: “In R, what are the advantages of using the density() function in iieu of hist()?” The reply had some correct ingredients, but had problems:

“…the resulting plot [from density()] is smoother and more continuous, which can make it easier to see the underlying distribution of the data. This can be especially useful when the data is noisy or when the number of data points is small…The density function allows you to specify the bandwidth, or the smoothness, of the curve, which can be useful for highlighting different features of the data. In contrast, the hist function does not allow you to control the smoothness of the resulting plot.”

Putting aside the fact that the phrase “more continuous” is nonsense, there are other issues. It is not necessarily true that density() will do better on small/noisy data. And the app misses the point that hist() does have an analog of the bandwidth, namely the bin width.

Finally, I asked a question that every statistician is asked by non-stat people: “How can public opinion polls be accurate if they are based on only a small fraction of the population?” The response (not shown here) notes some standard practices such as stratified sampling, but basically begs the question.

Say for example 1200 people are sampled, as is common, and we stratify by race, That would mean we design the sample to include about 160 Black people. But anyone who wondered about the small sample overall would be even more puzzled as to why 160 African-Americans is “representative.”

So in this case, ChatGPT would give a very misleading answer to an important, common question.

And we see that machines can fail Statistics, just like college students. 🙂