Who invented the concept of over-fitting?



I list the references that I found so far. Shortly, the first appearance of the term was in 1670, first appearance in in close meaning was in 1827, first appearance in a biological paper was in 1923 and first appearance in statistics was in 1935. However, the references indicate that there are gaps in this chronology.

Earliest reference I found was The flying pen-man; or, The art of short-writing by William Hopkins (teacher of stenography.) in 1670. However, it is a table of words with "overfit" as one of them and the context or meaning is not clear.

I check the other references before 1800 and those were transcription errors, "over, fit" or "overfet".

The earliest appearance in context I found was in "The Enigmatical Entertainer and Mathematical Associate ..., Issue 1" in 1827 saying: "I , though for poetry not overfit , 7 Will boldly dare to lay down rules for wit ;" hence the meaning of overfit here is "fit too well" but in a positive meaning.

It seems that Darwin's "On the Origin of Species" in 1859 made the concept of fitting popular.

Bray, Charles I. "Fitting livestock for show." Bulletin (Colorado Agricultural College. Extension Service); 171A (1923). : "They believe that these are almost equally valuable, and that there is less danger of over-fitting the pigs than with whole milk."

The 1934 paper Twenty-Two Years of the Eastern Percheron Futurity mentions the penalty of over-fitting yet regarding stallions.

The Quarterly Review of Biology Sep 1935 Volume 10, Number 3pp. 341 – 377, saying : "Perhaps we are old fashioned but to us a six-variate analysis based on thirteen observations seems rather like overfitting". So it was well known and maybe a bit old-fashioned during 1935 already.

First statistics reference to over fitting I found appears in "Tests of Fit in Time Series", 1952. Yet it says: " Such a graduation implies a gross overfitting, but this can be allowed for." Using overfitting as a well known concept.

The wikipedia article on overfitting references to Oxford dictionary entry that claims: "Origin 1930s; earliest use found in Quarterly Review of Biology. From over- + fitting."

It is possible that the reference is The Quarterly Review of Biology Sep 1935 Volume 10, Number 3pp. 341 – 377, yet I couldn't verify it.

I couldn't find the paper in Google Scholar. Looking in Google ngram viewer it seems that the usage started around 1885. Looking at Google book for that period shows references about sheep.

I'll be happy to learn that sheep are the origin of a central concept in AI but I'd like to see references. Reference not related to sheep are welcome too.


Posted 2020-11-24T06:30:54.983

Reputation: 2 463


I am not familiar with the Google ngram viewer but is it showing on this graph that there was some usage of this word in the 18th century? https://books.google.com/ngrams/graph?content=overfitting&year_start=1700&year_end=1800&corpus=26&smoothing=0&direct_url=t1%3B%2Coverfitting%3B%2Cc0#t1%3B%2Coverfitting%3B%2Cc0

– Vladislav Gladkikh – 2020-11-30T01:21:34.227

Nice! I updated the question by that. – DaL – 2020-11-30T07:16:33.557



Yes! It seems like first statistics reference to over fitting appears in The Quarterly Review of Biology . It says "Perhaps we are old fashioned but to us a six-variate analysis based on thirteen observations seems rather like overfitting.

I am attaching the screenshot of that particular page for your reference. Screenshot

Vidya Ganesh

Posted 2020-11-24T06:30:54.983

Reputation: 176

1Cool! The range is smaller now. However, they also refer to overfitting like a well known concept. It probably appeared earlier. Thanks! – DaL – 2020-12-02T05:46:23.683