67

16

Lots of people use the term *big data* in a rather *commercial* way, as a means of indicating that large datasets are involved in the computation, and therefore potential solutions must have good performance. Of course, *big data* always carry associated terms, like scalability and efficiency, but what exactly defines a problem as a *big data* problem?

Does the computation have to be related to some set of specific purposes, like data mining/information retrieval, or could an algorithm for general graph problems be labeled *big data* if the dataset was *big enough*? Also, how *big* is *big enough* (if this is possible to define)?

14"Anything too big to load into Excel" is the running joke. – Spacedman – 2014-06-11T12:07:51.477

It's precisely 1 GB. That's the cutoff in the rule book. There is no room for ambiguity. – Hack-R – 2016-06-18T02:14:37.553

This is an excellent question. As denoted by the variety of answer, the definition is... undefined – Manu H – 2016-08-09T10:13:15.123

7A nice article about when your data starts to be too big for normal usage chrisstucchio.com/blog/2013/hadoop_hatred.html – Johnny000 – 2014-05-14T07:48:10.370

1That depends on whether it is just being thrown in as a buzzword. – John Robertson – 2015-04-21T20:52:25.963