Tag: bigdata

91 How big is big data? 2014-05-14T03:56:20.963

51 Is the R language suitable for Big Data 2014-05-14T11:15:40.907

49 How to deal with version control of large amounts of (binary) data 2015-02-13T10:09:25.177

45 Data Science in C (or C++) 2015-03-20T14:56:23.420

41 Opening a 20GB file for analysis with pandas 2018-02-13T14:03:39.623

37 Do I need to learn Hadoop to be a Data Scientist? 2014-06-10T06:20:20.817

33 How to do SVD and PCA with big data? 2014-09-25T08:40:59.467

25 Data Science Project Ideas 2014-07-25T18:36:31.340

23 Improve the speed of t-sne implementation in python for huge data 2016-02-06T14:19:10.243

18 Uses of NoSQL database in data science 2014-07-21T13:41:13.427

17 Use liblinear on big data for semantic analysis 2014-05-14T01:57:56.880

14 When are p-values deceptive? 2014-05-14T22:12:37.203

14 Big data case study or use case example 2014-06-11T06:07:45.767

14 Looking for example infrastructure stacks/workflows/pipelines 2014-06-17T10:37:22.987

14 Is Python suitable for big data 2014-07-18T22:34:48.080

13 When a relational database has better performance than a no relational 2014-05-17T04:53:03.913

13 Is FPGrowth still considered "state of the art" in frequent pattern mining? 2014-07-12T17:25:52.907

13 Can we take of benefit of using transfer learning while training a word2vec models? 2016-03-10T21:01:20.697

12 How does a query into a huge database return with negligible latency? 2014-05-15T11:22:27.293

12 Tradeoffs between Storm and Hadoop (MapReduce) 2014-06-01T10:25:51.163

12 Preference Matching Algorithm 2014-06-18T22:10:58.497

12 What is an 'old name' of data scientist? 2015-02-28T22:10:58.473

11 Working with HPC clusters 2014-07-08T13:45:07.583

10 Why is it hard to grant efficiency while using libraries? 2014-05-18T14:02:51.350

10 Handling a regularly increasing feature set 2014-06-30T09:43:01.940

10 How do various statistical techniques (regression, PCA, etc) scale with sample size and dimension? 2014-08-05T18:36:12.753

10 Scalable Outlier/Anomaly Detection 2014-10-17T10:47:13.197

10 Which is faster: PostgreSQL vs MongoDB on large JSON datasets? 2015-06-03T20:29:40.490

10 Machine Learning Best Practices for Big Dataset 2016-09-07T22:40:00.723

10 Avoid reloading DataFrame between different python kernels 2017-01-17T23:08:13.620

10 Difference between interpolate() and fillna() in pandas 2017-12-23T08:03:03.610

9 Human activity recognition using smartphone data set problem 2014-05-27T10:41:33.220

8 Cascaded Error in Apache Storm 2014-06-01T12:51:25.040

8 How to compare experiments run over different infrastructures 2014-06-15T00:00:51.657

8 Filtering spam from retrieved data 2014-06-15T15:11:29.970

8 Original Meaning of "Intelligence" in "Business Intelligence" 2015-09-05T16:42:25.473

8 Understanding how distributed PCA works 2017-04-19T08:58:18.707

7 Lambda Architecture - How to implement the Merge Layer / Query Layer 2015-01-02T20:03:59.950

7 How will ADA Boost be used for solving regression problems? 2015-08-31T05:45:00.513

7 How to deal with large training data? 2016-11-28T05:47:53.123

7 Can one build linear models on "chunks" of the data set, if one can't build them on the entire data set? 2018-05-10T15:23:52.593

7 Computational aspects are typically ignored by statisticians 2018-07-19T08:46:10.267

6 Which Big Data technology stack is most suitable for processing tweets, extracting/expanding URLs and pushing (only) new links into 3rd party system? 2014-05-15T00:39:33.433

6 Is Data Science just a trend or is a long term concept? 2014-05-18T19:46:44.653

6 Looking for a strong Phd Topic in Predictive Analytics in the context of Big Data 2014-09-25T20:18:46.880

6 How to detect overfitting of a stock screener 2015-03-02T23:02:45.583

6 What's an efficient way to compare and group millions of store names? 2015-08-20T20:46:07.740

6 Classifying transactions as malicious 2015-09-15T03:15:28.163

6 Classifier and Technique to use for large number of categories 2015-09-26T11:58:37.963

6 Random Forests with Big Data - number of trees v. number of observations 2015-11-02T15:42:45.377

6 Database options for JSON storage, queried with Apache Drill 2015-12-31T19:18:24.280

6 Git for Deep Learning - what are the best tools for versioning/tracking machine learning experiments? 2018-08-02T06:51:32.400

6 Differences between big data, data warehousing, business intelligence and data science? 2018-10-01T17:21:42.240

6 XGBoost Huge Dataset ~1TB 2019-06-15T08:05:34.913

5 Efficient solution of fmincg without providing gradient? 2014-06-21T04:59:06.620

5 What technologies are fastest at performing joins on large datasets? 2014-11-09T14:11:18.350

5 Need help with LDA for selecting features 2015-05-28T22:07:35.367

5 What are the most concrete and easiest to understand applications of deep learning in the industry? 2016-03-05T11:18:28.513

5 Machine Learning in Spark 2016-06-21T09:40:45.333

5 Array of categorical variables vs one-hot encoding 2017-05-23T22:33:11.657

5 Dealing with population instability 2018-02-19T17:20:47.160

5 SGDClassifier fit and partial_fit functions 2018-04-11T15:38:16.330

4 Is there a replacement for small p-values in big data? 2014-05-15T00:26:11.387

4 Amazon S3 vs Google Drive 2014-06-14T23:52:10.490

4 How to measure execution time on distributed system 2014-06-17T05:55:04.710

4 Pig script code error? 2014-07-24T06:26:07.290

4 Database for a trie, or other appropriate structure for recommendation engine 2014-08-07T22:30:52.913

4 Distributed Scalable Decision Trees 2014-10-20T22:22:09.660

4 Anomaly detection in multiple parameters 2014-11-02T07:20:32.603

4 How to set up multi cluster spark without hadoop on Google Compute engine 2014-12-07T16:31:57.913

4 How to apply AdaBoost to more "complex" (non-binary) classifications/data fitting? 2014-12-26T06:53:29.670

4 Can we access HDFS file system and YARN scheduler in Apache Spark? 2015-01-30T18:55:46.173

4 Machine Learning on financial big data 2015-02-11T10:48:51.903

4 Learning resources for data science to win political campaigns? 2015-04-03T03:07:50.493

4 What types of features are used in a large-scale click-through rate prediction problem? 2015-04-30T12:26:47.637

4 Reference about social network data-mining 2015-05-01T15:19:17.943

4 How Mllib in Spark select variables in logistic regression 2015-05-04T13:26:04.767

4 NoSQL engine/service recommendation for geolocation data 2015-05-05T14:37:25.377

4 How to explain decision tree algortihm in layman's terms? 2015-08-11T02:15:24.130

4 Tools to perform SQL analytics on 350TB of csv data 2016-01-07T02:33:51.253

4 Simple Explanation of Apache Flume 2016-01-11T12:46:06.030

4 Fixed-radius range search in non-Euclidean space 2017-02-16T12:18:27.260

4 Is MLlib compulsory to work with distributed data? 2017-10-11T22:45:47.400

4 How to get Big Data Sets? 2018-07-06T17:25:18.523

4 Why imbalanced data-set will bias the prediction model towards the more common class? 2018-09-09T03:41:29.727

4 Pandas vs Linux Datascience 2018-09-11T21:18:12.640

4 Merging dataframes in Pandas is taking a surprisingly long time 2019-01-24T04:59:16.090

4 How can one quickly look up people from a large database? 2019-04-19T10:16:45.770

3 HBase connector - Thrift or REST 2014-06-10T06:19:46.510

3 Data preparation and machine learning algorithm for click prediction 2014-06-30T12:05:38.597

3 Pig latin code error 2014-07-24T06:34:50.083

3 SAP HANA vs Exasol 2014-09-02T08:47:38.737

3 Prerequisites for Data Science 2014-09-23T03:34:35.750

3 What is "data science"? 2014-12-06T06:53:14.617

3 What are the differences between Apache Spark and Apache Flink? 2015-01-28T17:40:32.450

3 What is the best Big-Data framework for stream processing? 2015-01-29T07:32:19.207

3 sk-learn - ValueError: array is too big. 2015-03-23T14:00:02.720

3 Application of Control Theory in Data Science 2015-05-16T18:09:15.630

3 Data produced as an output to Dumbo API of Python not getting distributed to all the nodes of cluster 2015-06-27T06:34:46.957