Tag: bigdata

67 How big is big data? 2014-05-14T03:56:20.963

39 Is the R language suitable for Big Data 2014-05-14T11:15:40.907

33 Do I need to learn Hadoop to be a Data Scientist? 2014-06-10T06:20:20.817

32 Data Science in C (or C++) 2015-03-20T14:56:23.420

30 How to deal with version control of large amounts of (binary) data 2015-02-13T10:09:25.177

22 Data Science Project Ideas 2014-07-25T18:36:31.340

20 How to do SVD and PCA with big data? 2014-09-25T08:40:59.467

17 Use liblinear on big data for semantic analysis 2014-05-14T01:57:56.880

14 Uses of NoSQL database in data science 2014-07-21T13:41:13.427

13 When a relational database has better performance than a no relational 2014-05-17T04:53:03.913

13 Big data case study or use case example 2014-06-11T06:07:45.767

13 Looking for example infrastructure stacks/workflows/pipelines 2014-06-17T10:37:22.987

12 What is an 'old name' of data scientist? 2015-02-28T22:10:58.473

11 When are p-values deceptive? 2014-05-14T22:12:37.203

10 Tradeoffs between Storm and Hadoop (MapReduce) 2014-06-01T10:25:51.163

10 Is Python suitable for big data 2014-07-18T22:34:48.080

9 How does a query into a huge database return with negligible latency? 2014-05-15T11:22:27.293

9 Why is it hard to grant efficiency while using libraries? 2014-05-18T14:02:51.350

9 Preference Matching Algorithm 2014-06-18T22:10:58.497

9 Working with HPC clusters 2014-07-08T13:45:07.583

9 Scalable Outlier/Anomaly Detection 2014-10-17T10:47:13.197

9 Which is faster: PostgreSQL vs MongoDB on large JSON datasets? 2015-06-03T20:29:40.490

9 Can we take of benefit of using transfer learning while training a word2vec models? 2016-03-10T21:01:20.697

8 Handling a regularly increasing feature set 2014-06-30T09:43:01.940

8 Is FPGrowth still considered "state of the art" in frequent pattern mining? 2014-07-12T17:25:52.907

8 How do various statistical techniques (regression, PCA, etc) scale with sample size and dimension? 2014-08-05T18:36:12.753

8 Improve the speed of t-sne implementation in python for huge data 2016-02-06T14:19:10.243

7 Human activity recognition using smartphone data set problem 2014-05-27T10:41:33.220

7 Cascaded Error in Apache Storm 2014-06-01T12:51:25.040

7 Original Meaning of "Intelligence" in "Business Intelligence" 2015-09-05T16:42:25.473

7 Avoid reloading DataFrame between different python kernels 2017-01-17T23:08:13.620

6 Which Big Data technology stack is most suitable for processing tweets, extracting/expanding URLs and pushing (only) new links into 3rd party system? 2014-05-15T00:39:33.433

6 Is Data Science just a trend or is a long term concept? 2014-05-18T19:46:44.653

6 How to compare experiments run over different infrastructures 2014-06-15T00:00:51.657

6 Filtering spam from retrieved data 2014-06-15T15:11:29.970

6 Lambda Architecture - How to implement the Merge Layer / Query Layer 2015-01-02T20:03:59.950

6 What's an efficient way to compare and group millions of store names? 2015-08-20T20:46:07.740

6 Classifier and Technique to use for large number of categories 2015-09-26T11:58:37.963

6 Machine Learning Best Practices for Big Dataset 2016-09-07T22:40:00.723

5 How to measure execution time on distributed system 2014-06-17T05:55:04.710

5 Efficient solution of fmincg without providing gradient? 2014-06-21T04:59:06.620

5 Looking for a strong Phd Topic in Predictive Analytics in the context of Big Data 2014-09-25T20:18:46.880

5 Random Forests with Big Data - number of trees v. number of observations 2015-11-02T15:42:45.377

5 Database options for JSON storage, queried with Apache Drill 2015-12-31T19:18:24.280

5 What are the most concrete and easiest to understand applications of deep learning in the industry? 2016-03-05T11:18:28.513

5 How to deal with large training data? 2016-11-28T05:47:53.123

5 Understanding how distributed PCA works 2017-04-19T08:58:18.707

5 Array of categorical variables vs one-hot encoding 2017-05-23T22:33:11.657

5 Opening a 20GB file for analysis with pandas 2018-02-13T14:03:39.623

4 HBase connector - Thrift or REST 2014-06-10T06:19:46.510

4 Distributed Scalable Decision Trees 2014-10-20T22:22:09.660

4 What technologies are fastest at performing joins on large datasets? 2014-11-09T14:11:18.350

4 How to set up multi cluster spark without hadoop on Google Compute engine 2014-12-07T16:31:57.913

4 How to apply AdaBoost to more "complex" (non-binary) classifications/data fitting? 2014-12-26T06:53:29.670

4 Can we access HDFS file system and YARN scheduler in Apache Spark? 2015-01-30T18:55:46.173

4 Machine Learning on financial big data 2015-02-11T10:48:51.903

4 What types of features are used in a large-scale click-through rate prediction problem? 2015-04-30T12:26:47.637

4 Reference about social network data-mining 2015-05-01T15:19:17.943

4 NoSQL engine/service recommendation for geolocation data 2015-05-05T14:37:25.377

4 Need help with LDA for selecting features 2015-05-28T22:07:35.367

4 decision trees on mix of categorical and real value parameters 2016-04-19T12:37:05.593

4 Machine Learning in Spark 2016-06-21T09:40:45.333

4 Training data from different sources 2016-08-14T21:01:09.887

3 Amazon S3 vs Google Drive 2014-06-14T23:52:10.490

3 Prerequisites for Data Science 2014-09-23T03:34:35.750

3 Anomaly detection in multiple parameters 2014-11-02T07:20:32.603

3 What is "data science"? 2014-12-06T06:53:14.617

3 What are the differences between Apache Spark and Apache Flink? 2015-01-28T17:40:32.450

3 What is the best Big-Data framework for stream processing? 2015-01-29T07:32:19.207

3 How to detect overfitting of a stock screener 2015-03-02T23:02:45.583

3 Learning resources for data science to win political campaigns? 2015-04-03T03:07:50.493

3 How Mllib in Spark select variables in logistic regression 2015-05-04T13:26:04.767

3 Application of Control Theory in Data Science 2015-05-16T18:09:15.630

3 Data produced as an output to Dumbo API of Python not getting distributed to all the nodes of cluster 2015-06-27T06:34:46.957

3 Is our data "Big Data" (Startup) 2015-07-28T14:55:13.340

3 How to explain decision tree algortihm in layman's terms? 2015-08-11T02:15:24.130

3 Question on reservoir sampling 2015-09-11T18:02:44.743

3 Classifying transactions as malicious 2015-09-15T03:15:28.163

3 How to predict the duration of burst given several series 2015-10-30T02:49:08.590

3 Tools to perform SQL analytics on 350TB of csv data 2016-01-07T02:33:51.253

3 How to define user churn 2016-01-25T10:33:46.983

3 Sentiment Analysis of Movie Reviews using Python 2016-04-16T03:52:54.283

3 Fixed-radius range search in non-Euclidean space 2017-02-16T12:18:27.260

3 Deploying models on bigdata platforms like Hadoop and Spark 2017-03-09T12:32:53.170

3 Is MLlib compulsory to work with distributed data? 2017-10-11T22:45:47.400

2 Data preparation and machine learning algorithm for click prediction 2014-06-30T12:05:38.597

2 Cannot make user directory on a new CDH5 installation (Hadoop) 2014-07-03T14:18:14.387

2 Pig script code error? 2014-07-24T06:26:07.290

2 Pig latin code error 2014-07-24T06:34:50.083

2 Database for a trie, or other appropriate structure for recommendation engine 2014-08-07T22:30:52.913

2 Pig Rank function not generating rank in output 2014-08-08T17:32:48.377

2 SAP HANA vs Exasol 2014-09-02T08:47:38.737

2 Creating Bag of words 2014-09-08T19:33:00.253

2 General approahces for grouping a continuous variable based on text data? 2014-09-13T17:13:23.373

2 Could someone please offer me some guidance on some kind of particular, SPECIFIC project that I could attemp, to "get my feet wet, so to speak" 2015-01-22T00:52:51.710

2 Which Big-Data Frameworks have most simple interfaces? 2015-01-30T21:04:30.173

2 How append works in hdfs? Where the newly created instance of file is placed? 2015-03-05T18:28:43.367

2 sk-learn - ValueError: array is too big. 2015-03-23T14:00:02.720

2 Optimizing Weka for large data sets 2015-04-22T12:32:10.243