Tag: apache-hadoop

37 Do I need to learn Hadoop to be a Data Scientist? 2014-06-10T06:20:20.817

31 What are the use cases for Apache Spark vs Hadoop 2014-06-17T20:48:35.267

15 What is the difference between Hadoop and noSQL 2014-05-14T10:44:58.933

12 Tradeoffs between Storm and Hadoop (MapReduce) 2014-06-01T10:25:51.163

12 Does Amazon RedShift replace Hadoop for ~1XTB data? 2014-06-11T04:24:04.183

11 Can map-reduce algorithms written for MongoDB be ported to Hadoop later? 2014-05-18T12:03:21.650

10 What are R's memory constraints? 2014-05-14T17:48:21.240

8 Cascaded Error in Apache Storm 2014-06-01T12:51:25.040

8 Data science and MapReduce programming model of Hadoop 2014-07-28T16:17:49.823

8 Good books for Hadoop, Spark, and Spark Streaming 2014-12-05T05:50:29.903

7 Linear Regression in R Mapreduce(RHadoop) 2014-07-03T10:49:50.993

7 Lambda Architecture - How to implement the Merge Layer / Query Layer 2015-01-02T20:03:59.950

6 Processing data stored in Redshift 2014-11-12T17:27:57.850

6 Is there a benefit to using hadoop with only one node? 2015-10-11T01:05:04.113

6 Improve k-means accuracy 2016-02-02T01:42:38.053

6 How to make k-means distributed? 2016-02-06T02:38:20.750

5 Storing Sensor Data for Analysis of the Office 2015-07-03T08:52:36.247

5 Can all statistical algorithms be parallelized using a Map Reduce framework 2015-08-26T20:44:06.540

5 Skills that school doesn't teach you 2016-08-17T19:08:17.143

5 Saving Large Spark ML Pipeline to HDFS 2018-01-08T16:19:33.187

4 Pig script code error? 2014-07-24T06:26:07.290

4 Hadoop for grid computing 2014-09-04T18:13:57.343

4 Hive: How to calculate the Kendall coefficient of correlation of a pair of a numeric columns in the group? 2014-12-01T14:52:31.827

4 How to set up multi cluster spark without hadoop on Google Compute engine 2014-12-07T16:31:57.913

4 Can we access HDFS file system and YARN scheduler in Apache Spark? 2015-01-30T18:55:46.173

4 Is there any point in learning Hadoop in 2018? 2018-12-23T15:19:13.280

3 HBase connector - Thrift or REST 2014-06-10T06:19:46.510

3 Cloudera QuickStart VM Error 2014-07-09T17:51:40.583

3 Hadoop Resource Manager Won't Start 2014-07-19T20:51:58.527

3 Pig latin code error 2014-07-24T06:34:50.083

3 java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream 2014-12-09T00:58:03.057

3 Data produced as an output to Dumbo API of Python not getting distributed to all the nodes of cluster 2015-06-27T06:34:46.957

3 Is our data "Big Data" (Startup) 2015-07-28T14:55:13.340

3 Deploying models on bigdata platforms like Hadoop and Spark 2017-03-09T12:32:53.170

3 How to setup a home-laptop cluster to 'practice' elasticsearch, hadoop, mesos and spark 2017-07-06T18:02:16.953

3 Communication between name node, data node and client in hadoop by analysing Packet Capturing 2018-05-20T21:34:09.857

3 BERT in production 2020-02-27T17:16:12.383

3 What is the main difference between Hadoop and Spark? 2020-09-05T11:28:44.113

2 Difference Between Hadoop Mapreduce(Java) and RHadoop mapreduce 2014-06-27T12:03:53.357

2 Cannot make user directory on a new CDH5 installation (Hadoop) 2014-07-03T14:18:14.387

2 Pig Rank function not generating rank in output 2014-08-08T17:32:48.377

2 Using Shark with Apache Spark 2014-08-26T21:37:12.107

2 Differences in scoring from PMML model on different platforms 2014-10-17T13:58:39.353

2 Error when using MAX in Apache Pig (Hadoop) 2015-02-09T00:18:46.427

2 How append works in hdfs? Where the newly created instance of file is placed? 2015-03-05T18:28:43.367

2 Can Hadoop be beneficial when data is in database tables and not in a file system 2015-08-26T20:49:40.957

2 How to read contents of a CSV file inside zip file using spark (python) 2016-05-05T23:43:27.647

2 unable to parse XML in pig 2016-05-10T16:35:06.310

2 spark item similarity recommendation 2016-11-01T09:20:22.320

2 Mahout Spark shell not working 2016-11-02T09:44:46.437

2 How many people can use a single Hadoop cluster at one time? 2016-11-13T20:25:56.503

2 K-means clustering on big data stored on multiple nodes on HDFS 2017-02-10T06:02:39.710

2 Ingestion of periodic REST API Calls into Hadoop 2017-03-07T14:06:42.100

2 Are jobs the only way out for data scientists? 2017-05-09T08:57:50.920

2 Why has Hadoop failed to become popular? 2017-05-29T18:21:27.990

2 Why does my master node get heap memory full for inbuilt SVD API in Apache Spark during calculation of inverse of a square matrix? 2017-11-15T11:00:26.433

2 Yarn service parameter for pseudodistributed 2017-12-31T11:39:33.533

2 Best practice for developing using Spark 2018-02-09T12:59:34.347

2 does storing file in hdfs parallelize it for Spark? 2018-04-28T18:51:11.863

2 Are parquet files compressed? 2019-12-22T05:00:41.837

1 Masters thesis topics in big data 2014-10-19T11:02:44.397

1 Hadoop/Pig Aggregate Data 2014-12-23T19:46:57.267

1 Can we use HDFS and big data Analytics for processing huge log files being processed through some application on some central server? 2015-06-18T07:09:08.140

1 How to use REST API to execute Map-Reduce Task? 2015-07-07T09:23:45.250

1 how to disable query from beeline results 2015-11-03T13:09:48.553

1 Pig is not able to read the complete data 2015-12-17T07:58:13.057

1 A Simple Explanation of ZooKeeper in Hadoop 2016-01-11T12:19:27.103

1 freebcp getting stalled for huge data 2016-02-11T13:36:17.973

1 Is there a text on Apache Spark that attempts to be as comprehensive as White's Hadoop: The Definitive Guide'? 2016-06-04T11:37:38.327

1 Suggestions on what patterns/analysis to derive from Airlines Big Data 2016-06-22T17:26:53.710

1 How to multiply a "fat and short" matrix with a "tall and thin" matrix using MapReduce? 2016-07-01T09:14:41.140

1 How to Scaling Out Artifical Neural Networks? 2016-10-31T07:33:34.623

1 Can R + Hadoop overcome R's memory constraints in any case? 2017-03-24T20:19:21.747

1 what ETL technique should i use for text documents using Hadoop? 2017-04-16T17:40:16.257

1 Hadoop Cluster Capacity Planning 2017-08-15T10:20:16.817

1 Hadoop and input informations divided in splits 2017-12-26T18:42:37.020

1 Custom Writable Serialization in Hadoop 2018-01-03T10:28:22.260

1 Hadoop - checksum while reading file from client 2018-01-05T12:54:06.900

1 Hdfs Data Balance on Cluster 2018-01-06T12:17:45.160

1 Tasks of a Yarn Process in Hadoop 2018-01-06T12:58:10.217

1 Change log level of yarn? 2018-05-19T19:07:07.987

1 Unable to open application master UI in spark1.6.1 in cluster mode 2018-08-28T02:22:02.673

1 Remote sensing image data storage in hadoop hdfs 2018-09-19T07:36:47.567

1 Accumulators in Spark (PySpark) without global variables? 2018-10-29T15:46:15.243

1 Datanode not starting on Slave Nodes for Apache Hadoop+Spark Setup 2019-03-31T11:33:53.407

1 Mapreduce jobs not working in hive 2019-07-31T16:09:49.667

1 How to create tensors in spark? 2019-08-06T23:46:29.600

1 How to push tasks to remote Hadoop/Spark cluster 2019-12-20T14:24:52.087

1 Loading file into and out of HDFS via system call/cmd line vs using libhdfs 2020-03-27T16:48:36.253

1 cannot access hive from spark 2020-03-30T22:44:33.393

0 Can hadoop with Spark be configured with 1GB RAM 2014-12-07T04:40:53.677

0 Questions on "Active Archive" 2015-01-24T15:15:27.777

0 Extract company names/job titles from free text 2015-02-09T17:28:54.390

0 When it is time to use Hadoop? 2015-02-17T19:10:05.547

0 Accessing directory of small files as one file 2015-07-28T12:40:39.007

0 Yarn timeline recovery not enabled error upgrading via ambari 2015-10-25T23:37:21.817

0 Predictive Analytics on distributed systems vs standalone system 2016-07-01T18:51:04.507

0 Spark algorithm to make a link analysis 2016-08-26T13:13:06.470

0 Machine Learning model to find items that are frequently bought together using Hadoop Spark 2016-08-29T14:15:22.463