Tag: apache-spark

10 Issue with IPython/Jupyter on Spark (Unrecognized alias) 2015-07-23T03:45:36.867

9 How to calculate the mean of a dataframe column and find the top 10% 2015-07-22T14:16:22.823

9 Merging multiple data frames row-wise in PySpark 2016-04-22T04:27:45.507

9 Spark ALS: recommending for new users 2016-10-24T21:13:33.707

7 Spark, optimally splitting a single RDD into two 2015-05-01T20:32:51.900

7 Server log analysis using machine learning 2015-11-27T18:11:03.323

6 Using Apache Spark to do ML. Keep getting serializing errors 2014-07-25T21:03:44.663

6 How to run a pyspark application in windows 8 command prompt 2015-06-21T17:31:05.457

6 How to convert categorical data to numerical data in Pyspark 2015-06-29T22:55:28.100

6 SPARK Mllib: Multiclass logistic regression, how to get the probabilities of all classes rather than the top one? 2015-12-17T10:52:10.013

6 Extracting individual emails from an email thread 2017-06-01T13:02:23.683

5 Local Development for Apache Spark 2015-02-15T04:51:21.167

5 Why does logistic regression in Spark and R return different models for the same data? 2015-05-07T13:23:47.440

5 Item-Item similarity based on text 2015-07-28T16:15:43.783

5 Random Forest Regression. How to represent really long list of categories for processing 2015-12-14T16:58:41.163

5 SPARK, ML: Naive Bayes classifier often assigns 1 as probability prediction 2015-12-16T14:55:27.443

5 Reading CSVs with new lines in fields with Spark 2016-07-11T21:02:40.633

5 Calculate cosine similarity in Apache Spark 2016-08-10T05:43:41.613

5 Understanding how distributed PCA works 2017-04-19T08:58:18.707

4 Choosing between Storm+Trident-ML, Storm+SAMOA or Spark Streaming+MLlib 2015-03-30T04:35:58.667

4 Performance profiling and tuning in Apache Spark 2015-05-07T20:08:05.440

4 Can theano work on mapreduce or on spark 2015-07-09T21:29:17.050

4 Scan-based operations Apache Spark 2015-10-12T15:23:01.260

4 Distributed k-means in Spark 2016-02-10T22:53:49.620

4 When does cache get expired for a RDD in pyspark? 2016-05-10T12:38:18.240

4 How to start prediction from dataset? 2016-06-09T00:02:39.277

4 Machine Learning in Spark 2016-06-21T09:40:45.333

4 Replace all numeric values in a pyspark dataframe by a constant value 2016-10-19T23:22:22.527

4 RDD of gziped files to "uncompressed" Dataframe 2016-11-10T23:50:52.223

4 Using Spark for finding similar users to a user? 2017-07-04T12:35:41.023

4 Saving Large Spark ML Pipeline to HDFS 2018-01-08T16:19:33.187

3 How Mllib in Spark select variables in logistic regression 2015-05-04T13:26:04.767

3 Which Spark MLlib regression algorithm is suitable for numeric predictions based on non-numeric features? 2015-11-27T02:54:52.780

3 Sampling with replacement, specify the probabilities 2015-12-18T16:19:00.960

3 How to select particular column in Spark(pyspark)? 2016-01-03T02:10:10.643

3 Why is Spark's LinearRegressionWithSGD very slow locally? 2016-02-28T17:25:28.147

3 Algorithm Suggestion For a Specific Problem 2016-04-12T12:56:51.450

3 Unbalanced class: class_weight for ML algorithms in Spark MLLib 2016-12-07T00:08:48.120

3 Order SparseVectors by the closest distance to given SparseVector 2017-03-03T12:45:02.607

3 Deploying models on bigdata platforms like Hadoop and Spark 2017-03-09T12:32:53.170

3 Clustering a very large number of very small clusters with most data unrelated 2017-06-12T16:40:59.607

3 What are the tools to speed up the running time of machine learning algorithms? 2018-02-28T16:55:43.977

2 Scalable open source machine learning library written in python 2015-07-09T20:38:22.933

2 What makes a graph algorithm a good candidate for concurrency? 2015-07-28T22:14:31.177

2 How to determine Nonnegativity in Matrix Factorization? 2015-12-10T20:22:54.097

2 ARIMAX with spark-timeseries 2016-01-20T19:03:33.203

2 How to decide the number of trees parameter for Random Forest algorithm in PySpark MLlib? 2016-01-21T22:51:03.573

2 How to predict an approximate weekly/monthly number, when the Unique Daily Visitors for that week/month are already known 2016-01-25T11:03:17.520

2 Use spark_csv inside Jupyter and using Python 2016-01-25T13:57:24.367

2 Spark ALS-WR giving the same recommended items for all users 2016-02-10T14:50:49.610

2 Which is the most appropiate algorithm to use with Mlib for predicting prices 2016-02-16T08:50:44.807

2 How to read contents of a CSV file inside zip file using spark (python) 2016-05-05T23:43:27.647

2 How to interpret upper-triangular matrix of cosine similarities 2016-06-20T14:02:37.407

2 solution for in Time/Space Complexity challenge in Recommendation System? 2016-08-08T05:45:31.677

2 value saveAsTextFile is not a member of org.apache.spark.sql.DataFrame 2016-09-02T11:05:02.417

2 Do categorical features always need to be encoded? 2016-09-13T13:15:01.727

2 Task not serializable Error 2016-09-14T12:56:58.240

2 ARIMA(X) Validation 2016-09-14T19:10:31.557

2 Hashing trick with random forest in scala 2016-09-22T08:34:17.947

2 Mahout Spark shell not working 2016-11-02T09:44:46.437

2 Spark 1.6.1 - Determining the number of clusters in a data set 2016-11-21T18:46:06.700

2 Model params tuning 2017-03-07T16:08:36.270

2 ALS in Spark: what loss function is it minimizing? 2017-07-03T15:14:24.617

2 How to setup a home-laptop cluster to 'practice' elasticsearch, hadoop, mesos and spark 2017-07-06T18:02:16.953

2 PySpark dataframe repartition 2018-02-22T10:19:01.260

1 How to convert a SQLContext Dataframe to RDD of vectors in Python? 2015-07-01T21:12:52.027

1 Reference of SVM Using Spark 2015-09-22T19:53:12.657

1 Implicit Training Models in Spark MLlib? 2016-03-23T14:18:50.757

1 What version of spark in latest Cloudera QuickStart VirtualBox? 2016-05-11T16:53:49.887

1 Unknown program 'spark-itemsimilarity' chosen 2016-05-13T18:29:47.550

1 Unable to load NLTK in spark using PySpark 2016-05-18T03:19:58.333

1 Spark Scala alternative Machine Learning Library? 2016-05-27T09:57:50.057

1 Is there a text on Apache Spark that attempts to be as comprehensive as White's Hadoop: The Definitive Guide'? 2016-06-04T11:37:38.327

1 Spark DataFrame courses 2016-06-05T03:01:24.253

1 Spark MLlib recommendation - restaurant/ item similarity - issues/improvement 2016-06-08T12:58:38.227

1 How to multiply a "fat and short" matrix with a "tall and thin" matrix using MapReduce? 2016-07-01T09:14:41.140

1 How to construct the document-topic matrix using the word-topic and topic-word matrix calculated using Latent Dirichlet Allocation? 2016-07-15T17:46:44.880

1 Loading and querying a Spark machine learning model outside of Spark 2016-07-27T14:21:52.663

1 Market Basket Analysis - Data Modelling 2016-08-29T12:49:26.950

1 Scalable training/updating of many small LSTM models 2016-08-31T11:15:32.410

1 Why Logistic regression into Spark Mllib does not use Maximum likelihood estimation? 2016-09-04T19:01:31.330

1 PrefixSpan model input RDD format 2016-09-26T23:35:01.490

1 spark item similarity recommendation 2016-11-01T09:20:22.320

1 Spark MLLib - how to re-use TF-IDF model 2016-11-01T19:07:37.377

1 Dataframe request with groupBy 2016-11-23T21:01:01.640

1 Filtering outliers in Apache Spark based on calculations of previous values 2016-12-07T06:20:35.227

1 Supervised Recommendation System trained on labeled phrase segments 2016-12-18T15:42:43.527

1 Spark Deeplearning4j Training Problem 2017-01-10T07:37:05.683

1 Spark SQL Pivot CrossTab functionality 2017-02-23T18:30:22.867

1 HTML Words Remover? 2017-03-14T19:00:23.950

1 Issue with Spark SVD 2017-03-15T05:37:14.407

1 Is there a way to find the weights of every feature in spark ml model? 2017-06-22T12:15:25.957

1 How to improve naive Bayes multiclass classification accuracy? 2017-06-27T07:08:55.867

1 Plot RDD data using a pyspark dataframe from csv file 2017-06-28T08:46:51.793

1 Finding lookalike for large number of users 2017-08-04T07:36:29.367

1 How to adapt the LBFGS algorithm to accept a different data input parameter in Spark? 2017-08-06T08:11:00.520

1 Creating a user-based recommendation engine? 2017-08-22T12:35:58.567

1 Unable to use the Python Data Frame method "iloc" on a Data Frame created in pyspark's SQLContext 2017-09-06T06:13:58.977

1 Not enough replicas available for query at consistency all (2 required but only 1 alive) 2017-11-30T15:29:35.163