How to download and combine S&P 500 stock prices?



Updated: I want to download historical S&P 500 stock prices and combined them into one big file, having only one column of date referencing all stocks. Here is what I have do so far.

dateStart = {2009, 1, 1};
dateEnd = {2018, 8, 1};
period = "Day";(*interval could be "Week","Day","Week","Month","Year"*)
snpstocks = FinancialData["SP500", "Members"];
snpprices = FinancialData[snpstocks, {dateStart, dateEnd, period}];

By doing this, I have all the historical stock prices as an individual time series. Series are of different length. I am trying to find union of all these series keeping first column as date reference for all series. If any series is starting latter than my start date, it will have missing values for earlier dates. Similarly, if any series is missing for latter dates, it will be missing in combined file.

As always, any help is greatly appreciated.


Posted 2014-11-11T01:03:38.380

Reputation: 2 167

Maybe use quandl ? See my answer here

– faysou – 2014-11-11T07:43:17.157

@ faysou, thanks for your comment. I have seen your post. Its pretty cool. I put this question to have more options in downloading data using built-in Mma function. At the same time, it also helps to learn how to merge unequal time series data. – ramesh – 2014-11-11T15:04:04.973

@rka I don't know what you are using this for but keep in mind that the members of the S&P500 have changed over time. So I'm not certain if you're going to get what you think you have gotten at the end of this exercise. – Edmund – 2014-11-11T16:12:09.970

@ Edmund, you are absolutely right. The members of S&P500 keep on changing. But this is an example. If we can make it work, we can change the symbols or even add more symbols and have all prices in one file. Then we can analyze stocks using other powerful Mma functions. – ramesh – 2014-11-11T18:26:55.087

@ramesh, ironically, my concern coincided with yours but I never thought of to deal with S&P data. I cannot download it with your code to check but some useful methods you may find in the question. If you will make it successfully, you may post your anwer here - it would be nice to read.

– garej – 2016-01-25T18:05:17.597

@ garej, I have done my work but the code is not neat. I will try to post later. Thanks – ramesh – 2016-01-26T19:37:18.143

@ramesh, that is fine: note that if you make a space between @ and nickmane in comment or post, you adressee will not get any notification. they should be stick together. I came here just occasionally, not knowing that you post something... – garej – 2016-01-26T20:54:41.677



From the question formulation I am not sure what is the desired end result: a time series or a table. It seems to be the latter but I give solutions for both.

I am using a sample of the stocks for clarity.

stdate = "04/21/1982";
enddate = "10/31/2014";

rSP = {"ADP", "ALL", "CNP", "ED", "EMR", "EXPD", "FB", "FLIR", "HAR", 
   "NEE", "OKE", "PHM", "PLD", "PX", "WAT", "XYL"};

Read the stocks time series:

mystock = 
  Table[TimeSeries[FinancialData[rSP[[i]], {stdate, enddate}]], {i, 

Merge into a single time series object

Get the time paths of each time series and covert each path into a list of date->value rules:

dateValRules = 
    Append[Rule @@@ mystock[[i]]["Path"], _?NumberQ -> 0]], {i, 

Create the full set of dates:

dr = DateRange[stdate, enddate];
drt = AbsoluteTime /@ dr;

Create the combined time series object:

combTS = TimeSeries[Transpose@Map[drt /. # &, dateValRules], {drt}];


DateListPlot[combTS, PlotRange -> All, Joined -> False, 
 PlotTheme -> "Detailed"]

enter image description here

Table of dates vs stock values

The code below makes table with rows corresponding to the dates and columns corresponding to the stock IDs. Put the stocks data into a large list of triplets {id, date, value}:

allStock = 
  Join @@ Table[
    Flatten /@ Thread[{rSP[[i]], mystock[[i]]["Path"]}], {i, 

Next we convert into a sparse matrix. For this we corresponding mapping to indices.

dr = DateRange[stdate, enddate];
stockNameToIndexRules = Dispatch@Thread[rSP -> Range[Length[rSP]]];
dateToIndexRules = 
  Dispatch@Thread[(AbsoluteTime /@ dr) -> Range[Length[dr]]];

Verify that all dates are indexed:

Complement[allStock[[All, 2]], Normal[dateToIndexRules][[All, 1]]]

(* Out[444]= {} *)

Make the sparse matrix:

arules = Map[Most[#] -> Last[#] &, 
   allStock /. stockNameToIndexRules /. dateToIndexRules];
smat = Transpose@
               {Length[Normal@stockNameToIndexRules], Length[Normal@dateToIndexRules]}, 

(Note that I put "NA" as the implicit value.)

Tabulate a sample of the sparse matrix:

TableForm[Normal[smat[[2000 ;; 2030, All]]], 
 TableHeadings -> {DateString[#, {"Day", ".", "Month", ".", "Year"}] & /@ Normal[dateToIndexRules][[All, 1]], 
   Normal[stockNameToIndexRules][[All, 1]]}]

enter image description here

We can re-create the sparse matrix with the implicit value being 0 and plot the row sums:

smat0 = 
   arules, {Length[Normal@stockNameToIndexRules], Length[Normal@dateToIndexRules]}]

(* Out[448]= SparseArray[< 95387 >, {11882, 16}] *)

DateListPlot[Transpose[{dr, Total /@ smat0}], PlotTheme -> "Detailed", Joined -> False]

enter image description here

Anton Antonov

Posted 2014-11-11T01:03:38.380

Reputation: 32 565

Might there be any benefit to using Dataset[] here? – J. M.'s ennui – 2016-05-18T00:15:57.093

@J.M. I am almost embarrassed to say that did not think about Dataset, but I did think about solutions using R's xtabs, or melt\cast. Another point is that for me with a sparse matrix several numerical and structural manipulations are fairly easy to do.

– Anton Antonov – 2016-05-18T01:06:58.603

@Anton, thank you for taking your time to answer this question. I have been using R for this purpose and it is so easy. When it come to handle time series data in Ma, it is so hard at least for me. This provides me a very good starting point. Thanks a lot, really happy! – ramesh – 2016-05-18T17:04:21.677

@ramesh You are welcome. The thing about R is that there are a lot of cookbooks or question-answer posts that can help you when stuck. Unfortunately, many R libraries follow their own logic and that makes much harder the learning process of using R. – Anton Antonov – 2016-05-18T17:17:47.433


snpstocks = FinancialData["SP500", "Members"];
snpprices = FinancialData[snpstocks[[1 ;; 10]], {{2010, 1, 1}, {2019, 4, 1}}]; (*Selecting only 10 symbols but some are missing*)
snppricesfinal =TimeSeriesThread[# &, snpprices, ResamplingMethod ->Missing[]];
DateListPlot[snppricesfinal, PlotRange -> All]

enter image description here


Posted 2014-11-11T01:03:38.380

Reputation: 2 167