I have a bigdata CSV file as a single table of five columns and almost 10 million rows

3

3

I need to examine the bigdata file with a database/DBMS and have used MS Office 365/Access successfully so to do, import and query. I have also tried, many times, to open a MMA 10.4 database link connection to examine the CSV file (and also the .accdb file resulting from success with Access) under Windows 7.

I never had any success using MMA and choosing either SQLite or HSQL engines. OpenSQLConnection[] gets me into and out of a multistep dialog OK. I seem to have achieved a connection, but it's never one that ever satisfies MMA. For example, SQLTableNames and SQLTableInformation (to list column headings) always fail to produce any correct answers.

Do I need to import then export the .CSV file in a format the MMA is happier with during the multi-step dialog? Some file format other than .csv or other than .accdb? Do I need to specify some properties or identify a "catalog" during the OpenSQLConnection[] dialog? TIA for a reply, Lew R

Per Bill, I scraped the header row and first five data rows of the database. See attached screen shot:

ShangraLa test table

Since the way to upload 'live' small .csv and .xlsl files is not obvious to me (maybe any file uploading is not welcome on this site), I will instead ponder how best to provide them. Maybe uploading the files to my website is the best answer.

Link to Google Drive's publicly shared folder containing 3 small files (2 csv, 1 xlsx) and short MMA notebook. Try downloading small test files and short ShangraLa.nb code for testing.

Edit: Here's the mini CSV example as plain text:

Site_Name,Parameter,DateTime_LST,Value,Unit
Shangra-LA,AmbT,1/1/1990 0:00,4.7,degC
Shangra-LA,AmbT,1/1/1990 1:00,4.8,degC
Shangra-LA,AmbT,1/1/1990 2:00,4.9,degC
Shangra-LA,AmbT,1/1/1990 3:00,4,degC
Shangra-LA,AmbT,1/1/1990 4:00,3.3,degC

Lew R

Posted 2017-03-09T22:18:12.013

Reputation: 31

3First thing to try: Hop into Office, scrape the first ten lines of the database and save that into exactly the same kind of file(s). See if Mathematica can happily pull in that tiny database. If so then that points at the size of bigdata. But if Mathematica can't pull in the tiny database then you would be able to post tiny sanitized example files somewhere so someone can grab the copies and try to figure out what you need to tell Mathematica to successfully use the (small) database. – Bill – 2017-03-09T22:29:06.947

Bill, mini database image added, per your suggestion. I ran it as a .csv file – Lew R – 2017-03-10T04:23:10.550

Bill, a scraped, mini, database image added, per your suggestion. I entered it as a .csv file using OpenSQLConnection[ ], specified HSQL, and got a True connection Nothing good happened after that. SQLTableNames gave an empty string { }. SQLTableInformation output (wrong) header information for MMA's sample database 'demo'. I can post a screen shot of the notebook's execution if helpful, with a few comments at the end. – Lew R – 2017-03-10T04:43:17.730

For Bill and Nasser, OK, I will attempt to post three 'live' versions of the mini database, all created from MS Excel 2016, itself loaded with the original 'bigdata' .csv file. Their names will be; MiniLa.csv, ShangraLa.csv, MiniLa.xlsx. This may take me a while, stumbling to upload them. – Lew R – 2017-03-10T06:11:12.220

For Bill and Nasser; The last two red lines added above that begin – Lew R – 2017-03-10T18:09:09.637

For Bill and Nasser; The last two red lines added above that begin 'Link to Google Drive's ... ' are a clickable link that (hopefully) will find an appropriate folder that you can open, find, and download MiniLa and ShangraLa scraped files, suitable for testing. Also included is a short MMA notebook one might wish to use for testing, albeit with a minor change or two. – Lew R – 2017-03-10T18:15:39.540

your example is plenty small enough to post here (AS TEXT). Don't make people go chasing google drive links. – george2079 – 2017-03-10T18:24:12.380

For George, thanks for the text. Only trying to give everyone their option, download, TEXT, screen shot, or whatever. For anyone interested, the dozen line MMA notebook I used for testing is also available at Google drive, IF the link works without password blocking. Lew R – Lew R – 2017-03-10T21:11:36.007

When a novice is trying to get MMA to read non-MMA files I think it is better to have real files, otherwise we don't know if my file exactly equals your file. Next, Google for Mathematica OpenSQLConnection and skipping the Wolfram pages shows years of people not being able to figure out how to get this to work. One person wrote "I had endless problems connecting and getting things to work, but after a lot of back and forward between Wolfram support and myself I got the following to work..." but he is doing something different. Try that Google search. Try Wolfram support if you have that. – Bill – 2017-03-12T02:58:43.153

For Bill, Thanks for your very sage advice re both 'real' files, which I knew Google collaboration could provide (did a G download work?) and MMA OpenSQLConnection issues others had. I was surprised when Wolfram folks approached me--out of the blue--wondering if I had any MMA code projects with problems. I of course replied, yes bigdata/database problems. So we had an interesting dialog for several months that ended when no $$ support materialized--I won't go into details on that issue. Suffice to say that I then examined both MMA and non-MMA ways to deal with bigdata databases – Lew R – 2017-03-13T19:32:47.613

(1)Bill, Continued . . . I looked at both MS (Access and Excel) and MMA resources to handle bigdata, and quickly found I could get MS to work well with .csv files, no problems loading Access and Excel. To date I have never had any similar .csv file success writing MMA code. So why post the very simple real database example (6 lines, per your advice). What did I learn? (A) I cannot write MMA code that opens a very simple monotable database. (B)To follow your suggestions. Read more @(2), posted next,. . . . . . – Lew R – 2017-03-13T20:49:59.933

(2)Bill, Continued. . . So I will do a Google search with key word OpenSQLConnection and read about "years of people not being able . . .!!" Here I suggest a little humor may help. Take a look at all the quotes from the Dutchman Dijkstra (first Turing award winner) about coding and coders. Here is the URL, https://en.wikiquote.org/wiki/Edsger_W._Dijkstra. Bottom line, It's not worth the time and costs to de-mystify MMA 'OpenSQLConnection failed execution' code reports. Hence move on; Examine the bigdata with MS Excel and Access, and hope that someday can do the same with MMA.

– Lew R – 2017-03-13T20:56:59.113

@LewR I've used DatabaseLink` with SQLite databases. I'll see what of that can be transferred to CSV. – b3m2a1 – 2017-03-14T02:27:12.667

Thanks MB1965 for your investigation. Want to acquire a simple .csv test file? Click the link at the end of my initial post (two red lines at end of post) and download the file ShangraLa.csv. from the jump to Google Drive collaboration. – Lew R – 2017-03-15T20:18:04.777

For MB1965, More on how to acquire a simple.csv table file for testing, in case linking to Google Drive a problem. Just manually type, line-by-line in 5 separate columns, the 6 line screen shot. Type it into an MS Excel empty spreadsheet (appears when Excel first opened). Save the result as both .csv and .xls files. Find the 6 line screen shot to copy in my initial post. – Lew R – 2017-03-15T20:43:41.877

For MB1965, Just downloaded and installed SQLite and the SQLite Gui tool. On the SQLite website, one finds the claim that the 'Lite' engine will import and export .csv files (and several other formats as well.) – Lew R – 2017-03-16T17:58:59.540

Answers

3

So more and more I think what you're trying to do is simply infeasible via Mathematica.

The bundled Microsoft Access driver (UCanAccess) won't touch Excel and the old Microsoft Excel driver has potentially been removed. This is, I think, why we get errors when trying to build a JDBC link to the file using either of those drivers.

But one alternative, if, indeed, your data isn't too massive, is to simply use Mathematica to move your data over to SQLite:

I'm going to assume you already have a file, but for my use I'll build some random CSV file:

$dbfile = CreateFile[];

$data = RandomReal[1, {100, 100}];
$keys = ToUpperCase@
     StringJoin@{"key_", 
       Alphabet[][[If[# < 26, {0}, {0, 1}] + 
          IntegerDigits[#, 26]]]} & /@ Range[Length@$data[[1]]];
Export[$dbfile, Prepend[$keys]@Transpose@$data, "CSV"];

Then we simply dump this to a random SQL database:

In[414]:= $sqldb = CreateFile[];
$conn =
  OpenSQLConnection[
   JDBC[
    "SQLite",
    $sqldb
    ]
   ];

In[416]:= $dbdata = Transpose@Import[$dbfile];
$dbtable = FileBaseName@$dbfile;
sqlColFormat[{colName_, data___}] :=

  With[{heads = DeleteDuplicates@Map[Head, {data}]},
   SQLColumn[colName,
    "DataTypeName" -> 
     If[Length@heads > 1, "Expr", ToString@First@heads]
    ]
   ];

In[419]:= SQLCreateTable[$conn, SQLTable[$dbtable],
  sqlColFormat /@ $dbdata
  ];

In[421]:= SQLColumns[$conn, $dbtable] // Take[#, 10] &

Out[421]= {SQLColumn[{"m000068920271", "KEY_A"}, 
  "DataTypeName" -> "REAL", "DataLength" -> 2000000000, 
  "Default" -> Null, "Nullable" -> 0], 
 SQLColumn[{"m000068920271", "KEY_B"}, "DataTypeName" -> "REAL", 
  "DataLength" -> 2000000000, "Default" -> Null, "Nullable" -> 0], 
 SQLColumn[{"m000068920271", "KEY_C"}, "DataTypeName" -> "REAL", 
  "DataLength" -> 2000000000, "Default" -> Null, "Nullable" -> 0], 
 SQLColumn[{"m000068920271", "KEY_D"}, "DataTypeName" -> "REAL", 
  "DataLength" -> 2000000000, "Default" -> Null, "Nullable" -> 0], 
 SQLColumn[{"m000068920271", "KEY_E"}, "DataTypeName" -> "REAL", 
  "DataLength" -> 2000000000, "Default" -> Null, "Nullable" -> 0], 
 SQLColumn[{"m000068920271", "KEY_F"}, "DataTypeName" -> "REAL", 
  "DataLength" -> 2000000000, "Default" -> Null, "Nullable" -> 0], 
 SQLColumn[{"m000068920271", "KEY_G"}, "DataTypeName" -> "REAL", 
  "DataLength" -> 2000000000, "Default" -> Null, "Nullable" -> 0], 
 SQLColumn[{"m000068920271", "KEY_H"}, "DataTypeName" -> "REAL", 
  "DataLength" -> 2000000000, "Default" -> Null, "Nullable" -> 0], 
 SQLColumn[{"m000068920271", "KEY_I"}, "DataTypeName" -> "REAL", 
  "DataLength" -> 2000000000, "Default" -> Null, "Nullable" -> 0], 
 SQLColumn[{"m000068920271", "KEY_J"}, "DataTypeName" -> "REAL", 
  "DataLength" -> 2000000000, "Default" -> Null, "Nullable" -> 0]}

And now you can use all of the standard SQL stuff on this. At one point I wrote some stuff for manipulating SQL databases with DatabaseLink (using an OOP framework I've since abandoned) and if I remember correctly it's not bad but also not totally smooth.

b3m2a1

Posted 2017-03-09T22:18:12.013

Reputation: 42 610

0

The answer is to use MS Office 365 program Access. It imports the .csv file with no difficulty and answers key operational querys. MMA efforts to get a working connection to SQLite that will also import the bigdata file and answer same querys to continue, with minimal priority.

Lew R

Posted 2017-03-09T22:18:12.013

Reputation: 31