How to get data from DB into dataset with keys

4

I'm a Mathematica newbie and I have the following problem: I connect to a Oracle DB and read out a set of variables, further on I want to histogram some of the variables. A Dataset seems to be the right tool, but I produce only datasets without a key via:

DBtest = SQLSelect[db, "TABLE1", {"ID", "VAR_1"}]

Dataset[DBtest]

Result is a Dataset with one column containing ID and the other containing VAR_1 (sorry for the generic names - I can't copy my real work, since it contains sensitive data).

Then I can't use the DBtest[Histogram, "key"] functionality, since there is no "key" available.

Is there anyway to add keys after filling the dataset from the DB? Or shouldn't I use the Dataset class?

Lea

Posted 2014-10-29T11:02:40.047

Reputation: 900

2

You have to thread the keys "ID" and "VAR_1", like RunnyKine's answer here http://mathematica.stackexchange.com/questions/59684/incorrect-result-with-semanticimport/59687#59687 ... Dataset[AssociationThread[First @ data, #] & /@ Rest @ data]

– alancalvitti – 2014-10-29T19:07:47.773

Perfect! Now it works, many thanks! – Lea – 2014-10-30T12:22:04.807

Answers

1

For discussion purposes, let's use some data from Mathematica's example databases:

$connection = OpenSQLConnection["publisher"];

$dbTest = SQLSelect[$connection, "SALESDETAILS", {"TITLE_ID", "QTY_ORDERED"}]

(* {{"PY2002", 75}, {"PY2002", 10}, <<17>> , {"CK4017", 20}, {"CK4016", 20}} *)

We can create a histogram directly from this data without wrapping it in a Dataset:

Histogram[$dbTest[[All, 2]]]

histogram screenshot

As observed in the question, a Dataset with this data carries no column names since the original result set is comprised of lists instead of associations:

$dataset = Dataset[$dbTest]

dataset screenshot

However, the columns are still accessible by their indices:

$dataset[Histogram, 2]

histogram screenshot

When we retrieve the original SQL data, we have the option of including a header row by means of the "ShowColumnHeadings" option. We can use the headers from this row to create associations that provide names for our columns:

$dataset2 =
  SQLSelect[$connection, "SALESDETAILS", "ShowColumnHeadings" -> True] //
    Inner[#2->#1&, Rest@#, First@#, Association]& // Dataset

dataset screenshot

Now we can produce histograms using the column names:

$dataset2[Histogram, "QTY_ORDERED"]

histogram screenshot

WReach

Posted 2014-10-29T11:02:40.047

Reputation: 62 787

This solution works as well. And the definition of $dataset2, including the SQL connection, is quite elegant. The way to produce histograms without using Dataset at all, was new for me. Thanks! – Lea – 2014-11-17T09:24:43.690