What is an alternative name for "Unstructured Data"?

6

I'm writing my thesis at the moment, and for some time - due to a lack of a proper alternative - I've stuck with "unstructured data" for referring to natural, free flowing text, e.g. Wikipedia articles.

This nomenclature has bothered me from the very beginning, since it opens a debate that I don't want to get into. Namely, that "unstructured" implies that natural language lacks structure, which it does not - the most obvious being syntax. It also gives a negative impression, since it is the opposite of "structured", which is accepted as being positive. This is not the focus of my thesis, though the "unstructured" part itself plays an important role.

I completely agree with the writer of this article, but he proposes no alternative except for "rich data", which doesn't cover my point. The point I'm trying to make that the text lacks a traditional database-like (e.g. tabular) structure of the data, with every piece of data having a clear data type and semantics that is easy to interpret using computer programs. Of course I'd like to condense this definition into a term, but so far I've been unsuccessful coming up with, or discovering an acceptable taxonomy in literature.

Benjamin B.

Posted 2015-06-23T12:59:13.607

Reputation: 235

1In the context of text mining I've seen "annotated data" vs. "unannotated data" or "raw data". – Suzana – 2015-06-23T13:55:09.790

@Suzana_K, that already seems a bit better. Thanks for the suggestion! More suggestions are very welcome. – Benjamin B. – 2015-06-23T13:56:38.377

Answers

4

It is a bad idea to counterpose "unstructure data" to, say, tabular data (as in "non-tabular data"), as you will have to elliminate other alternatives as well (e.g., "non-tabular and non-graph and ... data"). "Plain text" (-- my choice) or "raw text" or "raw data" sound fine.

victor

Posted 2015-06-23T12:59:13.607

Reputation: 288

7

"Raw data" is what we say in NLP.

L. Amber O'Hearn

Posted 2015-06-23T12:59:13.607

Reputation: 81

0

I would just say "text" or "textual" data and "mixed data" for a combination of fixed fields and blobs of text in a database.

Peter De Bie

Posted 2015-06-23T12:59:13.607

Reputation: 1