How to store datasets of lexical connections?

1

1

I'm investigating the possibility of storing the semantic-lexical connections (such as the relationships to the other words such as phrases and other dependencies, its strength, part of speech, language, etc.) in order to provide analysis of the input text.

I assume this has been already done. If so, to avoid reinventing the wheel, is there any efficient method to store and manage such data in some common format which has been already researched and tested?

kenorb

Posted 2016-08-03T12:41:06.953

Reputation: 9 163

Answers

1

I would think you could use a graph database, perhaps Neo4J or Titan or something of that nature. Or, if you want a simple file format, you could use one of the many formats that exist for representing graphs. You can find a list and overview of some of them here.

Another option would be to store them in RDF using a triplestore like Jena.

mindcrime

Posted 2016-08-03T12:41:06.953

Reputation: 3 471

1

If I understand you correctly, you should check out Word2Vec. From Wikipedia:

Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a high-dimensional space (typically of several hundred dimensions), with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

Doxosophoi

Posted 2016-08-03T12:41:06.953

Reputation: 1 719