Having a lot of text documents (in natural language, unstructured), what are the possible ways of annotating them with some semantic meta-data? For example, consider a short document:
I saw the company's manager last day.
To be able to extract information from it, it must be annotated with additional data to be less ambiguous. The process of finding such meta-data is not in question, so assume it is done manually. The question is how to store these data in a way that further analysis on it can be done more conveniently/efficiently?
A possible approach is to use XML tags (see below), but it seems too verbose, and maybe there are better approaches/guidelines for storing such meta-data on text documents.
<Person name="John">I</Person> saw the <Organization name="ACME">company</Organization>'s manager <Time value="2014-5-29">last day</Time>.