Here are the basic Natural Language Processing capabilities (or annotators) that are usually necessary to extract language units from textual data for sake of search and other applications:
Sentence breaker - to split text (usually, text paragraphs) to sentences. Even in English it can be hard for some cases like "Mr. and Mrs. Brown stay in room no. 20."
Tokenizer - to split text or sentences to words or word-level units, including punctuation. This task is not trivial for languages with no spaces and no stable understanding of word boundaries (e.g. Chinese, Japanese)
Part-of-speech Tagger - to guess part of speech of each word in the context of sentence; usually each word is assigned a so-called POS-tag from a tagset developed in advance to serve your final task (for example, parsing).
Lemmatizer - to convert a given word into its canonical form (lemma). Usually you need to know the word's POS-tag. For example, word "heating" as gerund must be converted to "heat", but as noun it must be left unchanged.
Parser - to perform syntactic analysis of the sentence and build a syntactic tree or graph. There're two main ways to represent syntactic structure of sentence: via constituency or dependency.
Summarizer - to generate a short summary of the text by selecting a set of top informative sentences of the document, representing its main idea. However can be done in more intelligent manner than just selecting the sentences from existing ones.
Named Entity Recognition - to extract so-called named entities from the text. Named entities are the chunks of words from text, which refer to an entity of certain type. The types may include: geographic locations (countries, cities, rivers, ...), person names, organization names etc. Before going into NER task you must understand what do you want to get and, possible, predefine a taxonomy of named entity types to resolve.
Coreference Resolution - to group named entities (or, depending on your task, any other text units) into clusters corresponding to a single real object/meaning. For example, "B. Gates", "William Gates", "Founder of Microsoft" etc. in one text may mean the same person, referenced by using different expressions.
There're many other interesting NLP applications/annotators (see NLP tasks category), sentiment analysis, machine translation etc.). There're many books on this, the classical book: "Speech and Language Processing" by Daniel Jurafsky and James H. Martin., but it can be too detailed for you.