What is ChunkParserI in nltk.chunk ? What exactly it has been called for?

1

from nltk.chunk import ChunkParserI 
from nltk.chunk.util import conlltags2tree 
from nltk.corpus import gazetteers 

class LocationChunker(ChunkParserI): 
    def __init__(self): 
        self.locations = set(gazetteers.words()) 
        self.lookahead = 0
        for loc in self.locations: 
            nwords = loc.count(' ') 
        if nwords > self.lookahead: 
            self.lookahead = nwords 

What is ChunkParserI in nltk.chunk ? What exactly it has been called for? Also, please explain the code. What is the difference between chunking and parsing?

Payal Bhatia

Posted 2019-08-13T11:23:19.497

Reputation: 99

Answers

1

Parsing is the process of decomposing a string into it's constituent symbols (if the string is a word or a sequence of characters) or syntactic components (if the string is a meaningful textual entity like a short story, a scientific abstract or a sentence). In an NLP context, when one talks about parsing, he/she usually refers to the latter interpretation.

Chunking (in an NLP context) is a specific form of parsing in that it extracts groups of words in so-called 'chunks'. These groups of words or chunks are 'meaningful short phrases from the sentence (tagged with Part-of-Speech). Chunks are thus made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can't be a part of chunk and such words are known as 'chinks''1. The latter can be defined with chunking rules.

I assume the code you posted comes from "Natural Language Processing: Python and NLTK" by Hardeniya et al.2? From there i can find that the LocationChunker class 'starts by constructing a set of all locations in the gazetteers corpus. Then, it finds the maximum number of words in a single location string so it knows how many words it must look ahead when parsing a tagged sentence.' (cf. Chapter 5, p. 319)

Joshua1990

Posted 2019-08-13T11:23:19.497

Reputation: 51

Thanks @Joshua1990. I am fine with the difference. this was not my source. But I am still not clear about code. – Payal Bhatia – 2019-08-14T07:37:00.443