Stemming and Lemmatization
In my last blog we have discussed, bag of words method for extracting features from text [refer this link Feature Extraction - Text- Bag of word ]. The drawback of bag of word method is size of bow matrix due to redundant tokens. if we will use these redundant tokens in building any machine learning model, it will be inefficient or will not perform good. To solve redundant token problem we can use "Stemming" and "Lemmatization" Stemming Stemming technique makes sure that different variations of word are represented by a single word. E.g. run, ran, running are represented by the single word "run". So the process of reducing the inflected forms of word to its root or stem word is called as Stemming . Root/Stem Word Variations of root word Gain Gained, gaining, gainful Do Do, done, doing, does Mr. Martin Stemmer had developed an algorithm for performing the stemming process to...