Creating Bag-of-Words using Python
In my last two blog we have discussed about the bag of words method for extracting features from text [refer this link Feature Extraction - Text- Bag of word ] and stemming and lemmatization techniques to avoid the redundant token problem [ refer this link Stemming and Lemmatization ] . Now it’s time to apply those concepts using python and see the things in action: We are going to explore below three options Let’s consider following three sentences: Kesri is a good movie to watch with family as it has good stories about India freedom fight. The success of a movie depends on the performance of the actors and story it has. There are no new movies releasing this month due to corona virus. Using above three sentences we will extract the Bag-of-words by applying the concepts of tokenization, stemming and lemmatization. So let’s get started: Step 1: Import the libraries word_tokenize for tokenization stopwords for stop words and CountVectorizer for creating bag-of words. #...