Power Law or Zipf’s Law: Word Frequency Distribution
This article will help you to understand the basic concepts for lexical processing for text data before using in any machine learning model. Working with any type of data, be it numeric, textual or images, involves following steps Explore : Performing pre-processing of data Understand the data As text is made up of words, sentences and paragraphs, hence exploring of text data, can be started by analyzing the words frequency distribution . Famous linguist, George Zipf had started a simple exercise: Count the number of times each word appear in the document Create a rank order on the frequency of each word. The most frequent word was given the rank 1, second most frequent work was given rank 2 and so on. He repeated this exercise on many documents and found a specific pattern in which words are distributed in the document. Basis the pattern observed he has given a principle known as “Zipf Law or Power Law” Let’s analyze the word freq...