Here, we make use of the ‘starts with’ function because hashtags (or mentions) always appear at the beginning of a word. This also helps in extracting extra information from our text data. One more interesting feature which we can extract from a tweet is calculating the number of hashtags or mentions present in it. Here, we have imported stopwords from NLTK, which is a basic NLP library in python. But sometimes calculating the number of stopwords can also give us some extra information which we might have been losing before. Generally, while solving an NLP problem, the first thing we do is to remove the stopwords. Train = train.apply(lambda x: avg_word(x)) Return (sum(len(word) for word in words)/len(words)) Here, we simply take the sum of the length of all the words and divide it by the total length of the tweet: def avg_word(sentence): This can also potentially help us in improving our model. We will also extract another feature which will calculate the average word length of each tweet. Note that the calculation will also include the number of spaces, which you can remove, if required. ![]() This is done by calculating the length of the tweet. Here, we calculate the number of characters in each tweet. This feature is also based on the previous feature intuition. To do this, we simply use the split function in python: train = train.apply(lambda x: len(str(x).split(" "))) The basic intuition behind this is that generally, the negative sentiments contain a lesser amount of words than the positive ones. One of the most basic features we can extract is the number of words in each tweet. Note that here we are only working with textual data, but we can also use the below methods when numerical features are also present along with the text. In the entire article, we will use the twitter sentiment dataset from the datahack platform. So let’s discuss some of them in this section.īefore starting, let’s quickly read the training file from the dataset in order to perform different tasks on it. We can use text data to extract a number of features even if we don’t have sufficient knowledge of Natural Language Processing.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |