Reverting black for the jupyter notebooks gets old. Can we just run black over all of them?
12 KiB
Sentiment Analysis using VADER¶
Based on: Social Media Sentiment Analysis in Python with VADER
VADER is a lexicon and rule-based model for sentiment analysis. The result generated by VADER is a dictionary of four keys:
- neg (negative)
- neu (neutral)
- pos (positive)
- compund (determines the degree of the senitment)
The neg, neu and pos values add up to 1. The compound is a value between -1 and +1. A compound greater or equal to 0.05 determines a positive sentiment, a compound lower or equal to -0.05 determines a negative sentiment. Otherwise it is considered to be neutral.
To use the VADER library, we need to install nltk
. We will also install pandas
for our example data:
!pip install nltk pandas
To use VADER for our analysis, we need to import the nltk
package and the VADER lexicon:
import nltk
# Download the lexicon
nltk.download("vader_lexicon")
# Import the lexicon
from nltk.sentiment.vader import SentimentIntensityAnalyzer
# Create an instance of SentimentIntensityAnalyzer
sent_analyzer = SentimentIntensityAnalyzer()
To compare our results with FINBert, we analyze the same headline as in the FINBert notebook:
headline = "Microsoft fails to hit profit expectations"
print(sent_analyzer.polarity_scores(headline))
Since compound
= 0.0258, this first headline is considered to be neutral. Now we create the same test data as in the notebook for FINBert. As VADER only works with english texts, we directly use the translations:
import pandas as pd
text_df = pd.DataFrame(
[
{"text": "Microsoft fails to hit profit expectations."},
{
"text": "Confidence continues to prevail on the stock market, as the performance of the DAX shows."
},
{"text": "Stocks rallied and the British pound gained."},
{
"text": "Meyer Burger now serves Australian market and presents itself at Smart Energy Expo in Sydney."
},
{
"text": "Meyer Burger enters Australian market and exhibits at Smart Energy Expo in Sydney."
},
{
"text": "J&T Express Vietnam helps local craft villages increase their reach."
},
{
"text": "7 experts recommend the stock for purchase, 1 expert recommends holding the stock."
},
{"text": "Microsoft share falls."},
{"text": "Microsoft share is rising."},
]
)
Analysis of the sample data:
def format_output(output_dict):
polarity = "neutral"
if output_dict["compound"] >= 0.05:
polarity = "positive"
elif output_dict["compound"] <= -0.05:
polarity = "negative"
return polarity
def predict_sentiment(text):
output_dict = sent_analyzer.polarity_scores(text)
return output_dict
# Run the predictions
text_df["vader_prediction"] = text_df["text"].apply(predict_sentiment)
# Show results
text_df
Conclusion¶
Since VADER only evaluates the sentiment of single words, it does not seem feasible for our example of financial texts. Especially, the two examples "Microsoft share falls" and "Microsoft share is rising" should yield different sentiments.