mirror of https://github.com/fhswf/aki_prj23_transparenzregister.git synced 2025-09-13 22:11:21 +02:00

Files

Philipp Horstenkamp 41f2c9f995 Executing black over all jupyter notebook (#190 )

Reverting black for the jupyter notebooks gets old. Can we just run
black over all of them?

2023-10-04 20:03:47 +02:00

12 KiB

Raw Blame History

Sentiment Analysis using VADER¶

Based on: Social Media Sentiment Analysis in Python with VADER

VADER is a lexicon and rule-based model for sentiment analysis. The result generated by VADER is a dictionary of four keys:

neg (negative)
neu (neutral)
pos (positive)
compund (determines the degree of the senitment)

The neg, neu and pos values add up to 1. The compound is a value between -1 and +1. A compound greater or equal to 0.05 determines a positive sentiment, a compound lower or equal to -0.05 determines a negative sentiment. Otherwise it is considered to be neutral.

To use the VADER library, we need to install nltk. We will also install pandas for our example data:

In [48]:

!pip install nltk pandas

Requirement already satisfied: nltk in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (3.6.5)
Requirement already satisfied: pandas in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (1.3.4)
Requirement already satisfied: click in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from nltk) (8.0.3)
Requirement already satisfied: joblib in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from nltk) (1.1.0)
Requirement already satisfied: regex>=2021.8.3 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from nltk) (2021.8.3)
Requirement already satisfied: tqdm in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from nltk) (4.62.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pandas) (2021.3)
Requirement already satisfied: numpy>=1.17.3 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pandas) (1.20.3)
Requirement already satisfied: six>=1.5 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)

To use VADER for our analysis, we need to import the nltk package and the VADER lexicon:

In [49]:

import nltk

# Download the lexicon
nltk.download("vader_lexicon")

# Import the lexicon
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Create an instance of SentimentIntensityAnalyzer
sent_analyzer = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/kim/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!

To compare our results with FINBert, we analyze the same headline as in the FINBert notebook:

In [50]:

headline = "Microsoft fails to hit profit expectations"
print(sent_analyzer.polarity_scores(headline))

{'neg': 0.289, 'neu': 0.412, 'pos': 0.299, 'compound': 0.0258}

Since compound = 0.0258, this first headline is considered to be neutral. Now we create the same test data as in the notebook for FINBert. As VADER only works with english texts, we directly use the translations:

In [51]:

import pandas as pd

text_df = pd.DataFrame(
    [
        {"text": "Microsoft fails to hit profit expectations."},
        {
            "text": "Confidence continues to prevail on the stock market, as the performance of the DAX shows."
        },
        {"text": "Stocks rallied and the British pound gained."},
        {
            "text": "Meyer Burger now serves Australian market and presents itself at Smart Energy Expo in Sydney."
        },
        {
            "text": "Meyer Burger enters Australian market and exhibits at Smart Energy Expo in Sydney."
        },
        {
            "text": "J&T Express Vietnam helps local craft villages increase their reach."
        },
        {
            "text": "7 experts recommend the stock for purchase, 1 expert recommends holding the stock."
        },
        {"text": "Microsoft share falls."},
        {"text": "Microsoft share is rising."},
    ]
)

Analysis of the sample data:

In [52]:

def format_output(output_dict):
    polarity = "neutral"

    if output_dict["compound"] >= 0.05:
        polarity = "positive"

    elif output_dict["compound"] <= -0.05:
        polarity = "negative"

    return polarity


def predict_sentiment(text):
    output_dict = sent_analyzer.polarity_scores(text)
    return output_dict


# Run the predictions
text_df["vader_prediction"] = text_df["text"].apply(predict_sentiment)

# Show results
text_df

Out[52]:

	text	vader_prediction
0	Microsoft fails to hit profit expectations.	{'neg': 0.289, 'neu': 0.412, 'pos': 0.299, 'co...
1	Confidence continues to prevail on the stock m...	{'neg': 0.0, 'neu': 0.809, 'pos': 0.191, 'comp...
2	Stocks rallied and the British pound gained.	{'neg': 0.0, 'neu': 0.698, 'pos': 0.302, 'comp...
3	Meyer Burger now serves Australian market and ...	{'neg': 0.0, 'neu': 0.73, 'pos': 0.27, 'compou...
4	Meyer Burger enters Australian market and exhi...	{'neg': 0.0, 'neu': 0.696, 'pos': 0.304, 'comp...
5	J&T Express Vietnam helps local craft villages...	{'neg': 0.0, 'neu': 0.538, 'pos': 0.462, 'comp...
6	7 experts recommend the stock for purchase, 1 ...	{'neg': 0.0, 'neu': 0.672, 'pos': 0.328, 'comp...
7	Microsoft share falls.	{'neg': 0.0, 'neu': 0.476, 'pos': 0.524, 'comp...
8	Microsoft share is rising.	{'neg': 0.0, 'neu': 0.577, 'pos': 0.423, 'comp...

Conclusion¶

Since VADER only evaluates the sentiment of single words, it does not seem feasible for our example of financial texts. Especially, the two examples "Microsoft share falls" and "Microsoft share is rising" should yield different sentiments.

12 KiB Raw Blame History

Sentiment Analysis using VADER¶

Conclusion¶

12 KiB

Raw Blame History