# FinBert

FinBert is a sentiment Analysis AI for Financial text.
Since we want to evaluate news article this is a necessary feature to analyse those texts.
In this document a first use of this tool will be shown.
Some texts will be analysed. Especially the analysis of german texts will be tried.

## Sources

[HugginFace](https://huggingface.co/ProsusAI/finbert)
[Tutorial](https://medium.com/codex/stocks-news-sentiment-analysis-with-deep-learning-transformers-and-machine-learning-cdcdb827fc06)

## Libraries

* transformers
* tqdm
* pandas
* numpy
* torch
* torchvision
* torchaudio
* sentencepiece
* sacremoses

In [25]:
!pip install transformers tqdm pandas numpy torch torchvision torchaudio sentencepiece sacremoses -U




[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


### Importing and creation of models and tokenizer

In [26]:
import pandas as pd
import torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# create a tokenizer object
tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")

# fetch the pretrained model
model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")

### Analyze a single sentiment

In [27]:
def analyze_sentiment(text: str) -> pd.Series:
    input_tokens = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
    output = model(**input_tokens)
    return pd.Series(
        torch.nn.functional.softmax(output.logits, dim=-1)[0].data,
        index=["+", "0", "-"],
    )


headline = "Microsoft fails to hit profit expectations"
tf = analyze_sentiment(headline)
tf

+    0.034084
0    0.932933
-    0.032982
dtype: float32

### Creating test data

In [None]:
text_df = pd.DataFrame(
    [
        {"text": "Microsoft fails to hit profit expectations", "lan": "en"},
        {
            "text": "Am Aktienmarkt überwieg weiter die Zuversicht, wie der Kursverlauf des DAX zeigt.",
            "lan": "de",
        },
        {"text": "Stocks rallied and the British pound gained.", "lan": "en"},
        {
            "text": "Meyer Burger bedient ab sofort australischen Markt und präsentiert sich auf Smart Energy Expo in Sydney.",
            "lan": "de",
        },
        {
            "text": "Meyer Burger enters Australian market and exhibits at Smart Energy Expo in Sydney.",
            "lan": "en",
        },
        {
            "text": "J&T Express Vietnam hilft lokalen Handwerksdörfern, ihre Reichweite zu vergrößern.",
            "lan": "de",
        },
        {
            "text": "7 Experten empfehlen die Aktie zum Kauf, 1 Experte empfiehlt, die Aktie zu halten.",
            "lan": "de",
        },
        {"text": "Microsoft aktie fällt.", "lan": "de"},
        {"text": "Microsoft aktie steigt.", "lan": "de"},
    ]
)

In [28]:
text_df

Unnamed: 0,text,lan
0,Microsoft fails to hit profit expectations,en
1,"Am Aktienmarkt überwieg weiter die Zuversicht,...",de
2,Stocks rallied and the British pound gained.,en
3,Meyer Burger bedient ab sofort australischen M...,de
4,Meyer Burger enters Australian market and exhi...,en
5,J&T Express Vietnam hilft lokalen Handwerksdör...,de
6,"7 Experten empfehlen die Aktie zum Kauf, 1 Exp...",de
7,Microsoft aktie fällt.,de
8,Microsoft aktie steigt.,de


### Analyze multiple Sentiments

In [29]:
def analyse_sentiments(texts: pd.DataFrame) -> pd.DataFrame:
    values = texts["text"].apply(analyze_sentiment)
    texts[["+", "0", "-"]] = values
    return texts


analyse_sentiments(text_df.copy())

Unnamed: 0,text,lan,+,0,-
0,Microsoft fails to hit profit expectations,en,0.034084,0.932933,0.032982
1,"Am Aktienmarkt überwieg weiter die Zuversicht,...",de,0.053528,0.02795,0.918522
2,Stocks rallied and the British pound gained.,en,0.898361,0.034474,0.067165
3,Meyer Burger bedient ab sofort australischen M...,de,0.116597,0.01279,0.870613
4,Meyer Burger enters Australian market and exhi...,en,0.187527,0.008846,0.803627
5,J&T Express Vietnam hilft lokalen Handwerksdör...,de,0.066277,0.020608,0.913115
6,"7 Experten empfehlen die Aktie zum Kauf, 1 Exp...",de,0.050346,0.022004,0.92765
7,Microsoft aktie fällt.,de,0.066061,0.01644,0.917498
8,Microsoft aktie steigt.,de,0.041449,0.018471,0.94008


## Conclusion about FinBert

The current form of this model can't be used for the german language.
It could be used if the text is translated beforehand. But it is questionable if that will work well.
Another way would be to retrain the same model with translated text from this models' data. But I do not believe this to be feasible.

# Translating sentiments before analysing them with FinBert

The problem with the FinBert model can be solved with translating the input before using FinBert.
The functions below explor this.

[Translator: Helsinki-NLP/opus-mt-de-en](https://huggingface.co/Helsinki-NLP/opus-mt-de-en)
https://huggingface.co/docs/transformers/main/en/model_doc/marian#transformers.MarianMTModel



In [30]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

translation_tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-de-en")

translation_model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-de-en")

In [31]:
def translate_sentiment(text: str) -> str:
    input_tokens = translation_tokenizer([text], return_tensors="pt")
    generated_ids = translation_model.generate(**input_tokens)
    return translation_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[
        0
    ]


headline = (
    "J&T Express Vietnam hilft lokalen Handwerksdörfern, ihre Reichweite zu vergrößern."
)
tf = translate_sentiment(headline)
tf



'J&T Express Vietnam helps local craft villages increase their reach.'

In [32]:
def translate_sentiment_series(series: pd.Series) -> pd.Series:
    if series["lan"] == "en":
        return series
    elif series["lan"] == "de":
        print(series["text"])
        return pd.Series(
            {
                "text": translate_sentiment(series["text"]),
                "lan": "de_translated",
                "orig": series["text"],
            }
        )
    raise ValueError(f"Language {series['lan']} is not known.")


def translate_sentiments(texts: pd.DataFrame) -> pd.DataFrame:
    texts = texts.apply(translate_sentiment_series, axis=1)
    return texts


translated_df = translate_sentiments(text_df.copy())
translated_df

Am Aktienmarkt überwieg weiter die Zuversicht, wie der Kursverlauf des DAX zeigt.
Meyer Burger bedient ab sofort australischen Markt und präsentiert sich auf Smart Energy Expo in Sydney.
J&T Express Vietnam hilft lokalen Handwerksdörfern, ihre Reichweite zu vergrößern.
7 Experten empfehlen die Aktie zum Kauf, 1 Experte empfiehlt, die Aktie zu halten.
Microsoft aktie fällt.
Microsoft aktie steigt.


Unnamed: 0,lan,orig,text
0,en,,Microsoft fails to hit profit expectations
1,de_translated,"Am Aktienmarkt überwieg weiter die Zuversicht,...","On the stock market, confidence continued to p..."
2,en,,Stocks rallied and the British pound gained.
3,de_translated,Meyer Burger bedient ab sofort australischen M...,Meyer Burger is now serving the Australian mar...
4,en,,Meyer Burger enters Australian market and exhi...
5,de_translated,J&T Express Vietnam hilft lokalen Handwerksdör...,J&T Express Vietnam helps local craft villages...
6,de_translated,"7 Experten empfehlen die Aktie zum Kauf, 1 Exp...","7 experts recommend the stock for purchase, 1 ..."
7,de_translated,Microsoft aktie fällt.,Microsoft Aktie falls.
8,de_translated,Microsoft aktie steigt.,Microsoft share is rising.


In [33]:
sentiments = analyse_sentiments(translated_df)
sentiments

Unnamed: 0,lan,orig,text,+,0,-
0,en,,Microsoft fails to hit profit expectations,0.034084,0.932933,0.032982
1,de_translated,"Am Aktienmarkt überwieg weiter die Zuversicht,...","On the stock market, confidence continued to p...",0.919673,0.018426,0.061901
2,en,,Stocks rallied and the British pound gained.,0.898361,0.034474,0.067165
3,de_translated,Meyer Burger bedient ab sofort australischen M...,Meyer Burger is now serving the Australian mar...,0.221019,0.006844,0.772137
4,en,,Meyer Burger enters Australian market and exhi...,0.187527,0.008846,0.803627
5,de_translated,J&T Express Vietnam hilft lokalen Handwerksdör...,J&T Express Vietnam helps local craft villages...,0.891114,0.007633,0.101254
6,de_translated,"7 Experten empfehlen die Aktie zum Kauf, 1 Exp...","7 experts recommend the stock for purchase, 1 ...",0.04085,0.016722,0.942427
7,de_translated,Microsoft aktie fällt.,Microsoft Aktie falls.,0.027456,0.88916,0.083384
8,de_translated,Microsoft aktie steigt.,Microsoft share is rising.,0.952216,0.019054,0.02873


## Conclusion about a translated FinBert

When translating a german text to english before using FinBert the results look much better and could be used for our project.
The big problem is that it will take even more CPU.
It should probably be combined with a language recognition and could be used to take multiple languages in since there are many variances of this translation model.