mirror of
https://github.com/fhswf/aki_prj23_transparenzregister.git
synced 2025-04-22 22:32:54 +02:00
* Added a first action * Repaired a typo * Repaired a typo2 * Repaired a typo2 * Added flake8 action * Repaired a typo in the flake8 action. * Added a first bandit action * Added a first batch * Added a first batch * Added a first batch * Added a first batch * Added a first batch * Added the flake8-prebuild as a need to flake8 * Added the flake8-prebuild as a need to flake8 * Added the flake8-prebuild as a need to flake8 * Added the docker socket to the volume. * Added the flake8-prebuild as a need to flake8 * Removed latest part from container. * Removed latest part from container. * Removed latest part from container. * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 poetry * Reworked flake8 poetry * Changed to 64bit * Some edits to the runner * Added python setup * Added python -m to python docker image. * Added python -m to python docker image. * Added python -m to python docker image. * Added python -m to python docker image. * Added python -m to python docker image. * Added python -m to python docker image. * Added ra run linter * Added ra run linter * Added ra run linter * Added ra run linter * Removed redundant version * Removed redundant version * Added isort * Added isort * Added isort * Added poetry install * Added poetry install * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Uses nodejs and python image * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Removed selfhosted runner * Removed self hosted runner * Removed self hosted runner * Removed self hosted runner * Added black and flake8 tests * Removed self hosted runner * Removed self hosted runner * Removed unneded actions * Added a mypy error. * Removed poetry call before boetry setup * Removed poetry call before poetry setup * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added the snook poetry builder * Reworked the repo a bit * Removed unneeded poetry installation * Added the isort action * Added isort test * Added ruff * Added full ruff configuration * Added full ruff configuration2 * Added full ruff configuration2 * Removed duplicat configurations * Removed some redundant pre-commit hooks * Removed unneeded actions. * Removed unneeded actions. * Repaired ruff * Added tests. * Removed * Removed * Removed a missing file * Removed a missing file * Removed a missing file * Removed a missing file * Removed a missing file * Added reports as artifacts * Added reports as artifacts * Added reports as artifacts * Removed the unneded poetry test * Added a license checker. * Added a license checker. * Removed some unneeded configuration. * Removed the import reformatted. * Added doc generation. * Added doc generation. * Added license summary. * Add * Add lint * Switched pip-licenses to poetry. * Switched pip-licenses to poetry. * Switched pip-licenses to poetry. * Remove some more packages. * Remove some more packages. * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added version codes to the main package * Changed the format of the md files * Presentation first draft * Version up and added extensions * Version up and added extensions * Version up and added extensions * Removed the venv path from docbuild * Actions version up * Actions version up * Actions version up * Actions version up * Actions version up * Actions version up * Experiements with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * First draft of the sphinx documentation. * Added the protocol to the time series. * Added the protocol to the time series. * First draft ot a first build pipline * Added mermaid version support * Added documentations pull and branch request requirements. * Added documentations pull and branch request requirements. * Added documentations pull and branch request requirements. * Added documentations pull and branch request requirements. * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Add safety * Add safety * Add safety * Added the action on pull_request_target * Added the action on pull_request_target * Added the action on pull_request_target * Added a pytest coverage report * Added a pytest coverage report * Added a pytest coverage report * Added a pytest coverage report * Added a pytest coverage report * Added a build step * Added a build step * Added a build step * Added a build step * Changed the lint action to work only on python changes. * Changed the lint action to work only on python changes. * Changed the lint action to work only on python changes. * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Coverage * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Repaired a bug. * Repaired a bug. * Repaired a bug. * Repaired a bug. * Repaired a bug. * Added a github branch.ref * Removed a poetry install * Docbuild now excludes templates * Added the seminarpräsentation to the documentation build * Added the seminarpräsentation to the documentation build * Added the seminarpräsentation to the documentation build * dded a few images * Changed the pre-commit image * Changed the pre-commit image * Presentation done * Never executing jupyter for sphinx * Never executing jupyter for sphinx * Never executing jupyter for sphinx * Never executing jupyter for sphinx * Never executing jupyter for sphinx
1001 lines
37 KiB
Plaintext
1001 lines
37 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"# FinBert\n",
|
|
"\n",
|
|
"FinBert is a sentiment Analysis AI for Financial text.\n",
|
|
"Since we want to evaluate news article this is a necessary feature to analyse those texts.\n",
|
|
"In this document a first use of this tool will be shown.\n",
|
|
"Some texts will be analysed. Especially the analysis of german texts will be tried.\n",
|
|
"\n",
|
|
"## Sources\n",
|
|
"\n",
|
|
"[HugginFace](https://huggingface.co/ProsusAI/finbert)\n",
|
|
"[Tutorial](https://medium.com/codex/stocks-news-sentiment-analysis-with-deep-learning-transformers-and-machine-learning-cdcdb827fc06)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"## Libraries\n",
|
|
"\n",
|
|
"* transformers\n",
|
|
"* tqdm\n",
|
|
"* pandas\n",
|
|
"* numpy\n",
|
|
"* torch\n",
|
|
"* torchvision\n",
|
|
"* torchaudio\n",
|
|
"* sentencepiece\n",
|
|
"* sacremoses"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:13.740927Z",
|
|
"start_time": "2023-05-01T13:16:08.554998Z"
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
},
|
|
"slideshow": {
|
|
"slide_type": "skip"
|
|
},
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Requirement already satisfied: transformers in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (4.28.1)\n",
|
|
"Requirement already satisfied: tqdm in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (4.65.0)\n",
|
|
"Requirement already satisfied: pandas in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (2.0.1)\n",
|
|
"Requirement already satisfied: numpy in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (1.24.3)\n",
|
|
"Requirement already satisfied: torch in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (2.0.0)\n",
|
|
"Requirement already satisfied: torchvision in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (0.15.1)\n",
|
|
"Requirement already satisfied: torchaudio in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (2.0.1)\n",
|
|
"Requirement already satisfied: sentencepiece in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (0.1.98)\n",
|
|
"Requirement already satisfied: sacremoses in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (0.0.53)\n",
|
|
"Requirement already satisfied: filelock in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from transformers) (3.8.0)\n",
|
|
"Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from transformers) (0.14.1)\n",
|
|
"Requirement already satisfied: packaging>=20.0 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from transformers) (23.1)\n",
|
|
"Requirement already satisfied: pyyaml>=5.1 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from transformers) (6.0)\n",
|
|
"Requirement already satisfied: regex!=2019.12.17 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from transformers) (2023.3.23)\n",
|
|
"Requirement already satisfied: requests in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from transformers) (2.28.1)\n",
|
|
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from transformers) (0.13.3)\n",
|
|
"Requirement already satisfied: colorama in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from tqdm) (0.4.6)\n",
|
|
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from pandas) (2.8.2)\n",
|
|
"Requirement already satisfied: pytz>=2020.1 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from pandas) (2022.7)\n",
|
|
"Requirement already satisfied: tzdata>=2022.1 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from pandas) (2023.3)\n",
|
|
"Requirement already satisfied: typing-extensions in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from torch) (4.5.0)\n",
|
|
"Requirement already satisfied: sympy in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from torch) (1.11.1)\n",
|
|
"Requirement already satisfied: networkx in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from torch) (3.1)\n",
|
|
"Requirement already satisfied: jinja2 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from torch) (3.1.2)\n",
|
|
"Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from torchvision) (9.4.0)\n",
|
|
"Requirement already satisfied: six in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from sacremoses) (1.16.0)\n",
|
|
"Requirement already satisfied: click in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from sacremoses) (8.1.3)\n",
|
|
"Requirement already satisfied: joblib in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from sacremoses) (1.2.0)\n",
|
|
"Requirement already satisfied: fsspec in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from huggingface-hub<1.0,>=0.11.0->transformers) (2023.4.0)\n",
|
|
"Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from jinja2->torch) (2.1.2)\n",
|
|
"Requirement already satisfied: charset-normalizer<3,>=2 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from requests->transformers) (2.1.1)\n",
|
|
"Requirement already satisfied: idna<4,>=2.5 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from requests->transformers) (3.4)\n",
|
|
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from requests->transformers) (1.26.12)\n",
|
|
"Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from requests->transformers) (2022.9.24)\n",
|
|
"Requirement already satisfied: mpmath>=0.19 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from sympy->torch) (1.3.0)\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"[notice] A new release of pip is available: 23.0.1 -> 23.1.2\n",
|
|
"[notice] To update, run: python.exe -m pip install --upgrade pip\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"!pip install transformers tqdm pandas numpy torch torchvision torchaudio sentencepiece sacremoses -U"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"### Importing and creation of models and tokenizer"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 26,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:15.121662Z",
|
|
"start_time": "2023-05-01T13:16:13.743921Z"
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
},
|
|
"slideshow": {
|
|
"slide_type": "subslide"
|
|
},
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pandas as pd\n",
|
|
"import torch\n",
|
|
"\n",
|
|
"from transformers import AutoTokenizer, AutoModelForSequenceClassification\n",
|
|
"\n",
|
|
"# create a tokenizer object\n",
|
|
"tokenizer = AutoTokenizer.from_pretrained(\"ProsusAI/finbert\")\n",
|
|
"\n",
|
|
"# fetch the pretrained model\n",
|
|
"model = AutoModelForSequenceClassification.from_pretrained(\"ProsusAI/finbert\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"### Analyze a single sentiment"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:15.194193Z",
|
|
"start_time": "2023-05-01T13:16:15.122665Z"
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
},
|
|
"slideshow": {
|
|
"slide_type": "-"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"+ 0.034084\n",
|
|
"0 0.932933\n",
|
|
"- 0.032982\n",
|
|
"dtype: float32"
|
|
]
|
|
},
|
|
"execution_count": 27,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"def analyze_sentiment(text: str) -> pd.Series:\n",
|
|
" input_tokens = tokenizer(text, padding=True, truncation=True, return_tensors=\"pt\")\n",
|
|
" output = model(**input_tokens)\n",
|
|
" return pd.Series(\n",
|
|
" torch.nn.functional.softmax(output.logits, dim=-1)[0].data,\n",
|
|
" index=[\"+\", \"0\", \"-\"],\n",
|
|
" )\n",
|
|
"\n",
|
|
"\n",
|
|
"headline = \"Microsoft fails to hit profit expectations\"\n",
|
|
"tf = analyze_sentiment(headline)\n",
|
|
"tf"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"slideshow": {
|
|
"slide_type": "slide"
|
|
}
|
|
},
|
|
"source": [
|
|
"### Creating test data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:15.208856Z",
|
|
"start_time": "2023-05-01T13:16:15.198186Z"
|
|
},
|
|
"slideshow": {
|
|
"slide_type": "skip"
|
|
},
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"text_df = pd.DataFrame(\n",
|
|
" [\n",
|
|
" {\"text\": \"Microsoft fails to hit profit expectations\", \"lan\": \"en\"},\n",
|
|
" {\n",
|
|
" \"text\": \"Am Aktienmarkt überwieg weiter die Zuversicht, wie der Kursverlauf des DAX zeigt.\",\n",
|
|
" \"lan\": \"de\",\n",
|
|
" },\n",
|
|
" {\"text\": \"Stocks rallied and the British pound gained.\", \"lan\": \"en\"},\n",
|
|
" {\n",
|
|
" \"text\": \"Meyer Burger bedient ab sofort australischen Markt und präsentiert sich auf Smart Energy Expo in Sydney.\",\n",
|
|
" \"lan\": \"de\",\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"text\": \"Meyer Burger enters Australian market and exhibits at Smart Energy Expo in Sydney.\",\n",
|
|
" \"lan\": \"en\",\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"text\": \"J&T Express Vietnam hilft lokalen Handwerksdörfern, ihre Reichweite zu vergrößern.\",\n",
|
|
" \"lan\": \"de\",\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"text\": \"7 Experten empfehlen die Aktie zum Kauf, 1 Experte empfiehlt, die Aktie zu halten.\",\n",
|
|
" \"lan\": \"de\",\n",
|
|
" },\n",
|
|
" {\"text\": \"Microsoft aktie fällt.\", \"lan\": \"de\"},\n",
|
|
" {\"text\": \"Microsoft aktie steigt.\", \"lan\": \"de\"},\n",
|
|
" ]\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 28,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:15.208856Z",
|
|
"start_time": "2023-05-01T13:16:15.198186Z"
|
|
},
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>text</th>\n",
|
|
" <th>lan</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>Microsoft fails to hit profit expectations</td>\n",
|
|
" <td>en</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
|
|
" <td>de</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>Stocks rallied and the British pound gained.</td>\n",
|
|
" <td>en</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
|
|
" <td>de</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>Meyer Burger enters Australian market and exhi...</td>\n",
|
|
" <td>en</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>J&T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
|
|
" <td>de</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
|
|
" <td>de</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>Microsoft aktie fällt.</td>\n",
|
|
" <td>de</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>Microsoft aktie steigt.</td>\n",
|
|
" <td>de</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" text lan\n",
|
|
"0 Microsoft fails to hit profit expectations en\n",
|
|
"1 Am Aktienmarkt überwieg weiter die Zuversicht,... de\n",
|
|
"2 Stocks rallied and the British pound gained. en\n",
|
|
"3 Meyer Burger bedient ab sofort australischen M... de\n",
|
|
"4 Meyer Burger enters Australian market and exhi... en\n",
|
|
"5 J&T Express Vietnam hilft lokalen Handwerksdör... de\n",
|
|
"6 7 Experten empfehlen die Aktie zum Kauf, 1 Exp... de\n",
|
|
"7 Microsoft aktie fällt. de\n",
|
|
"8 Microsoft aktie steigt. de"
|
|
]
|
|
},
|
|
"execution_count": 28,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"text_df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Analyze multiple Sentiments"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 29,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:16.132009Z",
|
|
"start_time": "2023-05-01T13:16:15.211858Z"
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>text</th>\n",
|
|
" <th>lan</th>\n",
|
|
" <th>+</th>\n",
|
|
" <th>0</th>\n",
|
|
" <th>-</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>Microsoft fails to hit profit expectations</td>\n",
|
|
" <td>en</td>\n",
|
|
" <td>0.034084</td>\n",
|
|
" <td>0.932933</td>\n",
|
|
" <td>0.032982</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
|
|
" <td>de</td>\n",
|
|
" <td>0.053528</td>\n",
|
|
" <td>0.027950</td>\n",
|
|
" <td>0.918522</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>Stocks rallied and the British pound gained.</td>\n",
|
|
" <td>en</td>\n",
|
|
" <td>0.898361</td>\n",
|
|
" <td>0.034474</td>\n",
|
|
" <td>0.067165</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
|
|
" <td>de</td>\n",
|
|
" <td>0.116597</td>\n",
|
|
" <td>0.012790</td>\n",
|
|
" <td>0.870613</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>Meyer Burger enters Australian market and exhi...</td>\n",
|
|
" <td>en</td>\n",
|
|
" <td>0.187527</td>\n",
|
|
" <td>0.008846</td>\n",
|
|
" <td>0.803627</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>J&T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
|
|
" <td>de</td>\n",
|
|
" <td>0.066277</td>\n",
|
|
" <td>0.020608</td>\n",
|
|
" <td>0.913115</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
|
|
" <td>de</td>\n",
|
|
" <td>0.050346</td>\n",
|
|
" <td>0.022004</td>\n",
|
|
" <td>0.927650</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>Microsoft aktie fällt.</td>\n",
|
|
" <td>de</td>\n",
|
|
" <td>0.066061</td>\n",
|
|
" <td>0.016440</td>\n",
|
|
" <td>0.917498</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>Microsoft aktie steigt.</td>\n",
|
|
" <td>de</td>\n",
|
|
" <td>0.041449</td>\n",
|
|
" <td>0.018471</td>\n",
|
|
" <td>0.940080</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" text lan + 0 \n",
|
|
"0 Microsoft fails to hit profit expectations en 0.034084 0.932933 \\\n",
|
|
"1 Am Aktienmarkt überwieg weiter die Zuversicht,... de 0.053528 0.027950 \n",
|
|
"2 Stocks rallied and the British pound gained. en 0.898361 0.034474 \n",
|
|
"3 Meyer Burger bedient ab sofort australischen M... de 0.116597 0.012790 \n",
|
|
"4 Meyer Burger enters Australian market and exhi... en 0.187527 0.008846 \n",
|
|
"5 J&T Express Vietnam hilft lokalen Handwerksdör... de 0.066277 0.020608 \n",
|
|
"6 7 Experten empfehlen die Aktie zum Kauf, 1 Exp... de 0.050346 0.022004 \n",
|
|
"7 Microsoft aktie fällt. de 0.066061 0.016440 \n",
|
|
"8 Microsoft aktie steigt. de 0.041449 0.018471 \n",
|
|
"\n",
|
|
" - \n",
|
|
"0 0.032982 \n",
|
|
"1 0.918522 \n",
|
|
"2 0.067165 \n",
|
|
"3 0.870613 \n",
|
|
"4 0.803627 \n",
|
|
"5 0.913115 \n",
|
|
"6 0.927650 \n",
|
|
"7 0.917498 \n",
|
|
"8 0.940080 "
|
|
]
|
|
},
|
|
"execution_count": 29,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"def analyse_sentiments(texts: pd.DataFrame) -> pd.DataFrame:\n",
|
|
" values = texts[\"text\"].apply(analyze_sentiment)\n",
|
|
" texts[[\"+\", \"0\", \"-\"]] = values\n",
|
|
" return texts\n",
|
|
"\n",
|
|
"\n",
|
|
"analyse_sentiments(text_df.copy())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Conclusion about FinBert\n",
|
|
"\n",
|
|
"The current form of this model can't be used for the german language.\n",
|
|
"It could be used if the text is translated beforehand. But it is questionable if that will work well.\n",
|
|
"Another way would be to retrain the same model with translated text from this models' data. But I do not believe this to be feasible."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Translating sentiments before analysing them with FinBert\n",
|
|
"\n",
|
|
"The problem with the FinBert model can be solved with translating the input before using FinBert.\n",
|
|
"The functions below explor this.\n",
|
|
"\n",
|
|
"[Translator: Helsinki-NLP/opus-mt-de-en](https://huggingface.co/Helsinki-NLP/opus-mt-de-en)\n",
|
|
"https://huggingface.co/docs/transformers/main/en/model_doc/marian#transformers.MarianMTModel\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 30,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:19.308043Z",
|
|
"start_time": "2023-05-01T13:16:16.135009Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n",
|
|
"\n",
|
|
"translation_tokenizer = AutoTokenizer.from_pretrained(\"Helsinki-NLP/opus-mt-de-en\")\n",
|
|
"\n",
|
|
"translation_model = AutoModelForSeq2SeqLM.from_pretrained(\"Helsinki-NLP/opus-mt-de-en\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 31,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:19.928232Z",
|
|
"start_time": "2023-05-01T13:16:19.310046Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"C:\\Users\\phhor\\PycharmProjects\\aki_prj23_transparenzregister\\venv\\Lib\\site-packages\\transformers\\generation\\utils.py:1313: UserWarning: Using `max_length`'s default (512) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.\n",
|
|
" warnings.warn(\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'J&T Express Vietnam helps local craft villages increase their reach.'"
|
|
]
|
|
},
|
|
"execution_count": 31,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"def translate_sentiment(text: str) -> str:\n",
|
|
" input_tokens = translation_tokenizer([text], return_tensors=\"pt\")\n",
|
|
" generated_ids = translation_model.generate(**input_tokens)\n",
|
|
" return translation_tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[\n",
|
|
" 0\n",
|
|
" ]\n",
|
|
"\n",
|
|
"\n",
|
|
"headline = (\n",
|
|
" \"J&T Express Vietnam hilft lokalen Handwerksdörfern, ihre Reichweite zu vergrößern.\"\n",
|
|
")\n",
|
|
"tf = translate_sentiment(headline)\n",
|
|
"tf"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 32,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:23.381261Z",
|
|
"start_time": "2023-05-01T13:16:19.933234Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Am Aktienmarkt überwieg weiter die Zuversicht, wie der Kursverlauf des DAX zeigt.\n",
|
|
"Meyer Burger bedient ab sofort australischen Markt und präsentiert sich auf Smart Energy Expo in Sydney.\n",
|
|
"J&T Express Vietnam hilft lokalen Handwerksdörfern, ihre Reichweite zu vergrößern.\n",
|
|
"7 Experten empfehlen die Aktie zum Kauf, 1 Experte empfiehlt, die Aktie zu halten.\n",
|
|
"Microsoft aktie fällt.\n",
|
|
"Microsoft aktie steigt.\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>lan</th>\n",
|
|
" <th>orig</th>\n",
|
|
" <th>text</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>en</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>Microsoft fails to hit profit expectations</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
|
|
" <td>On the stock market, confidence continued to p...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>en</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>Stocks rallied and the British pound gained.</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
|
|
" <td>Meyer Burger is now serving the Australian mar...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>en</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>Meyer Burger enters Australian market and exhi...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>J&T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
|
|
" <td>J&T Express Vietnam helps local craft villages...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
|
|
" <td>7 experts recommend the stock for purchase, 1 ...</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>Microsoft aktie fällt.</td>\n",
|
|
" <td>Microsoft Aktie falls.</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>Microsoft aktie steigt.</td>\n",
|
|
" <td>Microsoft share is rising.</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" lan orig \n",
|
|
"0 en NaN \\\n",
|
|
"1 de_translated Am Aktienmarkt überwieg weiter die Zuversicht,... \n",
|
|
"2 en NaN \n",
|
|
"3 de_translated Meyer Burger bedient ab sofort australischen M... \n",
|
|
"4 en NaN \n",
|
|
"5 de_translated J&T Express Vietnam hilft lokalen Handwerksdör... \n",
|
|
"6 de_translated 7 Experten empfehlen die Aktie zum Kauf, 1 Exp... \n",
|
|
"7 de_translated Microsoft aktie fällt. \n",
|
|
"8 de_translated Microsoft aktie steigt. \n",
|
|
"\n",
|
|
" text \n",
|
|
"0 Microsoft fails to hit profit expectations \n",
|
|
"1 On the stock market, confidence continued to p... \n",
|
|
"2 Stocks rallied and the British pound gained. \n",
|
|
"3 Meyer Burger is now serving the Australian mar... \n",
|
|
"4 Meyer Burger enters Australian market and exhi... \n",
|
|
"5 J&T Express Vietnam helps local craft villages... \n",
|
|
"6 7 experts recommend the stock for purchase, 1 ... \n",
|
|
"7 Microsoft Aktie falls. \n",
|
|
"8 Microsoft share is rising. "
|
|
]
|
|
},
|
|
"execution_count": 32,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"def translate_sentiment_series(series: pd.Series) -> pd.Series:\n",
|
|
" if series[\"lan\"] == \"en\":\n",
|
|
" return series\n",
|
|
" elif series[\"lan\"] == \"de\":\n",
|
|
" print(series[\"text\"])\n",
|
|
" return pd.Series(\n",
|
|
" {\n",
|
|
" \"text\": translate_sentiment(series[\"text\"]),\n",
|
|
" \"lan\": \"de_translated\",\n",
|
|
" \"orig\": series[\"text\"],\n",
|
|
" }\n",
|
|
" )\n",
|
|
" raise ValueError(f\"Language {series['lan']} is not known.\")\n",
|
|
"\n",
|
|
"\n",
|
|
"def translate_sentiments(texts: pd.DataFrame) -> pd.DataFrame:\n",
|
|
" texts = texts.apply(translate_sentiment_series, axis=1)\n",
|
|
" return texts\n",
|
|
"\n",
|
|
"\n",
|
|
"translated_df = translate_sentiments(text_df.copy())\n",
|
|
"translated_df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 33,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-01T13:16:24.076261Z",
|
|
"start_time": "2023-05-01T13:16:23.383269Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>lan</th>\n",
|
|
" <th>orig</th>\n",
|
|
" <th>text</th>\n",
|
|
" <th>+</th>\n",
|
|
" <th>0</th>\n",
|
|
" <th>-</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>en</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>Microsoft fails to hit profit expectations</td>\n",
|
|
" <td>0.034084</td>\n",
|
|
" <td>0.932933</td>\n",
|
|
" <td>0.032982</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
|
|
" <td>On the stock market, confidence continued to p...</td>\n",
|
|
" <td>0.919673</td>\n",
|
|
" <td>0.018426</td>\n",
|
|
" <td>0.061901</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>en</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>Stocks rallied and the British pound gained.</td>\n",
|
|
" <td>0.898361</td>\n",
|
|
" <td>0.034474</td>\n",
|
|
" <td>0.067165</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
|
|
" <td>Meyer Burger is now serving the Australian mar...</td>\n",
|
|
" <td>0.221019</td>\n",
|
|
" <td>0.006844</td>\n",
|
|
" <td>0.772137</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>en</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>Meyer Burger enters Australian market and exhi...</td>\n",
|
|
" <td>0.187527</td>\n",
|
|
" <td>0.008846</td>\n",
|
|
" <td>0.803627</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>J&T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
|
|
" <td>J&T Express Vietnam helps local craft villages...</td>\n",
|
|
" <td>0.891114</td>\n",
|
|
" <td>0.007633</td>\n",
|
|
" <td>0.101254</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
|
|
" <td>7 experts recommend the stock for purchase, 1 ...</td>\n",
|
|
" <td>0.040850</td>\n",
|
|
" <td>0.016722</td>\n",
|
|
" <td>0.942427</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>Microsoft aktie fällt.</td>\n",
|
|
" <td>Microsoft Aktie falls.</td>\n",
|
|
" <td>0.027456</td>\n",
|
|
" <td>0.889160</td>\n",
|
|
" <td>0.083384</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>de_translated</td>\n",
|
|
" <td>Microsoft aktie steigt.</td>\n",
|
|
" <td>Microsoft share is rising.</td>\n",
|
|
" <td>0.952216</td>\n",
|
|
" <td>0.019054</td>\n",
|
|
" <td>0.028730</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" lan orig \n",
|
|
"0 en NaN \\\n",
|
|
"1 de_translated Am Aktienmarkt überwieg weiter die Zuversicht,... \n",
|
|
"2 en NaN \n",
|
|
"3 de_translated Meyer Burger bedient ab sofort australischen M... \n",
|
|
"4 en NaN \n",
|
|
"5 de_translated J&T Express Vietnam hilft lokalen Handwerksdör... \n",
|
|
"6 de_translated 7 Experten empfehlen die Aktie zum Kauf, 1 Exp... \n",
|
|
"7 de_translated Microsoft aktie fällt. \n",
|
|
"8 de_translated Microsoft aktie steigt. \n",
|
|
"\n",
|
|
" text + 0 \n",
|
|
"0 Microsoft fails to hit profit expectations 0.034084 0.932933 \\\n",
|
|
"1 On the stock market, confidence continued to p... 0.919673 0.018426 \n",
|
|
"2 Stocks rallied and the British pound gained. 0.898361 0.034474 \n",
|
|
"3 Meyer Burger is now serving the Australian mar... 0.221019 0.006844 \n",
|
|
"4 Meyer Burger enters Australian market and exhi... 0.187527 0.008846 \n",
|
|
"5 J&T Express Vietnam helps local craft villages... 0.891114 0.007633 \n",
|
|
"6 7 experts recommend the stock for purchase, 1 ... 0.040850 0.016722 \n",
|
|
"7 Microsoft Aktie falls. 0.027456 0.889160 \n",
|
|
"8 Microsoft share is rising. 0.952216 0.019054 \n",
|
|
"\n",
|
|
" - \n",
|
|
"0 0.032982 \n",
|
|
"1 0.061901 \n",
|
|
"2 0.067165 \n",
|
|
"3 0.772137 \n",
|
|
"4 0.803627 \n",
|
|
"5 0.101254 \n",
|
|
"6 0.942427 \n",
|
|
"7 0.083384 \n",
|
|
"8 0.028730 "
|
|
]
|
|
},
|
|
"execution_count": 33,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"sentiments = analyse_sentiments(translated_df)\n",
|
|
"sentiments"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Conclusion about a translated FinBert\n",
|
|
"\n",
|
|
"When translating a german text to english before using FinBert the results look much better and could be used for our project.\n",
|
|
"The big problem is that it will take even more CPU.\n",
|
|
"It should probably be combined with a language recognition and could be used to take multiple languages in since there are many variances of this translation model."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"celltoolbar": "Slideshow",
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.0"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|