Added a text in research central.

This commit is contained in:
Philipp Horstenkamp 2023-05-01 12:20:56 +02:00
parent 141070ea16
commit 29347ce7fb
Signed by: Philipp
GPG Key ID: DD53EAC36AFB61B4
2 changed files with 180 additions and 289 deletions

View File

@ -6,19 +6,41 @@
"source": [ "source": [
"# FinBert\n", "# FinBert\n",
"\n", "\n",
"FinBert is a sentiment Analysis AI for Financial text.\n",
"Since we want to evaluate news article this is a necessary feature to analyse those texts.\n",
"In this document a first use of this tool will be shown.\n",
"Some texts will be analysed. Especially the analysis of german texts will be tried.\n",
"\n",
"## Sources\n", "## Sources\n",
"\n", "\n",
"[HugginFace](https://huggingface.co/ProsusAI/finbert)\n", "[HugginFace](https://huggingface.co/ProsusAI/finbert)\n",
"[Tutorial](https://medium.com/codex/stocks-news-sentiment-analysis-with-deep-learning-transformers-and-machine-learning-cdcdb827fc06)" "[Tutorial](https://medium.com/codex/stocks-news-sentiment-analysis-with-deep-learning-transformers-and-machine-learning-cdcdb827fc06)"
] ]
}, },
{
"cell_type": "markdown",
"source": [
"## Libraries\n",
"\n",
"* transformers\n",
"* tqdm\n",
"* pandas\n",
"* numpy\n",
"* torch\n",
"* torchvision\n",
"* torchaudio"
],
"metadata": {
"collapsed": false
}
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 13, "execution_count": 1,
"metadata": { "metadata": {
"ExecuteTime": { "ExecuteTime": {
"end_time": "2023-04-30T21:54:44.056694Z", "start_time": "2023-05-01T12:07:42.348877Z",
"start_time": "2023-04-30T21:53:45.027971Z" "end_time": "2023-05-01T12:07:46.944359Z"
}, },
"collapsed": false, "collapsed": false,
"jupyter": { "jupyter": {
@ -27,12 +49,47 @@
"tags": [] "tags": []
}, },
"outputs": [ "outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: transformers in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (4.28.1)\n",
"Requirement already satisfied: tqdm in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (4.65.0)\n",
"Requirement already satisfied: pandas in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (2.0.1)\n",
"Requirement already satisfied: numpy in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (1.24.3)\n",
"Requirement already satisfied: torch in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (2.0.0)\n",
"Requirement already satisfied: torchvision in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (0.15.1)\n",
"Requirement already satisfied: torchaudio in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (2.0.1)\n",
"Requirement already satisfied: filelock in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from transformers) (3.8.0)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from transformers) (0.14.1)\n",
"Requirement already satisfied: packaging>=20.0 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from transformers) (23.1)\n",
"Requirement already satisfied: pyyaml>=5.1 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from transformers) (6.0)\n",
"Requirement already satisfied: regex!=2019.12.17 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from transformers) (2023.3.23)\n",
"Requirement already satisfied: requests in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from transformers) (2.28.1)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from transformers) (0.13.3)\n",
"Requirement already satisfied: colorama in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from tqdm) (0.4.6)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from pandas) (2.8.2)\n",
"Requirement already satisfied: pytz>=2020.1 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from pandas) (2022.7)\n",
"Requirement already satisfied: tzdata>=2022.1 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from pandas) (2023.3)\n",
"Requirement already satisfied: typing-extensions in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from torch) (4.5.0)\n",
"Requirement already satisfied: sympy in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from torch) (1.11.1)\n",
"Requirement already satisfied: networkx in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from torch) (3.1)\n",
"Requirement already satisfied: jinja2 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from torch) (3.1.2)\n",
"Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from torchvision) (9.4.0)\n",
"Requirement already satisfied: fsspec in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from huggingface-hub<1.0,>=0.11.0->transformers) (2023.4.0)\n",
"Requirement already satisfied: six>=1.5 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from jinja2->torch) (2.1.2)\n",
"Requirement already satisfied: charset-normalizer<3,>=2 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from requests->transformers) (2.1.1)\n",
"Requirement already satisfied: idna<4,>=2.5 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from requests->transformers) (3.4)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from requests->transformers) (1.26.12)\n",
"Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\phhor\\appdata\\roaming\\python\\python311\\site-packages (from requests->transformers) (2022.9.24)\n",
"Requirement already satisfied: mpmath>=0.19 in c:\\users\\phhor\\pycharmprojects\\aki_prj23_transparenzregister\\venv\\lib\\site-packages (from sympy->torch) (1.3.0)\n"
]
},
{ {
"name": "stderr", "name": "stderr",
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"ERROR: To modify pip, please run the following command:\n",
"C:\\Users\\phhor\\PycharmProjects\\aki_prj23_transparenzregister\\venv\\Scripts\\python.exe -m pip install transformers tqdm pandas numpy torch torchvision torchaudio pip -Uq\n",
"\n", "\n",
"[notice] A new release of pip is available: 23.0.1 -> 23.1.2\n", "[notice] A new release of pip is available: 23.0.1 -> 23.1.2\n",
"[notice] To update, run: python.exe -m pip install --upgrade pip\n" "[notice] To update, run: python.exe -m pip install --upgrade pip\n"
@ -40,22 +97,46 @@
} }
], ],
"source": [ "source": [
"!pip install transformers tqdm pandas numpy torch torchvision torchaudio pip -Uq" "!pip install transformers tqdm pandas numpy torch torchvision torchaudio -U"
] ]
}, },
{
"cell_type": "markdown",
"source": [
"### Importing and creation of models and tokenizer"
],
"metadata": {
"collapsed": false
}
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 16, "execution_count": 2,
"metadata": { "metadata": {
"collapsed": false, "collapsed": false,
"jupyter": { "jupyter": {
"outputs_hidden": false "outputs_hidden": false
}, },
"tags": [] "tags": [],
"ExecuteTime": {
"start_time": "2023-05-01T12:07:46.944359Z",
"end_time": "2023-05-01T12:07:51.049695Z"
}
}, },
"outputs": [], "outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\phhor\\PycharmProjects\\aki_prj23_transparenzregister\\venv\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"source": [ "source": [
"import pandas as pd\n", "import pandas as pd\n",
"import torch\n",
"\n",
"from transformers import AutoTokenizer, AutoModelForSequenceClassification\n", "from transformers import AutoTokenizer, AutoModelForSequenceClassification\n",
"\n", "\n",
"# create a tokenizer object\n", "# create a tokenizer object\n",
@ -65,54 +146,26 @@
"model = AutoModelForSequenceClassification.from_pretrained(\"ProsusAI/finbert\")" "model = AutoModelForSequenceClassification.from_pretrained(\"ProsusAI/finbert\")"
] ]
}, },
{
"cell_type": "markdown",
"source": [
"### Analyze a single sentiment"
],
"metadata": {
"collapsed": false
}
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 12, "execution_count": 3,
"metadata": { "metadata": {
"collapsed": false, "collapsed": false,
"jupyter": { "jupyter": {
"outputs_hidden": false "outputs_hidden": false
}, },
"tags": [] "ExecuteTime": {
}, "start_time": "2023-05-01T12:07:51.050690Z",
"outputs": [ "end_time": "2023-05-01T12:07:51.129723Z"
{
"data": {
"text/plain": [
"tensor([[0.0535, 0.0279, 0.9185]], grad_fn=<SoftmaxBackward0>)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# A headline to be used as input\n",
"import torch\n",
"\n",
"headline = \"Microsoft fails to hit profit expectations\"\n",
"headline2 = (\n",
" \"Am Aktienmarkt überwieg weiter die Zuversicht, wie der Kursverlauf des DAX zeigt.\"\n",
")\n",
"\n",
"# Pre-process input phrase\n",
"input_tokens = tokenizer(headline2, padding=True, truncation=True, return_tensors=\"pt\")\n",
"# Run inference on the tokenized phrase\n",
"output = model(**input_tokens)\n",
"\n",
"# Pass model output logits through a softmax layer.\n",
"sentim_scores = torch.nn.functional.softmax(output.logits, dim=-1)\n",
"sentim_scores"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
} }
}, },
"outputs": [ "outputs": [
@ -125,20 +178,15 @@
}, },
{ {
"data": { "data": {
"text/plain": [ "text/plain": "+ 0.034084\n0 0.932933\n- 0.032982\ndtype: float32"
"+ 0.034084\n",
"0 0.932933\n",
"- 0.032982\n",
"dtype: float32"
]
}, },
"execution_count": 56, "execution_count": 3,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
], ],
"source": [ "source": [
"def analyze_sentiment(text: str):\n", "def analyze_sentiment(text: str) -> pd.Series:\n",
" print(text)\n", " print(text)\n",
" input_tokens = tokenizer(text, padding=True, truncation=True, return_tensors=\"pt\")\n", " input_tokens = tokenizer(text, padding=True, truncation=True, return_tensors=\"pt\")\n",
" output = model(**input_tokens)\n", " output = model(**input_tokens)\n",
@ -148,106 +196,37 @@
" )\n", " )\n",
"\n", "\n",
"\n", "\n",
"headline = \"Microsoft fails to hit profit expectations\"\n",
"tf = analyze_sentiment(headline)\n", "tf = analyze_sentiment(headline)\n",
"tf" "tf"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "markdown",
"execution_count": 80, "source": [
"### Creating test data"
],
"metadata": { "metadata": {
"tags": [] "collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"tags": [],
"ExecuteTime": {
"start_time": "2023-05-01T12:07:51.130725Z",
"end_time": "2023-05-01T12:07:51.193306Z"
}
}, },
"outputs": [ "outputs": [
{ {
"data": { "data": {
"text/html": [ "text/plain": " text lan\n0 Microsoft fails to hit profit expectations en\n1 Am Aktienmarkt überwieg weiter die Zuversicht,... de\n2 Stocks rallied and the British pound gained. en\n3 Meyer Burger bedient ab sofort australischen M... de\n4 Meyer Burger enters Australian market and exhi... en\n5 J&T Express Vietnam hilft lokalen Handwerksdör... en\n6 7 Experten empfehlen die Aktie zum Kauf, 1 Exp... de\n7 Microsoft aktie fällt. de\n8 Microsoft aktie steigt. de",
"<div>\n", "text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>text</th>\n <th>lan</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Microsoft fails to hit profit expectations</td>\n <td>en</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n <td>de</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Stocks rallied and the British pound gained.</td>\n <td>en</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Meyer Burger bedient ab sofort australischen M...</td>\n <td>de</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Meyer Burger enters Australian market and exhi...</td>\n <td>en</td>\n </tr>\n <tr>\n <th>5</th>\n <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n <td>en</td>\n </tr>\n <tr>\n <th>6</th>\n <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n <td>de</td>\n </tr>\n <tr>\n <th>7</th>\n <td>Microsoft aktie fällt.</td>\n <td>de</td>\n </tr>\n <tr>\n <th>8</th>\n <td>Microsoft aktie steigt.</td>\n <td>de</td>\n </tr>\n </tbody>\n</table>\n</div>"
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>lan</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Microsoft fails to hit profit expectations</td>\n",
" <td>en</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
" <td>de</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Stocks rallied and the British pound gained.</td>\n",
" <td>en</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
" <td>de</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Meyer Burger enters Australian market and exhi...</td>\n",
" <td>en</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
" <td>en</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
" <td>de</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Microsoft aktie fällt.</td>\n",
" <td>de</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Microsoft aktie steigt.</td>\n",
" <td>de</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" text lan\n",
"0 Microsoft fails to hit profit expectations en\n",
"1 Am Aktienmarkt überwieg weiter die Zuversicht,... de\n",
"2 Stocks rallied and the British pound gained. en\n",
"3 Meyer Burger bedient ab sofort australischen M... de\n",
"4 Meyer Burger enters Australian market and exhi... en\n",
"5 J&T Express Vietnam hilft lokalen Handwerksdör... en\n",
"6 7 Experten empfehlen die Aktie zum Kauf, 1 Exp... de\n",
"7 Microsoft aktie fällt. de\n",
"8 Microsoft aktie steigt. de"
]
}, },
"execution_count": 80, "execution_count": 4,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -284,13 +263,39 @@
"text_df" "text_df"
] ]
}, },
{
"cell_type": "markdown",
"source": [],
"metadata": {
"collapsed": false
}
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 81, "execution_count": 6,
"outputs": [],
"source": [
"### Analyze multiple Sentiments"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"start_time": "2023-05-01T12:10:36.719000Z",
"end_time": "2023-05-01T12:10:36.725700Z"
}
}
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": { "metadata": {
"collapsed": false, "collapsed": false,
"jupyter": { "jupyter": {
"outputs_hidden": false "outputs_hidden": false
},
"ExecuteTime": {
"start_time": "2023-05-01T12:07:51.151294Z",
"end_time": "2023-05-01T12:07:51.992442Z"
} }
}, },
"outputs": [ "outputs": [
@ -311,142 +316,17 @@
}, },
{ {
"data": { "data": {
"text/html": [ "text/plain": " text lan + 0 \n0 Microsoft fails to hit profit expectations en 0.034084 0.932933 \\\n1 Am Aktienmarkt überwieg weiter die Zuversicht,... de 0.053528 0.027950 \n2 Stocks rallied and the British pound gained. en 0.898361 0.034474 \n3 Meyer Burger bedient ab sofort australischen M... de 0.116597 0.012790 \n4 Meyer Burger enters Australian market and exhi... en 0.187527 0.008846 \n5 J&T Express Vietnam hilft lokalen Handwerksdör... en 0.066277 0.020608 \n6 7 Experten empfehlen die Aktie zum Kauf, 1 Exp... de 0.050346 0.022004 \n7 Microsoft aktie fällt. de 0.066061 0.016440 \n8 Microsoft aktie steigt. de 0.041449 0.018471 \n\n - \n0 0.032982 \n1 0.918522 \n2 0.067165 \n3 0.870613 \n4 0.803627 \n5 0.913115 \n6 0.927650 \n7 0.917498 \n8 0.940080 ",
"<div>\n", "text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>text</th>\n <th>lan</th>\n <th>+</th>\n <th>0</th>\n <th>-</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Microsoft fails to hit profit expectations</td>\n <td>en</td>\n <td>0.034084</td>\n <td>0.932933</td>\n <td>0.032982</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n <td>de</td>\n <td>0.053528</td>\n <td>0.027950</td>\n <td>0.918522</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Stocks rallied and the British pound gained.</td>\n <td>en</td>\n <td>0.898361</td>\n <td>0.034474</td>\n <td>0.067165</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Meyer Burger bedient ab sofort australischen M...</td>\n <td>de</td>\n <td>0.116597</td>\n <td>0.012790</td>\n <td>0.870613</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Meyer Burger enters Australian market and exhi...</td>\n <td>en</td>\n <td>0.187527</td>\n <td>0.008846</td>\n <td>0.803627</td>\n </tr>\n <tr>\n <th>5</th>\n <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n <td>en</td>\n <td>0.066277</td>\n <td>0.020608</td>\n <td>0.913115</td>\n </tr>\n <tr>\n <th>6</th>\n <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n <td>de</td>\n <td>0.050346</td>\n <td>0.022004</td>\n <td>0.927650</td>\n </tr>\n <tr>\n <th>7</th>\n <td>Microsoft aktie fällt.</td>\n <td>de</td>\n <td>0.066061</td>\n <td>0.016440</td>\n <td>0.917498</td>\n </tr>\n <tr>\n <th>8</th>\n <td>Microsoft aktie steigt.</td>\n <td>de</td>\n <td>0.041449</td>\n <td>0.018471</td>\n <td>0.940080</td>\n </tr>\n </tbody>\n</table>\n</div>"
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>lan</th>\n",
" <th>+</th>\n",
" <th>0</th>\n",
" <th>-</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Microsoft fails to hit profit expectations</td>\n",
" <td>en</td>\n",
" <td>0.034084</td>\n",
" <td>0.932933</td>\n",
" <td>0.032982</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
" <td>de</td>\n",
" <td>0.053528</td>\n",
" <td>0.027950</td>\n",
" <td>0.918522</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Stocks rallied and the British pound gained.</td>\n",
" <td>en</td>\n",
" <td>0.898361</td>\n",
" <td>0.034474</td>\n",
" <td>0.067165</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
" <td>de</td>\n",
" <td>0.116597</td>\n",
" <td>0.012790</td>\n",
" <td>0.870613</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Meyer Burger enters Australian market and exhi...</td>\n",
" <td>en</td>\n",
" <td>0.187527</td>\n",
" <td>0.008846</td>\n",
" <td>0.803627</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
" <td>en</td>\n",
" <td>0.066277</td>\n",
" <td>0.020608</td>\n",
" <td>0.913115</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
" <td>de</td>\n",
" <td>0.050346</td>\n",
" <td>0.022004</td>\n",
" <td>0.927650</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Microsoft aktie fällt.</td>\n",
" <td>de</td>\n",
" <td>0.066061</td>\n",
" <td>0.016440</td>\n",
" <td>0.917498</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Microsoft aktie steigt.</td>\n",
" <td>de</td>\n",
" <td>0.041449</td>\n",
" <td>0.018471</td>\n",
" <td>0.940080</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" text lan + 0 \n",
"0 Microsoft fails to hit profit expectations en 0.034084 0.932933 \\\n",
"1 Am Aktienmarkt überwieg weiter die Zuversicht,... de 0.053528 0.027950 \n",
"2 Stocks rallied and the British pound gained. en 0.898361 0.034474 \n",
"3 Meyer Burger bedient ab sofort australischen M... de 0.116597 0.012790 \n",
"4 Meyer Burger enters Australian market and exhi... en 0.187527 0.008846 \n",
"5 J&T Express Vietnam hilft lokalen Handwerksdör... en 0.066277 0.020608 \n",
"6 7 Experten empfehlen die Aktie zum Kauf, 1 Exp... de 0.050346 0.022004 \n",
"7 Microsoft aktie fällt. de 0.066061 0.016440 \n",
"8 Microsoft aktie steigt. de 0.041449 0.018471 \n",
"\n",
" - \n",
"0 0.032982 \n",
"1 0.918522 \n",
"2 0.067165 \n",
"3 0.870613 \n",
"4 0.803627 \n",
"5 0.913115 \n",
"6 0.927650 \n",
"7 0.917498 \n",
"8 0.940080 "
]
}, },
"execution_count": 81, "execution_count": 5,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
], ],
"source": [ "source": [
"def analyse_sentiments(texts: pd.Series) -> pd.DataFrame:\n", "def analyse_sentiments(texts: pd.DataFrame) -> pd.DataFrame:\n",
" values = texts[\"text\"].apply(analyze_sentiment)\n", " values = texts[\"text\"].apply(analyze_sentiment)\n",
" # print(values)\n",
" texts[[\"+\", \"0\", \"-\"]] = values\n", " texts[[\"+\", \"0\", \"-\"]] = values\n",
" return texts\n", " return texts\n",
"\n", "\n",
@ -455,18 +335,17 @@
] ]
}, },
{ {
"cell_type": "code", "cell_type": "markdown",
"execution_count": null, "source": [
"metadata": {}, "## Conclusion\n",
"outputs": [], "\n",
"source": [] "The current form of this model can't be used for the german language.\n",
}, "It could be used if the text is translated beforehand. But it is questionable if that will work well.\n",
{ "Another way would be to retrain the same model with translated text from this models' data. But I do not believe this to be feasible."
"cell_type": "code", ],
"execution_count": null, "metadata": {
"metadata": {}, "collapsed": false
"outputs": [], }
"source": []
} }
], ],
"metadata": { "metadata": {

View File

@ -0,0 +1,12 @@
# Research Central
## Sentiment Analysis
### FinBert
FinBert is a specialised sentiment Analysis for Financial Data.
Sadly it isn't a very good model, and it does not work at all for texts in german.
Experiments can be found here:
* [FinBert Jupyter](../../Jupyter/AI-models/"Sentiment Analysis"/FinBert.ipynb)