(chore): Initilised devops tools (#29)

* Added a first action * Repaired a typo * Repaired a typo2 * Repaired a typo2 * Added flake8 action * Repaired a typo in the flake8 action. * Added a first bandit action * Added a first batch * Added a first batch * Added a first batch * Added a first batch * Added a first batch * Added the flake8-prebuild as a need to flake8 * Added the flake8-prebuild as a need to flake8 * Added the flake8-prebuild as a need to flake8 * Added the docker socket to the volume. * Added the flake8-prebuild as a need to flake8 * Removed latest part from container. * Removed latest part from container. * Removed latest part from container. * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 * Reworked flake8 poetry * Reworked flake8 poetry * Changed to 64bit * Some edits to the runner * Added python setup * Added python -m to python docker image. * Added python -m to python docker image. * Added python -m to python docker image. * Added python -m to python docker image. * Added python -m to python docker image. * Added python -m to python docker image. * Added ra run linter * Added ra run linter * Added ra run linter * Added ra run linter * Removed redundant version * Removed redundant version * Added isort * Added isort * Added isort * Added poetry install * Added poetry install * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Uses nodejs and python image * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Added flake8 as lint. * Removed selfhosted runner * Removed self hosted runner * Removed self hosted runner * Removed self hosted runner * Added black and flake8 tests * Removed self hosted runner * Removed self hosted runner * Removed unneded actions * Added a mypy error. * Removed poetry call before boetry setup * Removed poetry call before poetry setup * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added a test to understand the poetry action better * Added the snook poetry builder * Reworked the repo a bit * Removed unneeded poetry installation * Added the isort action * Added isort test * Added ruff * Added full ruff configuration * Added full ruff configuration2 * Added full ruff configuration2 * Removed duplicat configurations * Removed some redundant pre-commit hooks * Removed unneeded actions. * Removed unneeded actions. * Repaired ruff * Added tests. * Removed * Removed * Removed a missing file * Removed a missing file * Removed a missing file * Removed a missing file * Removed a missing file * Added reports as artifacts * Added reports as artifacts * Added reports as artifacts * Removed the unneded poetry test * Added a license checker. * Added a license checker. * Removed some unneeded configuration. * Removed the import reformatted. * Added doc generation. * Added doc generation. * Added license summary. * Add * Add lint * Switched pip-licenses to poetry. * Switched pip-licenses to poetry. * Switched pip-licenses to poetry. * Remove some more packages. * Remove some more packages. * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added a make file * Added version codes to the main package * Changed the format of the md files * Presentation first draft * Version up and added extensions * Version up and added extensions * Version up and added extensions * Removed the venv path from docbuild * Actions version up * Actions version up * Actions version up * Actions version up * Actions version up * Actions version up * Experiements with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * Experiments with sphinx * First draft of the sphinx documentation. * Added the protocol to the time series. * Added the protocol to the time series. * First draft ot a first build pipline * Added mermaid version support * Added documentations pull and branch request requirements. * Added documentations pull and branch request requirements. * Added documentations pull and branch request requirements. * Added documentations pull and branch request requirements. * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Tests should now be passing * Add safety * Add safety * Add safety * Added the action on pull_request_target * Added the action on pull_request_target * Added the action on pull_request_target * Added a pytest coverage report * Added a pytest coverage report * Added a pytest coverage report * Added a pytest coverage report * Added a pytest coverage report * Added a build step * Added a build step * Added a build step * Added a build step * Changed the lint action to work only on python changes. * Changed the lint action to work only on python changes. * Changed the lint action to work only on python changes. * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Added the ability to compile a html report * Coverage * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Finished test and build workflow * Repaired a bug. * Repaired a bug. * Repaired a bug. * Repaired a bug. * Repaired a bug. * Added a github branch.ref * Removed a poetry install * Docbuild now excludes templates * Added the seminarpräsentation to the documentation build * Added the seminarpräsentation to the documentation build * Added the seminarpräsentation to the documentation build * dded a few images * Changed the pre-commit image * Changed the pre-commit image * Presentation done * Never executing jupyter for sphinx * Never executing jupyter for sphinx * Never executing jupyter for sphinx * Never executing jupyter for sphinx * Never executing jupyter for sphinx
2026-02-13 22:17:38 +01:00 · 2023-06-23 18:47:04 +02:00
parent f691cac9e8
commit a9304201af
35 changed files with 5940 additions and 336 deletions
@@ -1,3 +0,0 @@
 [flake8]
 max-line-length = 88
 extend-ignore = E203
@@ -0,0 +1,61 @@
 name: Documentation-Action
 on:
  push:
    branches:
    - main
  pull_request:
    branches:
    - '*'
 jobs:
  doc-build:
    name: Build
    runs-on: ubuntu-latest
    steps:
    - run: sudo apt install pandoc -y
    - uses: actions/checkout@v3
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: 3.11
    - name: Install and configure Poetry
      uses: snok/install-poetry@v1
      with:
        version: 1.4.2
        virtualenvs-create: false
    - run: poetry install --only doc,root,develop
    - name: Doc-Build
      run: |
        cd documentations
        sphinx-apidoc -o . ../src/aki_prj23_transparenzregister -feP
        make html
    - name: Package artifact
      uses: actions/upload-pages-artifact@v1
      with:
        path: documentations/_build/html/
  doc-deploy:
    name: Deployment
    runs-on: ubuntu-latest
    needs: doc-build
    permissions:
      pages: write
      id-token: write
    concurrency:
      group: pages
      cancel-in-progress: false
    if: github.ref == 'refs/heads/main'
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    steps:
    - run: echo "Deployment URL = ${{ steps.deployment.outputs.page_url }}"
    - uses: actions/download-artifact@v3
      with:
        name: github-pages
    - name: Deploy to GitHub Pages
      id: deployment
      uses: actions/deploy-pages@v2
      with:
        artifact_name: github-pages
@@ -0,0 +1,80 @@
 name: Python-Lint
 on:
  push:
    paths:
    - '*.py'
    - poetry.lock
    - pyproject.toml
  pull_request:
 jobs:
  run-linters:
    name: Black & mypy
    runs-on: ubuntu-latest
    steps:
    - name: Set up python
      id: setup-python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'
    - name: Check out Git repository
      uses: actions/checkout@v3
    - name: Install and configure Poetry
      uses: snok/install-poetry@v1
      with:
        version: 1.4.2
        virtualenvs-create: false
        virtualenvs-path: ~/local/share/virtualenvs
    - run: poetry install --without develop,doc,test
    - name: Run linters
      uses: wearerequired/lint-action@v2
      with:
        black: true
        mypy: true
  ruff:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - uses: chartboost/ruff-action@v1
  python-requirements:
    name: Check Python Requirements
    runs-on: ubuntu-latest
    steps:
    - name: Set up python
      id: setup-python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'
    - name: Install and configure Poetry
      uses: snok/install-poetry@v1
      with:
        version: 1.4.2
        virtualenvs-path: ~/local/share/virtualenvs
    - name: Cache pipenv
      id: cache-pipenv
      uses: actions/cache@v3
      with:
        path: ~/.local/share/virtualenvs
        key: venv
    - name: Check out Git repository
      uses: actions/checkout@v3
    - name: Poetry export
      run: poetry export -f requirements.txt --output requirements.txt
    - name: Check license
      run: |
        poetry run pip install pip-licenses
        poetry run pip-licenses --format=markdown --output-file=license-summary.md
    - name: Archive license summary
      uses: actions/upload-artifact@v3
      with:
        name: license-summary
        path: |
          license-summary.md
          requirements.txt
    - name: Check requirements security with pip-audit
      uses: pypa/gh-action-pip-audit@v1.0.0
      with:
        inputs: requirements.txt
@@ -0,0 +1,131 @@
 name: Test & Build
 on:
  pull_request:
  pull_request_target:
  push:
    paths:
    - '*.py'
    - poetry.lock
    - pyproject.toml
 jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
    - name: Check out repository code
      uses: actions/checkout@v3
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: 3.11
    - name: Install and configure Poetry
      uses: snok/install-poetry@v1
      with:
        version: 1.4.2
        virtualenvs-path: ~/local/share/virtualenvs
    - id: cache-pipenv
      uses: actions/cache@v3
      with:
        path: ~/.local/share/virtualenvs
        key: venv
    - run: poetry install --without develop,doc,lint
    - name: Run test suite
      run: |
        poetry run pytest --junit-xml=unit-test-results.xml --cov-report "xml:coverage.xml" --cov=src tests/
    - name: Archive code coverage results
      uses: actions/upload-artifact@v3
      with:
        name: code-coverage-report
        path: |
          coverage.xml
          .coverage
    - name: Archive code coverage results
      uses: actions/upload-artifact@v3
      with:
        name: test-report
        path: |
          unit-test-results.xml
        if-no-files-found: error
  coverage_pull_request:
    if: ${{ github.event_name == 'pull_request' }}
    runs-on: ubuntu-latest
    needs: test
    steps:
    - uses: actions/download-artifact@v3
      with:
        name: code-coverage-report
    - name: Get Cover
      uses: orgoro/coverage@v3.1
      with:
        coverageFile: coverage.xml
        token: ${{ secrets.GITHUB_TOKEN }}
        thresholdAll: 0.8
        thresholdNew: 0.8
        thresholdModified: 0.8
  coverage_report:
    runs-on: ubuntu-latest
    needs: test
    steps:
    - name: Check out repository code
      uses: actions/checkout@v3
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: 3.11
    - id: cache-pipenv
      uses: actions/cache@v3
      with:
        path: ~/.local/share/virtualenvs
        key: venv
    - name: Install and configure Poetry
      uses: snok/install-poetry@v1
      with:
        version: 1.4.2
        virtualenvs-path: ~/local/share/virtualenvs
    - run: |
        poetry install --only test
    - uses: actions/download-artifact@v3
      with:
        name: code-coverage-report
    - name: Make Coverage Report
      run: |
        poetry run coverage html
    - name: Archive builds
      uses: actions/upload-artifact@v3
      with:
        name: Coverage Report HTML
        path: htmlcov/
  build:
    runs-on: ubuntu-latest
    needs: test
    steps:
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: 3.11
    - name: Install and configure Poetry
      uses: snok/install-poetry@v1
      with:
        version: 1.4.2
        virtualenvs-path: ~/local/share/virtualenvs
    - id: cache-pipenv
      uses: actions/cache@v3
      with:
        path: ~/.local/share/virtualenvs
        key: venv
    - name: Check out repository code
      uses: actions/checkout@v3
    - run: |
        poetry install --without develop,doc,lint,test
        poetry build
    - name: Archive builds
      uses: actions/upload-artifact@v3
      with:
        name: builds
        path: dist/
@@ -209,3 +209,7 @@ replay_pid*
 /handelsregister.db
 /handelsregister.png
 /documentations/_build/
 /documentations/aki_prj23_transparenzregister.*
 /documentations/modules.rst
 /unit-test-results.xml
@@ -23,6 +23,13 @@ repos:
  - id: debug-statements
  - id: pretty-format-json
 - repo: https://github.com/astral-sh/ruff-pre-commit
  # Ruff version.
  rev: v0.0.270
  hooks:
  - id: ruff
    args: [--fix, --exit-non-zero-on-fix]
 - repo: https://github.com/psf/black
  rev: 23.3.0
  hooks:
@@ -33,7 +40,7 @@ repos:
 - repo: https://github.com/macisamuele/language-formatters-pre-commit-hooks
-  rev: v2.8.0
+  rev: v2.9.0
  hooks:
  - id: pretty-format-ini
    args: [--autofix]
@@ -44,56 +51,26 @@ repos:
    exclude: (^poetry.lock$)
 - repo: https://github.com/domdfcoding/flake2lint
  rev: v0.4.2
  hooks:
  - id: flake2lint
 - repo: https://github.com/PyCQA/flake8
  rev: 6.0.0
  hooks:
  - id: flake8
    args: [--config=tox.ini]
 - repo: https://github.com/pre-commit/mirrors-mypy
-  rev: v1.2.0
+  rev: v1.3.0
  hooks:
  - id: mypy
    additional_dependencies:
    - pandas==2.*
    - pandas-stubs==2.0.*
    - types-requests
 - repo: https://github.com/frnmst/md-toc
  rev: 8.1.9
  hooks:
  - id: md-toc
- repo: https://gitlab.com/smop/pre-commit-hooks
+- repo: https://github.com/python-poetry/poetry
-  rev: v1.0.0
+  rev: '1.4'
-  hooks: []
+  hooks:
-  # - id: check-poetry
+  - id: poetry-check
 - repo: https://github.com/Lucas-C/pre-commit-hooks-java
  rev: 1.3.10
  hooks: []
  # - id: validate-html
 - repo: https://github.com/asottile/pyupgrade
  rev: v3.3.2
  hooks:
-  - id: pyupgrade
+  - id: validate-html
    args: [--py311-plus]
 - repo: https://github.com/pylint-dev/pylint
  rev: v3.0.0a6
  hooks: []
  # - id: pylint
  #  args: [--disable=import-error]
 - repo: https://github.com/MarcoGorelli/absolufy-imports
  rev: v0.3.1
  hooks:
  - id: absolufy-imports
 - repo: https://github.com/pycqa/isort
  rev: 5.12.0
  hooks:
  - id: isort
    name: isort (python)
@@ -2,7 +2,11 @@
 "cells": [
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# FinBert\n",
    "\n",
@@ -19,6 +23,11 @@
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Libraries\n",
    "\n",
@@ -31,23 +40,22 @@
    "* torchaudio\n",
    "* sentencepiece\n",
    "* sacremoses"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
-     "start_time": "2023-05-01T13:16:08.554998Z",
+     "end_time": "2023-05-01T13:16:13.740927Z",
-     "end_time": "2023-05-01T13:16:13.740927Z"
+     "start_time": "2023-05-01T13:16:08.554998Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    },
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "outputs": [
@@ -108,26 +116,30 @@
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Importing and creation of models and tokenizer"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
-    "collapsed": false,
+    "ExecuteTime": {
     "end_time": "2023-05-01T13:16:15.121662Z",
     "start_time": "2023-05-01T13:16:13.743921Z"
    },
    "jupyter": {
     "outputs_hidden": false
    },
-    "tags": [],
+    "slideshow": {
-    "ExecuteTime": {
+     "slide_type": "subslide"
-     "start_time": "2023-05-01T13:16:13.743921Z",
+    },
-     "end_time": "2023-05-01T13:16:15.121662Z"
+    "tags": []
    }
   },
   "outputs": [],
   "source": [
@@ -145,30 +157,39 @@
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Analyze a single sentiment"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
-    "collapsed": false,
+    "ExecuteTime": {
     "end_time": "2023-05-01T13:16:15.194193Z",
     "start_time": "2023-05-01T13:16:15.122665Z"
    },
    "jupyter": {
     "outputs_hidden": false
    },
-    "ExecuteTime": {
+    "slideshow": {
-     "start_time": "2023-05-01T13:16:15.122665Z",
+     "slide_type": "-"
     "end_time": "2023-05-01T13:16:15.194193Z"
    }
   },
   "outputs": [
    {
     "data": {
-      "text/plain": "+    0.034084\n0    0.932933\n-    0.032982\ndtype: float32"
+      "text/plain": [
       "+    0.034084\n",
       "0    0.932933\n",
       "-    0.032982\n",
       "dtype: float32"
      ]
     },
     "execution_count": 27,
     "metadata": {},
@@ -192,34 +213,29 @@
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Creating test data"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": null,
   "metadata": {
    "tags": [],
    "ExecuteTime": {
-     "start_time": "2023-05-01T13:16:15.198186Z",
+     "end_time": "2023-05-01T13:16:15.208856Z",
-     "end_time": "2023-05-01T13:16:15.208856Z"
+     "start_time": "2023-05-01T13:16:15.198186Z"
    }
    },
-   "outputs": [
+    "slideshow": {
-    {
+     "slide_type": "skip"
     "data": {
      "text/plain": "                                                text lan\n0         Microsoft fails to hit profit expectations  en\n1  Am Aktienmarkt überwieg weiter die Zuversicht,...  de\n2       Stocks rallied and the British pound gained.  en\n3  Meyer Burger bedient ab sofort australischen M...  de\n4  Meyer Burger enters Australian market and exhi...  en\n5  J&T Express Vietnam hilft lokalen Handwerksdör...  de\n6  7 Experten empfehlen die Aktie zum Kauf, 1 Exp...  de\n7                             Microsoft aktie fällt.  de\n8                            Microsoft aktie steigt.  de",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>text</th>\n      <th>lan</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Microsoft fails to hit profit expectations</td>\n      <td>en</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n      <td>de</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Stocks rallied and the British pound gained.</td>\n      <td>en</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Meyer Burger bedient ab sofort australischen M...</td>\n      <td>de</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Meyer Burger enters Australian market and exhi...</td>\n      <td>en</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n      <td>de</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n      <td>de</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>Microsoft aktie fällt.</td>\n      <td>de</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>Microsoft aktie steigt.</td>\n      <td>de</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
    },
-     "execution_count": 28,
+    "tags": []
-     "metadata": {},
+   },
-     "output_type": "execute_result"
+   "outputs": [],
    }
   ],
   "source": [
    "text_df = pd.DataFrame(\n",
    "    [\n",
@@ -248,44 +264,270 @@
    "        {\"text\": \"Microsoft aktie fällt.\", \"lan\": \"de\"},\n",
    "        {\"text\": \"Microsoft aktie steigt.\", \"lan\": \"de\"},\n",
    "    ]\n",
-    ")\n",
+    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-05-01T13:16:15.208856Z",
     "start_time": "2023-05-01T13:16:15.198186Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>text</th>\n",
       "      <th>lan</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Microsoft fails to hit profit expectations</td>\n",
       "      <td>en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
       "      <td>de</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Stocks rallied and the British pound gained.</td>\n",
       "      <td>en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
       "      <td>de</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Meyer Burger enters Australian market and exhi...</td>\n",
       "      <td>en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
       "      <td>de</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
       "      <td>de</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Microsoft aktie fällt.</td>\n",
       "      <td>de</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Microsoft aktie steigt.</td>\n",
       "      <td>de</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                text lan\n",
       "0         Microsoft fails to hit profit expectations  en\n",
       "1  Am Aktienmarkt überwieg weiter die Zuversicht,...  de\n",
       "2       Stocks rallied and the British pound gained.  en\n",
       "3  Meyer Burger bedient ab sofort australischen M...  de\n",
       "4  Meyer Burger enters Australian market and exhi...  en\n",
       "5  J&T Express Vietnam hilft lokalen Handwerksdör...  de\n",
       "6  7 Experten empfehlen die Aktie zum Kauf, 1 Exp...  de\n",
       "7                             Microsoft aktie fällt.  de\n",
       "8                            Microsoft aktie steigt.  de"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text_df"
   ]
  },
  {
   "cell_type": "markdown",
-   "source": [],
+   "metadata": {},
-   "metadata": {
+   "source": []
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Analyze multiple Sentiments"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
-    "collapsed": false,
+    "ExecuteTime": {
     "end_time": "2023-05-01T13:16:16.132009Z",
     "start_time": "2023-05-01T13:16:15.211858Z"
    },
    "jupyter": {
     "outputs_hidden": false
    },
    "ExecuteTime": {
     "start_time": "2023-05-01T13:16:15.211858Z",
     "end_time": "2023-05-01T13:16:16.132009Z"
    }
   },
   "outputs": [
    {
     "data": {
-      "text/plain": "                                                text lan         +         0   \n0         Microsoft fails to hit profit expectations  en  0.034084  0.932933  \\\n1  Am Aktienmarkt überwieg weiter die Zuversicht,...  de  0.053528  0.027950   \n2       Stocks rallied and the British pound gained.  en  0.898361  0.034474   \n3  Meyer Burger bedient ab sofort australischen M...  de  0.116597  0.012790   \n4  Meyer Burger enters Australian market and exhi...  en  0.187527  0.008846   \n5  J&T Express Vietnam hilft lokalen Handwerksdör...  de  0.066277  0.020608   \n6  7 Experten empfehlen die Aktie zum Kauf, 1 Exp...  de  0.050346  0.022004   \n7                             Microsoft aktie fällt.  de  0.066061  0.016440   \n8                            Microsoft aktie steigt.  de  0.041449  0.018471   \n\n          -  \n0  0.032982  \n1  0.918522  \n2  0.067165  \n3  0.870613  \n4  0.803627  \n5  0.913115  \n6  0.927650  \n7  0.917498  \n8  0.940080  ",
+      "text/html": [
-      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>text</th>\n      <th>lan</th>\n      <th>+</th>\n      <th>0</th>\n      <th>-</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Microsoft fails to hit profit expectations</td>\n      <td>en</td>\n      <td>0.034084</td>\n      <td>0.932933</td>\n      <td>0.032982</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n      <td>de</td>\n      <td>0.053528</td>\n      <td>0.027950</td>\n      <td>0.918522</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Stocks rallied and the British pound gained.</td>\n      <td>en</td>\n      <td>0.898361</td>\n      <td>0.034474</td>\n      <td>0.067165</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Meyer Burger bedient ab sofort australischen M...</td>\n      <td>de</td>\n      <td>0.116597</td>\n      <td>0.012790</td>\n      <td>0.870613</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Meyer Burger enters Australian market and exhi...</td>\n      <td>en</td>\n      <td>0.187527</td>\n      <td>0.008846</td>\n      <td>0.803627</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n      <td>de</td>\n      <td>0.066277</td>\n      <td>0.020608</td>\n      <td>0.913115</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n      <td>de</td>\n      <td>0.050346</td>\n      <td>0.022004</td>\n      <td>0.927650</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>Microsoft aktie fällt.</td>\n      <td>de</td>\n      <td>0.066061</td>\n      <td>0.016440</td>\n      <td>0.917498</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>Microsoft aktie steigt.</td>\n      <td>de</td>\n      <td>0.041449</td>\n      <td>0.018471</td>\n      <td>0.940080</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>text</th>\n",
       "      <th>lan</th>\n",
       "      <th>+</th>\n",
       "      <th>0</th>\n",
       "      <th>-</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Microsoft fails to hit profit expectations</td>\n",
       "      <td>en</td>\n",
       "      <td>0.034084</td>\n",
       "      <td>0.932933</td>\n",
       "      <td>0.032982</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
       "      <td>de</td>\n",
       "      <td>0.053528</td>\n",
       "      <td>0.027950</td>\n",
       "      <td>0.918522</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Stocks rallied and the British pound gained.</td>\n",
       "      <td>en</td>\n",
       "      <td>0.898361</td>\n",
       "      <td>0.034474</td>\n",
       "      <td>0.067165</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
       "      <td>de</td>\n",
       "      <td>0.116597</td>\n",
       "      <td>0.012790</td>\n",
       "      <td>0.870613</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Meyer Burger enters Australian market and exhi...</td>\n",
       "      <td>en</td>\n",
       "      <td>0.187527</td>\n",
       "      <td>0.008846</td>\n",
       "      <td>0.803627</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
       "      <td>de</td>\n",
       "      <td>0.066277</td>\n",
       "      <td>0.020608</td>\n",
       "      <td>0.913115</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
       "      <td>de</td>\n",
       "      <td>0.050346</td>\n",
       "      <td>0.022004</td>\n",
       "      <td>0.927650</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Microsoft aktie fällt.</td>\n",
       "      <td>de</td>\n",
       "      <td>0.066061</td>\n",
       "      <td>0.016440</td>\n",
       "      <td>0.917498</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Microsoft aktie steigt.</td>\n",
       "      <td>de</td>\n",
       "      <td>0.041449</td>\n",
       "      <td>0.018471</td>\n",
       "      <td>0.940080</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                text lan         +         0   \n",
       "0         Microsoft fails to hit profit expectations  en  0.034084  0.932933  \\\n",
       "1  Am Aktienmarkt überwieg weiter die Zuversicht,...  de  0.053528  0.027950   \n",
       "2       Stocks rallied and the British pound gained.  en  0.898361  0.034474   \n",
       "3  Meyer Burger bedient ab sofort australischen M...  de  0.116597  0.012790   \n",
       "4  Meyer Burger enters Australian market and exhi...  en  0.187527  0.008846   \n",
       "5  J&T Express Vietnam hilft lokalen Handwerksdör...  de  0.066277  0.020608   \n",
       "6  7 Experten empfehlen die Aktie zum Kauf, 1 Exp...  de  0.050346  0.022004   \n",
       "7                             Microsoft aktie fällt.  de  0.066061  0.016440   \n",
       "8                            Microsoft aktie steigt.  de  0.041449  0.018471   \n",
       "\n",
       "          -  \n",
       "0  0.032982  \n",
       "1  0.918522  \n",
       "2  0.067165  \n",
       "3  0.870613  \n",
       "4  0.803627  \n",
       "5  0.913115  \n",
       "6  0.927650  \n",
       "7  0.917498  \n",
       "8  0.940080  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
@@ -304,19 +546,18 @@
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion about FinBert\n",
    "\n",
    "The current form of this model can't be used for the german language.\n",
    "It could be used if the text is translated beforehand. But it is questionable if that will work well.\n",
    "Another way would be to retrain the same model with translated text from this models' data. But I do not believe this to be feasible."
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Translating sentiments before analysing them with FinBert\n",
    "\n",
@@ -326,14 +567,17 @@
    "[Translator: Helsinki-NLP/opus-mt-de-en](https://huggingface.co/Helsinki-NLP/opus-mt-de-en)\n",
    "https://huggingface.co/docs/transformers/main/en/model_doc/marian#transformers.MarianMTModel\n",
    "\n"
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-05-01T13:16:19.308043Z",
     "start_time": "2023-05-01T13:16:16.135009Z"
    }
   },
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n",
@@ -341,18 +585,17 @@
    "translation_tokenizer = AutoTokenizer.from_pretrained(\"Helsinki-NLP/opus-mt-de-en\")\n",
    "\n",
    "translation_model = AutoModelForSeq2SeqLM.from_pretrained(\"Helsinki-NLP/opus-mt-de-en\")"
-   ],
+   ]
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "start_time": "2023-05-01T13:16:16.135009Z",
     "end_time": "2023-05-01T13:16:19.308043Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-05-01T13:16:19.928232Z",
     "start_time": "2023-05-01T13:16:19.310046Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
@@ -364,7 +607,9 @@
    },
    {
     "data": {
-      "text/plain": "'J&T Express Vietnam helps local craft villages increase their reach.'"
+      "text/plain": [
       "'J&T Express Vietnam helps local craft villages increase their reach.'"
      ]
     },
     "execution_count": 31,
     "metadata": {},
@@ -385,18 +630,17 @@
    ")\n",
    "tf = translate_sentiment(headline)\n",
    "tf"
-   ],
+   ]
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "start_time": "2023-05-01T13:16:19.310046Z",
     "end_time": "2023-05-01T13:16:19.928232Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-05-01T13:16:23.381261Z",
     "start_time": "2023-05-01T13:16:19.933234Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
@@ -412,8 +656,112 @@
    },
    {
     "data": {
-      "text/plain": "             lan                                               orig   \n0             en                                                NaN  \\\n1  de_translated  Am Aktienmarkt überwieg weiter die Zuversicht,...   \n2             en                                                NaN   \n3  de_translated  Meyer Burger bedient ab sofort australischen M...   \n4             en                                                NaN   \n5  de_translated  J&T Express Vietnam hilft lokalen Handwerksdör...   \n6  de_translated  7 Experten empfehlen die Aktie zum Kauf, 1 Exp...   \n7  de_translated                             Microsoft aktie fällt.   \n8  de_translated                            Microsoft aktie steigt.   \n\n                                                text  \n0         Microsoft fails to hit profit expectations  \n1  On the stock market, confidence continued to p...  \n2       Stocks rallied and the British pound gained.  \n3  Meyer Burger is now serving the Australian mar...  \n4  Meyer Burger enters Australian market and exhi...  \n5  J&T Express Vietnam helps local craft villages...  \n6  7 experts recommend the stock for purchase, 1 ...  \n7                             Microsoft Aktie falls.  \n8                         Microsoft share is rising.  ",
+      "text/html": [
-      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>lan</th>\n      <th>orig</th>\n      <th>text</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>en</td>\n      <td>NaN</td>\n      <td>Microsoft fails to hit profit expectations</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>de_translated</td>\n      <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n      <td>On the stock market, confidence continued to p...</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>en</td>\n      <td>NaN</td>\n      <td>Stocks rallied and the British pound gained.</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>de_translated</td>\n      <td>Meyer Burger bedient ab sofort australischen M...</td>\n      <td>Meyer Burger is now serving the Australian mar...</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>en</td>\n      <td>NaN</td>\n      <td>Meyer Burger enters Australian market and exhi...</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>de_translated</td>\n      <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n      <td>J&amp;T Express Vietnam helps local craft villages...</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>de_translated</td>\n      <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n      <td>7 experts recommend the stock for purchase, 1 ...</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>de_translated</td>\n      <td>Microsoft aktie fällt.</td>\n      <td>Microsoft Aktie falls.</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>de_translated</td>\n      <td>Microsoft aktie steigt.</td>\n      <td>Microsoft share is rising.</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>lan</th>\n",
       "      <th>orig</th>\n",
       "      <th>text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Microsoft fails to hit profit expectations</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
       "      <td>On the stock market, confidence continued to p...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Stocks rallied and the British pound gained.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
       "      <td>Meyer Burger is now serving the Australian mar...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Meyer Burger enters Australian market and exhi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
       "      <td>J&amp;T Express Vietnam helps local craft villages...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
       "      <td>7 experts recommend the stock for purchase, 1 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>Microsoft aktie fällt.</td>\n",
       "      <td>Microsoft Aktie falls.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>Microsoft aktie steigt.</td>\n",
       "      <td>Microsoft share is rising.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             lan                                               orig   \n",
       "0             en                                                NaN  \\\n",
       "1  de_translated  Am Aktienmarkt überwieg weiter die Zuversicht,...   \n",
       "2             en                                                NaN   \n",
       "3  de_translated  Meyer Burger bedient ab sofort australischen M...   \n",
       "4             en                                                NaN   \n",
       "5  de_translated  J&T Express Vietnam hilft lokalen Handwerksdör...   \n",
       "6  de_translated  7 Experten empfehlen die Aktie zum Kauf, 1 Exp...   \n",
       "7  de_translated                             Microsoft aktie fällt.   \n",
       "8  de_translated                            Microsoft aktie steigt.   \n",
       "\n",
       "                                                text  \n",
       "0         Microsoft fails to hit profit expectations  \n",
       "1  On the stock market, confidence continued to p...  \n",
       "2       Stocks rallied and the British pound gained.  \n",
       "3  Meyer Burger is now serving the Australian mar...  \n",
       "4  Meyer Burger enters Australian market and exhi...  \n",
       "5  J&T Express Vietnam helps local craft villages...  \n",
       "6  7 experts recommend the stock for purchase, 1 ...  \n",
       "7                             Microsoft Aktie falls.  \n",
       "8                         Microsoft share is rising.  "
      ]
     },
     "execution_count": 32,
     "metadata": {},
@@ -443,23 +791,167 @@
    "\n",
    "translated_df = translate_sentiments(text_df.copy())\n",
    "translated_df"
-   ],
+   ]
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "start_time": "2023-05-01T13:16:19.933234Z",
     "end_time": "2023-05-01T13:16:23.381261Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-05-01T13:16:24.076261Z",
     "start_time": "2023-05-01T13:16:23.383269Z"
    }
   },
   "outputs": [
    {
     "data": {
-      "text/plain": "             lan                                               orig   \n0             en                                                NaN  \\\n1  de_translated  Am Aktienmarkt überwieg weiter die Zuversicht,...   \n2             en                                                NaN   \n3  de_translated  Meyer Burger bedient ab sofort australischen M...   \n4             en                                                NaN   \n5  de_translated  J&T Express Vietnam hilft lokalen Handwerksdör...   \n6  de_translated  7 Experten empfehlen die Aktie zum Kauf, 1 Exp...   \n7  de_translated                             Microsoft aktie fällt.   \n8  de_translated                            Microsoft aktie steigt.   \n\n                                                text         +         0   \n0         Microsoft fails to hit profit expectations  0.034084  0.932933  \\\n1  On the stock market, confidence continued to p...  0.919673  0.018426   \n2       Stocks rallied and the British pound gained.  0.898361  0.034474   \n3  Meyer Burger is now serving the Australian mar...  0.221019  0.006844   \n4  Meyer Burger enters Australian market and exhi...  0.187527  0.008846   \n5  J&T Express Vietnam helps local craft villages...  0.891114  0.007633   \n6  7 experts recommend the stock for purchase, 1 ...  0.040850  0.016722   \n7                             Microsoft Aktie falls.  0.027456  0.889160   \n8                         Microsoft share is rising.  0.952216  0.019054   \n\n          -  \n0  0.032982  \n1  0.061901  \n2  0.067165  \n3  0.772137  \n4  0.803627  \n5  0.101254  \n6  0.942427  \n7  0.083384  \n8  0.028730  ",
+      "text/html": [
-      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>lan</th>\n      <th>orig</th>\n      <th>text</th>\n      <th>+</th>\n      <th>0</th>\n      <th>-</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>en</td>\n      <td>NaN</td>\n      <td>Microsoft fails to hit profit expectations</td>\n      <td>0.034084</td>\n      <td>0.932933</td>\n      <td>0.032982</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>de_translated</td>\n      <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n      <td>On the stock market, confidence continued to p...</td>\n      <td>0.919673</td>\n      <td>0.018426</td>\n      <td>0.061901</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>en</td>\n      <td>NaN</td>\n      <td>Stocks rallied and the British pound gained.</td>\n      <td>0.898361</td>\n      <td>0.034474</td>\n      <td>0.067165</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>de_translated</td>\n      <td>Meyer Burger bedient ab sofort australischen M...</td>\n      <td>Meyer Burger is now serving the Australian mar...</td>\n      <td>0.221019</td>\n      <td>0.006844</td>\n      <td>0.772137</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>en</td>\n      <td>NaN</td>\n      <td>Meyer Burger enters Australian market and exhi...</td>\n      <td>0.187527</td>\n      <td>0.008846</td>\n      <td>0.803627</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>de_translated</td>\n      <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n      <td>J&amp;T Express Vietnam helps local craft villages...</td>\n      <td>0.891114</td>\n      <td>0.007633</td>\n      <td>0.101254</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>de_translated</td>\n      <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n      <td>7 experts recommend the stock for purchase, 1 ...</td>\n      <td>0.040850</td>\n      <td>0.016722</td>\n      <td>0.942427</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>de_translated</td>\n      <td>Microsoft aktie fällt.</td>\n      <td>Microsoft Aktie falls.</td>\n      <td>0.027456</td>\n      <td>0.889160</td>\n      <td>0.083384</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>de_translated</td>\n      <td>Microsoft aktie steigt.</td>\n      <td>Microsoft share is rising.</td>\n      <td>0.952216</td>\n      <td>0.019054</td>\n      <td>0.028730</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>lan</th>\n",
       "      <th>orig</th>\n",
       "      <th>text</th>\n",
       "      <th>+</th>\n",
       "      <th>0</th>\n",
       "      <th>-</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Microsoft fails to hit profit expectations</td>\n",
       "      <td>0.034084</td>\n",
       "      <td>0.932933</td>\n",
       "      <td>0.032982</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>Am Aktienmarkt überwieg weiter die Zuversicht,...</td>\n",
       "      <td>On the stock market, confidence continued to p...</td>\n",
       "      <td>0.919673</td>\n",
       "      <td>0.018426</td>\n",
       "      <td>0.061901</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Stocks rallied and the British pound gained.</td>\n",
       "      <td>0.898361</td>\n",
       "      <td>0.034474</td>\n",
       "      <td>0.067165</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>Meyer Burger bedient ab sofort australischen M...</td>\n",
       "      <td>Meyer Burger is now serving the Australian mar...</td>\n",
       "      <td>0.221019</td>\n",
       "      <td>0.006844</td>\n",
       "      <td>0.772137</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Meyer Burger enters Australian market and exhi...</td>\n",
       "      <td>0.187527</td>\n",
       "      <td>0.008846</td>\n",
       "      <td>0.803627</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>J&amp;T Express Vietnam hilft lokalen Handwerksdör...</td>\n",
       "      <td>J&amp;T Express Vietnam helps local craft villages...</td>\n",
       "      <td>0.891114</td>\n",
       "      <td>0.007633</td>\n",
       "      <td>0.101254</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>7 Experten empfehlen die Aktie zum Kauf, 1 Exp...</td>\n",
       "      <td>7 experts recommend the stock for purchase, 1 ...</td>\n",
       "      <td>0.040850</td>\n",
       "      <td>0.016722</td>\n",
       "      <td>0.942427</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>Microsoft aktie fällt.</td>\n",
       "      <td>Microsoft Aktie falls.</td>\n",
       "      <td>0.027456</td>\n",
       "      <td>0.889160</td>\n",
       "      <td>0.083384</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>de_translated</td>\n",
       "      <td>Microsoft aktie steigt.</td>\n",
       "      <td>Microsoft share is rising.</td>\n",
       "      <td>0.952216</td>\n",
       "      <td>0.019054</td>\n",
       "      <td>0.028730</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             lan                                               orig   \n",
       "0             en                                                NaN  \\\n",
       "1  de_translated  Am Aktienmarkt überwieg weiter die Zuversicht,...   \n",
       "2             en                                                NaN   \n",
       "3  de_translated  Meyer Burger bedient ab sofort australischen M...   \n",
       "4             en                                                NaN   \n",
       "5  de_translated  J&T Express Vietnam hilft lokalen Handwerksdör...   \n",
       "6  de_translated  7 Experten empfehlen die Aktie zum Kauf, 1 Exp...   \n",
       "7  de_translated                             Microsoft aktie fällt.   \n",
       "8  de_translated                            Microsoft aktie steigt.   \n",
       "\n",
       "                                                text         +         0   \n",
       "0         Microsoft fails to hit profit expectations  0.034084  0.932933  \\\n",
       "1  On the stock market, confidence continued to p...  0.919673  0.018426   \n",
       "2       Stocks rallied and the British pound gained.  0.898361  0.034474   \n",
       "3  Meyer Burger is now serving the Australian mar...  0.221019  0.006844   \n",
       "4  Meyer Burger enters Australian market and exhi...  0.187527  0.008846   \n",
       "5  J&T Express Vietnam helps local craft villages...  0.891114  0.007633   \n",
       "6  7 experts recommend the stock for purchase, 1 ...  0.040850  0.016722   \n",
       "7                             Microsoft Aktie falls.  0.027456  0.889160   \n",
       "8                         Microsoft share is rising.  0.952216  0.019054   \n",
       "\n",
       "          -  \n",
       "0  0.032982  \n",
       "1  0.061901  \n",
       "2  0.067165  \n",
       "3  0.772137  \n",
       "4  0.803627  \n",
       "5  0.101254  \n",
       "6  0.942427  \n",
       "7  0.083384  \n",
       "8  0.028730  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
@@ -469,30 +961,22 @@
   "source": [
    "sentiments = analyse_sentiments(translated_df)\n",
    "sentiments"
-   ],
+   ]
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "start_time": "2023-05-01T13:16:23.383269Z",
     "end_time": "2023-05-01T13:16:24.076261Z"
    }
   }
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion about a translated FinBert\n",
    "\n",
    "When translating a german text to english before using FinBert the results look much better and could be used for our project.\n",
    "The big problem is that it will take even more CPU.\n",
    "It should probably be combined with a language recognition and could be used to take multiple languages in since there are many variances of this translation model."
-   ],
+   ]
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
@@ -0,0 +1,236 @@
 {
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2023-06-03T01:36:32.345509400Z",
     "start_time": "2023-06-03T01:36:32.332130700Z"
    }
   },
   "outputs": [],
   "source": [
    "from typing import Final\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [
    {
     "data": {
      "text/plain": "     Company 1  Connection Weight  Company 2\n0           21                 83         58\n1           37                 88         86\n2           40                  6         83\n3           60                 35          2\n4           11                 22         10\n..         ...                ...        ...\n695         62                 37         11\n696         10                 24         27\n697         97                 40         55\n698         14                 87         66\n699         50                 55         82\n\n[693 rows x 3 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Company 1</th>\n      <th>Connection Weight</th>\n      <th>Company 2</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>21</td>\n      <td>83</td>\n      <td>58</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>37</td>\n      <td>88</td>\n      <td>86</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>40</td>\n      <td>6</td>\n      <td>83</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>60</td>\n      <td>35</td>\n      <td>2</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>11</td>\n      <td>22</td>\n      <td>10</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>695</th>\n      <td>62</td>\n      <td>37</td>\n      <td>11</td>\n    </tr>\n    <tr>\n      <th>696</th>\n      <td>10</td>\n      <td>24</td>\n      <td>27</td>\n    </tr>\n    <tr>\n      <th>697</th>\n      <td>97</td>\n      <td>40</td>\n      <td>55</td>\n    </tr>\n    <tr>\n      <th>698</th>\n      <td>14</td>\n      <td>87</td>\n      <td>66</td>\n    </tr>\n    <tr>\n      <th>699</th>\n      <td>50</td>\n      <td>55</td>\n      <td>82</td>\n    </tr>\n  </tbody>\n</table>\n<p>693 rows × 3 columns</p>\n</div>"
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from typing import Final\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "number_of_entries = 100\n",
    "number_of_contacts = 10\n",
    "ids: Final = [_ for _ in range(number_of_entries)]\n",
    "companies = pd.DataFrame(columns=[], index=pd.Index(ids, name=\"company_id\"))\n",
    "companies\n",
    "\n",
    "\n",
    "id1 = (\n",
    "    pd.Series(ids * number_of_contacts, name=\"Company 1\")\n",
    "    .sample(frac=0.7, random_state=42)\n",
    "    .reset_index(drop=True)\n",
    ")\n",
    "id2 = (\n",
    "    pd.Series(ids * number_of_contacts, name=\"Company 2\")\n",
    "    .sample(frac=0.7, random_state=43)\n",
    "    .reset_index(drop=True)\n",
    ")\n",
    "connections = (\n",
    "    pd.DataFrame(\n",
    "        [\n",
    "            id1,\n",
    "            pd.Series(\n",
    "                np.random.randint(0, 100, size=(max(len(id1), len(id2)))),\n",
    "                name=\"Connection Weight\",\n",
    "            ),\n",
    "            id2,\n",
    "        ]\n",
    "    )\n",
    "    .T.dropna()\n",
    "    .astype(int)\n",
    ")\n",
    "connections = connections.loc[(connections[\"Company 1\"] != connections[\"Company 2\"])]\n",
    "connections"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-06-03T10:15:42.647508100Z",
     "start_time": "2023-06-03T10:15:40.656713900Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "outputs": [
    {
     "data": {
      "text/plain": "     Company 1  Connection Weight  Company 2\n0           21                 36         58\n1           37                 59         86\n2           40                 26         83\n3           60                 21          2\n4           11                  2         10\n..         ...                ...        ...\n695         62                 45         11\n696         10                 64         27\n697         97                 24         55\n698         14                 51         66\n699         50                 93         82\n\n[693 rows x 3 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Company 1</th>\n      <th>Connection Weight</th>\n      <th>Company 2</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>21</td>\n      <td>36</td>\n      <td>58</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>37</td>\n      <td>59</td>\n      <td>86</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>40</td>\n      <td>26</td>\n      <td>83</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>60</td>\n      <td>21</td>\n      <td>2</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>11</td>\n      <td>2</td>\n      <td>10</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>695</th>\n      <td>62</td>\n      <td>45</td>\n      <td>11</td>\n    </tr>\n    <tr>\n      <th>696</th>\n      <td>10</td>\n      <td>64</td>\n      <td>27</td>\n    </tr>\n    <tr>\n      <th>697</th>\n      <td>97</td>\n      <td>24</td>\n      <td>55</td>\n    </tr>\n    <tr>\n      <th>698</th>\n      <td>14</td>\n      <td>51</td>\n      <td>66</td>\n    </tr>\n    <tr>\n      <th>699</th>\n      <td>50</td>\n      <td>93</td>\n      <td>82</td>\n    </tr>\n  </tbody>\n</table>\n<p>693 rows × 3 columns</p>\n</div>"
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "id1 = (\n",
    "    pd.Series(ids * number_of_contacts, name=\"Company 1\")\n",
    "    .sample(frac=0.7, random_state=42)\n",
    "    .reset_index(drop=True)\n",
    ")\n",
    "id2 = (\n",
    "    pd.Series(ids * number_of_contacts, name=\"Company 2\")\n",
    "    .sample(frac=0.7, random_state=43)\n",
    "    .reset_index(drop=True)\n",
    ")\n",
    "connections = (\n",
    "    pd.DataFrame(\n",
    "        [\n",
    "            id1,\n",
    "            pd.Series(\n",
    "                np.random.randint(0, 100, size=(max(len(id1), len(id2)))),\n",
    "                name=\"Connection Weight\",\n",
    "            ),\n",
    "            id2,\n",
    "        ]\n",
    "    )\n",
    "    .T.dropna()\n",
    "    .astype(int)\n",
    ")\n",
    "connections = connections.loc[(connections[\"Company 1\"] != connections[\"Company 2\"])]\n",
    "connections"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-06-03T01:40:08.441882700Z",
     "start_time": "2023-06-03T01:40:08.406876900Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "outputs": [
    {
     "data": {
      "text/plain": "           Company 2\nCompany 1           \n0                  6\n1                  6\n2                  5\n3                  9\n4                  7\n...              ...\n95                 7\n96                 8\n97                 7\n98                 6\n99                 8\n\n[100 rows x 1 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Company 2</th>\n    </tr>\n    <tr>\n      <th>Company 1</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>6</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>6</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>5</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>9</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>95</th>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>96</th>\n      <td>8</td>\n    </tr>\n    <tr>\n      <th>97</th>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>98</th>\n      <td>6</td>\n    </tr>\n    <tr>\n      <th>99</th>\n      <td>8</td>\n    </tr>\n  </tbody>\n</table>\n<p>100 rows × 1 columns</p>\n</div>"
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "connections[[\"Company 1\", \"Company 2\"]].groupby(\"Company 1\").count()"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-06-03T01:44:23.433333600Z",
     "start_time": "2023-06-03T01:44:23.424841700Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "outputs": [
    {
     "data": {
      "text/plain": "            Analysis-d0  Analysis-d1\ncompany_id                          \n0                     1            6\n1                     1            6\n2                     1            5\n3                     1            9\n4                     1            7\n...                 ...          ...\n95                    1            7\n96                    1            8\n97                    1            7\n98                    1            6\n99                    1            8\n\n[100 rows x 2 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Analysis-d0</th>\n      <th>Analysis-d1</th>\n    </tr>\n    <tr>\n      <th>company_id</th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>6</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n      <td>6</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1</td>\n      <td>5</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1</td>\n      <td>9</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>95</th>\n      <td>1</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>96</th>\n      <td>1</td>\n      <td>8</td>\n    </tr>\n    <tr>\n      <th>97</th>\n      <td>1</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>98</th>\n      <td>1</td>\n      <td>6</td>\n    </tr>\n    <tr>\n      <th>99</th>\n      <td>1</td>\n      <td>8</td>\n    </tr>\n  </tbody>\n</table>\n<p>100 rows × 2 columns</p>\n</div>"
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "companies[\"Analysis-d0\"] = 1\n",
    "companies[\"Analysis-d1\"] = connections[[\"Company 1\", \"Company 2\"]].groupby(\"Company 1\").count()\n",
    "connection_sum = connections.join(connections.set_index(\"Company 2\"), on=)\n",
    "companies[\"Analysis-d1\"] = connections[[\"Company 1\", \"Company 2\"]].groupby(\"Company 1\").count()\n",
    "# for tiers in range(5):\n",
    "companies"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-06-03T01:43:25.341850700Z",
     "start_time": "2023-06-03T01:43:25.318015500Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "companies"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "start_time": "2023-06-03T01:36:32.382091200Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "start_time": "2023-06-03T01:36:32.385093700Z"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }
@@ -1,5 +1,13 @@
 # aki_prj23_transparenzregister
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 [![Actions status](https://github.com/astral-sh/ruff/workflows/CI/badge.svg)](https://github.com/astral-sh/ruff/actions)
 [![Pytest](https://github.com/fhswf/aki_prj23_transparenzregister/actions/workflows/test-action.yaml/badge.svg?branch=main)](https://github.com/fhswf/aki_prj23_transparenzregister/actions/workflows/test-action.yaml)
 [![Python-Lint-Push-Action](https://github.com/fhswf/aki_prj23_transparenzregister/actions/workflows/lint-actions.yaml/badge.svg)](https://github.com/fhswf/aki_prj23_transparenzregister/actions/workflows/lint-actions.yaml)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 ## Contributions
 See the [CONTRIBUTING.md](CONTRIBUTING.md) about how code should be formatted and what kind of rules we set ourselves.
 [![bandit](https://github.com/fhswf/aki_prj23_transparenzregister/actions/workflows/bandit-action.yaml/badge.svg)](https://github.com/fhswf/aki_prj23_transparenzregister/actions/workflows/bandit-action.yaml)
@@ -0,0 +1,20 @@
 # Minimal makefile for Sphinx documentation
 #
 # You can set these variables from the command line, and also
 # from the environment for the first two.
 SPHINXOPTS    ?=
 SPHINXBUILD   ?= sphinx-build
 SOURCEDIR     = .
 BUILDDIR      = _build
 # Put it first so that "make" without argument is like "make help".
 help:
 	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
 .PHONY: help Makefile
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
 	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@@ -2,8 +2,8 @@
 Version 0.1 Erstellt am 07.04.2023
-|Autoren | Matrikelnummer |
+| Autoren            | Matrikelnummer |
-|----------|---------| 
+|--------------------|----------------|
 | Kim Mesewinkel     | 000            |
 | Tristan Nolde      | 000            |
 | Sebastian Zelenie  | 000            |
@@ -16,12 +16,12 @@ Version 0.1             Erstellt am  07.04.2023
 ## Historie der Dokumentenversion <a name="historie"></a>
-|Version | Datum | Autor | Änderungsgrund / Bemerkung |
+| Version   | Datum      | Autor         | Änderungsgrund / Bemerkung             |
-|----------|---------| ---------| ---------| 
+|-----------|------------|---------------|----------------------------------------|
-| 0.1 |  07.04.2023 | Tim Ronneburg | Intialaufsetzen des Pflichtenhefts |
+| 0.1       | 07.04.2023 | Tim Ronneburg | Initiales aufsetzen des Pflichtenhefts |
-| 0.2 | 000 |
+| 0.2       | 000        |               |                                        |
-| ... | 000 |
+| ...       | 000        |               |                                        |
-| 1.0 | 000 |
+| 1.0       | 000        |               |                                        |
 ## Inhaltsverzeichnis <a name="inhaltsverzeichnis"></a>
 [Historie der Dokumentenversion](#historie)
@@ -78,7 +78,7 @@ Test
-## Funktionale Anforderungenn <a name="f_anforderung"></a>
+## Funktionale Anforderungen <a name="f_anforderung"></a>
 ### **Muss Ziele**
@@ -0,0 +1,88 @@
 """Python sphinx documentation build configuration."""
 # Configuration file for the Sphinx documentation builder.
 #
 # For the full list of built-in configuration values, see the documentation:
 # https://www.sphinx-doc.org/en/master/usage/configuration.html
 import os
 import sys
 from importlib.metadata import metadata
 from typing import Final
 # -- Project information -----------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
 _DISTRIBUTION_METADATA = metadata("aki-prj23-transparenzregister")
 __author__: Final[str] = _DISTRIBUTION_METADATA["Author"]
 __email__: Final[str] = _DISTRIBUTION_METADATA["Author-email"]
 __version__: Final[str] = _DISTRIBUTION_METADATA["Version"]
 project: Final[str] = "transparenzregister"
 copyright: Final[str] = "2023, AKI PRJ23"  # noqa: A001
 author: Final[str] = __author__
 version: Final[str] = __version__
 release: Final[str] = __version__
 sys.path.insert(0, os.path.abspath("../src"))  # Add the path to your Python package
 sys.path.insert(0, os.path.abspath("../src/aki_prj23_transparenzregister"))
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
 extensions: Final[list[str]] = [
    "sphinx.ext.autodoc",
    "nbsphinx",
    "myst_parser",
    "sphinx.ext.napoleon",
    "sphinx_copybutton",
    "sphinx_autodoc_typehints",
    "sphinx.ext.intersphinx",
    "sphinx.ext.autosectionlabel",
    "sphinx.ext.viewcode",
    "IPython.sphinxext.ipython_console_highlighting",
    "sphinxcontrib.mermaid",
 ]
 # templates_path : Final[list[str]] = ["_templates"]
 exclude_patterns: Final[list[str]] = ["_build", "Thumbs.db", ".DS_Store", "templates"]
 root_doc: Final[str] = "index"
 # master_doc = "index"
 autodoc_default_flags: Final[list[str]] = [
    "members",
    "inherited-members",
    "show-inheritance",
 ]
 autodoc_class_signature: Final[str] = "separated"
 autodoc_default_options: Final[dict[str, bool]] = {
    _: True for _ in autodoc_default_flags
 }
 autodoc_typehints: Final[str] = "signature"
 simplify_optional_unions: Final[bool] = True
 typehint_defaults: Final[str] = "comma"
 source_suffix: Final[list[str]] = [".rst", ".md"]
 mermaid_output_format: Final[str] = "raw"
 # -- Options for HTML output -------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
 html_theme: Final[str] = "sphinx_rtd_theme"
 html_static_path: Final[list[str]] = ["_static"]
 napoleon_google_docstring: Final[bool] = True
 napoleon_numpy_docstring: Final[bool] = False
 nbsphinx_execute = "never"
 intersphinx_mapping: Final[dict[str, tuple[str, None]]] = {
    "python": ("https://docs.python.org/3", None),
    "pandas": ("https://pandas.pydata.org/docs/", None),
    "numpy": ("https://numpy.org/doc/stable/", None),
    "matplotlib": ("https://matplotlib.org/stable/", None),
    "scikit-learn": ("https://scikit-learn.org/stable/", None),
    "sphinx": ("https://docs.sympy.org/latest/", None),
 }
@@ -0,0 +1,53 @@
 .. Your Package Name documentation master file, created by Sphinx
 Transparenzregister Dokumentation
 =================================
 This is the documentation for the AKI project group on the german transparenzregister and an Analysis there of.
 .. toctree::
   :maxdepth: 3
   :caption: Project planung
   Pflichtenheft
   timeline.md
 .. toctree::
   :glob:
   :maxdepth: 1
   :caption: Meeting Notes:
   meeting-notes/*
 .. toctree::
   :glob:
   :maxdepth: 3
   :caption: Research
   research/*
   research/*.ipynb
 .. toctree::
   :glob:
   :maxdepth: 0
   :caption: Seminararbeiten
   seminararbeiten/DevOps/Seminarpräsentation.ipynb
 .. toctree::
   :glob:
   :maxdepth: 0
   :caption: Modules
   modules
 .. automodule:: aki_prj23_transparenzregister
   :members:
   :undoc-members:
   :show-inheritance:
   :inherited-members:
   :autodoc_member_order:
 Indices and tables
 ==================
 * :ref:`genindex`
 * :ref:`modindex`
@@ -0,0 +1,35 @@
@ECHO OFF
 pushd %~dp0
 REM Command file for Sphinx documentation
 if "%SPHINXBUILD%" == "" (
 	set SPHINXBUILD=sphinx-build
 )
 set SOURCEDIR=.
 set BUILDDIR=_build
 %SPHINXBUILD% >NUL 2>NUL
 if errorlevel 9009 (
 	echo.
 	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
 	echo.installed, then set the SPHINXBUILD environment variable to point
 	echo.to the full path of the 'sphinx-build' executable. Alternatively you
 	echo.may add the Sphinx directory to PATH.
 	echo.
 	echo.If you don't have Sphinx installed, grab it from
 	echo.https://www.sphinx-doc.org/
 	exit /b 1
 )
 if "%1" == "" goto help
 %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
 goto end
 :help
 %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
 :end
 popd
@@ -0,0 +1,698 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# DevOps for the Transparenzregister analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Dependency management in Python\n",
    "### Tools\n",
    "\n",
    "- `The requirements.txt` lists all the dependencies of a project with version number and optionally with hashes and additional indexes and conditions for system specific differences.\n",
    "    - Changes are difficult because auf interdependency. \n",
    "    - Sync with requirements.txt is impossible via pip.\n",
    "    - All indirekt requirements need to be changed manually. \n",
    "    - Security and other routine upgrades for bugfixes are annoying and difficult to solve.\n",
    "    - Adding new requirements is complex."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "- `pip-tools` is the next level up.\n",
    "    - Generates `requirements.txt` from `requirements.ini`\n",
    "    - Allows for sync with ``requirements.txt`\n",
    "    - No solution to manage multiple combinations of requirements for multiple problems.\n",
    "        -  Applications or packages with dev and build tools\n",
    "        -  Applications or packages with test and lint tools\n",
    "        -  packages with additional typing packages\n",
    "        -  A combination there of\n",
    "- `pip-compile-multi` is an extension of `pip-tools` and allows for the generation of multiple requirements files.\n",
    "  - Only configured combinations of dependency groups are allowed.\n",
    "  - Different configurations may find different solutions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "- `poetry` is the most advanced tool to solve python dependencies\n",
    "    - Comparable to Javas `maven`\n",
    "    - Finds a complete solution for all requirement groups and installed groups as defined\n",
    "    - Allows for upgradable packages in defined bounds.\n",
    "    - Exports a solution that can be used on multiple machines to guarantee the same environment\n",
    "    - Handling of Virtual environments\n",
    "    - Automatically includes requirements in metadata and other entries for wheel when building\n",
    "    - Build and publication management\n",
    "    - Complete packaging configuration in `pyproject.toml` as required in **PEP 621**\n",
    "    - Supports plugins"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Defined Poetry dependency groups in our project\n",
    "| Group   | Contents & Purpose                                                  |\n",
    "|:--------|:--------------------------------------------------------------------|\n",
    "| root    | The packages needed for the package itself                          |\n",
    "| develop | Packages needed to support the development such as `jupyter`        |\n",
    "| lint    | Packages needed for linting such as `mypy`, `pandas-stubs` & `ruff` |\n",
    "| test    | Packages needed for testing such as `pytest`                        |\n",
    "| doc     | Packages needed for the documentation build such as `sphinx`        |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### How to use poetry\n",
    "- `poetry new <project-name>` command creates a new poject with folder structure and everything.\n",
    "- `poetry init` adds a poetry configuration to an existing project.\n",
    "\n",
    "- `poetry install` If the project is already configured will install the dependencies.\n",
    "    - kwarg `--with dev` force it to install the dependencies to develop with. In our case that would be a jupyter setup.\n",
    "    - kwarg `--without lint,test` forces poetry to not install the dependencies for the groups lint and test. For our case that would include pytest, mypy and typing packages."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "- `poetry add pandas_typing<=2` would add pandas with a versions smaller than `2.0.0` as a dependency.\n",
    "    - kwarg `--group lint` would configure it as part of the dependency group typing.\n",
    "    - A package can be part of multiple groups.\n",
    "    - By default, it is part of the package requirements that are part of the requirements if a build wheel is installed.\n",
    "    - Only direct requirements are configured! Indirect requirements are solved.\n",
    "- `poetry update` updates the dependency solution and syncs the new dependencies.\n",
    "- Requirement files can be exported.\n",
    "\n",
    "The full documentation can be found [here](https://python-poetry.org/)!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Linter\n",
    "\n",
    "Python is an\n",
    "\n",
    "- interpreted\n",
    "- weak typed\n",
    "\n",
    "programing language.\n",
    "Without validation of types and other compile mechanisms that highlight some errors.\n",
    "\n",
    "Lint stands for *lint is not a software testing tool* and analyses the code for known patterns that can lead to different kinds of problems."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Why lint: Application perspective:\n",
    "- In Compiled programming languages many errors a thrown when a software is build. This is a first minimum quality gate for code.\n",
    "- Hard typing also enforces a certain explicit expectation on arguments are expected. This is a secondary quality gate for code python does not share.\n",
    "- This allows for a certain flexibility but allows for careless mistakes.\n",
    "- Helps to find inconsistencies\n",
    "- Helps to find security vulnerabilities"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Why lint: Human perspective:\n",
    "- Certainty that naming conventions are followed allows for an easier code understanding.\n",
    "- Auto whitespace formatting (Black) \n",
    "    - Absolut whitespace formatting allows for a clean differentials when versioning with git.\n",
    "    - The brain does not need to adapt on how somebody else formats his code\n",
    "    - No time wasted on beatification of code through whitespace\n",
    "- Classic linter\n",
    "    - Faster increas in abilities\n",
    "    - Nobody needs to read a long styleguide\n",
    "    - Reminds the programmer of style rules when they are violated\n",
    "    - Contributers from otside the project can contribute easier\n",
    "    - Code simplifications are pointed out\n",
    "    - Reduces the number of variances for the same functionality"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Collection of Recommended linter\n",
    "\n",
    "- Black for an automatic and absolut whitespace formatting. (No configuration options)\n",
    "- Ruff faster rust implementation of many commonly used linters.\n",
    "    - Reimplementation of the following tools:\n",
    "        - flake8 (Classic python linter, unused imports, pep8)\n",
    "        - isort Automatic import sorting (Vanilla python, third party, your package)\n",
    "        - bandit (Static code analysis for security problems)\n",
    "        - pylint (General static code analysis)\n",
    "        - many more\n",
    "    - Fixes many things that have `simple` fixes\n",
    "    - Relatively new\n",
    "    - Endorsed from project like pandas, FastAPI, Hugging Face, SciPy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "- mypy\n",
    "    - Checks typing for python\n",
    "    - Commonly used linter for typing\n",
    "    - Often needs support of typing tools\n",
    "    - Sometimes additional typing information is needed from packages such as `pandas_stubs`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "- pip-audit checks dependencies against vulnarability db\n",
    "- pip-license checks if a dependency has an allowed license"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Testing with Pytest\n",
    "\n",
    "Even tough python comes with its own testing framework a much more lightweight and more commonly used testing framework is `pytest`\n",
    "\n",
    "``tests/basic_test.py``\n",
    "\n",
    "```python\n",
    "from ... import add\n",
    "\n",
    "def test_addition():\n",
    "    assert add(4, 3) == 7, \"The addtion did not result into the correct result\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Parametizeed Test\n",
    "\n",
    "In addition, pytest contains the functionality to parameter its inputs\n",
    "\n",
    "``tests/parametriesed_test.py``\n",
    "\n",
    "```python\n",
    "import pytest\n",
    "\n",
    "from ... import add\n",
    "\n",
    "@pytest.mark.parametize(\"inputvalues,output_value\", [[(1,2,3), 6], [(21, 21), 42]])\n",
    "def test_addition(inputvalues: tuple[float, ...], output_value: [float]):\n",
    "    assert add(*inputvalues) == output_value\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Tests with setup and teardown\n",
    "\n",
    "Setting up an enviroment and cleaning it up afterwords is possible with `pytest`'s `fixture`\n",
    "\n",
    "``tests/setup_and_teardown_test.py`` \n",
    "\n",
    "```python\n",
    "import pytest\n",
    "\n",
    "from sqlalchemy.orm import Session\n",
    "\n",
    "@pytest.fixture()\n",
    "def create_test_sql() -> Generator[Session, None, None]:\n",
    "    # create_test_sql_table\n",
    "    # create sql connection\n",
    "    yield sql_session\n",
    "    # delete sql connection\n",
    "    # delete sql tables\n",
    "    \n",
    "def test_sql_table(create_test_sql) -> None:\n",
    "    assert sql_engine.query(HelloWorldTable).get(\"hello\") == \"world\"\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Tests are run with the following command\n",
    "\n",
    "```bash\n",
    "poetry run pytest tests/\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Code Coverage\n",
    "\n",
    "Code coverage reports count how many times a line was executed and therfore tested.\n",
    "\n",
    "They can eiter be integrated into an IDE for higliting of missing code or reviewed directly.\n",
    "\n",
    "Either over third party software or by the html version that can be found with the build artifacts."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## pre-commits\n",
    "\n",
    "Git is a filesystem based versioning application.\n",
    "That includes parts of its code are accessible and ment to be manipulated.\n",
    "At different times of the application a manipulate script can be executed.\n",
    "Typicle moments are on:\n",
    "- pull\n",
    "- push / push received\n",
    "- pre-commit / pre-merge / pre-rebase\n",
    "\n",
    "The `pre-commit` package hooks into the commit and implements a set of programms before committing\n",
    "Files can be **edited** or **validated**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "`pre-commit` execute fast tests on changed files to ensure quality of code.\n",
    "\n",
    "**Bohems Law**\n",
    "\n",
    "![Bohems Law](bohems-law.png)\n",
    "\n",
    "Since they are executed on commit on only the newly committed files a response is much faster.\n",
    "The normally only include linting and format validation tools no testing.\n",
    "Sometimes autofixer such as black, isort and ruff."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![Pre-commit.PNG](Pre-commit.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Configured pre-commit hooks:\n",
    "\n",
    "- format checker + pretty formatter (xml,json,ini,yaml,toml)\n",
    "- secret checker => No passwords or private keys\n",
    "- file naming convention checker for tests\n",
    "- syntax checker\n",
    "- ruff => Linter\n",
    "- black => Whitespace formatter\n",
    "- poetry checker\n",
    "- mypy => typing checker\n",
    "- md-toc => Adds a table oc contents to an *.md  where `<!--TOC-->` is placed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Pre commits are installed with the command\n",
    "```bash\n",
    "pre-commit install\n",
    "```\n",
    "The pre commits after that executed on each commit.\n",
    "\n",
    "If the pre-commits need to be skipped the -n option skips them on commit.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Documentation build with sphinx\n",
    "\n",
    "There is no single way to use to build a python documentation.\n",
    "Sphinx is a commonly used libarary.\n",
    "\n",
    "- Builds a package documentation from code\n",
    "- Native in rest\n",
    "- Capable of importing *.md, *.ipynb\n",
    "- Commonly used read the docs theme\n",
    "- Allows links to third party documentations via inter-sphinx (pandas, numpy, etc.)\n",
    "\n",
    "Currently implemented to build a documentation on pull_requests and the main branch.\n",
    "\n",
    "Automatically deployed from the main branch."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## GitHub\n",
    "\n",
    "GitHub is a central hub for git repositories to be stored and manged.\n",
    "\n",
    "In addition, it hosts project management tools and devops tools for:\n",
    "- testing\n",
    "- linting\n",
    "- analysing\n",
    "- building\n",
    "- deploying code\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Example GitHub action workflow\n",
    "\n",
    "Workflows are defined in `.github/workflows/some-workflow.yaml`\n",
    "```yaml\n",
    "name: Build\n",
    "\n",
    "on: # when to run the action\n",
    "  pull_request:\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "A single action of a workflow.\n",
    "\n",
    "```yaml\n",
    "jobs:\n",
    "  build:\n",
    "    runs-on: ubuntu-latest # on what kind of runner to run an action\n",
    "    steps:\n",
    "    - uses: actions/setup-python@v4 # setup python\n",
    "      with:\n",
    "        python-version: 3.11\n",
    "    - uses: snok/install-poetry@v1 # setup poetry\n",
    "      with:\n",
    "        version: 1.4.2\n",
    "        virtualenvs-path: ~/local/share/virtualenvs\n",
    "    - uses: actions/checkout@v3\n",
    "    - run: |\n",
    "        poetry install --without develop,doc,lint,test\n",
    "        poetry build\n",
    "    - uses: actions/upload-artifact@v3\n",
    "      with:\n",
    "        name: builds\n",
    "        path: dist/\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Test and Build-pipeline with GitHub actions\n",
    "On push and pull request:\n",
    "- Lint + license check + dependency security audit\n",
    "    - Problem summaries in GitHub actions  + Problem notification via mail\n",
    "- Test with pytest + coverage reports + coverage comment on pull request\n",
    "- Python Build\n",
    "- Documentation Build\n",
    "- Documentation deployment to GitHub pages (on push to main)\n",
    "\n",
    "On Tag:\n",
    "- Push: Docker architecture and CD context still unclear"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Build artifacts\n",
    "\n",
    "- Dependencies / versions and licenses\n",
    "- Security report\n",
    "- Unit test reports and coverage report as `.coverage` / `coverage.xml` / `html`!\n",
    "- Build wheel\n",
    "- Build documentation\n",
    "- probably. one or more container\n",
    "- if needed documentation as pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![Action Snapshot](Action.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![Action Summary](Action-Summary.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![Lint-error.PNG](Lint-error.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![Coverage.PNG](Coverage.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![Pull Request](Pull_request.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Dependabot\n",
    "\n",
    "Dependabot is a GitHub tool to refresh dependencies if newer ones come available or if the currently used ones develop security flaws.\n",
    "Dependabot is currently not python compatible.\n",
    "Dependabot is a tool for a passive maintenance of a project without the need for much human overside."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### GitHub Runner Configuration and what does not work\n",
    "\n",
    "Most GitHub actions for python reley on the `actions/python-setup` action.\n",
    "This action is not available for linux arm.\n",
    "Workarounds with a python docker container / an installation of python on the runner and other tools do not work well."
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  },
  "rise": {
   "scroll": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
@@ -1,12 +1,8 @@
-```mermaid
+# Timeline
-%%{init: 
+```{mermaid}
-{
+
-  "theme": "neutral"
+    gantt
-}
+
 }%%
 gantt
 %%Timeline created 11-04-2023
 %%use Mermaid.js for visualization
        title Timeline PG Transparenzregister
        dateFormat  YYYY-MM-DD
        section Organisation
@@ -1,5 +1,107 @@
-[tool.isort]
+[build-system]
-profile = "black"
+build-backend = "poetry.core.masonry.api"
 requires = ["poetry-core"]
-[tool.pylint.format]
+[tookl.mypy]
-max-line-length = "88"
+disallow_untyped_defs = true
 follow_imports = "silent"
 python_version = "3.11"
 warn_redudant_casts = true
 warn_unused_ignores = true
 [tool.black]
 target-version = ["py311"]
 [tool.coverage.run]
 branch = true
 dynamic_context = "test_function"
 relative_files = true
 source = ["src"]
 [tool.poetry]
 authors = ["AKI Projektgruppe 23"]
 description = "A project analysing the german transparenzregister and other data sources to find shared business interests and shared personal and other links for lots of companies."
 name = "aki-prj23-transparenzregister"
 packages = [{include = "aki_prj23_transparenzregister", from = "src"}]
 readme = "README.md"
 version = "0.1.0"
 [tool.poetry.dependencies]
 loguru = "^0.7.0"
 matplotlib = "^3.7.1"
 plotly = "^5.14.1"
 python = "^3.11"
 seaborn = "^0.12.2"
 tqdm = "^4.65.0"
 [tool.poetry.group.develop.dependencies]
 black = {extras = ["jupyter"], version = "^23.3.0"}
 jupyterlab = "^4.0.0"
 nbconvert = "^7.4.0"
 pre-commit = "^3.3.2"
 rise = "^5.7.1"
 [tool.poetry.group.doc.dependencies]
 jupyter = "^1.0.0"
 myst-parser = "^1.0.0"
 nbsphinx = "^0.9.2"
 sphinx = "^6.0.0"
 sphinx-copybutton = "^0.5.2"
 sphinx-rtd-theme = "^1.2.1"
 sphinx_autodoc_typehints = "*"
 sphinxcontrib-mermaid = "^0.9.2"
 sphinxcontrib-napoleon = "^0.7"
 [tool.poetry.group.lint.dependencies]
 black = "^23.3.0"
 mypy = "^1.3.0"
 pandas-stubs = "^2.0.1.230501"
 ruff = "^0.0.270"
 types-requests = "^2.31.0.1"
 [tool.poetry.group.test.dependencies]
 pytest = "^7.3.1"
 pytest-clarity = "^1.0.1"
 pytest-cov = "^4.1.0"
 pytest-mock = "^3.10.0"
 pytest-repeat = "^0.9.1"
 [tool.ruff]
 exclude = [
  ".bzr",
  ".direnv",
  ".eggs",
  ".git",
  ".git-rewrite",
  ".hg",
  ".mypy_cache",
  ".nox",
  ".pants.d",
  ".pytype",
  ".ruff_cache",
  ".svn",
  ".tox",
  ".venv",
  "__pypackages__",
  "_build",
  "buck-out",
  "build",
  "dist",
  "node_modules",
  "venv"
 ]
 # Never enforce `E501` (line length violations).
 ignore = ["E501"]
 line-length = 88
 # Enable flake8-bugbear (`B`) rules.
 select = ["E", "F", "B", "I", "S", "RSE", "RET", "SLF", "SIM", "TID", "PD", "PL", "PLE", "PLR", "PLW", "NPY", "UP", "D", "N", "A", "C4", "T20", "PT"]
 src = ["src"]
 target-version = "py311"
 # Avoid trying to fix flake8-bugbear (`B`) violations.
 unfixable = ["B"]
 [tool.ruff.per-file-ignores]
 "tests/*.py" = ["S101"]
 [tool.ruff.pydocstyle]
 convention = "google"
@@ -0,0 +1,21 @@
 colorama==0.4.6 ; python_version >= "3.11" and python_version < "4.0" and sys_platform == "win32" or python_version >= "3.11" and python_version < "4.0" and platform_system == "Windows"
 contourpy==1.1.0 ; python_version >= "3.11" and python_version < "4.0"
 cycler==0.11.0 ; python_version >= "3.11" and python_version < "4.0"
 fonttools==4.40.0 ; python_version >= "3.11" and python_version < "4.0"
 kiwisolver==1.4.4 ; python_version >= "3.11" and python_version < "4.0"
 loguru==0.7.0 ; python_version >= "3.11" and python_version < "4.0"
 matplotlib==3.7.1 ; python_version >= "3.11" and python_version < "4.0"
 numpy==1.25.0 ; python_version >= "3.11" and python_version < "4.0"
 packaging==23.1 ; python_version >= "3.11" and python_version < "4.0"
 pandas==2.0.2 ; python_version >= "3.11" and python_version < "4.0"
 pillow==9.5.0 ; python_version >= "3.11" and python_version < "4.0"
 plotly==5.15.0 ; python_version >= "3.11" and python_version < "4.0"
 pyparsing==3.0.9 ; python_version >= "3.11" and python_version < "4.0"
 python-dateutil==2.8.2 ; python_version >= "3.11" and python_version < "4.0"
 pytz==2023.3 ; python_version >= "3.11" and python_version < "4.0"
 seaborn==0.12.2 ; python_version >= "3.11" and python_version < "4.0"
 six==1.16.0 ; python_version >= "3.11" and python_version < "4.0"
 tenacity==8.2.2 ; python_version >= "3.11" and python_version < "4.0"
 tqdm==4.65.0 ; python_version >= "3.11" and python_version < "4.0"
 tzdata==2023.3 ; python_version >= "3.11" and python_version < "4.0"
 win32-setctime==1.1.0 ; python_version >= "3.11" and python_version < "4.0" and sys_platform == "win32"
@@ -1 +0,0 @@
 .env
@@ -1,36 +0,0 @@
 # base
 FROM ubuntu:latest
 # set the github runner version
 ARG RUNNER_VERSION="2.304.0"
 ARG ARCHITECTURE=arm
 # update the base packages and add a non-sudo user
 RUN apt-get update -y && apt-get upgrade -y && useradd -m docker
 # install python and the packages the your code depends on along with jq so we can parse JSON
 # add additional packages as necessary
 RUN DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
    curl jq build-essential libssl-dev libffi-dev python3 python3-venv python3-dev python3-pip nano vim
 # cd into the user directory, download and unzip the github actions runner
 RUN cd /home/docker && mkdir actions-runner && cd actions-runner \
    && curl -O -L https://github.com/actions/runner/releases/download/v${RUNNER_VERSION}/actions-runner-linux-${ARCHITECTURE}-${RUNNER_VERSION}.tar.gz \
    && tar xzf ./actions-runner-linux-${ARCHITECTURE}-${RUNNER_VERSION}.tar.gz \
    && rm *.tar.gz
 # install some additional dependencies
 RUN chown -R docker ~docker && /home/docker/actions-runner/bin/installdependencies.sh
 # copy over the start.sh script
 COPY start.sh .
 # make the script executable
 RUN chmod +x start.sh
 # since the config and run script for actions are not allowed to be run by root,
 # set the user to "docker" so all subsequent commands are run as the docker user
 USER docker
 # set the entrypoint to the start.sh script
 CMD ["./start.sh"]
@@ -1,24 +0,0 @@
 # Runner
 This subfolder of the project adds a github action runner when started with a docker compose.
 Github Actions is the build in CI/CD functionality of github. Runners are the instances that execute the actions required by the CI/CD.
 Github allows runners as a service or selfhosted. Since we will not get accees to the FH-SWF cloud resource we will use the selfhosted version of the runner.
 ## Configure & Startup
 To start a github runner please add a `.env` config file with key value pairs for the following variables.
 ```
 ORGANIZATION=fhswf/aki_prj23_transparenzregister
 ACCESS_TOKEN=                                    #
 ARCHITECTURE=arm                                 # Value such as x64,arm64,arm
 HOSTNAME=some-runner-name                        # A name to give to the docker image. Will also be used as a runner name.
 DGID=						 # Group id of the docker group
 ```
 An access token can be optained in the Actions ection of the repository or [here](https://github.com/fhswf/aki_prj23_transparenzregister/settings/actions/runners/new).
 To startup the runner execute `docker compose up -d --build`.
 Docker needs to be installed. Additionally you may need to execute the docker comands as root.
 ## Sources
 This runner configuration is based on [this article](https://testdriven.io/blog/github-actions-docker/).
@@ -1,14 +0,0 @@
 version: '3'
 services:
  runner:
    build:
      context: .
      dockerfile: Dockerfile
      args:
        - ARCHITECTURE=${ARCHITECTURE}
    environment:
      - ORGANIZATION=${ORGANIZATION}
      - ACCESS_TOKEN=${ACCESS_TOKEN}
      - TZ=DE 
    hostname: ${HOSTNAME}
    restart: unless-stopped
@@ -1,20 +0,0 @@
 #!/bin/bash
 ORGANIZATION=$ORGANIZATION
 ACCESS_TOKEN=$ACCESS_TOKEN
 REG_TOKEN=$(curl -sX POST -H "Authorization: token ${ACCESS_TOKEN}" https://api.github.com/orgs/${ORGANIZATION}/actions/runners/registration-token | jq .token --raw-output)
 cd /home/docker/actions-runner
 ./config.sh --url https://github.com/${ORGANIZATION} --token ${ACCESS_TOKEN}
 cleanup() {
    echo "Removing runner..."
    ./config.sh remove --unattended --token ${ACCESS_TOKEN}
 }
 trap 'cleanup; exit 130' INT
 trap 'cleanup; exit 143' TERM
 ./run.sh & wait $!
@@ -0,0 +1,12 @@
 """A project analysing the german transparenzregister and other data sources.
 to find shared business interests and shared personal and other links for lots of companies
 """
 from importlib.metadata import metadata
 from typing import Final
 _DISTRIBUTION_METADATA = metadata("aki-prj23-transparenzregister")
 __author__: Final[str] = _DISTRIBUTION_METADATA["Author"]
 __email__: Final[str] = _DISTRIBUTION_METADATA["Author-email"]
 __version__: Final[str] = _DISTRIBUTION_METADATA["Version"]
@@ -0,0 +1,8 @@
 """Pylint tests."""
 import aki_prj23_transparenzregister
 def test_version() -> None:
    """Tests if the version tag is entered."""
    assert aki_prj23_transparenzregister.__version__
    assert isinstance(aki_prj23_transparenzregister.__version__, str)