60 Commits

Author SHA1 Message Date
Sascha Zhu
3c37d1aa65 Aspect-based Sentiment Analysis, Juypter File (not fully commented yet) 2024-01-06 12:49:04 +01:00
Sebastian
35016ba5f3
Improved model for sentiment pipeline (#434)
Austausch des Sentiment-Modells.
2023-11-27 19:32:49 +01:00
TrisNol
f8a0d58314 feat(data-extraction): Provide KPI table analysis in bundesanzeiger wrapper 2023-11-11 11:01:17 +01:00
TrisNol
815e08a8f1 checkpoint: Transform values to € and normalize column names 2023-11-11 11:01:17 +01:00
TrisNol
ec11ae13aa checkpoint: Parse table into dict of financial data 2023-11-11 11:01:17 +01:00
TrisNol
972fcd155e checkpoint: Normalize HTML tables fetched from Bundesanzeiger 2023-11-11 11:01:17 +01:00
Tim
e5769b3c25 Added Tests
Co-authored-by: Tristan Nolde <TrisNol@users.noreply.github.com>
2023-11-10 18:56:51 +01:00
Tim
410b690873 Added test 2023-11-10 18:56:51 +01:00
55ebb4c17d
Typo fixes (#249) 2023-10-21 10:58:54 +02:00
Sebastian
c680ac9759
Feature/ner (#103)
NER und Sentiment-Pipeline mit Services zur Datenextraktion.

---------

Co-authored-by: Philipp Horstenkamp <philipp@horstenkamp.de>
Co-authored-by: TrisNol <tristan.nolde@yahoo.de>
2023-10-16 19:54:24 +02:00
41f2c9f995
Executing black over all jupyter notebook (#190)
Reverting black for the jupyter notebooks gets old. Can we just run
black over all of them?
2023-10-04 20:03:47 +02:00
TrisNol
febcd59e39 test(data-extraction): Include first unit tests 2023-09-17 19:20:28 +02:00
TrisNol
bfe50ac76d checkpoint(data-ingestion): Move Unternehmensregister code to .py 2023-09-15 17:22:54 +02:00
TrisNol
8be192e1de checkpoint(data-ingestion): Include type in company relations, fix issue in capital for KGs 2023-09-15 15:39:42 +02:00
TrisNol
0c7216e105 checkpoint 2023-09-14 18:17:02 +02:00
TrisNol
413b43c615 checkpoint(data-ingestion): Unify date format in data 2023-09-14 16:47:11 +02:00
TrisNol
cf92cb61cc checkpoint(data-ingestion): Extract founding_date and other stats 2023-09-12 19:07:23 +02:00
TrisNol
1e15656028 refactor: Pull Auditor extraction into Bundesanzeiger utils 2023-08-18 16:21:52 +02:00
TrisNol
eb0962e1be refactor: Move Auditor dataclass to models 2023-08-18 14:20:34 +02:00
TrisNol
309755383e Install deutschland package 2023-08-18 14:15:05 +02:00
TrisNol
1ca1985f57 Merge branch 'main' into feature/refactor-utils 2023-08-11 15:40:33 +02:00
TrisNol
d565770b99 checkpoint(db): Refactor mongo utils, extract postgres entities from Juptyer 2023-08-11 15:12:18 +02:00
50bf7811ef
Div. dev ops updates (#43)
* Pipline rework to limit mypy and balck to src and tests
* Poetry update
* pre-commit update
2023-08-10 19:20:12 +02:00
Sebastian
f7e845cb16
How to update data in StagingDB on K8 Cluster 2023-08-02 20:30:13 +02:00
Sebastian
d853f52850
How to connect to MongoDB / StagingDB 2023-07-31 20:04:37 +02:00
Sebastian
aac37ffdab
Create configuration.py 2023-07-31 20:03:42 +02:00
Sebastian
ccce24d85e
Delete MongoDB 2023-07-31 20:01:55 +02:00
Sebastian
1c891a5b58
Create directory 'MongoDB' for experiments 2023-07-31 19:56:59 +02:00
TrisNol
ed681d7c47 refactor: Implement linter feedback 2023-07-11 14:20:16 +02:00
TrisNol
4c95550dbf feat(data-extraction): MongoWrapper, DataClasses and services for News and Company data 2023-07-10 18:58:31 +02:00
TrisNol
4c65d37816 Merge main into feature/data-extraction 2023-07-10 17:15:43 +02:00
TrisNol
e44385ce3a style: Refactoring imports, adapting MongoConnector to different connection_strings 2023-06-30 20:36:03 +02:00
TrisNol
3cd8860312 adding distric court location to export 2023-06-27 19:49:23 +02:00
TrisNol
421b1e8c87 Bundesanzeiger preparation, Handeslblatt RSS feed export 2023-06-27 19:17:54 +02:00
TrisNol
37fb1b1da3 multi-process scraping, transforming unternehmensregister output 2023-06-25 15:58:53 +02:00
a9304201af
(chore): Initilised devops tools (#29)
* Added a first action

* Repaired a typo

* Repaired a typo2

* Repaired a typo2

* Added flake8 action

* Repaired a typo in the flake8 action.

* Added a first bandit action

* Added a first batch

* Added a first batch

* Added a first batch

* Added a first batch

* Added a first batch

* Added the flake8-prebuild as a need to flake8

* Added the flake8-prebuild as a need to flake8

* Added the flake8-prebuild as a need to flake8

* Added the docker socket to the volume.

* Added the flake8-prebuild as a need to flake8

* Removed latest part from container.

* Removed latest part from container.

* Removed latest part from container.

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8

* Reworked flake8 poetry

* Reworked flake8 poetry

* Changed to 64bit

* Some edits to the runner

* Added python setup

* Added python -m to python docker image.

* Added python -m to python docker image.

* Added python -m to python docker image.

* Added python -m to python docker image.

* Added python -m to python docker image.

* Added python -m to python docker image.

* Added ra run linter

* Added ra run linter

* Added ra run linter

* Added ra run linter

* Removed redundant version

* Removed redundant version

* Added isort

* Added isort

* Added isort

* Added poetry install

* Added poetry install

* Added flake8 as lint.

* Added flake8 as lint.

* Added flake8 as lint.

* Added flake8 as lint.

* Added flake8 as lint.

* Added flake8 as lint.

* Added flake8 as lint.

* Uses nodejs and python image

* Added flake8 as lint.

* Added flake8 as lint.

* Added flake8 as lint.

* Added flake8 as lint.

* Removed selfhosted runner

* Removed self hosted runner

* Removed self hosted runner

* Removed self hosted runner

* Added black and flake8 tests

* Removed self hosted runner

* Removed self hosted runner

* Removed unneded actions

* Added a mypy error.

* Removed poetry call before boetry setup

* Removed poetry call before poetry setup

* Added a test to understand the poetry action better

* Added a test to understand the poetry action better

* Added a test to understand the poetry action better

* Added a test to understand the poetry action better

* Added a test to understand the poetry action better

* Added a test to understand the poetry action better

* Added the snook poetry builder

* Reworked the repo a bit

* Removed unneeded poetry installation

* Added the isort action

* Added isort test

* Added ruff

* Added full ruff configuration

* Added full ruff configuration2

* Added full ruff configuration2

* Removed duplicat configurations

* Removed some redundant pre-commit hooks

* Removed unneeded actions.

* Removed unneeded actions.

* Repaired ruff

* Added tests.

* Removed

* Removed

* Removed a missing file

* Removed a missing file

* Removed a missing file

* Removed a missing file

* Removed a missing file

* Added reports as artifacts

* Added reports as artifacts

* Added reports as artifacts

* Removed the unneded poetry test

* Added a license checker.

* Added a license checker.

* Removed some unneeded configuration.

* Removed the import reformatted.

* Added doc generation.

* Added doc generation.

* Added license summary.

* Add

* Add lint

* Switched pip-licenses to poetry.

* Switched pip-licenses to poetry.

* Switched pip-licenses to poetry.

* Remove some more packages.

* Remove some more packages.

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added a make file

* Added version codes to the main package

* Changed the format of the md files

* Presentation first draft

* Version up and added extensions

* Version up and added extensions

* Version up and added extensions

* Removed the venv path from docbuild

* Actions version up

* Actions version up

* Actions version up

* Actions version up

* Actions version up

* Actions version up

* Experiements with sphinx

* Experiments with sphinx

* Experiments with sphinx

* Experiments with sphinx

* Experiments with sphinx

* Experiments with sphinx

* Experiments with sphinx

* Experiments with sphinx

* First draft of the sphinx documentation.

* Added the protocol to the time series.

* Added the protocol to the time series.

* First draft ot a first build pipline

* Added mermaid version support

* Added documentations pull and branch request requirements.

* Added documentations pull and branch request requirements.

* Added documentations pull and branch request requirements.

* Added documentations pull and branch request requirements.

* Tests should now be passing

* Tests should now be passing

* Tests should now be passing

* Tests should now be passing

* Tests should now be passing

* Tests should now be passing

* Tests should now be passing

* Tests should now be passing

* Add safety

* Add safety

* Add safety

* Added the action on pull_request_target

* Added the action on pull_request_target

* Added the action on pull_request_target

* Added a pytest coverage report

* Added a pytest coverage report

* Added a pytest coverage report

* Added a pytest coverage report

* Added a pytest coverage report

* Added a build step

* Added a build step

* Added a build step

* Added a build step

* Changed the lint action to work only on python changes.

* Changed the lint action to work only on python changes.

* Changed the lint action to work only on python changes.

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Added the ability to compile a html report

* Coverage

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Finished test and build workflow

* Repaired a bug.

* Repaired a bug.

* Repaired a bug.

* Repaired a bug.

* Repaired a bug.

* Added a github branch.ref

* Removed a poetry install

* Docbuild now excludes templates

* Added the seminarpräsentation to the documentation build

* Added the seminarpräsentation to the documentation build

* Added the seminarpräsentation to the documentation build

* dded a few images

* Changed the pre-commit image

* Changed the pre-commit image

* Presentation done

* Never executing jupyter for sphinx

* Never executing jupyter for sphinx

* Never executing jupyter for sphinx

* Never executing jupyter for sphinx

* Never executing jupyter for sphinx
2023-06-23 18:47:04 +02:00
TrisNol
c9c7b0cf7a code cleanup, presentation on data extraction 2023-06-19 18:02:34 +02:00
TrisNol
6e31bc62bd mongodb wrapper for managing News objects 2023-06-16 18:50:19 +02:00
TrisNol
5b96bb7e3e adding company ID as well as compatible dataclasses 2023-06-16 18:00:11 +02:00
TrisNol
d3d8adabad dockerized mongodb as staging DB 2023-06-15 20:24:39 +02:00
TrisNol
3e737fbac5 first news article data extraction from tagesschau api 2023-06-15 18:04:23 +02:00
TrisNol
058c16b3ff Bulk process Unternehmensregister .xmls 2023-06-11 13:11:44 +02:00
TrisNol
1010b43a5f Extract first stakeholder informationen from Unternehmensregister export 2023-06-09 14:23:56 +02:00
TrisNol
e2ad2d475a Traverse all pages 2023-06-09 13:51:36 +02:00
TrisNol
d69368318f Download Unternehmensregister export via Selenium 2023-06-09 13:01:46 +02:00
SeZett
ba46532e0a Change on timeseries Notebook: filepath 2023-05-11 15:56:14 +02:00
RonnyFlex
6d509ee6ed News research + Abstract verflechtungsanalyse V1 2023-05-08 21:01:33 +02:00
SeZett
783891557b added ideas to DB scheme 2023-05-04 11:20:37 +02:00
KM-R
375d96b0ed
Example for sentiment analysis using VADER
To compare another sentiment analysis library to FINBert, used the same sample texts
2023-05-03 00:06:09 +02:00
f1e1a05fe8
Removed empty cell. 2023-05-01 13:21:58 +02:00