94 Commits

Author SHA1 Message Date
27a95c1c23 Replaced the bind with the connection method 2024-01-15 20:24:38 +01:00
9ea3771f18 A lot of spelling (#512) 2024-01-04 18:01:59 +01:00
693532086d Update styling and language of home page (#484) 2023-12-28 17:15:10 +01:00
ce1598c42e Removed double execution of Layouting in 2 and 3d. (#385)
Prior to layouting the sping layout was allways calculated and later
overwritten. (Double execution)
2023-11-16 17:25:15 +01:00
d0677287b6 Added a filter for financial reports. (#372)
Finanical reports are now filtered before beeing added to the SQL
database to only added knwon keys.
Some matching is also done.
The most importend missing reports are printed to be implemented later
on.
Rapidfuzz could be used.
2023-11-13 18:52:12 +01:00
af8a907cf9 Stop table reset of better persistent tables. (#373) 2023-11-12 14:27:44 +01:00
ac6ca3547b test: Add unit test for news api wrapper 2023-11-11 14:30:00 +01:00
066800123d Created pipeline to run ner sentiment and sql ingest (#314)
Created a dataprocessing pipline that enhances the raw mined data with
Organsiation extractions and sentiment analysis prio to moving the data
to the sql db.
The transfer of matched data is done afterword.

---------

Co-authored-by: SeZett <zeleny.sebastian@fh-swf.de>
2023-11-11 13:28:12 +00:00
a6d486209a Introduce extended_financial_data code (#357)
Introducing the previously developed method to fetch the financial data
via table parsing (aka "data lake like solution") in a non-destructive
manner by defaulting to the current RegEx-based behaviour.
2023-11-11 14:10:20 +01:00
e5b61bc19c Added multi relation dropdowns to dashbord (#363)
This change allows for a more complete combination of relation
combinations to be filtered.
2023-11-11 13:47:46 +01:00
9edf5b1dce test: Increase coverage for multi-column headers 2023-11-11 11:03:36 +01:00
fecf42d75a test: Unit test new KPI extraction 2023-11-11 11:01:17 +01:00
Tim
e5769b3c25 Added Tests
Co-authored-by: Tristan Nolde <TrisNol@users.noreply.github.com>
2023-11-10 18:56:51 +01:00
Tim
410b690873 Added test 2023-11-10 18:56:51 +01:00
Tim
41af7e2d18 Added test behaviour 2023-11-10 18:56:51 +01:00
Tim
f38728450d now ruff confirm 2023-11-10 18:53:47 +01:00
Tim
f2ac0eda91 Added Realtion_count MEthod 2023-11-10 18:53:47 +01:00
Tim
30f9e4506f solved errors 2023-11-10 18:50:38 +01:00
Tim
7e8adfafd5 Test Version 2023-11-10 18:50:11 +01:00
f7ec3eaf24 test: Increase test coverage and refactor v3 2023-11-05 12:55:47 +01:00
e8d1a37cff test: Extend unit tests 2023-11-04 14:19:41 +01:00
61f94fa3b9 test: Unit tests 2023-11-04 11:24:36 +01:00
d6b07431e7 test: Adapt existing unit tests to refactored imports 2023-11-04 11:24:36 +01:00
ad36c68993 Moved the AI tests into the AI folder. (#315) 2023-11-03 13:45:24 +01:00
8d9981d967 Moved AI files in the AI module. (#308) 2023-11-02 20:30:04 +01:00
7953ba9291 Mixed typo fixes (#270) 2023-10-26 19:06:45 +02:00
1eb972b7ff Adds the transfer of sentiments into the sql db (#253)
Transfers the sentimenes from the mongodb int the sql db.
2023-10-24 17:50:40 +02:00
36a0bab6ff Add relations from finanical reports to SQL (#216) 2023-10-19 19:21:33 +02:00
83d313150c test: Update to new functions 2023-10-17 18:47:25 +02:00
600039207d test(data-extraction): Adapt unit tests to new behaviour 2023-10-17 18:16:44 +02:00
c680ac9759 Feature/ner (#103)
NER und Sentiment-Pipeline mit Services zur Datenextraktion.

---------

Co-authored-by: Philipp Horstenkamp <philipp@horstenkamp.de>
Co-authored-by: TrisNol <tristan.nolde@yahoo.de>
2023-10-16 19:54:24 +02:00
f1474feaf8 refactor: Adapt to extended unit tests 2023-10-15 13:21:41 +02:00
fd47487367 Update tests/utils/data_extraction/unternehmensregister/transform_test.py
Co-authored-by: Philipp Horstenkamp <philipp@horstenkamp.de>
2023-10-15 13:07:34 +02:00
8db04177be feat(data-extraction): Extract c/o relation from street in company relation 2023-10-15 13:06:32 +02:00
eba5235dff refactor: Implement PR feedback 2023-10-15 12:05:25 +02:00
39c13ac74a Update tests/utils/data_extraction/unternehmensregister/transform_test.py
Co-authored-by: Philipp Horstenkamp <philipp@horstenkamp.de>
2023-10-15 11:51:11 +02:00
b972acee7a fix(data-extraction): Parse date from Gesellschaftsvertrag entry 2023-10-14 18:22:41 +02:00
6365e252b9 Added location to person (#185) 2023-10-14 15:27:19 +00:00
f8c111d7e2 Resolve mismatch between staging and prod db data for financials (#211)
SQL Creation is now done dynamicly by the definition of the enumeration
type.
2023-10-14 17:16:14 +02:00
84d0139531 fix(data-extraction): Handle malformed date_of_birth fields 2023-10-07 17:01:34 +02:00
7500895982 fix: Add script to fix malformed yearly_result entries (#202) 2023-10-07 12:35:29 +02:00
9cc58ba8be fix: Add script to fix malformed yearly_result entries 2023-10-07 09:11:43 +02:00
63325e7faa Add constraints to the SQL entities (#186) 2023-10-06 18:48:58 +02:00
b1ca268a62 SQL fixes after new mongo ingest (#199) 2023-10-06 18:22:19 +02:00
09c36960e3 Add an list of missing relation partners to be searched (#171)
- [x] Add a new table
- [x] Add a field to the table that can register if the company was
already queried
- [x] Add a field to the table that counts how many times a relation
partner was missing
- [x] Add a function that restets the counter

Also:
- Reworked the get_company function to use the location dict as kwargs
2023-10-05 19:57:30 +02:00
c6f2c7467c Rework the transfer of company data to fit the new data in the mongodb (#188)
This adds the additional company data as proposed to the sql db.

- [x] @TrisNol Is everything included or did I miss a feature. Relations
are in another issue.
- [x] @KM-R New DB features for the Dashbord for your review.
2023-10-05 19:47:46 +02:00
259259953e refactor: Move quote removal funtion to string utils, adapt to requirements 2023-10-03 16:37:54 +02:00
2a446a9937 checkpoint: Remove quotes from company names in relations 2023-10-03 14:33:46 +02:00
49498ad7c0 checkpoint: Remove quotes from company name 2023-10-03 14:33:45 +02:00
7e9cff046a fix(data-extraction): Parse house-number from street field if possibl… (#179) 2023-10-03 14:26:21 +02:00