Philipp Horstenkamp 066800123d
Created pipeline to run ner sentiment and sql ingest (#314)
Created a dataprocessing pipline that enhances the raw mined data with
Organsiation extractions and sentiment analysis prio to moving the data
to the sql db.
The transfer of matched data is done afterword.

---------

Co-authored-by: SeZett <zeleny.sebastian@fh-swf.de>
2023-11-11 13:28:12 +00:00

81 lines
3.1 KiB
Markdown

# aki_prj23_transparenzregister
[![python](https://img.shields.io/badge/Python-3.11-3776AB.svg?style=flat&logo=python&logoColor=white)](https://www.python.org)
[![Actions status](https://github.com/astral-sh/ruff/workflows/CI/badge.svg)](https://github.com/astral-sh/ruff/actions)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)
[![Documentation Status](https://readthedocs.org/projects/mypy/badge/?version=stable)](https://mypy.readthedocs.io/en/stable/?badge=stable)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
## Contributions
See the [CONTRIBUTING.md](CONTRIBUTING.md) about how code should be formatted and what kind of rules we set ourselves.
## Available entrypoints
The project has currently the following entrypoint available:
- **data-transformation** > Transfers all the data from the mongodb into the sql db to make it available as production data.
- **data-processing** > Processes the data using NLP methods and transfers matched data into the SQL table ready for use.
- **reset-sql** > Resets all sql tables in the connected db.
- **copy-sql** > Copys the content of a db to another db.
- **webserver** > Starts the webserver showing the analysis results.
## DB Connection settings
To connect to the SQL db see [sql/connector.py](./src/aki_prj23_transparenzregister/utils/sql/connector.py)
To connect to the Mongo db see [connect]
Create a `secrets.json` in the root of this repo with the following structure (values to be replaces by desired config):
The sqlite db is alternative to the postgres section.
```json
{
"sqlite": "path-to-sqlite.db",
"postgres": {
"username": "username",
"password": "password",
"host": "localhost",
"database": "db-name",
"port": 5432
},
"mongo": {
"username": "username",
"password": "password",
"host": "localhost",
"database": "transparenzregister",
"port": 27017
}
}
```
Alternatively, the secrets can be provided as environment variables. One option to do so is to add a `.env` file with
the following layout:
```
PYTHON_POSTGRES_USERNAME=postgres
PYTHON_POSTGRES_PASSWORD=postgres
PYTHON_POSTGRES_HOST=localhost
PYTHON_POSTGRES_DATABASE=postgres
PYTHON_POSTGRES_PORT=5432
PYTHON_MONGO_USERNAME=username
PYTHON_MONGO_HOST=localhost
PYTHON_MONGO_PASSWORD=password
PYTHON_MONGO_PORT=27017
PYTHON_MONGO_DATABASE=transparenzregister
PYTHON_SQLITE_PATH=PathToSQLite3.db # An overwrite path to an sqllite db
PYTHON_DASH_LOGIN_USERNAME=some-login-to-webgui
PYTHON_DASH_LOGIN_PW=some-pw-to-login-to-webgui
CR=ghcr.io/fhswf/aki_prj23_transparenzregister
TAG=latest
HTTP_PORT=80
```
The prefix `PYTHON_` can be customized by setting a different `prefix` when constructing the ConfigProvider.