mirror of
https://github.com/fhswf/aki_prj23_transparenzregister.git
synced 2025-04-25 01:42:33 +02:00
Added settings for NER and sentiment services. --------- Co-authored-by: Philipp Horstenkamp <philipp@horstenkamp.de>
129 lines
4.7 KiB
Markdown
129 lines
4.7 KiB
Markdown
# README.md of the aki_prj23_transparenzregister
|
|
|
|
## Contributions
|
|
|
|
See the [CONTRIBUTING.md](CONTRIBUTING.md) about how code should be formatted and what kind of rules we set ourselves.
|
|
|
|
## Defined entrypoints
|
|
|
|
The project has currently the following entrypoint available:
|
|
|
|
- **data-transformation** > Transfers all the data from the mongodb into the sql db to make it available as production data.
|
|
- **data-processing** > Processes the data using NLP methods and transfers matched data into the SQL table ready for use.
|
|
- **reset-sql** > Resets all sql tables in the connected db.
|
|
- **copy-sql** > Copys the content of a db to another db.
|
|
- **webserver** > Starts the webserver showing the analysis results.
|
|
- **find-missing-companies** >
|
|
- **ingest** >
|
|
|
|
All entrypoints support the `-h` argument and show a short help.
|
|
|
|
## Applikation startup
|
|
|
|
### Central Build
|
|
|
|
The packages / container built by GitHub are accessible for users that are logged into the GitHub Container Registry (GHCR) with a Personal Access token via Docker.
|
|
Run `docker login ghcr.io` to start that process. [The complete docs on logging in can be found here.](https://docs.github.com/de/packages/working-with-a-github-packages-registry/working-with-the-container-registry)
|
|
The application can than be simply started with `docker compose up --pull`.
|
|
Please note that some configuration with a `.env` is necessary.
|
|
|
|
### Local Build
|
|
|
|
The application can be locally build by starting the `rebuild-and-start.bat`, if `poetry` and `docker-compose` is installed.
|
|
This will build a current `*.whl` and build the Docker container locally.
|
|
The configuration that start like this is the `local-docker-compose.yaml`.
|
|
Please note that some configuration with a `.env` is necessary.
|
|
|
|
## Application Settings
|
|
|
|
### Docker configuration / Environmental-Variables
|
|
The current design of this application suggests that it is started inside a `docker-compose` configuration.
|
|
For `docker-compose` this is commonly done by providing a `.env` file in the root folder.
|
|
|
|
To use the environmental configuration start an application part with the `ENV` argument (`webserver ENV`).
|
|
|
|
```dotenv
|
|
# Defines the container registry used. Default: "ghcr.io/fhswf/aki_prj23_transparenzregister"
|
|
CR=ghcr.io/fhswf/aki_prj23_transparenzregister
|
|
|
|
# main is the tag the main branch is taged with and is currently in use
|
|
TAG=latest
|
|
|
|
# Configures the access port for the webserver.
|
|
# Default: "80" (local only)
|
|
HTTP_PORT: 8888
|
|
|
|
# configures where the application root is based. Default: "/"
|
|
DASH_URL_BASE_PATHNAME=/transparenzregister/
|
|
|
|
# Enables basic auth for the application.
|
|
# Diabled when one is empty. Default: Disabled
|
|
PYTHON_DASH_LOGIN_USERNAME=some-login-to-webgui
|
|
PYTHON_DASH_LOGIN_PW=some-pw-to-login-to-webgui
|
|
|
|
# How often data should be ingested in houres, Default: "4"
|
|
PYTHON_INGEST_SCHEDULE=12
|
|
|
|
# Settings for NER Service
|
|
# possible values: "spacy", "company_list", "transformer", Default: "transformer"
|
|
PYTHON_NER_METHOD=transformer
|
|
# possible values: "text", "title", Default: "text"
|
|
PYTHON_NER_DOC=text
|
|
|
|
# Settings for Sentiment Service
|
|
# possible values: "spacy", "transformer", Default: "transformer"
|
|
PYTHON_SENTIMENT_METHOD=transformer
|
|
# possible values: "text", "title", Default: "text"
|
|
PYTHON_SENTIMENT_DOC=text
|
|
|
|
# Acces to the mongo db
|
|
PYTHON_MONGO_USERNAME=username
|
|
PYTHON_MONGO_HOST=mongodb
|
|
PYTHON_MONGO_PASSWORD=password
|
|
PYTHON_MONGO_PORT=27017
|
|
PYTHON_MONGO_DATABASE=transparenzregister
|
|
|
|
# Acces to the postress sql db
|
|
PYTHON_POSTGRES_USERNAME=username
|
|
PYTHON_POSTGRES_PASSWORD=password
|
|
PYTHON_POSTGRES_HOST=postgres-host
|
|
PYTHON_POSTGRES_DATABASE=db-name
|
|
PYTHON_POSTGRES_PORT=5432
|
|
|
|
# An overwrite path to an sqlite db, overwrites the POSTGRES section
|
|
PYTHON_SQLITE_PATH=PathToSQLite3.db
|
|
```
|
|
|
|
### Local execution / config file
|
|
|
|
Create a `*.json` in the root of this repo with the following structure
|
|
(values to be replaces by desired config):
|
|
Please note that an `sqlite` entry overwrites the `postgres` entry.
|
|
To use the `*.json` use the path to it as an argument when using an entrypoint (`webserver secrets.json`).
|
|
|
|
```json
|
|
{
|
|
"sqlite": "path-to-sqlite.db",
|
|
"postgres": {
|
|
"username": "username",
|
|
"password": "password",
|
|
"host": "localhost",
|
|
"database": "db-name",
|
|
"port": 5432
|
|
},
|
|
"mongo": {
|
|
"username": "username",
|
|
"password": "password",
|
|
"host": "localhost",
|
|
"database": "transparenzregister",
|
|
"port": 27017
|
|
}
|
|
}
|
|
```
|
|
|
|
### sqlite vs. postgres
|
|
|
|
We support both sqlite and postgres because a local db is filled in about 10% of the time the remote db needs to be completed.
|
|
Even tough we use the `sqlite` for testing the connection can't manage multithreading or multiprocessing.
|
|
This clashes with the webserver. For production mode use the `postgres`-db.
|