Sebastian 3fa4667a24
addes NER/Sentiment Settings to README.md (#517)
Added settings for NER and sentiment services.

---------

Co-authored-by: Philipp Horstenkamp <philipp@horstenkamp.de>
2024-01-04 17:48:43 +01:00

129 lines
4.7 KiB
Markdown

# README.md of the aki_prj23_transparenzregister
## Contributions
See the [CONTRIBUTING.md](CONTRIBUTING.md) about how code should be formatted and what kind of rules we set ourselves.
## Defined entrypoints
The project has currently the following entrypoint available:
- **data-transformation** > Transfers all the data from the mongodb into the sql db to make it available as production data.
- **data-processing** > Processes the data using NLP methods and transfers matched data into the SQL table ready for use.
- **reset-sql** > Resets all sql tables in the connected db.
- **copy-sql** > Copys the content of a db to another db.
- **webserver** > Starts the webserver showing the analysis results.
- **find-missing-companies** >
- **ingest** >
All entrypoints support the `-h` argument and show a short help.
## Applikation startup
### Central Build
The packages / container built by GitHub are accessible for users that are logged into the GitHub Container Registry (GHCR) with a Personal Access token via Docker.
Run `docker login ghcr.io` to start that process. [The complete docs on logging in can be found here.](https://docs.github.com/de/packages/working-with-a-github-packages-registry/working-with-the-container-registry)
The application can than be simply started with `docker compose up --pull`.
Please note that some configuration with a `.env` is necessary.
### Local Build
The application can be locally build by starting the `rebuild-and-start.bat`, if `poetry` and `docker-compose` is installed.
This will build a current `*.whl` and build the Docker container locally.
The configuration that start like this is the `local-docker-compose.yaml`.
Please note that some configuration with a `.env` is necessary.
## Application Settings
### Docker configuration / Environmental-Variables
The current design of this application suggests that it is started inside a `docker-compose` configuration.
For `docker-compose` this is commonly done by providing a `.env` file in the root folder.
To use the environmental configuration start an application part with the `ENV` argument (`webserver ENV`).
```dotenv
# Defines the container registry used. Default: "ghcr.io/fhswf/aki_prj23_transparenzregister"
CR=ghcr.io/fhswf/aki_prj23_transparenzregister
# main is the tag the main branch is taged with and is currently in use
TAG=latest
# Configures the access port for the webserver.
# Default: "80" (local only)
HTTP_PORT: 8888
# configures where the application root is based. Default: "/"
DASH_URL_BASE_PATHNAME=/transparenzregister/
# Enables basic auth for the application.
# Diabled when one is empty. Default: Disabled
PYTHON_DASH_LOGIN_USERNAME=some-login-to-webgui
PYTHON_DASH_LOGIN_PW=some-pw-to-login-to-webgui
# How often data should be ingested in houres, Default: "4"
PYTHON_INGEST_SCHEDULE=12
# Settings for NER Service
# possible values: "spacy", "company_list", "transformer", Default: "transformer"
PYTHON_NER_METHOD=transformer
# possible values: "text", "title", Default: "text"
PYTHON_NER_DOC=text
# Settings for Sentiment Service
# possible values: "spacy", "transformer", Default: "transformer"
PYTHON_SENTIMENT_METHOD=transformer
# possible values: "text", "title", Default: "text"
PYTHON_SENTIMENT_DOC=text
# Acces to the mongo db
PYTHON_MONGO_USERNAME=username
PYTHON_MONGO_HOST=mongodb
PYTHON_MONGO_PASSWORD=password
PYTHON_MONGO_PORT=27017
PYTHON_MONGO_DATABASE=transparenzregister
# Acces to the postress sql db
PYTHON_POSTGRES_USERNAME=username
PYTHON_POSTGRES_PASSWORD=password
PYTHON_POSTGRES_HOST=postgres-host
PYTHON_POSTGRES_DATABASE=db-name
PYTHON_POSTGRES_PORT=5432
# An overwrite path to an sqlite db, overwrites the POSTGRES section
PYTHON_SQLITE_PATH=PathToSQLite3.db
```
### Local execution / config file
Create a `*.json` in the root of this repo with the following structure
(values to be replaces by desired config):
Please note that an `sqlite` entry overwrites the `postgres` entry.
To use the `*.json` use the path to it as an argument when using an entrypoint (`webserver secrets.json`).
```json
{
"sqlite": "path-to-sqlite.db",
"postgres": {
"username": "username",
"password": "password",
"host": "localhost",
"database": "db-name",
"port": 5432
},
"mongo": {
"username": "username",
"password": "password",
"host": "localhost",
"database": "transparenzregister",
"port": 27017
}
}
```
### sqlite vs. postgres
We support both sqlite and postgres because a local db is filled in about 10% of the time the remote db needs to be completed.
Even tough we use the `sqlite` for testing the connection can't manage multithreading or multiprocessing.
This clashes with the webserver. For production mode use the `postgres`-db.