# README.md of the aki_prj23_transparenzregister ## Contributions See the [CONTRIBUTING.md](CONTRIBUTING.md) about how code should be formatted and what kind of rules we set ourselves. ## Defined entrypoints The project has currently the following entrypoint available: - **data-transformation** > Transfers all the data from the mongodb into the sql db to make it available as production data. - **data-processing** > Processes the data using NLP methods and transfers matched data into the SQL table ready for use. - **reset-sql** > Resets all sql tables in the connected db. - **copy-sql** > Copys the content of a db to another db. - **webserver** > Starts the webserver showing the analysis results. - **find-missing-companies** > - **ingest** > All entrypoints support the `-h` argument and show a short help. ## Applikation startup ### Central Build The packages / container built by GitHub are accessible for users that are logged into the GitHub Container Registry (GHCR) with a Personal Access token via Docker. Run `docker login ghcr.io` to start that process. [The complete docs on logging in can be found here.](https://docs.github.com/de/packages/working-with-a-github-packages-registry/working-with-the-container-registry) The application can than be simply started with `docker compose up --pull`. Please note that some configuration with a `.env` is necessary. ### Local Build The application can be locally build by starting the `rebuild-and-start.bat`, if `poetry` and `docker-compose` is installed. This will build a current `*.whl` and build the Docker container locally. The configuration that start like this is the `local-docker-compose.yaml`. Please note that some configuration with a `.env` is necessary. ## Application Settings ### Docker configuration / Environmental-Variables The current design of this application suggests that it is started inside a `docker-compose` configuration. For `docker-compose` this is commonly done by providing a `.env` file in the root folder. To use the environmental configuration start an application part with the `ENV` argument (`webserver ENV`). ```dotenv # Defines the container registry used. Default: "ghcr.io/fhswf/aki_prj23_transparenzregister" CR=ghcr.io/fhswf/aki_prj23_transparenzregister # main is the tag the main branch is taged with and is currently in use TAG=latest # Configures the access port for the webserver. # Default: "80" (local only) HTTP_PORT: 8888 # configures where the application root is based. Default: "/" DASH_URL_BASE_PATHNAME=/transparenzregister/ # Enables basic auth for the application. # Diabled when one is empty. Default: Disabled PYTHON_DASH_LOGIN_USERNAME=some-login-to-webgui PYTHON_DASH_LOGIN_PW=some-pw-to-login-to-webgui # How often data should be ingested in houres, Default: "4" PYTHON_INGEST_SCHEDULE=12 # Settings for NER Service # possible values: "spacy", "company_list", "transformer", Default: "transformer" PYTHON_NER_METHOD=transformer # possible values: "text", "title", Default: "text" PYTHON_NER_DOC=text # Settings for Sentiment Service # possible values: "spacy", "transformer", Default: "transformer" PYTHON_SENTIMENT_METHOD=transformer # possible values: "text", "title", Default: "text" PYTHON_SENTIMENT_DOC=text # Acces to the mongo db PYTHON_MONGO_USERNAME=username PYTHON_MONGO_HOST=mongodb PYTHON_MONGO_PASSWORD=password PYTHON_MONGO_PORT=27017 PYTHON_MONGO_DATABASE=transparenzregister # Acces to the postress sql db PYTHON_POSTGRES_USERNAME=username PYTHON_POSTGRES_PASSWORD=password PYTHON_POSTGRES_HOST=postgres-host PYTHON_POSTGRES_DATABASE=db-name PYTHON_POSTGRES_PORT=5432 # An overwrite path to an sqlite db, overwrites the POSTGRES section PYTHON_SQLITE_PATH=PathToSQLite3.db ``` ### Local execution / config file Create a `*.json` in the root of this repo with the following structure (values to be replaces by desired config): Please note that an `sqlite` entry overwrites the `postgres` entry. To use the `*.json` use the path to it as an argument when using an entrypoint (`webserver secrets.json`). ```json { "sqlite": "path-to-sqlite.db", "postgres": { "username": "username", "password": "password", "host": "localhost", "database": "db-name", "port": 5432 }, "mongo": { "username": "username", "password": "password", "host": "localhost", "database": "transparenzregister", "port": 27017 } } ``` ### sqlite vs. postgres We support both sqlite and postgres because a local db is filled in about 10% of the time the remote db needs to be completed. Even tough we use the `sqlite` for testing the connection can't manage multithreading or multiprocessing. This clashes with the webserver. For production mode use the `postgres`-db.