# README.md of the aki_prj23_transparenzregister ## Contributions See the [CONTRIBUTING.md](CONTRIBUTING.md) about how code should be formatted and what kind of rules we set ourselves. ## Defined entrypoints The project has currently the following entrypoint available: - **data-transformation** > Transfers all the data from the mongodb into the sql db to make it available as production data. - **data-processing** > Processes the data using NLP methods and transfers matched data into the SQL table ready for use. - **reset-sql** > Resets all sql tables in the connected db. - **copy-sql** > Copys the content of a db to another db. - **webserver** > Starts the webserver showing the analysis results. - **find-missing-companies** > Retrieves meta information of companies referenced by others but not yet part of the dataset. - **ingest** > Scheduled data ingestion of news articles as well as missing companies and financial data. All entrypoints support the `-h` argument that shows a short help text. ## Applikation startup ### Central Build The packages / container built by GitHub are accessible for users that are logged into the GitHub Container Registry (GHCR) with a Personal Access token via Docker. Run `docker login ghcr.io` to start that process. [The complete docs on logging in can be found here.](https://docs.github.com/de/packages/working-with-a-github-packages-registry/working-with-the-container-registry) The application can than be simply started with `docker compose up --pull`. Please note that some configuration with a `.env` is necessary. ### Local Build The application can be locally build by starting the `rebuild-and-start.bat`, if `poetry` and `docker-compose` is installed. This will build a current `*.whl` and build the Docker container locally. The configuration that start like this is the `local-docker-compose.yaml`. Please note that some configuration with a `.env` is necessary. ## Application Settings ### Docker configuration / Environmental-Variables The current design of this application suggests that it is started inside a `docker-compose` configuration. For `docker-compose` this is commonly done by providing a `.env` file in the root folder. To use the environmental configuration start an application part with the `ENV` argument (`webserver ENV`). ```dotenv # Defines the container registry used. Default: "ghcr.io/fhswf/aki_prj23_transparenzregister" CR=ghcr.io/fhswf/aki_prj23_transparenzregister # main is the tag the main branch is taged with and is currently in use TAG=latest # Configures the access port for the webserver. # Default: "80" (local only) HTTP_PORT: 8888 # configures where the application root is based. Default: "/" DASH_URL_BASE_PATHNAME=/transparenzregister/ # Enables basic auth for the application. # Diabled when one is empty. Default: Disabled PYTHON_DASH_LOGIN_USERNAME=some-login-to-webgui PYTHON_DASH_LOGIN_PW=some-pw-to-login-to-webgui # How often data should be ingested in houres, Default: "4" PYTHON_INGEST_SCHEDULE=12 # Settings for NER Service # possible values: "spacy", "company_list", "transformer", Default: "transformer" PYTHON_NER_METHOD=transformer # possible values: "text", "title", Default: "text" PYTHON_NER_DOC=text # Settings for Sentiment Service # possible values: "spacy", "transformer", Default: "transformer" PYTHON_SENTIMENT_METHOD=transformer # possible values: "text", "title", Default: "text" PYTHON_SENTIMENT_DOC=text # Acces to the mongo db PYTHON_MONGO_USERNAME=username PYTHON_MONGO_HOST=mongodb PYTHON_MONGO_PASSWORD=password PYTHON_MONGO_PORT=27017 PYTHON_MONGO_DATABASE=transparenzregister # Acces to the postress sql db PYTHON_POSTGRES_USERNAME=username PYTHON_POSTGRES_PASSWORD=password PYTHON_POSTGRES_HOST=postgres-host PYTHON_POSTGRES_DATABASE=db-name PYTHON_POSTGRES_PORT=5432 # An overwrite path to an sqlite db, overwrites the POSTGRES section PYTHON_SQLITE_PATH=PathToSQLite3.db ``` ### Local execution / config file Create a `*.json` in the root of this repo with the following structure (values to be replaces by desired config): Please note that an `sqlite` entry overwrites the `postgres` entry. To use the `*.json` use the path to it as an argument when using an entrypoint (`webserver secrets.json`). ```json { "sqlite": "path-to-sqlite.db", "postgres": { "username": "username", "password": "password", "host": "localhost", "database": "db-name", "port": 5432 }, "mongo": { "username": "username", "password": "password", "host": "localhost", "database": "transparenzregister", "port": 27017 } } ``` ### sqlite vs. postgres We support both sqlite and postgres because a local db is filled in about 10% of the time the remote db needs to be completed. Even tough we use the `sqlite` for testing the connection can't manage multithreading or multiprocessing. This clashes with the webserver. For production mode use the `postgres`-db. ## Re-Enable Actions & Dependabot After the project is over all computation using parts should be turned off. To enable all the features please enable the GitHub Actions first. The following image shows where the buttons to enable the actions can be found. ![Actions](Actions-annotatet.PNG) Additionally, it is recommended to enable Dependabot. Please note that patches are currently only demanded for critical security fixes. Use `poetry update` prior to restarting the project to update all the python dependencies. Note that both security updates and alerts should be enabled. ![Dependabot](Dependabot-annotatet.PNG)