2024-01-08 19:01:56 +01:00
2024-01-15 22:21:27 +01:00
2024-01-17 16:22:46 +01:00
2023-12-14 18:08:10 +01:00
2024-01-24 19:44:57 +01:00
2024-01-19 21:47:03 +01:00
2024-01-13 09:59:03 +01:00
2024-01-25 17:49:23 +01:00

README.md of the aki_prj23_transparenzregister

Contributions

See the CONTRIBUTING.md about how code should be formatted and what kind of rules we set ourselves.

Defined entrypoints

The project has currently the following entrypoint available:

  • data-transformation > Transfers all the data from the mongodb into the sql db to make it available as production data.
  • data-processing > Processes the data using NLP methods and transfers matched data into the SQL table ready for use.
  • reset-sql > Resets all sql tables in the connected db.
  • copy-sql > Copys the content of a db to another db.
  • webserver > Starts the webserver showing the analysis results.
  • find-missing-companies > Retrieves meta information of companies referenced by others but not yet part of the dataset.
  • ingest > Scheduled data ingestion of news articles as well as missing companies and financial data.

All entrypoints support the -h argument that shows a short help text.

Applikation startup

Central Build

The packages / container built by GitHub are accessible for users that are logged into the GitHub Container Registry (GHCR) with a Personal Access token via Docker. Run docker login ghcr.io to start that process. The complete docs on logging in can be found here. The application can than be simply started with docker compose up --pull. Please note that some configuration with a .env is necessary.

Local Build

The application can be locally build by starting the rebuild-and-start.bat, if poetry and docker-compose is installed. This will build a current *.whl and build the Docker container locally. The configuration that start like this is the local-docker-compose.yaml. Please note that some configuration with a .env is necessary.

Application Settings

Docker configuration / Environmental-Variables

The current design of this application suggests that it is started inside a docker-compose configuration. For docker-compose this is commonly done by providing a .env file in the root folder.

To use the environmental configuration start an application part with the ENV argument (webserver ENV).

# Defines the container registry used. Default: "ghcr.io/fhswf/aki_prj23_transparenzregister"
CR=ghcr.io/fhswf/aki_prj23_transparenzregister

# main is the tag the main branch is taged with and is currently in use
TAG=latest

# Configures the access port for the webserver. 
# Default: "80" (local only)
HTTP_PORT: 8888

# configures where the application root is based. Default: "/"
DASH_URL_BASE_PATHNAME=/transparenzregister/

# Enables basic auth for the application. 
# Diabled when one is empty. Default: Disabled
PYTHON_DASH_LOGIN_USERNAME=some-login-to-webgui
PYTHON_DASH_LOGIN_PW=some-pw-to-login-to-webgui

# How often data should be ingested in houres, Default: "4"
PYTHON_INGEST_SCHEDULE=12

# Settings for NER Service
# possible values: "spacy", "company_list", "transformer", Default: "transformer"
PYTHON_NER_METHOD=transformer
# possible values: "text", "title", Default: "text"
PYTHON_NER_DOC=text

# Settings for Sentiment Service 
# possible values: "spacy", "transformer", Default: "transformer"
PYTHON_SENTIMENT_METHOD=transformer
# possible values: "text", "title", Default: "text"
PYTHON_SENTIMENT_DOC=text

# Acces to the mongo db
PYTHON_MONGO_USERNAME=username
PYTHON_MONGO_HOST=mongodb
PYTHON_MONGO_PASSWORD=password
PYTHON_MONGO_PORT=27017
PYTHON_MONGO_DATABASE=transparenzregister

# Acces to the postress sql db
PYTHON_POSTGRES_USERNAME=username
PYTHON_POSTGRES_PASSWORD=password
PYTHON_POSTGRES_HOST=postgres-host
PYTHON_POSTGRES_DATABASE=db-name
PYTHON_POSTGRES_PORT=5432

# An overwrite path to an sqlite db, overwrites the POSTGRES section
PYTHON_SQLITE_PATH=PathToSQLite3.db

Local execution / config file

Create a *.json in the root of this repo with the following structure (values to be replaces by desired config): Please note that an sqlite entry overwrites the postgres entry. To use the *.json use the path to it as an argument when using an entrypoint (webserver secrets.json).

{
  "sqlite": "path-to-sqlite.db",
  "postgres": {               
    "username": "username",      
    "password": "password",
    "host": "localhost",
    "database": "db-name",
    "port": 5432
  },
  "mongo": {
    "username": "username",
    "password": "password",
    "host": "localhost",
    "database": "transparenzregister",
    "port": 27017
  }
}

sqlite vs. postgres

We support both sqlite and postgres because a local db is filled in about 10% of the time the remote db needs to be completed. Even tough we use the sqlite for testing the connection can't manage multithreading or multiprocessing. This clashes with the webserver. For production mode use the postgres-db.

Re-Enable Actions & Dependabot

After the project is over all computation using parts should be turned off.

To enable all the features please enable the GitHub Actions first. The following image shows where the buttons to enable the actions can be found.

Actions

Additionally, it is recommended to enable Dependabot. Please note that patches are currently only demanded for critical security fixes. Use poetry update prior to restarting the project to update all the python dependencies. Note that both security updates and alerts should be enabled.

Dependabot

Description
No description provided
Readme 138 MiB
Languages
Jupyter Notebook 84.5%
HTML 12.1%
Python 3.3%