Philipp Horstenkamp 066800123d
Created pipeline to run ner sentiment and sql ingest (#314)
Created a dataprocessing pipline that enhances the raw mined data with
Organsiation extractions and sentiment analysis prio to moving the data
to the sql db.
The transfer of matched data is done afterword.

---------

Co-authored-by: SeZett <zeleny.sebastian@fh-swf.de>
2023-11-11 13:28:12 +00:00

3.1 KiB

aki_prj23_transparenzregister

python Actions status Ruff pre-commit Checked with mypy Documentation Status Code style: black

Contributions

See the CONTRIBUTING.md about how code should be formatted and what kind of rules we set ourselves.

Available entrypoints

The project has currently the following entrypoint available:

  • data-transformation > Transfers all the data from the mongodb into the sql db to make it available as production data.
  • data-processing > Processes the data using NLP methods and transfers matched data into the SQL table ready for use.
  • reset-sql > Resets all sql tables in the connected db.
  • copy-sql > Copys the content of a db to another db.
  • webserver > Starts the webserver showing the analysis results.

DB Connection settings

To connect to the SQL db see sql/connector.py To connect to the Mongo db see [connect]

Create a secrets.json in the root of this repo with the following structure (values to be replaces by desired config):

The sqlite db is alternative to the postgres section.

{
  "sqlite": "path-to-sqlite.db",
  "postgres": {               
    "username": "username",      
    "password": "password",
    "host": "localhost",
    "database": "db-name",
    "port": 5432
  },
  "mongo": {
    "username": "username",
    "password": "password",
    "host": "localhost",
    "database": "transparenzregister",
    "port": 27017
  }
}

Alternatively, the secrets can be provided as environment variables. One option to do so is to add a .env file with the following layout:

PYTHON_POSTGRES_USERNAME=postgres
PYTHON_POSTGRES_PASSWORD=postgres
PYTHON_POSTGRES_HOST=localhost
PYTHON_POSTGRES_DATABASE=postgres
PYTHON_POSTGRES_PORT=5432

PYTHON_MONGO_USERNAME=username
PYTHON_MONGO_HOST=localhost
PYTHON_MONGO_PASSWORD=password
PYTHON_MONGO_PORT=27017
PYTHON_MONGO_DATABASE=transparenzregister

PYTHON_SQLITE_PATH=PathToSQLite3.db # An overwrite path to an sqllite db

PYTHON_DASH_LOGIN_USERNAME=some-login-to-webgui
PYTHON_DASH_LOGIN_PW=some-pw-to-login-to-webgui

CR=ghcr.io/fhswf/aki_prj23_transparenzregister
TAG=latest

HTTP_PORT=80

The prefix PYTHON_ can be customized by setting a different prefix when constructing the ConfigProvider.