{ "cells": [ { "cell_type": "markdown", "metadata": { "jupyter": { "outputs_hidden": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# DevOps for the Transparenzregister analysis" ] }, { "cell_type": "markdown", "metadata": { "jupyter": { "outputs_hidden": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Dependency management in Python\n", "### Tools\n", "\n", "- `The requirements.txt` lists all the dependencies of a project with version number and optionally with hashes and additional indexes and conditions for system specific differences.\n", " - Changes are difficult because auf interdependency. \n", " - Sync with requirements.txt is impossible via pip.\n", " - All indirekt requirements need to be changed manually. \n", " - Security and other routine upgrades for bugfixes are annoying and difficult to solve.\n", " - Adding new requirements is complex." ] }, { "cell_type": "markdown", "metadata": { "jupyter": { "outputs_hidden": false }, "slideshow": { "slide_type": "skip" } }, "source": [ "- `pip-tools` is the next level up.\n", " - Generates `requirements.txt` from `requirements.ini`\n", " - Allows for sync with ``requirements.txt`\n", " - No solution to manage multiple combinations of requirements for multiple problems.\n", " - Applications or packages with dev and build tools\n", " - Applications or packages with test and lint tools\n", " - packages with additional typing packages\n", " - A combination there of\n", "- `pip-compile-multi` is an extension of `pip-tools` and allows for the generation of multiple requirements files.\n", " - Only configured combinations of dependency groups are allowed.\n", " - Different configurations may find different solutions." ] }, { "cell_type": "markdown", "metadata": { "jupyter": { "outputs_hidden": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "- `poetry` is the most advanced tool to solve python dependencies\n", " - Comparable to Javas `maven`\n", " - Finds a complete solution for all requirement groups and installed groups as defined\n", " - Allows for upgradable packages in defined bounds.\n", " - Exports a solution that can be used on multiple machines to guarantee the same environment\n", " - Handling of Virtual environments\n", " - Automatically includes requirements in metadata and other entries for wheel when building\n", " - Build and publication management\n", " - Complete packaging configuration in `pyproject.toml` as required in **PEP 621**\n", " - Supports plugins" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Defined Poetry dependency groups in our project\n", "| Group | Contents & Purpose |\n", "|:--------|:--------------------------------------------------------------------|\n", "| root | The packages needed for the package itself |\n", "| develop | Packages needed to support the development such as `jupyter` |\n", "| lint | Packages needed for linting such as `mypy`, `pandas-stubs` & `ruff` |\n", "| test | Packages needed for testing such as `pytest` |\n", "| doc | Packages needed for the documentation build such as `sphinx` |" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### How to use poetry\n", "- `poetry new ` command creates a new poject with folder structure and everything.\n", "- `poetry init` adds a poetry configuration to an existing project.\n", "\n", "- `poetry install` If the project is already configured will install the dependencies.\n", " - kwarg `--with dev` force it to install the dependencies to develop with. In our case that would be a jupyter setup.\n", " - kwarg `--without lint,test` forces poetry to not install the dependencies for the groups lint and test. For our case that would include pytest, mypy and typing packages." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "- `poetry add pandas_typing<=2` would add pandas with a versions smaller than `2.0.0` as a dependency.\n", " - kwarg `--group lint` would configure it as part of the dependency group typing.\n", " - A package can be part of multiple groups.\n", " - By default, it is part of the package requirements that are part of the requirements if a build wheel is installed.\n", " - Only direct requirements are configured! Indirect requirements are solved.\n", "- `poetry update` updates the dependency solution and syncs the new dependencies.\n", "- Requirement files can be exported.\n", "\n", "The full documentation can be found [here](https://python-poetry.org/)!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Linter\n", "\n", "Python is an\n", "\n", "- interpreted\n", "- weak typed\n", "\n", "programing language.\n", "Without validation of types and other compile mechanisms that highlight some errors.\n", "\n", "Lint stands for *lint is not a software testing tool* and analyses the code for known patterns that can lead to different kinds of problems." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Why lint: Application perspective:\n", "- In Compiled programming languages many errors a thrown when a software is build. This is a first minimum quality gate for code.\n", "- Hard typing also enforces a certain explicit expectation on arguments are expected. This is a secondary quality gate for code python does not share.\n", "- This allows for a certain flexibility but allows for careless mistakes.\n", "- Helps to find inconsistencies\n", "- Helps to find security vulnerabilities" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Why lint: Human perspective:\n", "- Certainty that naming conventions are followed allows for an easier code understanding.\n", "- Auto whitespace formatting (Black) \n", " - Absolut whitespace formatting allows for a clean differentials when versioning with git.\n", " - The brain does not need to adapt on how somebody else formats his code\n", " - No time wasted on beatification of code through whitespace\n", "- Classic linter\n", " - Faster increas in abilities\n", " - Nobody needs to read a long styleguide\n", " - Reminds the programmer of style rules when they are violated\n", " - Contributers from otside the project can contribute easier\n", " - Code simplifications are pointed out\n", " - Reduces the number of variances for the same functionality" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Collection of Recommended linter\n", "\n", "- Black for an automatic and absolut whitespace formatting. (No configuration options)\n", "- Ruff faster rust implementation of many commonly used linters.\n", " - Reimplementation of the following tools:\n", " - flake8 (Classic python linter, unused imports, pep8)\n", " - isort Automatic import sorting (Vanilla python, third party, your package)\n", " - bandit (Static code analysis for security problems)\n", " - pylint (General static code analysis)\n", " - many more\n", " - Fixes many things that have `simple` fixes\n", " - Relatively new\n", " - Endorsed from project like pandas, FastAPI, Hugging Face, SciPy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- mypy\n", " - Checks typing for python\n", " - Commonly used linter for typing\n", " - Often needs support of typing tools\n", " - Sometimes additional typing information is needed from packages such as `pandas_stubs`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- pip-audit checks dependencies against vulnarability db\n", "- pip-license checks if a dependency has an allowed license" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Testing with Pytest\n", "\n", "Even tough python comes with its own testing framework a much more lightweight and more commonly used testing framework is `pytest`\n", "\n", "``tests/basic_test.py``\n", "\n", "```python\n", "from ... import add\n", "\n", "def test_addition():\n", " assert add(4, 3) == 7, \"The addtion did not result into the correct result\"\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Parametizeed Test\n", "\n", "In addition, pytest contains the functionality to parameter its inputs\n", "\n", "``tests/parametriesed_test.py``\n", "\n", "```python\n", "import pytest\n", "\n", "from ... import add\n", "\n", "@pytest.mark.parametize(\"inputvalues,output_value\", [[(1,2,3), 6], [(21, 21), 42]])\n", "def test_addition(inputvalues: tuple[float, ...], output_value: [float]):\n", " assert add(*inputvalues) == output_value\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Tests with setup and teardown\n", "\n", "Setting up an enviroment and cleaning it up afterwords is possible with `pytest`'s `fixture`\n", "\n", "``tests/setup_and_teardown_test.py`` \n", "\n", "```python\n", "import pytest\n", "\n", "from sqlalchemy.orm import Session\n", "\n", "@pytest.fixture()\n", "def create_test_sql() -> Generator[Session, None, None]:\n", " # create_test_sql_table\n", " # create sql connection\n", " yield sql_session\n", " # delete sql connection\n", " # delete sql tables\n", " \n", "def test_sql_table(create_test_sql) -> None:\n", " assert sql_engine.query(HelloWorldTable).get(\"hello\") == \"world\"\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Tests are run with the following command\n", "\n", "```bash\n", "poetry run pytest tests/\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Code Coverage\n", "\n", "Code coverage reports count how many times a line was executed and therfore tested.\n", "\n", "They can eiter be integrated into an IDE for higliting of missing code or reviewed directly.\n", "\n", "Either over third party software or by the html version that can be found with the build artifacts." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## pre-commits\n", "\n", "Git is a filesystem based versioning application.\n", "That includes parts of its code are accessible and ment to be manipulated.\n", "At different times of the application a manipulate script can be executed.\n", "Typicle moments are on:\n", "- pull\n", "- push / push received\n", "- pre-commit / pre-merge / pre-rebase\n", "\n", "The `pre-commit` package hooks into the commit and implements a set of programms before committing\n", "Files can be **edited** or **validated**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`pre-commit` execute fast tests on changed files to ensure quality of code.\n", "\n", "**Bohems Law**\n", "\n", "![Bohems Law](bohems-law.png)\n", "\n", "Since they are executed on commit on only the newly committed files a response is much faster.\n", "The normally only include linting and format validation tools no testing.\n", "Sometimes autofixer such as black, isort and ruff." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![Pre-commit.PNG](Pre-commit.PNG)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Configured pre-commit hooks:\n", "\n", "- format checker + pretty formatter (xml,json,ini,yaml,toml)\n", "- secret checker => No passwords or private keys\n", "- file naming convention checker for tests\n", "- syntax checker\n", "- ruff => Linter\n", "- black => Whitespace formatter\n", "- poetry checker\n", "- mypy => typing checker\n", "- md-toc => Adds a table oc contents to an *.md where `` is placed" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Pre commits are installed with the command\n", "```bash\n", "pre-commit install\n", "```\n", "The pre commits after that executed on each commit.\n", "\n", "If the pre-commits need to be skipped the -n option skips them on commit.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Documentation build with sphinx\n", "\n", "There is no single way to use to build a python documentation.\n", "Sphinx is a commonly used libarary.\n", "\n", "- Builds a package documentation from code\n", "- Native in rest\n", "- Capable of importing *.md, *.ipynb\n", "- Commonly used read the docs theme\n", "- Allows links to third party documentations via inter-sphinx (pandas, numpy, etc.)\n", "\n", "Currently implemented to build a documentation on pull_requests and the main branch.\n", "\n", "Automatically deployed from the main branch." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## GitHub\n", "\n", "GitHub is a central hub for git repositories to be stored and manged.\n", "\n", "In addition, it hosts project management tools and devops tools for:\n", "- testing\n", "- linting\n", "- analysing\n", "- building\n", "- deploying code\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example GitHub action workflow\n", "\n", "Workflows are defined in `.github/workflows/some-workflow.yaml`\n", "```yaml\n", "name: Build\n", "\n", "on: # when to run the action\n", " pull_request:\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A single action of a workflow.\n", "\n", "```yaml\n", "jobs:\n", " build:\n", " runs-on: ubuntu-latest # on what kind of runner to run an action\n", " steps:\n", " - uses: actions/setup-python@v4 # setup python\n", " with:\n", " python-version: 3.11\n", " - uses: snok/install-poetry@v1 # setup poetry\n", " with:\n", " version: 1.4.2\n", " virtualenvs-path: ~/local/share/virtualenvs\n", " - uses: actions/checkout@v3\n", " - run: |\n", " poetry install --without develop,doc,lint,test\n", " poetry build\n", " - uses: actions/upload-artifact@v3\n", " with:\n", " name: builds\n", " path: dist/\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Test and Build-pipeline with GitHub actions\n", "On push and pull request:\n", "- Lint + license check + dependency security audit\n", " - Problem summaries in GitHub actions + Problem notification via mail\n", "- Test with pytest + coverage reports + coverage comment on pull request\n", "- Python Build\n", "- Documentation Build\n", "- Documentation deployment to GitHub pages (on push to main)\n", "\n", "On Tag:\n", "- Push: Docker architecture and CD context still unclear" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Build artifacts\n", "\n", "- Dependencies / versions and licenses\n", "- Security report\n", "- Unit test reports and coverage report as `.coverage` / `coverage.xml` / `html`!\n", "- Build wheel\n", "- Build documentation\n", "- probably. one or more container\n", "- if needed documentation as pdf" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![Action Snapshot](Action.PNG)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![Action Summary](Action-Summary.PNG)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![Lint-error.PNG](Lint-error.PNG)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![Coverage.PNG](Coverage.PNG)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![Pull Request](Pull_request.PNG)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Dependabot\n", "\n", "Dependabot is a GitHub tool to refresh dependencies if newer ones come available or if the currently used ones develop security flaws.\n", "Dependabot is currently not python compatible.\n", "Dependabot is a tool for a passive maintenance of a project without the need for much human overside." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### GitHub Runner Configuration and what does not work\n", "\n", "Most GitHub actions for python reley on the `actions/python-setup` action.\n", "This action is not available for linux arm.\n", "Workarounds with a python docker container / an installation of python on the runner and other tools do not work well." ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" }, "rise": { "scroll": true } }, "nbformat": 4, "nbformat_minor": 4 }