mirror of
https://github.com/fhswf/aki_prj23_transparenzregister.git
synced 2025-04-21 21:12:54 +02:00
398 lines
18 KiB
Plaintext
398 lines
18 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Networkx und Pyvis - Minimal Working Example"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Referenzen: \n",
|
|
"- [Networkx Dokumentation](https://networkx.org/documentation/stable/)\n",
|
|
"- [Pyvis Dokumentation](https://pyvis.readthedocs.io/en/latest/index.html)\n",
|
|
"- [Introduction to Python for Humanists](https://python-textbook.pythonhumanities.com/06_sna/06_01_05_networkx_pyvis.html)\n",
|
|
"\n",
|
|
"\n",
|
|
"Networkx ist eine Python Bibliothek zur Erstellung und Analyse von Netzwerken. Pyvis ist eine Python Bibliothek zur interaktiven Visualisierung von Netzwerkgraphen. Beide können mit `pip` installiert werden. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Requirement already satisfied: networkx in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (2.6.3)\n",
|
|
"Requirement already satisfied: pyvis in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (0.3.2)\n",
|
|
"Requirement already satisfied: ipython>=5.3.0 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pyvis) (7.29.0)\n",
|
|
"Requirement already satisfied: jsonpickle>=1.4.1 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pyvis) (3.0.1)\n",
|
|
"Requirement already satisfied: networkx>=1.11 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pyvis) (2.6.3)\n",
|
|
"Requirement already satisfied: jinja2>=2.9.6 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pyvis) (2.11.3)\n",
|
|
"Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (3.0.20)\n",
|
|
"Requirement already satisfied: jedi>=0.16 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.18.0)\n",
|
|
"Requirement already satisfied: traitlets>=4.2 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (5.1.0)\n",
|
|
"Requirement already satisfied: pexpect>4.3 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (4.8.0)\n",
|
|
"Requirement already satisfied: setuptools>=18.5 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (58.0.4)\n",
|
|
"Requirement already satisfied: matplotlib-inline in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.1.2)\n",
|
|
"Requirement already satisfied: decorator in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (5.1.0)\n",
|
|
"Requirement already satisfied: backcall in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.2.0)\n",
|
|
"Requirement already satisfied: pygments in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (2.10.0)\n",
|
|
"Requirement already satisfied: appnope in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.1.2)\n",
|
|
"Requirement already satisfied: pickleshare in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.3.0->pyvis) (0.7.5)\n",
|
|
"Requirement already satisfied: parso<0.9.0,>=0.8.0 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from jedi>=0.16->ipython>=5.3.0->pyvis) (0.8.2)\n",
|
|
"Requirement already satisfied: MarkupSafe>=0.23 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from jinja2>=2.9.6->pyvis) (1.1.1)\n",
|
|
"Requirement already satisfied: ptyprocess>=0.5 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pexpect>4.3->ipython>=5.3.0->pyvis) (0.7.0)\n",
|
|
"Requirement already satisfied: wcwidth in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=5.3.0->pyvis) (0.2.5)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# install networkx and pyvis using pip\n",
|
|
"!pip install networkx\n",
|
|
"!pip install pyvis"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Panda Dataframe mit Beispieldaten\n",
|
|
"\n",
|
|
"Um ein Netzwerk aufbauen zu können, brauchen wir Daten für die Knoten (nodes) und Kanten (edges). Die Daten speichern wir jeweils in einem Panda Dataframe. Pandas kann ebenfalls mit `pip` installiert werden. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Requirement already satisfied: pandas in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (1.3.4)\n",
|
|
"Requirement already satisfied: python-dateutil>=2.7.3 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pandas) (2.8.2)\n",
|
|
"Requirement already satisfied: pytz>=2017.3 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pandas) (2021.3)\n",
|
|
"Requirement already satisfied: numpy>=1.17.3 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from pandas) (1.20.3)\n",
|
|
"Requirement already satisfied: six>=1.5 in /Users/kim/opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# install pandas using pip\n",
|
|
"!pip install pandas"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Die Knoten unseres Netzwerks sollen die Unternehmen und Personen darstellen. Eine `id` ermöglicht die eindeutige Identifizierung eines Knoten und hilft Duplikate zu vermeiden. Um Unternehmen von Personen differenzieren zu können, wurde zusätzlich die Information `type` aufgenommen. Sie dient in unserem Beispiel dazu, die Form des Knoten zu bestimmen. Durch `label` bekommt der Knoten eine für den User verständliche Bezeichnung. Weitere Informationen, wie zum Beispiel `branche`, können später für das Mouse Over oder die Größe oder Farbe der Knoten verwendet werden. \n",
|
|
"\n",
|
|
"Um in einem späteren Schritt die Attribute der Knoten an das Netzwerk zu übergeben, generieren wir zusätzlich eine Spalte `shape`, eine Spalte `color` und eine Spalte `title`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" id label type branche shape color title\n",
|
|
"0 1 Firma 1 Company Branche 1 dot #f3e8eeff Firma 1\\nBranche 1\n",
|
|
"1 2 Firma 2 Company Branche 2 dot #bacdb0ff Firma 2\\nBranche 2\n",
|
|
"2 3 Firma 3 Company Branche 3 dot #729b79ff Firma 3\\nBranche 3\n",
|
|
"3 4 Firma 4 Company Branche 4 dot #475b63ff Firma 4\\nBranche 4\n",
|
|
"4 5 Firma 5 Company Branche 5 dot #2e2c2fff Firma 5\\nBranche 5\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# import pandas\n",
|
|
"import pandas as pd\n",
|
|
"\n",
|
|
"# create dataframe based on the sample data\n",
|
|
"df_nodes = pd.read_csv(\"nodes.csv\", sep=\";\")\n",
|
|
"\n",
|
|
"# define shape based on the type\n",
|
|
"node_shape = {\"Company\": \"dot\", \"Person\": \"triangle\"}\n",
|
|
"df_nodes[\"shape\"] = df_nodes[\"type\"].map(node_shape)\n",
|
|
"\n",
|
|
"# define color based on branche\n",
|
|
"node_color = {\n",
|
|
" \"Branche 1\": \" #f3e8eeff\",\n",
|
|
" \"Branche 2\": \"#bacdb0ff\",\n",
|
|
" \"Branche 3\": \"#729b79ff\",\n",
|
|
" \"Branche 4\": \"#475b63ff\",\n",
|
|
" \"Branche 5\": \"#2e2c2fff\",\n",
|
|
"}\n",
|
|
"df_nodes[\"color\"] = df_nodes[\"branche\"].map(node_color)\n",
|
|
"\n",
|
|
"# add information column that can be used for the mouse over in the graph\n",
|
|
"df_nodes = df_nodes.fillna(\"\")\n",
|
|
"df_nodes[\"title\"] = df_nodes[\"label\"] + \"\\n\" + df_nodes[\"branche\"]\n",
|
|
"\n",
|
|
"# show first five entries of the dataframe\n",
|
|
"print(df_nodes.head())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Die Kanten visualisieren die Beziehungen zwischen den Unternehmen und Personen. Um in Pyvis eine Kante darzustellen braucht es minimal die Information zwischen welchen beiden Knoten eine Kante dargestellt werden soll. In den Beispieldaten entspricht dies `from` und `to`. Es wird jeweils auf die eindeutige `id` der jeweiligen Knoten referenziert. `label` bezeichnet hier die Art der Beziehung, z.B. AR = Aufsichtsrat. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" from to label weight\n",
|
|
"0 1 50 AR 2\n",
|
|
"1 1 41 V 4\n",
|
|
"2 1 46 WP 5\n",
|
|
"3 1 48 AR 1\n",
|
|
"4 1 40 V 4\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# create dataframe based on the sample data\n",
|
|
"df_edges = pd.read_csv(\"edges.csv\", sep=\";\")\n",
|
|
"\n",
|
|
"# show first five entries of the dataframe\n",
|
|
"print(df_edges.head())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Erstellung eines Netzwerks mit networkx\n",
|
|
"\n",
|
|
"Zur Erstellung des Netzwerks nutzen wir `networkx`, da diese Bibliothek bessere Analysemöglichkeiten hat als `pyvis`. Das mit `networkx` erstellte Netzwerk können wir später an `pyvis` zur interaktiven Visualisierung übergeben werden. \n",
|
|
"\n",
|
|
"Wir erstellen die Knoten und Kanten auf Basis unsere beiden Dataframes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# import networkx\n",
|
|
"import networkx as nx\n",
|
|
"\n",
|
|
"# initiate graph\n",
|
|
"graph = nx.MultiGraph()\n",
|
|
"\n",
|
|
"# create edges from dataframe\n",
|
|
"graph = nx.from_pandas_edgelist(\n",
|
|
" df_edges, source=\"from\", target=\"to\", edge_attr=[\"label\"]\n",
|
|
") # , 'weight'])\n",
|
|
"\n",
|
|
"# pos = nx.spring_layout(graph, weight = 'weight')\n",
|
|
"# df_nodes['x'] = df_nodes['id'].map(lambda x: pos[x][0])\n",
|
|
"# df_nodes['y'] = df_nodes['id'].map(lambda x: pos[x][1])\n",
|
|
"\n",
|
|
"# update node attributes from dataframe\n",
|
|
"nodes_attr = df_nodes.set_index(\"id\").to_dict(orient=\"index\")\n",
|
|
"nx.set_node_attributes(graph, nodes_attr)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Mit Hilfe von `single_source_shortest_path_length` lässt sich die Anzahl der Nachbarn in unterschiedlichen Ebenen bestimmen. Durch die Eingrenzung des `cutoff` listet es alle Nachbarn und bis dahin benötigte Schritte. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[{'id': 1, 'k=1': 9, 'k=2': 38, 'k=3': 49}, {'id': 50, 'k=1': 4, 'k=2': 19, 'k=3': 46}, {'id': 41, 'k=1': 8, 'k=2': 30, 'k=3': 48}, {'id': 46, 'k=1': 4, 'k=2': 20, 'k=3': 47}, {'id': 48, 'k=1': 5, 'k=2': 21, 'k=3': 46}]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# create empty list to save k-neighbours for each node\n",
|
|
"k_neighbours = []\n",
|
|
"\n",
|
|
"# loop all nodes in the graph\n",
|
|
"for node in graph.nodes:\n",
|
|
" # create empty dictionary\n",
|
|
" dict = {}\n",
|
|
" # get node id\n",
|
|
" dict[\"id\"] = node\n",
|
|
" # get k-neighbours for k=1,2,3, subtract -1 since output of single_source_shortest_path_length contains node itself\n",
|
|
" dict[\"k=1\"] = len(nx.single_source_shortest_path_length(graph, node, cutoff=1)) - 1\n",
|
|
" dict[\"k=2\"] = len(nx.single_source_shortest_path_length(graph, node, cutoff=2)) - 1\n",
|
|
" dict[\"k=3\"] = len(nx.single_source_shortest_path_length(graph, node, cutoff=3)) - 1\n",
|
|
" # append list for each node\n",
|
|
" k_neighbours.append(dict)\n",
|
|
"\n",
|
|
"print(k_neighbours[:5])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Visualisierung des Netzwerks mit pyvis\n",
|
|
"\n",
|
|
"Für die Visualisierung importieren wir `Network` von `pyvis.network` und initialisiern das `pyvis` Netzwerk. Mit der Methode `from_nx` können wir das `networkx` Netzwerk übergeben. \n",
|
|
"\n",
|
|
"Die Größe der Knoten bestimmen wir je nach Auswahl entweder aufgrund der Anzahl der Verbindungen zu anderen Knoten oder anhand der Eigenvektor-Zentralität. Knoten mit vielen Verbindungen bzw. höherer Zentralität werden größer dargestellt."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# visualize using pyvis\n",
|
|
"from pyvis.network import Network\n",
|
|
"\n",
|
|
"# initiate network\n",
|
|
"net = Network(\n",
|
|
" directed=False, neighborhood_highlight=True, bgcolor=\"white\", font_color=\"black\"\n",
|
|
")\n",
|
|
"\n",
|
|
"# pass networkx graph to pyvis\n",
|
|
"net.from_nx(graph)\n",
|
|
"\n",
|
|
"# set edge options\n",
|
|
"net.inherit_edge_colors(False)\n",
|
|
"net.set_edge_smooth(\"dynamic\")\n",
|
|
"\n",
|
|
"# chose size format\n",
|
|
"size_type = \"edges\" # select 'edges' or 'eigen'\n",
|
|
"\n",
|
|
"adj_list = net.get_adj_list()\n",
|
|
"\n",
|
|
"if size_type == \"eigen\":\n",
|
|
" eigenvector = nx.eigenvector_centrality(graph)\n",
|
|
"\n",
|
|
"# calculate and update size of the nodes depending on their number of edges\n",
|
|
"for node_id, neighbors in adj_list.items():\n",
|
|
" if size_type == \"edges\":\n",
|
|
" size = len(neighbors) * 5\n",
|
|
" if size_type == \"eigen\":\n",
|
|
" size = eigenvector[node_id] * 900\n",
|
|
" next(\n",
|
|
" (node.update({\"value\": size}) for node in net.nodes if node[\"id\"] == node_id),\n",
|
|
" None,\n",
|
|
" )\n",
|
|
" next(\n",
|
|
" (node.update({\"size\": size}) for node in net.nodes if node[\"id\"] == node_id),\n",
|
|
" None,\n",
|
|
" )\n",
|
|
"\n",
|
|
"# set the node distance and spring lenght using repulsion\n",
|
|
"net.repulsion(node_distance=250, spring_length=150)\n",
|
|
"\n",
|
|
"# activate physics buttons to further explore the available solvers:\n",
|
|
"# barnesHut, forceAtlas2Based, repulsion, hierarchicalRepulsion\n",
|
|
"net.show_buttons(filter_=[\"physics\"])\n",
|
|
"\n",
|
|
"# save graph as HTML\n",
|
|
"net.save_graph(\"networkx_pyvis.html\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"{1: 0.2590276672203281, 50: 0.11573458719186203, 41: 0.23089631495685015, 46: 0.09723686259252076, 48: 0.11963178876421071, 40: 0.19182246741215414, 37: 0.11584617541421141, 2: 0.2156948179967803, 44: 0.09069008350377152, 36: 0.12164282800223333, 49: 0.07333232608496432, 14: 0.05222928767261443, 15: 0.09120494717875655, 16: 0.07647921544493948, 17: 0.08405528712304138, 18: 0.07986204168307376, 19: 0.1323918034494897, 31: 0.2928131232735568, 20: 0.10812229868312798, 21: 0.08788249236713751, 22: 0.1186928914916743, 23: 0.11742853245579209, 6: 0.2077969102871851, 25: 0.0594562062424749, 39: 0.05827758860927808, 26: 0.12023885869170473, 27: 0.07040921471375026, 28: 0.04832885692517859, 29: 0.1266153741460298, 3: 0.19455723550826837, 35: 0.3112735938438682, 4: 0.1867770233044344, 5: 0.18288061376562756, 43: 0.10566912562335516, 7: 0.08267466177528007, 45: 0.061959219104767996, 8: 0.1277676533326448, 9: 0.15188022282282598, 32: 0.3558117649603071, 38: 0.07531646453054974, 47: 0.060242987519435645, 34: 0.08786367126085788, 33: 0.055712820471659395, 10: 0.1540931474648185, 11: 0.08703487502978537, 12: 0.02664931447399513, 13: 0.03161820778804042, 30: 0.042565390898229194, 24: 0.04996908258118087, 42: 0.011302736073531213}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"eigenvector = nx.eigenvector_centrality(graph)\n",
|
|
"print(eigenvector)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Offene Fragen\n",
|
|
"- Gibt es Knoten ohne Verbindung? Wenn erst die Kanten generiert werden, werden diese vermutlich bisher nicht berücksichtigt.\n",
|
|
"- Bei der Auswahl eines Unternehmens werden verbundene Knoten nicht farblich angezeigt\n",
|
|
"- Bei mehreren Verbindung zwischen zwei Knoten wird derzeit nur die erste angezeigt. Dies kann umgehen werden, wenn man das Netzwerk die Option `directed = True` mitgibt. Allerdings werden dadurch die Kanten zu Pfeilen und man muss bei der Speicherung der Verbindungen aufpassen. Gibt es auch Möglichkeiten für undirected graphs?\n",
|
|
"- Sollen die Kanten zusätzlich gewichtet werden? "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Resultierende Anforderungen an die Daten\n",
|
|
"\n",
|
|
"Relationale Daten für die Kanten und Ecken sind ausreichend. Für die Knoten (= Unternehmen, Personen) werden benötigt:\n",
|
|
"- Eindeutige ID\n",
|
|
"- Bezeichnung, z.B. Name des Unternehmens bzw. der Person\n",
|
|
"- Weitere Informationen, die im Mouse Over angezeigt oder nach denen die Farben oder Größen der Knoten konfiguriert werden sollen\n",
|
|
"\n",
|
|
"Für die Kanten (= Verbindungen) werden benötigt:\n",
|
|
"- Eindeutige IDs zwischen denen die Verbindung besteht\n",
|
|
"- Art der Verbindung\n",
|
|
"- Ggfs. Gewichtungen\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"interpreter": {
|
|
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3.10.1 64-bit",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.1"
|
|
},
|
|
"orig_nbformat": 4
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|