mirror of
https://github.com/fhswf/aki_prj23_transparenzregister.git
synced 2025-08-13 21:19:31 +02:00
Reverting black for the jupyter notebooks gets old. Can we just run black over all of them?
16 KiB
16 KiB
Corporate Intelligence¶
Down to due maintenance work from 24.03. - 26.03.
Basically a Bundesanzeiger Scraping Wrapper
In [2]:
from deutschland.bundesanzeiger import Bundesanzeiger
ba = Bundesanzeiger()
# search term
data = ba.get_reports("Atos IT-Dienstleistung & Beratung GmbH")
# returns a dictionary with all reports found as fulltext reports
print(data.keys())
In [9]:
# Note: There can be multiple "Aufsichtsrat" entries per Company, the API however does only return one because the keys are overwritten
jahresabschluss = data[
"Jahresabschluss zum Geschäftsjahr vom 01.01.2019 bis zum 31.12.2019"
]
# Note: Although the report includes the entire text it lacks the formatting that would make extracting information a lot easier as the data is wrapped inside a <table> originally
with open("./jahresabschluss-example.txt", "w") as file:
file.write(jahresabschluss["report"])
print(jahresabschluss.keys())
In [3]:
from deutschland.handelsregister import Handelsregister
hr = Handelsregister()
results = hr.search(keywords="BLUECHILLED Verwaltungs GmbH")
print(results)
Offene Register¶
Hint: Visualize schema with tools such a DBeaver
Note: Not up-to-date
In [3]:
# SQLite export
import sqlite3
con = sqlite3.connect("../data/openregister.db")
In [4]:
cur = con.cursor()
In [5]:
schema = cur.execute("SELECT name FROM sqlite_master WHERE type='table';")
schema.fetchall()
Out[5]:
In [6]:
import pandas as pd
df = pd.read_sql_query("SELECT * FROM company LIMIT 100", con)
df.head()
Out[6]:
Open Corporates¶
In [7]:
import requests
BASE_URL = "https://api.opencorporates.com"
In [8]:
response = requests.get(f"{BASE_URL}/companies/search")
response.status_code
Out[8]: