Database of schools in Slovakia
At the time of writing, there is no single unified easily-accessible database of schools in Slovakia. This projects aims to show how to create one, using data that is already publicly available but hard to work with.
Sources of intermediate datasets:
The first website contains a list of datasets (one for each kind of school), we will download them all. The second link contains a number of datasets with additional information about some properties of schools (metadata such as regions, districts, taught languages, etc.),
Schools have some metadata, such as region or district IDs. We will get the additional datasets to be able to join the school tables.
Each school kind has many possible types (a more granular specification of the type of school).
(dataset of possible 'primary teaching languages' in schools)
For each school, this is the institution that founded or operates the school.
Fetching school datasets
Now that we have all required metadata datasets, we can download all school datasets (again to the
data directory), concatenate the individual databases and merging them with collected metadata.
We collect the list of all school datasets by parsing the table on this website.
Merging school datasets
This is tricky, because not all tables have the same columns, however, there is large overlap, so we just merge all tables and allow missing values. We also
- drop certain columns related only to a specific kind of school
- add the 'kind' column (which is constant for a single dataset)
- add the 'category' column (based on 'kind' but fewer values, i.e. more general categories)
- rename remaining columns
Joining with school metadata
Now that we have all schools together, we can join the table with metadata on columns such as
Done! We have assembled all the data together. Let's analyze the data.
Kinds of schools
How many schools there are of each kind?