Wikidata is one of the most prominent crowdsourced knowledge bases built according to the Linked Data vision and techs. The project is run by the Wikimedia Foundation and its goal is to build a machine-understandable database of knowledge. Data provided by the Italian Ministry of Education, Universities and Research about Italian schools names, addresses and types were transformed in the Wikidata data format and loaded in the knowledge base. Now, nearly every school on the Italian territory has its own Wikidata reference interconnected to a wider data context, which was not available in the starting dataset.
Facts & figures
Data integration involves tricky tasks needing an extensive systematic preprocessing in order to normalize data patterns and reduce noise. I developed a cleaning pipeline for improving the quality of starting datasets and staging clean information. Finally, I loaded data about 65k+ Italian schools with Python libraries that wrap calls to Wikidata APIs.
- An example of a school item in Wikidata
- ItalianSchoolsBot, the bot I crated for data ingestion