One of the dreams of every scientist and data analyst in the world is to harness machines for the integration of different datasets about diverse information an in an easy and continuous fashion. This would allow them to aggregate many data sources for building a wider picture which finally makes much more sense. Unfortunately, such an aggregation is an extremely effort-consuming process. I spent most of my Ph.D. on such needs and I came up with Semalytics.
Semalytics is a framework that aims at addressing those integration issues and providing smart analytical features. The platform stores local data through RDF triples and can map them to Wikidata, the crowdsourced semantic project of Wikimedia Foundation, or to other knowledge bases. Designed for a rapid data integration and clean analytics.
Facts & figures
I designed and implemented the Semalytics data framework exploiting Semantic Web and Linked Data techs as a built-in mapping logic. Under the hood, the platform leverages GraphDB and stores data and relationships as a graph. Moreover, entailment rulesets enable data reasoning for the inference of new information, logically drawn from explicit data. SPARQL federation makes possible to distribute queries among different data endpoints at query runtime, thus exploring interconnected pieces of knowledge.
This system has been developed in collaboration with the Translational Cancer Medicine unit of the Candiolo Cancer Institute and it is used for the analysis of hierarchical data coming from cascading experiments in pre-clinical cancer research.
A big thank you to Andrea Bertotti M.D. Ph.D., and to Alessandro Fiori Ph.D. for their precious support.
- A Docker-based demo
- Semalytics in action! A computational narrative
- My Ph.D. thesis
- The paper about Semalytics
- jsondesign: a Python library handling JSON schema for mimicking entities design in an OO-fashion