Introduction

  • graph theory by Euler

There are different graph models to choose from:

  • directed vs undirected graphs (edges have direction or not)
  • weighted vs unweighted graphs (edges have weights or not)
  • cyclic vs acyclic graphs (graphs with cycles or not)
  • hypergraphs (edges can connect more than two nodes)
  • property graphs (nodes and edges can have properties)

They can all be boiled down to four basic primitives, that allows you to build any graph structure:

  • nodes (entities)
  • labels (names)
  • relationships (edges)
  • properties (attributes)

The last one can be seen in the property graph model, where nodes and edges can have key-value pairs associated with them. Is the most common model today.

Motivation

  • Data needs to be understandable in order to have value
  • provide contextualized understanding of data is a powerful tool to make sense of data and generate knowledge
  • allows to represent complex relationships and networks very intuitively, both for humans and machines
  • very flexible, just add more nodes and edges as needed

Definition

  • Knowledge graphs are a specific type of graph that provide contextual understanding.
  • Provides a holistic view of data, capturing not only entities but also their relationships and attributes.
  • Data can come from anywhere, from self-contained graph databases, to datalakes, to federated systems.

holistic meeans dealing with or treating the whole of something or someone and not just a part. federated means a group of entities that are joined together but still maintain their own autonomy.

Organizing principle

  • graphs that are functional but not understandable are not very useful
  • knowledge graphs are organized around semantics
  • semantics is the name for the meaning of things, so people and machines can understand them
  • semantics is what makes knowledge graphs different from regular graphs
  • data shouldn’t need a manual to be understood
  • organizing principles are contracts between the graph and its users
  • taxonomies and ontologies are high level organizing principles
  • taxonomies are hierarchical classifications of concepts
  • taxonomies can be defined as Category nodes connected by SUBCLASS_OF relationships
  • ontologies are more complex relationships between concepts
  • ontologies allows users and machines to take actions based on the data
  • there are several off-the-shelf widely used ontologies for particular domains, such as SNOMED CT
  • you can choose to build your own ontology or extend an existing one, depending on your domain and use case

Learning Path: Semantic Web

RDF Basics

  • Triple structure: Subject-Predicate-Object
  • URIs as identifiers: How resources are named
  • Literals vs Resources: Data values vs things
  • Serialization formats: Turtle (most readable), RDF/XML, JSON-LD
  • Exercise: Describe yourself and 3 relationships in Turtle

2. Namespaces & Vocabularies

  • Prefixes: Shorthand for long URIs (rdf:, rdfs:, skos:)
  • Common vocabularies: Dublin Core, FOAF, Schema.org
  • Exercise: Create your own namespace and vocabulary

3. RDFS Essentials

  • Classes: rdfs:Class, rdfs:subClassOf
  • Properties: rdfs:Property, rdfs:domain, rdfs:range
  • Labels: rdfs:label, rdfs:comment
  • Exercise: Model a simple domain (e.g., library with books, authors, genres)

4. SKOS for Organization

  • Concepts: skos:Concept, skos:ConceptScheme
  • Labels: skos:prefLabel, skos:altLabel
  • Relationships: skos:broader, skos:narrower, skos:related
  • Exercise: Build a 3-level taxonomy with multilingual labels

5. OWL (Optional Advanced)

  • Only if needed: Property characteristics, restrictions, inference
  • Start simple: Equivalence, disjointness
  • Exercise: Add logical constraints to your RDFS model

Tools to Try

  • Protégé: Visual ontology editor
  • RDF Playground: Online Turtle editor
  • SPARQL: Query language (learn after understanding triples)