Knowledge Graphs

Introduction

graph theory by Euler

There are different graph models to choose from:

directed vs undirected graphs (edges have direction or not)
weighted vs unweighted graphs (edges have weights or not)
cyclic vs acyclic graphs (graphs with cycles or not)
hypergraphs (edges can connect more than two nodes)
property graphs (nodes and edges can have properties)

They can all be boiled down to four basic primitives, that allows you to build any graph structure:

nodes (entities)
labels (names)
relationships (edges)
properties (attributes)

The last one can be seen in the property graph model, where nodes and edges can have key-value pairs associated with them. Is the most common model today.

Motivation

Data needs to be understandable in order to have value
provide contextualized understanding of data is a powerful tool to make sense of data and generate knowledge
allows to represent complex relationships and networks very intuitively, both for humans and machines
very flexible, just add more nodes and edges as needed

Definition

Knowledge graphs are a specific type of graph that provide contextual understanding.
Provides a holistic view of data, capturing not only entities but also their relationships and attributes.
Data can come from anywhere, from self-contained graph databases, to datalakes, to federated systems.

holistic meeans dealing with or treating the whole of something or someone and not just a part. federated means a group of entities that are joined together but still maintain their own autonomy.

Organizing principle

graphs that are functional but not understandable are not very useful
knowledge graphs are organized around semantics
semantics is the name for the meaning of things, so people and machines can understand them
semantics is what makes knowledge graphs different from regular graphs
data shouldn’t need a manual to be understood
organizing principles are contracts between the graph and its users
taxonomies and ontologies are high level organizing principles
taxonomies are hierarchical classifications of concepts
taxonomies can be defined as Category nodes connected by SUBCLASS_OF relationships
ontologies are more complex relationships between concepts
ontologies allows users and machines to take actions based on the data
there are several off-the-shelf widely used ontologies for particular domains, such as SNOMED CT
you can choose to build your own ontology or extend an existing one, depending on your domain and use case

Learning Path: Semantic Web

RDF Basics

Triple structure: Subject-Predicate-Object
URIs as identifiers: How resources are named
Literals vs Resources: Data values vs things
Serialization formats: Turtle (most readable), RDF/XML, JSON-LD
Exercise: Describe yourself and 3 relationships in Turtle

2. Namespaces & Vocabularies

Prefixes: Shorthand for long URIs (rdf:, rdfs:, skos:)
Common vocabularies: Dublin Core, FOAF, Schema.org
Exercise: Create your own namespace and vocabulary

3. RDFS Essentials

Classes: rdfs:Class, rdfs:subClassOf
Properties: rdfs:Property, rdfs:domain, rdfs:range
Labels: rdfs:label, rdfs:comment
Exercise: Model a simple domain (e.g., library with books, authors, genres)

4. SKOS for Organization

Concepts: skos:Concept, skos:ConceptScheme
Labels: skos:prefLabel, skos:altLabel
Relationships: skos:broader, skos:narrower, skos:related
Exercise: Build a 3-level taxonomy with multilingual labels

5. OWL (Optional Advanced)

Only if needed: Property characteristics, restrictions, inference
Start simple: Equivalence, disjointness
Exercise: Add logical constraints to your RDFS model

Tools to Try

Protégé: Visual ontology editor
RDF Playground: Online Turtle editor
SPARQL: Query language (learn after understanding triples)

$ luctre

Recent Posts

The residue is the point

The small clever move

Do less, but better