Introduction
- graph theory by Euler
There are different graph models to choose from:
- directed vs undirected graphs (edges have direction or not)
- weighted vs unweighted graphs (edges have weights or not)
- cyclic vs acyclic graphs (graphs with cycles or not)
- hypergraphs (edges can connect more than two nodes)
- property graphs (nodes and edges can have properties)
They can all be boiled down to four basic primitives, that allows you to build any graph structure:
- nodes (entities)
- labels (names)
- relationships (edges)
- properties (attributes)
The last one can be seen in the property graph model, where nodes and edges can have key-value pairs associated with them. Is the most common model today.
Motivation
- Data needs to be understandable in order to have value
- provide contextualized understanding of data is a powerful tool to make sense of data and generate knowledge
- allows to represent complex relationships and networks very intuitively, both for humans and machines
- very flexible, just add more nodes and edges as needed
Definition
- Knowledge graphs are a specific type of graph that provide contextual understanding.
- Provides a holistic view of data, capturing not only entities but also their relationships and attributes.
- Data can come from anywhere, from self-contained graph databases, to datalakes, to federated systems.
holistic meeans dealing with or treating the whole of something or someone and not just a part. federated means a group of entities that are joined together but still maintain their own autonomy.
Organizing principle
- graphs that are functional but not understandable are not very useful
- knowledge graphs are organized around semantics
- semantics is the name for the meaning of things, so people and machines can understand them
- semantics is what makes knowledge graphs different from regular graphs
- data shouldn’t need a manual to be understood
- organizing principles are contracts between the graph and its users
- taxonomies and ontologies are high level organizing principles
- taxonomies are hierarchical classifications of concepts
- taxonomies can be defined as
Categorynodes connected bySUBCLASS_OFrelationships - ontologies are more complex relationships between concepts
- ontologies allows users and machines to take actions based on the data
- there are several off-the-shelf widely used ontologies for particular domains, such as SNOMED CT
- you can choose to build your own ontology or extend an existing one, depending on your domain and use case
Learning Path: Semantic Web
RDF Basics
- Triple structure: Subject-Predicate-Object
- URIs as identifiers: How resources are named
- Literals vs Resources: Data values vs things
- Serialization formats: Turtle (most readable), RDF/XML, JSON-LD
- Exercise: Describe yourself and 3 relationships in Turtle
2. Namespaces & Vocabularies
- Prefixes: Shorthand for long URIs (
rdf:,rdfs:,skos:) - Common vocabularies: Dublin Core, FOAF, Schema.org
- Exercise: Create your own namespace and vocabulary
3. RDFS Essentials
- Classes:
rdfs:Class,rdfs:subClassOf - Properties:
rdfs:Property,rdfs:domain,rdfs:range - Labels:
rdfs:label,rdfs:comment - Exercise: Model a simple domain (e.g., library with books, authors, genres)
4. SKOS for Organization
- Concepts:
skos:Concept,skos:ConceptScheme - Labels:
skos:prefLabel,skos:altLabel - Relationships:
skos:broader,skos:narrower,skos:related - Exercise: Build a 3-level taxonomy with multilingual labels
5. OWL (Optional Advanced)
- Only if needed: Property characteristics, restrictions, inference
- Start simple: Equivalence, disjointness
- Exercise: Add logical constraints to your RDFS model
Tools to Try
- Protégé: Visual ontology editor
- RDF Playground: Online Turtle editor
- SPARQL: Query language (learn after understanding triples)