Introduction

  • graph theory by Euler

There are different graph models to choose from:

  • directed vs undirected graphs (edges have direction or not)
  • weighted vs unweighted graphs (edges have weights or not)
  • cyclic vs acyclic graphs (graphs with cycles or not)
  • hypergraphs (edges can connect more than two nodes)
  • property graphs (nodes and edges can have properties)

They can all be boiled down to four basic primitives, that allows you to build any graph structure:

  • nodes (entities)
  • labels (names)
  • relationships (edges)
  • properties (attributes)

The last one can be seen in the property graph model, where nodes and edges can have key-value pairs associated with them. Is the most common model today.

Motivation

  • Data needs to be understandable in order to have value
  • provide contextualized understanding of data is a powerful tool to make sense of data and generate knowledge
  • allows to represent complex relationships and networks very intuitively, both for humans and machines
  • very flexible, just add more nodes and edges as needed

Definition

  • Knowledge graphs are a specific type of graph that provide contextual understanding.
  • Provides a holistic view of data, capturing not only entities but also their relationships and attributes.
  • Data can come from anywhere, from self-contained graph databases, to datalakes, to federated systems.

holistic meeans dealing with or treating the whole of something or someone and not just a part. federated means a group of entities that are joined together but still maintain their own autonomy.

Organizing principle

  • graphs that are functional but not understandable are not very useful
  • knowledge graphs are organized around semantics
  • semantics is the name for the meaning of things, so people and machines can understand them
  • semantics is what makes knowledge graphs different from regular graphs
  • data shouldn’t need a manual to be understood
  • organizing principles are contracts between the graph and its users
  • taxonomies and ontologies are high level organizing principles
  • taxonomies are hierarchical classifications of concepts
  • taxonomies can be defined as Category nodes connected by SUBCLASS_OF relationships
  • ontologies are more complex relationships between concepts
  • ontologies allows users and machines to take actions based on the data
  • there are several off-the-shelf widely used ontologies for particular domains, such as SNOMED CT
  • you can choose to build your own ontology or extend an existing one, depending on your domain and use case