How to Evolve a Knowledge Graph?

Knowledge Graphs

What
should AI
Know ?

8. How to Evolve a Knowledge Graph?

1. Introduction

Once a knowledge graph is built, it needs to evolve over time to reflect changes in the real world and updates in business requirements. These changes can happen at two levels: the schema (the structure of the graph) or individual facts (the data stored in the graph). Updating facts is usually easier than updating the schema, because schema changes can affect a large portion of the data and sometimes even the software that relies on the schema.

Evolving a knowledge graph involves both social and technical challenges. Social challenges arise because design decisions often involve subjectivity and require agreement from multiple stakeholders. Changes can impact the work of different stakeholders, so proper workflows are needed to roll out updates smoothly. Unfortunately, there are few standard guidelines for managing these social processes.

The technical problems involved in evolving a knowledge graph have been fundamental to database and knowledge base management, and have been researched under the topics of schema evolution, view maintenance, and truth maintenance. Each of these techniques is meant for a different category of updates, and is backed by significant theory that can be adapted for the context of knowledge graphs.

Which approach to use often depends on business priorities and the cost of making changes. In large-scale knowledge graphs, many inconsistencies may persist simply because they do not affect critical business functionality.

In this chapter, we will start by looking at concrete examples of changes in knowledge graphs. We will then introduce schema evolution, view maintenance, and truth maintenance, focusing on how these techniques are relevant to knowledge graphs and how they can be adapted in practice.

2. Examples of Changes to a Knowledge Graph

Knowledge graphs can change for many reasons. Here, we consider five common categories: changes in the world, changes in requirements, changes in data sources, changes that affect previous inferences, and changes that require redesign. This classification helps structure our examples, though other approaches are possible.

Changes in the world: Consider Amazon's product knowledge graph. New products are constantly added by suppliers, some products are discontinued, and inventory levels must be updated as items are sold or restocked. Some new products may even require introducing new properties, such as "battery life" for electronics.

Changes in requirements: In the Google Knowledge Graph, the concept of Artist was originally defined to include only Person. Over time, new data showed that virtual performers like Vocaloids should also be classified as artists. To accommodate users' expectations, the schema had to be updated.

Changes in sources: When building a music album knowledge graph, information may come from multiple sources. New sources are often added, and existing sources change over time. The knowledge graph must stay synchronized with these updates to maintain accuracy.

Changes affecting inference: A knowledge graph may infer that any program shown in a movie theater is a movie. But theaters now show sporting events or operas, which breaks this inference rule. Updating the relevant inference rules is necessary to avoid incorrect conclusions.

Changes requiring redesign: Initially, a knowledge graph may represent the CEO of a company as a relation pointing directly to a Person. Later, it may be redesigned so that the CEO relation points to an object containing additional information, such as the period during which the person held the role.

3. Schema Evolution Techniques

Schema evolution in relational databases, often called database reorganization, handles adding or removing columns. In knowledge graphs, schema evolution is more complex. Common operations include:

Adding or removing a class from the class hierarchy
Adding or removing a superclass for an existing class
Adding or removing a type for an individual
Adding or removing a relation or property from a class

Complexity arises because changes can propagate through the hierarchy, affecting subclasses, instances, and inherited properties.

Removing or renaming a property: These changes must be propagated to all affected entities. Summaries of impacted areas are often generated to guide updates.

Adding a new class: If a new class is added without specifying a superclass, it is typically assigned to a system-defined root class to avoid orphan classes.

Deleting a class: When we delete a class from the class hierarchy, or remove the superclass of a class, we need to make decisions about what to do about its subclasses, and its instances. If its subclasses have another superclass, this does not pose a problem. But, if the class being deleted is the only superclass of a class, we need to either delete all the subclasses, and their instances or we need to assign a new superclass. To assign a new superclass to class A whose only superclass B is being deleted, the new superclass could be one or all immediate superclasses of B. Furthermore, if there was any property associated with the class that is being deleted, we need to either delete those properties from all the classes and instances where it was inherited, or ensure that the property is associated with another class that will remain in the knowledge graph after the deletion of the class for which this property was originally defined.

Transitive relationships: Consider a situation in which A is a superclass of B, and B is a superclass of C. If A is asserted as a direct superclass of C, the update may be rejected, since this relationship is already implied by transitivity in many systems.

Maintaining acyclicity: Adding new superclass relationships must not create cycles in the class hierarchy. Systems often detect and prevent such cycles.

Changing constraints on relations: Relaxing constraints usually does not affect existing data, but tightening constraints may invalidate some data, requiring repair actions.

4. View Maintenance Techniques

In relational databases, a view is a named query that can be reused simply by referencing its name. The query is defined based on one or more base tables. If we choose to store the results of a view, it is called a materialized view. When the underlying base tables change, the materialized view must be updated to reflect these changes.

The simplest approach is to recompute the entire view from scratch, but there are more efficient methods called incremental view maintenance algorithms that update only the affected parts.

For example, consider a knowledge graph that contains information about employees and departments. We create a view that lists all employees in the "Sales" department. If an employee transfers to a different department, an incremental update can adjust the view without recomputing the entire list from scratch.

In the context of knowledge graphs, the use of view computation and maintenance techniques is currently not very common. There are situations where data in the knowledge graph is processed, and the results are stored. For such situations, view definition and maintenance techniques are directly applicable. Leveraging view update methods for such situations is open for future work.

5. Truth Maintenance Techniques

Truth maintenance techniques were originally developed in rule-based systems to keep track of conclusions that are derived from existing facts and rules. When the underlying facts or rules change, the system can determine which derived conclusions must be updated or removed.

A common implementation of truth maintenance is known as a justification-based system. Whenever the system derives a new conclusion, it records a justification explaining how that conclusion was obtained. This justification typically includes the facts and rules that were used in the derivation. Later, if a fact or a rule changes, the system can identify all conclusions that depended on it and update them accordingly.

For example, suppose a system derives the conclusion that "Alice is an employee" based on the facts "Alice works for Company X" and the rule "Anyone who works for a company is an employee." If the first fact is later removed, the justification for the conclusion no longer holds, and the derived conclusion must be retracted.

In current knowledge graph practice, most inferences are computed by application code, and the justifications for these inferences are not explicitly stored. As a result, when data or rules change, inferred facts are often recomputed from scratch.

As knowledge graph systems continue to mature, explicitly tracking the sources and justifications of inferences is likely to become more important. Doing so would enable more efficient update mechanisms and support better management of evolving knowledge graphs.

6. Summary

Knowledge graphs are created to meet specific business needs and typically have a long life cycle. They must evolve in response to changes in the real world and shifting business requirements.

The evolution of a knowledge graph must take into account how it is used by its user community and should follow well-defined social and engineering processes. It can also benefit from established design principles and algorithms.

In this chapter, we showed how techniques from schema evolution, view maintenance, and truth maintenance provide useful foundations for managing change in knowledge graphs. While these techniques originated in database and rule-based systems, they are increasingly relevant to modern knowledge graph systems and are likely to play a central role in their future development.

7. Further Reading

Classic work on schema evolution techniques was done in the context of object-oriented database systms [Banerjee et. al. 1989]. A recent overview of incremental view maintenance techniques was presented as an invited lecture at a recent conference on the Principles of Database Systems [Olteanu 2024]. The classic work on truth maintenance systems was done by Jon Doyle [Doyle 1978].

[Banerjee et. al. 1989] Banerjee, Jay; Kim, Won; Kim, Hyoung Joo; and Korth, Henry F. 1987. Semantics and implementation of schema evolution in object-oriented databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 87), pages 311-322. ACM. DOI: 10.1145/38713.38748.

[Olteanu 2024]Olteanu, Dan. 2024. Recent Increments in Incremental View Maintenance. arXiv:2404.17679. Invited Gems of PODS 2024 paper.

[Doyle 1978] Doyle, Jon. 1978. Truth Maintenance Systems for Problem Solving. MIT AI Lab Technical Report AI-TR-419, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology.

Exercises

Exercise 8.1. Assess the qualitative ease (ie, easy, moderately involved, highly involved) of making the following changes in a knowledge graph.

(a) Incorporating release of a new iPhone into Amazon product graph

(b) Incorporating the effect of Brexit

(c) Launch of a new vendor for distributing face masks

(d) Repurposing a hotel as a hospital for COVID patients

(e) Changing Wikipedia knowledge graph to model corporate mergers and acqusitions

Exercise 8.2. For each of the following change, which of the knowledge graph change management technique would be most directly applicable? (Recall that the change management techniques are: schema evolution, view maintenance and truth maintenance).

(a) Eliminating a product category

(b) Wiki Data integrates data from different museums, city governments, and bibliographic databases

(c) Evolving Microsoft academic publications knowledge graph

(d) Updates in the schema mapping rules

(e) Splitting a category into two different categories

Exercise 8.3. Consider the following scenarios for the company knowledge graph that you created in the previous sections. For each scenario, assess how the knowledge graph would need to evolve.

You need to evaluate a company in terms of how prepared it is to leverage and benefit from AI technologies.
You need to assess how a company would be affected by a newly announced government tariff policy.
New issues are revealed in a recent earnings call transcript.

For each scenario, identify whether the required changes involve updates to the schema, the data, the applicaton code, or some combination of these.

Exercise 8.4. A common challenge in evolving a textbook knowledge graph is to keep it aligned with the new editions of the textbook. For the textbook knowledge graph you have created in the previous chapter, download the two successive versions of the textbook you have been using. First analyze the changes between the two versions of the textbook. Next. assess how these changes will affect the knowledge graph and the design the steps necessary to align the knowledge graph with the new version.