1. Introduction
Once a knowledge graph is built, it needs to evolve over time to
reflect changes in the real world and updates in business
requirements. These changes can happen at two levels: the schema (the
structure of the graph) or individual facts (the data stored in the
graph). Updating facts is usually easier than updating the schema,
because schema changes can affect a large portion of the data and
sometimes even the software that relies on the schema.
Evolving a knowledge graph involves both social and technical
challenges. Social challenges arise because design decisions often
involve subjectivity and require agreement from multiple
stakeholders. Changes can impact the work of different stakeholders,
so proper workflows are needed to roll out updates
smoothly. Unfortunately, there are few standard guidelines for
managing these social processes.
The technical
problems involved in evolving a knowledge graph have been fundamental
to database and knowledge base management, and have been researched
under the topics of schema evolution, view maintenance, and truth
maintenance. Each of these techniques is meant for a different
category of updates, and is backed by significant theory that can be
adapted for the context of knowledge graphs.
Which approach to use often depends on business priorities and the
cost of making changes. In large-scale knowledge graphs, many
inconsistencies may persist simply because they do not affect critical
business functionality.
In this chapter, we will start by looking at concrete examples of
changes in knowledge graphs. We will then introduce schema evolution,
view maintenance, and truth maintenance, focusing on how these
techniques are relevant to knowledge graphs and how they can be
adapted in practice.
2. Examples of Changes to a Knowledge Graph
Knowledge graphs can change for many reasons. Here, we consider
five common categories: changes in the world, changes in requirements,
changes in data sources, changes that affect previous inferences, and
changes that require redesign. This classification helps structure our
examples, though other approaches are possible.
Changes in the world: Consider Amazon's product knowledge
graph. New products are constantly added by suppliers, some products
are discontinued, and inventory levels must be updated as items are
sold or restocked. Some new products may even require introducing new
properties, such as "battery life" for electronics.
Changes in requirements: In the Google Knowledge Graph, the
concept of Artist was originally defined to include
only Person. Over time, new data showed that virtual performers
like Vocaloids should also be classified as artists. To
accommodate users' expectations, the schema had to be
updated.
Changes in sources: When building a music album knowledge
graph, information may come from multiple sources. New sources are
often added, and existing sources change over time. The knowledge
graph must stay synchronized with these updates to maintain
accuracy.
Changes affecting inference: A knowledge graph may infer
that any program shown in a movie theater is a movie. But theaters now
show sporting events or operas, which breaks this inference
rule. Updating the relevant inference rules is necessary to avoid
incorrect conclusions.
Changes requiring redesign: Initially, a knowledge graph may
represent the CEO of a company as a relation pointing directly
to a Person. Later, it may be redesigned so that the CEO
relation points to an object containing additional information, such
as the period during which the person held the role.
3. Schema Evolution Techniques
Schema evolution in relational databases, often called database
reorganization, handles adding or removing columns. In knowledge
graphs, schema evolution is more complex. Common operations
include:
- Adding or removing a class from the class hierarchy
- Adding or removing a superclass for an existing class
- Adding or removing a type for an individual
- Adding or removing a relation or property from a class
Complexity arises because changes can propagate through the
hierarchy, affecting subclasses, instances, and inherited
properties.
Removing or renaming a property: These changes must be
propagated to all affected entities. Summaries of impacted areas are
often generated to guide updates.
Adding a new class: If a new class is added without
specifying a superclass, it is typically assigned to a system-defined
root class to avoid orphan classes.
Deleting a class: When we delete a class from the class hierarchy, or remove the
superclass of a class, we need to make decisions about what to do
about its subclasses, and its instances. If its subclasses have
another superclass, this does not pose a problem. But, if the class
being deleted is the only superclass of a class, we need to either
delete all the subclasses, and their instances or we need to assign
a new superclass. To assign a new superclass to class A whose
only superclass B is being deleted, the new superclass could
be one or all immediate superclasses of B. Furthermore, if
there was any property associated with the class that is being
deleted, we need to either delete those properties from all the
classes and instances where it was inherited, or ensure that the
property is associated with another class that will remain in the
knowledge graph after the deletion of the class for which this property
was originally defined.
Transitive relationships: Consider a situation in
which A is a superclass of B, and B is a
superclass of C. If A is asserted as a direct
superclass of C, the update may be rejected, since this
relationship is already implied by transitivity in many systems.
Maintaining acyclicity: Adding new superclass relationships
must not create cycles in the class hierarchy. Systems often detect
and prevent such cycles.
Changing constraints on relations: Relaxing constraints
usually does not affect existing data, but tightening constraints may
invalidate some data, requiring repair actions.
4. View Maintenance Techniques
In relational databases, a view is a named query that can be
reused simply by referencing its name. The query is defined based on
one or more base tables. If we choose to store the results of a
view, it is called a materialized view. When the underlying
base tables change, the materialized view must be updated to reflect
these changes.
The simplest approach is to recompute the entire view from scratch,
but there are more efficient methods called incremental view
maintenance algorithms that update only the affected parts.
For example, consider a knowledge graph that contains information
about employees and departments. We create a view that lists all
employees in the "Sales" department. If an employee transfers to a
different department, an incremental update can adjust the view
without recomputing the entire list from scratch.
In the context of knowledge graphs, the use of view computation and
maintenance techniques is currently not very common. There are
situations where data in the knowledge graph is processed, and the
results are stored. For such situations, view definition and
maintenance techniques are directly applicable. Leveraging view
update methods for such situations is open for future work.
5. Truth Maintenance Techniques
Truth maintenance techniques were originally developed in rule-based
systems to keep track of conclusions that are derived from existing
facts and rules. When the underlying facts or rules change, the system
can determine which derived conclusions must be updated or removed.
A common implementation of truth maintenance is known as a
justification-based system. Whenever the system derives a new
conclusion, it records a justification explaining how that
conclusion was obtained. This justification typically includes the
facts and rules that were used in the derivation. Later, if a fact or a
rule changes, the system can identify all conclusions that depended on
it and update them accordingly.
For example, suppose a system derives the conclusion that
"Alice is an employee" based on the facts "Alice works for
Company X" and the rule "Anyone who works for a company is an
employee." If the first fact is later removed, the justification
for the conclusion no longer holds, and the derived conclusion must be
retracted.
In current knowledge graph practice, most inferences are computed by
application code, and the justifications for these
inferences are not explicitly stored. As a result, when data or rules
change, inferred facts are often recomputed from scratch.
As knowledge graph systems continue to mature, explicitly tracking
the sources and justifications of inferences is likely to become more
important. Doing so would enable more efficient update mechanisms and
support better management of evolving knowledge graphs.
6. Summary
Knowledge graphs are created to meet specific business needs and
typically have a long life cycle. They must evolve in response to
changes in the real world and shifting business requirements.
The evolution of a knowledge graph must take into account how it is
used by its user community and should follow well-defined social and
engineering processes. It can also benefit from established design
principles and algorithms.
In this chapter, we showed how techniques from schema evolution,
view maintenance, and truth maintenance provide useful foundations for
managing change in knowledge graphs. While these techniques originated
in database and rule-based systems, they are increasingly relevant to
modern knowledge graph systems and are likely to play a central role
in their future development.
7. Further Reading
Classic work on schema evolution techniques was done in the context
of object-oriented database
systms [Banerjee
et. al. 1989]. A recent overview of incremental view maintenance
techniques was presented as an invited lecture at a recent conference
on the Principles of Database
Systems [Olteanu
2024]. The classic work on truth maintenance systems was done by
Jon
Doyle [Doyle
1978].
[Banerjee
et. al. 1989] Banerjee, Jay; Kim, Won; Kim, Hyoung Joo; and Korth,
Henry F. 1987. Semantics and implementation of schema evolution in
object-oriented databases. In Proceedings of the ACM SIGMOD
International Conference on Management of Data (SIGMOD 87), pages
311-322. ACM. DOI: 10.1145/38713.38748.
[Olteanu
2024]Olteanu, Dan. 2024. Recent Increments in Incremental View
Maintenance. arXiv:2404.17679. Invited Gems of PODS 2024 paper.
[Doyle
1978] Doyle, Jon. 1978. Truth Maintenance Systems for Problem
Solving. MIT AI Lab Technical Report AI-TR-419, Department of
Electrical Engineering and Computer Science, Massachusetts Institute
of Technology.
Exercises
Exercise 8.1.
Assess the qualitative ease (ie, easy, moderately involved, highly
involved) of making the following changes in a knowledge graph.
|
(a) |
Incorporating release of a new iPhone into Amazon product graph |
|
(b) |
Incorporating the effect of Brexit |
|
(c) |
Launch of a new vendor for distributing face masks |
|
(d) |
Repurposing a hotel as a hospital for COVID patients |
|
(e) |
Changing Wikipedia knowledge graph to model corporate mergers and acqusitions |
Exercise 8.2.
For each of the following change, which of the knowledge graph change management technique would be
most directly applicable? (Recall that the change management techniques are: schema evolution, view
maintenance and truth maintenance).
|
(a) |
Eliminating a product category |
|
(b) |
Wiki Data integrates data from different museums, city governments, and bibliographic databases |
|
(c) |
Evolving Microsoft academic publications knowledge graph |
|
(d) |
Updates in the schema mapping rules |
|
(e) |
Splitting a category into two different categories |
Exercise 8.3. Consider the following scenarios for the
company knowledge graph that you created in the previous sections.
For each scenario, assess how the knowledge graph would need to
evolve.
- You need to evaluate a company in terms of how prepared it is
to leverage and benefit from AI technologies.
- You need to assess how a company would be affected by a newly
announced government tariff policy.
- New issues are revealed in a recent earnings call transcript.
For each scenario, identify whether the required changes involve
updates to the schema, the data, the applicaton code, or some
combination of these.
Exercise 8.4. A common challenge in evolving a textbook
knowledge graph is to keep it aligned with the new editions of the
textbook. For the textbook knowledge graph you have created in the
previous chapter, download the two successive versions of the textbook
you have been using. First analyze the changes between the two
versions of the textbook. Next. assess how these changes will affect
the knowledge graph and the design the steps necessary to align the
knowledge graph with the new version.
|