CS520 Knowledge Graphs
What
should AI
Know ?
 

7. How do Users Interact with a Knowledge Graph?


1. Introduction

In previous chapters, we have discussed the design of a knowledge graph, different methods for creating it, and techniques for doing inference with it. We will now turn our attention to how users interact with a knowledge graph. To some degree, the design of interaction begins with the design of a knowledge graph schema as the schema is meant to be easily understood. In fact, a key advantage of knowledge graphs is that the conceptual view of a domain expressed in the schema is also the basis for its implementation. In this chapter, we will consider interaction techniques once the knowledge graph schema has been populated with instances.

The purpose of a knowledge graph is to answer a user's questions. Some of the questions may be known upfront, while some questions users may never think of themselves. The interaction of a user with the knowledge graph could be real-time, or a batch process might be run to produce certain reports to answer some predefined questions and to produce certain analytics. This results in a matrix of four different modes of interaction interactions with the knowledge graph along the dimensions of whether the interaction is initiated by the user (ie, Pull), or in response to information presented to the user (ie, Push), and whether the questions are known in advance vs questions are not known in advance.

An Example Property Graph Schema

Above modes of interaction are usually supported through a combination of search, query answering, and graphical interfaces. A search interface is like the interface of a search engine where the user may simply type keywords. The query interfaces range from a formal query language to a natural language interface. A graphical interface may be used for composing a query, for viewing the results of a query or for browsing the graph defined by the instances in the knowledge graph.

The actual interface to a knowledge graph will typically use a combination of methods. For example, a query might be composed through a combination of search and structured query interface. Similarly, the results may be partly graphical, and partly textual. In this section, we will consider graphical visualization of knowledge, structured query interface, and a natural language query interface. Formal query languages such as Cypher and SPARQL have already been covered in the previous chapters.

2. Visualization of a Knowledge Graph

It is too common for us to see graphical visualizations of knowledge graphs containing thousands of nodes and edges on a screen. Many times, the graphical visualization is simply a backdrop for the points to be made in contrast to driving and contributing to the insights that help us identify what points to make. Just because we are working with a knowledge graph, it should not automatically imply that a graphical visualization is the best way to interact with it. One should turn to the best principles for visualization design, and choose the most effective medium for presenting the information. Consequently, we begin this section by summarizing the key principles for visualization design, and then outline a few best practices for graphically visualizing knowledge graphs.

2.1 General Principles for Visualization Design

The overall purpose for visualizing a knowledge graph is to come up with a representation of the data to significantly amplify its understanding by the end-user. The improvement in user understanding results from the effective use of the following elements. First, the visualization presents more information in a display than the user might be able to remember at one time. Second, it takes away the burden from the user for having to look for important pieces of information. Third, by placing relevant data next to each other, it enhances the ability to make comparisons. Fourth, it keeps track of user's attention as they are navigating the information. Fifth, it provides a more abstract view of a situation through omission and recoding of information. Finally, by letting the user interact and manipulate the visualization, it helps the user in deeply engaging and immersing in the information.

A visualization is an adjustable mapping from the data to a visual form for a human perceiver. A Knowledge graphs is powerful because its schema can be visualized directly in the same form in which it is stored as a data, i.e., without requiring any transformation. We should not assume this approach to always carry over when we are visualizing the data stored in the knowledge graph, because, the stored data size is much larger than the size of the schema. We, therefore, need to explicitly undertake a design of visual structures for best presenting knowledge graph data.

The design of a visual structure involves mapping the desired information into a combination of ways for visual encoding: spatial substrate, marks, connection and enclosure, retinal properties, and temporal encoding. Choosing an appropriate spatial encoding for our information is the first and the most important step. For example, if we want to display geographical data, usually its representation on a map is most intuitive. For non-graphical data, we may choose suitable axes and coordinates along which to display information. Marks are visible things that occur in space, and help a user in distinguishing different data values (e.g., points, lines, area and volume). Connections include graphs, trees, and other hierarchical organizations. Enclosing lines can be drawn around certain objects. Retinal properties include elements such as color, crispness, resolution, transparency, hue, saturation, etc. to visually highlight certain aspects of the data. Finally, a temporal encoding as an animation can enhance a visualization and help us see the dynamics of how the data change over a period of time.

We can group the visualizations into four categories: simple, composed, interactive, and attentive reactive. In a simple visualization, we show up to three different dimensions/variables which is considered the barrier for human understandability. In a composed visualization, we combine one or more simple visualizations so that we can capture more variables in the same display. In an interactive visualization, the user can selectively explore, expand and navigate through the information. Finally, in an attentive/reactive visualization, the system responds to user actions, and the system potentially anticipates the most useful things to display next.

2.2 Best Practices for Knowledge Graph Visualization

In the previous section, we reviewed some of the principles for designing visualizations, and argued that we should never assume that displaying all the data in a graph in a cluttered display is the most useful presentation. In this section, we consider a recipe for the design of a knowledge graph visualization that leverages some of the principles considered in the previous section.

The design of visualization for a knowledge graph could be divided into the following steps. First, determine which variables of the problem domain to map into spatial position in the visual structure. Second, combine these mappings to increase dimensionality. Third, use retinal properties as an overlay to add more dimensions. Fourth, add controls for user interaction so that selective navigation is possible. Finally, consider attention-reactive features to expand space and manage attention.

A possible visualization for a knowledge graph would have the following elements: overview, dynamic queries, zooming in, details on demand, and retrieval by example. As the amount of data in a knowledge graph is huge, we could begin by presenting an aggregated summary of the data. The user can then filter the data and focus on an area of interest by either posing queries or zooming into certain parts of the data. While viewing a segment of knowledge graph, the user could ask for details on a particular element. Finally, as a use of attentive-reactive feature, the system could proactively make suggestions of the kind of data elements that already exist in the knowledge graph, and offer a choice to the user to select retrieving them.

3. Structured Query Interfaces

A structured query interface can be an important ingredient to interacting with a knowledge graph. In such an interface, the user starts typing expressions, with system suggesting completions in a way that the resulting expression can be mapped into an underlying query language such as Cypher or SPARQL. To illustrate structured queries, consider the following snippet of a knowledge graph schema from Wikidata.

An Example Property Graph Schema

For the instance data corresponding to the above schema, we can pose the following queries.

city with largest area
top five cities by area
countries whose capitals have area at least 500 squared kilometers
states bordering Oregon and Washington
second tallest mountain in France
country with the most number of rivers

One way to specify a structured query interface is to specify rules of grammar in the Backus Naur Form (BNF). The rules shown below illustrate this approach for the set of examples considered above.

<np> ::= <noun> "and" <noun>
<np> ::= <geographical-region> |
        <geographical-region> <spatial-relation> <geographical-region>
<geographical-region> ::= "capital of country" | "city" | "country" |
                         "mountain" | "river" | "state"
<property> ::= "area" | "height"
<aggregate-relator> ::= "with the most number of" | "with largest" |"with"
<aggregate-modifier> ::= "top" <number> | "second tallest"
<spatial-relation> ::= "bordering" | "inside"
<number-constaint> ::= "atleast" <quantity>
<quantity> ::= <number> <unit>
<unit> ::= "Square Kilometer"
<ranking> ::= "by"

We can reduce the queries above to expressions that conform to the grammar above. For example, the first query, "city with the largest area", conforms to the following expression:

<geographical-region> <aggregate-relator> <property>

Next consider the query "top five cities by area". The following expression that captures this query is equivalent to "top five city by area". For our structured query interface to be faithful to original English, we will need to incoporate the lexical knowledge about plurals.

<aggregate-modifier> <geographical-region> <ranking> <property>

Finally, we show below the expression for the third query: "countries whose capitals have area at least 500 squared kilometers". The expression show below has the following version in English: "country with capital with area at least 500 squared kilometers". In addition to the incorrect pluralization, it uses different words, for example, "with" instead of "whose", and "with" instead of "have". Most reasonable speakers of English would consider both the queries to be equivalent. This example highlights the tradeoffs in designing structured query interfaces: they may not be faithful to all the different ways of posing the query in English, but they can handle a large number of practical useful cases. For example, the above grammar will correctly handle the query "state with largest area", and a numerous other variations.

<geographical-region> <aggregate-relator> <geographical-region>
     <aggregate-relator> <property> <number-constraint> <quantity>

Once we have developed a BNF grammar for a structured query interface that specifies the range of queries of interest, it is straightforward to check whether an input query is legal, and also to generate a set of legal queries which could be suggested to the user proactively to autocomplete what they have already typed. We show below a logic programming translation of the above BNF grammar.

np(X) :- np(Y) & np(Z) & append(X,Y,Z)
np(X) :- geographical_region(X)
np(A) :- geographical_region(X) & spatial_relation(Y) & geographical_region(Z) & append(A,X,Y,Z)
geographical_region(capital)
geographical_region(city)
geographical_region(country)
geographical_region(mountain)
geographical_region(river)
geographical_region(state)
property(area)
property(height)
aggregate_relator(with_the_most_number_of)
aggregate_relator(with_largest)
aggregate_relator(with)
aggregate_modifier(second_tallest)
aggregate_modifier(X) :- number(N) & append(X,top,N)
spatial_relation(bordering)
aggregate_relation(insider)
number_constraint(X) :- quantity(Q) & append(X,atleast,Q)
quantity(Q) :- number(N) & unit(U) & append(Q,N,U)
unit(square_kilomters)
ranking(by)

Improvements in structured queries accepted by the system require improving the grammar. This approach can be very cost-effective for situations where the entities of interest and their desired relationships can be easily captured in a grammar. In the next section, we consider approaches that aim to accept queries in English, and try to overcome the problem of required engineering of the grammar through a machine learning approach.

4. Natural Language Query Interfaces

A possible approach to go beyond the structured queries, and to get around the problem of massive engineering of the grammar, is to use a semantic parsing framework. In this framework, we begin with a minimal grammar and use it with a natural language parser that is trained to choose the most likely interpretation of a question. A semantic parsing system for understanding natural language questions has five components: executor, grammar, model, parser and learner. We briefly describe each of these components and then illustrate them using examples.

Unlike traditional natural language parsers that produce a parse tree, a semantic parsing system produces a representation that can be directly executed on a suitable platform. For example, it could produce a SPARQL or a Cypher query. The engine that evaluates the resulting query then becomes the executor for the semantic parsing system.

The grammar in a semantic parsing system specifies a set of rules for processing the input sentences. It connects utterances to possible derivations of logical forms. Formally, a grammar is set of rules of the form α ⇒ β. We show below a snippet of a grammar used by a semantic parsing system.

NP(x) with the largest RelNP(r) ⇒ argmax(1,1,x,r)
top NP(n) NP(X) by the RelNP(r) ⇒ argmax(1,n,x,r)
NP(x) has RelNP(r) at least NP(n) ⇒ (< arg(x,r) n)

The first rule in the above grammar can be used to process "city with the largest area". The right-hand side of each rule specifies an executable form. For the first rule, argmax(M,N,X,R) is a relation that consists of a ranked list of X (ranked between M and N) such that the ranking is defined on the R values of X. Similarly, the third rule handles the query "countries whose capital has area at least 15000 square kilomter". In the right hand side of the rule, (< arg(X,R), N) specifies the computation to determine those X such that their R value is less than N.

Given an input question, the application of grammar may result in multiple alternative interpretations. A model defines a probability distribution over those interpretations to specify which of them is most likely. A common approach is to use the log-linear machine learning model that takes as input the features of each interpretation. We define the features of an interpretation by maintaining a vector in which each position is a count of how many times a particular rule of the grammar was applied in arriving at that interpretation. The model also contains another vector that specifies the weight of each feature for capturing the importance of each feature. One goal of the learning then is to determine an optimal weight vector.

Given a trained model, and an input, the parser computes its high probability interpretation. Assuming that the utterance is given as a sequence of tokens, the parser (usually a chart parser), recrusively builds interpretations for each span of text. As the total number of interpretations can be exponential, we typically limit the number of interpretations at each step to a pre-specified number (e.g., 20). As a result, the final interpretation is not guaranteed to be optimal, but it very often turns out to be an effective heuristic.

The learner computes the parameters of the model, and in some cases, additions to the grammar, by processing the training examples. The learner optimizes the parameters to maximize the likelihood of observing the examples in the training data. Even though we may have the training data, but we do not typically have the correct interpretation for each instance in the data. Therefore, the training process will consider any interpretation that can reproduce an instance in the training data to be correct. We typically use a stochastic gradient descent to optimize the model.

The components of a semantic parsing system are relatively loosely coupled. The executor is concerned purely with what we want to express independent of how it would be expressed in natural language. The grammar describes how candidate logical forms are constructed from the utterance but does not provide algorithmic guidance nor specifies a way to score the candidates. The model focuses on a interpretation and defines features that could be helpful for predicting accurately. The parser and the learner provide algorithms largely independent of semantic representations. This modularity allows us to improve each component in isolation.

Understanding natural language queries accurately is an extremely difficult problem. Most semantic parsing systems report an accuracy in the range of 50-60%. Improving the performance further requires amassing training data or engineering the grammar. Overall success of the system depends on the tight scope of the desired set of queries and availability of training data and computation power.

5. Summary

In this chapter, we considered different ways end-users interact with knowledge graphs. Perhaps, the most important take away from this chapter should be that just because we are working with knowledge graphs, it does not imply that displaying many nodes and edges on a computer screen is the most appropriate way to interact with it. Graphical views are often appropriate for the schema information because of its limited size, but not always effective for interacting with the instance data. We argued that the best user interface needs to be designed by considering the business problem to be solved, the kind of data we are working with, and the resources that can be invested into the design and engineering. Best interfaces, often, will use a combination of methods that leverage search, structured queries, and use natural language question answering only to a limited degree.