CS520 Knowledge Graphs
What
should AI
Know ?
 

What are some High Value Use Cases of Knowledge Graphs?


1. Introduction

Knowledge graphs are used in many domains, from biomedicine and journalism to network security and entertainment. In this chapter, we focus on two high-value families of applications: finance and open data.

2. Knowledge Graphs for Finance

We examine three distinct roles that knowledge graphs play in finance: analytics, tax and financial calculations, and financial reporting. Analytical knowledge graphs, which support querying and inference over large datasets, are the most widely used today. Knowledge graphs for financial calculations emphasize formal structure and rule-based reasoning. Finally, using knowledge graphs to exchange financial reporting data is an emerging application, driven by regulation and the need for machine-interpretable disclosures.

2.1 Knowledge Graphs for Financial Analytics

Consider a large financial organization such as a bank that serves millions of customers, including both individuals and companies. Such organizations routinely face complex relational questions, many of which can be expressed using the following templates:

  1. If a company goes into financial trouble, which of our clients are its suppliers or vendors? Are any of them applying for a loan?
  2. In a supply chain network, is there a single company that connects many others?
  3. Which startups have attracted the most influential investors?
  4. Which investors tend to co-invest?
  5. Which companies are most similar to a given company ?
  6. Which company might be a good future client for us?

Answering the first question requires supplier and vendor data, which is often purchased from third-party providers and integrated with internal customer data. This integration relies on schema mapping and entity linking techniques discussed in Chapter 4. Increasingly, financial institutions also use news data for market intelligence, which requires entity and relation extraction from text, as described in Chapter 5. Once this information is represented as a knowledge graph, path-finding algorithms (Chapter 6) can be used to traverse supplier and vendor relationships and answer the query. Visualizing the resulting subgraph can further help analysts understand supply chain dependencies.

The second question can be addressed using centrality measures, such as betweenness centrality, to identify companies that play a critical connecting role.

The third question is relevant to startup valuation. It again requires integrating investment data from external sources with internal data. Here, centrality-based techniques such as graph-based PageRank can be used to identify influential investors. Visualizing investment networks helps reveal how these investors are connected to startups.

The fourth question is an example of community detection. In a knowledge graph that represents investment relationships, groups of investors who frequently co-invest often form identifiable communities.

The fifth and sixth questions rely on graph embedding techniques, introduced briefly in Chapter 1. Similarity between companies can be computed using distances between their embedding vectors. Predicting a potential future client is an instance of the link prediction problem: rather than predicting the next word, as in language models, the goal is to predict the most likely new relationships in the graph.

2.2 Knowledge Graphs for Income Tax Calculations

Income tax law in the United States spans more than 80,000 pages, and over 150 million returns are filed each year. The law includes thousands of forms and instructions, which change annually and sometimes even mid-year. Tax preparation software makes this complex body of law accessible, allowing end users to prepare and file returns on their own.

Some tax tools represent the law as a set of rules. Once encoded as rules, the system can perform calculations, generate user dialogs, provide explanations, and check input completeness. While tax computation relies on rule-based reasoning (Chapter 6), many supporting operations -- such as assessing the impact of rule changes or guiding user interactions -- are enabled by representing rules as a knowledge graph and applying graph-based algorithms.

For example, consider the following simplified rule:

  • A person is a resident of California, and
  • The person is older than 18 years.

Expressed in Datalog, the rule is:

qualified_for_tax_benefit(P) :-
    resident_of(P, CA) & age(P, N) & min(N, 18, 18)

From this rule, we can construct a knowledge graph, shown below:

An Example Property Graph Schema
Figure 1: An Example Calcuation Graph for an Income Tax Rule

If the taxpayer is 17 years old, reachability analysis on this graph shows that the rule cannot be satisfied, so evaluating residency is unnecessary. While this example involves only a single rule, real tax systems contain thousands of interdependent rules, resulting in large and complex knowledge graphs.

2.3 Knowledge Graphs for Financial Reporting

Financial institutions are required to report the derivative contracts they hold, such as interest rate and commodity swaps, options, futures, forwards, and various asset-backed securities. These reports are important to compliance teams, brokers, and regulators. If each institution reports in a different format, it becomes difficult to process, aggregate, and analyze this information -- a classic data integration challenge that knowledge graphs can address.

An industry-wide solution is the Financial Industry Business Ontology (FIBO), a common semantic model defined using the Web Ontology Language (OWL). FIBO specifies the key concepts in financial business applications and the relationships between them. It provides a shared vocabulary that can give meaning to diverse data sources, including spreadsheets, databases, and XML documents. FIBO concepts are developed collaboratively by financial institutions to reflect a consensus of industry-standard concepts and data models.

FIBO includes concepts and terminology needed for reporting on derivatives. For example, it defines that a Derivatives Contract can have part one or more Options, and an Option can have a call option buyer. The diagram below illustrates how a call option buyer is connected to related entities in the ontology:

call option buyer
Figure 2: Different relations associated with a call option buyer in the FIBO ontology

This is an ontology graph: it captures relationships at the schema level rather than the level of individual data values. For instance, it shows that a call option buyer must be an independent party, a legal person, and an autonomous agent. By using FIBO definitions in financial reports, institutions can integrate their data with reports from other providers, streamlining analysis and reducing costs.

FIBO development is driven by practical use cases. While derivatives reporting is one example, other applications include counterparty exposure, index analysis for ETFs, and exchange instrument data management.

3. Knowledge Graphs for Open Data

Our discussion in this section is based on the Open Knowledge Network (OKN) being created under the sponsorship of the National Science Foundation. Inspired by specific use cases, OKN is interconnecting openly available government data. We will consider in detail three specific use cases: integrated justice platform, monitoring food conatminants in the national food and water systems and precision medicine. Even though these are independent projects, effort is being made to explore uses cases that cut across different projects, and can demonstrate the value of the Open Knowledge Network as a whole.

3.1 Integrated Justice Platform

The Systematic Content Analysis of Litigation EventS Open Knowledge Network (SCALES OKN) aims to transform the transparency and accessibility of federal court records by building an open, AI-ready knowledge network that enables systematic analysis of the judiciary for the benefit of the public, researchers, and policymakers. The initial version of the SCALES OKN included data from all 94 U.S district courts with over 6 million triples.

A major challenge for the SCALES team has been that the current court records generate a wealth of information, but much of it is unstructured text. This includes (i) chronological entries on a case's docket sheet and (ii) longer documents filed by the parties or judge. While docket entries track the case events and the parties' strategic choices, a full understanding of the case also requires interpreting the attached documents, which often contain the legal reasoning, arguments, and evidence underlying those events. Without tools to synthesize both docket entries and associated documents, it is difficult to grasp the complete history and context of a case.

As an initial step towards understanding the case documents, the SCALES team focused on annotating the documents so that judges, parties, and events are identfied, searchable and analyzable. This problem overlaps with the entity extraction task that we considered in Chapter 5. Based on the initial annotations, the team developed and tested regular expressions that could be used to extract the same information towards the goal of generating an even bigger data set for training natural language processing tools.

Another challenge addressed to give the users an ability to analyze the court data and annotations. This requires developing protocols to disambiguate entities -- litigants, lawyers, judges, third parties, and others --- and discovering the complex relationships amongst them as a case proceeds. To support this task, the team has been developing an entity and event ontology.

The entity recognition task for court records has numerous unique challenges. For example, a 2016 civil rights case from Indiana lists current Transportation Secretary Peter Buttigieg as a defendant. That case, however, was not brought against the person Peter Buttigieg, but instead in his capacity as then-mayor of South Bend, Indiana. Knowing in what role a party relates to a case can dramatically alter the understanding of the case and its relevance as a feature in AI models. Clear semantic definitions of roles vs person from the SCALES ontology are essential for better training machine learning models.

3.2 Monitoring Contaminants in the National Food and Wster System

The Safe Agricultural Products and Water Graph (SAWGraph) is a sophisticated Open Knowledge Network (OKN) designed to address the critical environmental challenge of PFAS (per- and polyfluoroalkyl substances) contamination. By integrating disparate datasets from hydrology, chemistry, industry, and agriculture, SAWGraph provides a semantic framework for tracing the flow of toxic forever chemicals through the environment. The project serves as a premier example of how knowledge graphs can bridge the gap between scientific silos, allowing researchers and policymakers to move beyond isolated spreadsheets toward a holistic understanding of how industrial activities impact food and water safety.

Technically, the project utilizes a graph of graphs architecture, aligning several domain-specific ontologies to maintain data integrity across scales. It leverages the SOSA/SSN (Sensor, Observation, Sample, and Actuator) ontology to standardize chemical measurement data, alongside FoodOn for agricultural products and NAICS codes for industrial facilities. A key innovation in its design is the use of S2 cells -- a discrete global grid system -- to handle spatial data. By mapping all environmental features to these hierarchical cells rather than performing complex real-time geometric intersections, the system achieves high-performance spatial querying that is essential for identifying proximity-based contamination risks.

Beyond simple data storage, SAWGraph enables advanced reasoning through transitive connectivity within its hydrology and supply chain components. This allows the graph to model upstream and downstream relationships; for instance, a user can trace the path of a contaminant from an industrial discharge point through river networks and into the groundwater wells used for crop irrigation. This ability to model flow and influence across different physical domains demonstrates the power of knowledge graphs in predictive modeling, specifically in identifying at-risk agricultural lands that may not yet have documented contamination but are hydrologically linked to known PFAS sources.

As a case study for knowledge graph implementation, SAWGraph highlights the importance of interoperability and community standards. The project demonstrates that the value of an environmental knowledge graph lies not just in the volume of data it contains, but in its ability to provide actionable decision support through semantic queries. By creating a unified view of the source-to-sink journey of contaminants, SAWGraph provides a scalable blueprint for using semantic web technologies to monitor ecological health and protect public resources in an increasingly complex industrial landscape.

3.3 Precision Medicine

Scalable PrecisiOn Medicine Knowledge Engine (SPOKE) is an open knowledge network that connects curated information from thirty-seven specialized and human-curated databases into a single property graph, With 3 million nodes and 15 million by a recent count. Much of this data is composed of genomic associations with disease, chemical compounds and their binding targets, and metabolic reactions from select bacterial organisms of relevance to human health.

The development of the SPOKE is being driven by several precision medicine questions. What are the potential side effects of a drug> How can an existing drug be repurposed for new indications? Is a protein invovled in a specific disese suitable for drug discovery?

SPOKE makes an extensive use of many of the concepts introduced in this course. Extensive use is made of analyses such as centrality to test the utility of specific nodes. User interfaces are designed fro specific purposes. Instead of showing a hair ball view of the graph, a multi-level neighborhood explorer is designed that resembles a geographic map. Separate interfaces are designed for clinicians.

Neighborhood Explorer in SPOKE
Figure 3: A map-like visualization of the knowledge graph in SPOKE

The Spoke Knowledge Graph was used to predict a potential treatment to reduce mortality in COVID-19 patients on mechanical ventilation. By tracing a path in Spoke from the ACE2 protein -- the receptor SARS-CoV-2 uses to enter cells -- to the drug dexamethasone, researchers uncovered a previously unknown connection. Analysis of gene expression revealed that mechanical stress from ventilation upregulates ACE2, while dexamethasone suppresses the hormone midkine (MK), which mediates this stress response. This creates a vicious cycle: ventilation intended to support patients could inadvertently increase viral spread in the lungs. Spoke helped "connect the dots" across multiple domains of knowledge, suggesting that corticosteroids could break this cycle. Subsequent clinical studies confirmed that corticosteroids reduce mortality in ventilated COVID-19 patients by about 30%, demonstrating the power of knowledge graphs to reveal hidden insights in complex biomedical data.

4. Summary

Knowledge graphs have wide-ranging applications across industries, enabling analytics, computations, and data exchange. In the financial industry, knowledge graphs are applied to analytics, calculations, and reporting, helping organizations uncover novel insights, perform complex reasoning, and streamline data integration. Government open data illustrates their power in enabling solution of difficult problems ranging from drug discovery to providing greater visibility into court processes. By making heterogeneous data interoperable and queryable, knowledge graphs reveal hidden patterns, support sophisticated decision-making, and enable cross-domain insights. As more sectors -- including finance, healthcare, and government -- adopt knowledge graphs, their role in driving intelligent, data-informed strategies will continue to expand and become increasingly mainstream.

5. Further Reading

Most graph database vendors maintain an inventory of use cases for exploiting their technology. As an example, see the collection maintained by Neo4j [Neo4j Graph Use Cases]. An overview of the Open Knowledge Network and the SPOKE and SCALES KGs discussed in this chapter can be found in a special issue of the AI Magazine [AIM 2022]. The current status of these projects along with an overview of SAWGRAPH and several other related projects is available as part of the OKN masteclass offered by the Edugate project [Proto OKN].

In recent years, the use case of leveraging graphs with Large Language Models in systems known as Regrieval Augmented Generate (RAG) has become very popular. Several surveys exist, and a good starting point is the ongoing work at Meta on building a knowledgable intelligent assistant [Dong 2025].

[Neo4j Graph Use Cases] Neo4j. "Graph Database Use Cases and Solutions." Neo4j, https://neo4j.com/use-cases/ . Accessed 29 December 2025.

[AIM 2022] AI Magazine Special Issue on The NSF Convergence Accelerator Program

[Proto OKN] Proto OKN Master Class offered as part of Edugate

[Dong 2025] Luna Dong. Where Are We in the Journey to A Knowledgeable Assistant"

Exercises

Exercise 9.1. Suppose, you are facing the problem of finding alternative routes in the face of a traffic jam. Which of the following graph algorithms might be a good choice for this use case?
(a) A* Search
(b) Minimal Spanning Tree
(c) Random walk
(d) All pairs shortest path
(e) Depth-first Search

Exercise 9.2. Suppose, you are facing the problem of deciding the location of a branch office in a city, and you need to choose the most accessible location. Which graph algorithm might you use to help you make this decision?
(a) Degree centrality
(b) Closeness centrality
(c) Betweenness centrality
(d) Page rank
(e) Public opinion survey

Exercise 9.3. Suppose, you are working on fraud analysis, and you need to check if a group has a few discrete bad behaviors or is acting as a systematic collection of entities, which of the following graph algorithms might be ideally suited?
(a) Triangle count
(b) Connected components
(c) Strongly connected components
(d) Page Rank
(e) Fast unfolding algorithm

Exercise 9.4. An income tax application can use as knowledge graph in which of the following ways?
(a) Analyze the graph of rules to determine what inputs are required
(b) Create a taxonomy of classes to better organize the tax calculation rules
(c) Connect customer data with different investment opportunities
(d) All of the above
(e) None of the above

Exercise 9.5. Which of the following could not be achieved by financial ontologies such as FIBO?
(a) Stock recommendations that will outperform the market
(b) A standard vocabulary to exchange information
(c) Index analysis for ETF development
(d) Analysis of counter party exposure
(e) Exchange instrument data offering