CS520 Knowledge Graphs
What
should AI
Know ?
 

How to Create a Knowledge Graph from Text?


1. Introduction

Textual sources—ranging from financial news and SEC filings to digital publications like the Wall Street Journal—contain a wealth of data critical for market research and business intelligence. By leveraging Natural Language Processing (NLP), we can transform this unstructured text into knowledge graphs to enable advanced analytics.

Because NLP is a vast and evolving field, this section does not aim to provide an exhaustive overview. Instead, we focus on the core NLP concepts essential for building knowledge graphs. Understanding these fundamentals is increasingly important regardless of the specific NLP method used for information extraction. Many specialized data vendors have created dedicated systems to extract structured data from the vast sea of natural language text.

To construct a robust knowledge graph from unstructured text, we primarily rely on three fundamental NLP tasks:

  • Entity Extraction: Also known as Named Entity Recognition (NER), this is the process of identifying and classifying key entities within the text—such as Organizations, People, or Locations. In the context of a knowledge graph, these extracted entities serve as the primary nodes.
  • Relation Extraction: Once entities are identified, the next step is to determine the associations between them. For example, from a financial report, we might extract the relation "is CEO of" between a Person entity and an Organization entity. This process is also used to identify specific properties of an entity, such as "Net Sales" or "Headquarters." These relations and properties form the edges or node attributes within the graph.
  • Entity Resolution: This task ensures that multiple mentions across a text refer to the correct single entity. This involves coreference resolution (linking "John Smith" to the pronoun "he" later in a paragraph) and entity linking (recognizing that "Apple," "Apple Inc.," and "the Cupertino-based tech giant" all refer to the same node). Proper resolution prevents the creation of duplicate nodes and ensures the graph remains accurate and connected

In this chapter, we provide an overview of the techniques used for entity and relation extraction. We have omitted entity resolution from this discussion, as it is an advanced topic that falls outside the scope of this volume.

Most modern extraction techniques rely on adapting pre-trained language models to specific tasks. For our requirements, we treat these models and their underlying machine learning architectures as "black boxes," as they are now widely available as accessible, off-the-shelf tools. This shift in the NLP landscape allows knowledge graph creators to focus on the final product—the graph itself—rather than the intricacies of model architecture. Instead, the primary responsibility of the developer shifts toward providing high-quality training and evaluation data to fine-tune these models.

We will begin with a high-level overview of language models before diving into the specifics of the entity and relation extraction tasks.

2. Overview of Language Models

Language modeling is the task of predicting the next word in a sequence based on the words that preceded it. For example, given a sentence fragment: "students opened their", a language model predicts the most likely subsequent words, such as, "book", "exam", "laptop", etc.

More formally, given a set of words x1, ... , xn-1, a language model calculates the probability P( xn | x1, ... , xn-1), for every word xn in its vocabulary. This technology is the backbone of familiar tools like search engine autocomplete, smartphone auto correcting, and modern generative AI.

To understand how these probabilities are calculated, consider the following tiny corpus consisting of only three sentences.

dogs chase cats
cats love milk
dogs love people

This corpus has six words: dogs, chase, cats, love, milk and people. The number of times each of them occurs in the corpus is respectively 2, 1, 2, 2, 1, 1. We can also observe that any pair of words that appears in the corpus occurs exactly once. For example, love is followed by cats only once. If we wanted to calculate the probability P(milk|love), we could calculate that value to be 0.5 as the ratio of count (love milk) and count(love). Here count(love milk) denotes the number of times love is followed by milk in the corpus (i.e., 1), and the number of times love appears in the corpus (i.e., 2).

Modern language models are created by training a deep learning model, such as a Recurring Neural Network, on a large corpus of text. Numerous variations of pre-trained language models are available as open source products that can be adapted for the purpose of the specific task at hand. As we explore the techniques for entity and relation extraction in the following sections, we will describe how these models are adapted to transform raw sentences into the triples required for a knowledge graph.

3. Entity Extraction

We will begin by considering a concrete example of entity extraction, and then give an overview of different approaches to entity extraction, and conclude the section by discussing some challenges in performing well at this task.

3.1 An Example of Entity Extraction

A named entity is generally defined as anything that can be referred to by a proper name, such as a person, location, or organization. In practice, this definition is often extended to include "entity-like" values such as dates, times, currencies, and numerical expressions.

Consider the following sentence from a news story:

    Cecilia Love, 52, a retired police investigator who lives in Massachusetts, said she paid around $370 a ticket with tax for nonstop United Airlines flights to Sacramento from Boston for her niece's high school graduation in June, 2020.    

To a computer, this sentence is just a string of characters. An entity extraction model "extracts" the entities by tagging the segments of text with their corresponding types:

    [PER Cecilia Love], 52, a retired police investigator who lives in [LOC Massachusetts], said she paid around [MONEY $370] a ticket with tax for nonstop [ORG United Airlines] flight to [LOC Sacramento] from [LOC Boston] for her niece's high school graduation in [TIME June, 2020].    

The paragraph contains seven named entities, one of which is a person (indicated by PER), three are locations (indicated by LOC), one is money (indicated by MONEY), one is an organization (indicated by ORG), and one is a time (indicated by TIME). Depending on the domain of application, we may introduce more or less named entity types. For example, in the task of identifying key terms in a text, there is only one entity type that captures a key term.

Entity extraction is a versatile tool used across many modern software applications.

  • Question Answering (QA): When a user asks a specific question, entity extraction helps the system isolate potential answers from a retrieved passage. For instance, if a user asks, "Which airline did Cecilia fly?", the system uses NER to identify [ORG United Airlines] as the target answer.
  • Semantic Augmentation: In word processing or web browsing, entity extraction can identify entities in real-time and provide "hover-over" definitions, historical facts, or direct links to external databases like Wikipedia or a corporate CRM.
  • Knowledge Discovery: By extracting entities across thousands of documents, researchers can identify hidden patterns, such as a specific person appearing frequently in proximity to a particular location or organization.

3.2 Approaches to Entity Extraction

At its core, entity extraction is framed as a sequence labeling problem. In this framework, we associate a label with every word (or token) in a sentence, and the model's task is to predict the most likely label for each.

We can perform entity extraction using three broad approaches:

  • Classical Sequence Labeling: These are statistical models, such as Conditional Random Fields (CRF), that analyze features of a word and its neighbors to determine the most probable sequence of labels.
  • Deep Learning Models: Modern approaches use neural architectures like Transformers to learn complex linguistic patterns. These models represent the current state-of-the-art and are highly effective at understanding context.
  • Rule-Based Approaches: These rely on predefined patterns, regular expressions, or dictionaries (gazetteers). They are particularly useful for highly structured data such as phone numbers, dates, or standardized product codes.

To facilitate the labeling, we introduce a labeling scheme that is known as BIOES in which the meaning of different tags is as follows: B stands for the beginning of an entity, I stands for the interior of an entity, O stands for a word that is not part of an entity, E stands for the end of an entity, and S stands for a single word entity. As an example, the words in the text snippet shown above will be labeled as shown below.

Cecilia B Love E , O 52 O , O
a O retired O police O investigator O who O
lives O in O Massachusetts S , O said O
she O paid O around O $370 S a O
ticket O with O tax O for O nonstop O
United B Airlines E flights O to O SacramentoS
from O Boston S for O her O niece'sO
high O school O graduationO inO June B
, I 2020 E

In the sequence labeling approach, we train a statistical model—such as conditional random fields (CRF) —to predict the correct BIOES tag for each token. This method is characterized by a heavy reliance on feature engineering, where developers must manually identify and extract relevant attributes from the text to guide the model's decisions. These features often include linguistic attributes like part-of-speech tags and the base form of the word, as well as orthographic patterns such as whether the word is in all-caps, contains digits, or possesses specific prefixes and suffixes. Furthermore, models may incorporate lexical lookups against a gazetteer (a list of known entities) or leverage word embeddings to capture semantic context. A significant challenge of this approach is that the performance can vary drastically depending on the application domain and the choice of features. As a result, moving from a general news corpus to a specialized field like medicine or law requires substantial effort to "hand-craft" and tune a new feature set that can accommodate the specific terminologies and structures of that domain.

In a deep learning approach, there is no feature engineering, and we simply input word embeddings to a language model. Instead of predicting the next word, the language model now predicts one of the five tags (B, I, O, E, S) tags that are required for entity recognition. To adapt the language model to this new task, we first pre-train it using the corpus for that domain, and then train it for the task at hand. In the task-specific training of the language model, we provide the training by adding a distinguished token [CLS] that denotes the beginning of an entity, and a second distinguished token [SEP] that denotes the end of an entity. This training allows the model to predict these distinguished tags in response to a text input. Such predictions are enough for us to produce one of the five required tags for each word.

Finally, in a rule-based approach, one specifies labeling rules in a formal query language. The rules can include regular expressions, references to dictionaries, semantic constraints, and may also invoke automated extractors and reference table structures. The rules may also invoke machine learning modules for specific tasks. Rule application can be sequenced in a way that we first use high precision rules, followed by lookup in standard name list, followed by language-based heuristics, and when all else fails, resort to probabilistic machine learning techniques.

3.3 Challenges in Entity Extraction

Although entity extractors can achieve precision and recall above 90% on specific tasks, maintaining strong performance across diverse domains remains challenging. Differences in vocabulary, writing style, and underlying assumptions often cause methods that work well in one setting to degrade in another. In this section, we examine several key challenges encountered in entity extraction.

When labeling entities with semantic classes, ambiguity often arises. For example, the name Louis Vuitton, can refer to either a person, or an organization, or a commercial product. Resolving such ambiguities typically requires analyzing surrounding context, such as nearby words, sentence structure, and the broader topic of the text.

Machine learning models typically require large amounts of labeled training data. In practice, such data may be unavailable or significantly largely incomplete. Training models on incomplete or biased datasets can substantially degrade their performance and limit their ability to generalize.

A related variation of entity extraction is key phrase identification, which aims to extract salient phrases from text rather than instances of a small, fixed set of entity classes. Because key phrases can belong to many possible categories—or may not fit cleanly into any predefined class—identifying key phrases is more difficult. In addition, key phrases can vary widely in complexity, ranging from highly specific expressions (e.g., duplication of a cell by fission) to very general terms (e.g., attach), making it difficult to design a single technique that performs well in all cases.

Entities can appear in many different surface forms, including synonyms, acronyms, plural forms, and other morphological variations. For example, the organization Louis Vuitton may also be referred to simply as LV in text, while key phrases such as duplication of a cell by fission may appear in shortened or rephrased forms like cell fission or cell duplication. Effective entity extraction therefore requires access to lexical knowledge that captures these variations—knowledge that is often unavailable when working in a new domain. Consequently, lexicon extraction, the task of automatically identifying relevant terms and their variants, becomes an important complementary problem for improving entity extraction performance.

4. Relation Extraction

In this section, we begin with several concrete examples of relation extraction to build intuition for the task. We then provide an overview of common approaches used to extract relations from text, and conclude by discussing the key challenges involved in achieving strong performance.

4.1 Examples of Relation Extraction

Considering the text snippet from the previous section, we can extract relations such as Cecilia Love lives in Massachusetts, United Airlines flies from Boston, and United Airlines flies to Sacramento, etc. In a typical relation extraction task, the relevant entities are assumed to have already been identified; relation extraction therefore builds directly on entity extraction. In addition, the set of relations to be extracted (e.g., lives in, flies from, flies to) is usually defined in advance.

A common example of relation extraction task is to extract information from Wikipedia Infoboxes. This information can be used to improve the search results over the internet. Wikipedia infoboxes define relationships such as preceded by, succeeded by, children, spouse, etc. Achieving high accuracy on this task is challenging because of numerous corner cases. For example, Larry King has been married multiple times, and therefore, the extractor must be able to take that into account the time duration for which the marriage existed.

The relation extraction is often applied for extracting domain-specific relationships. For example, the Unified Medical Language Systems supports relationships such as causes, treats, disrupts, etc. In addition to standard relations like subclass-of, and has_part, extracting domain-specific relationships requires careful design and selection of relationships ahead of time. Some approaches attempt to extract relations without specifying them ahead of time, but in practice, these methods are generally less usefulness for producing accurate and meaningful results.

4.2 Approaches to Relation Extraction

There are three broad approaches to relation extraction: syntactic patterns, supervised machine learning, and unsupervised machine learning. As discussed earlier, the unsupervised machine learning has limited use in practice. Therefore, we will primarily consider the use of syntactic patterns and supervised machine learning for relation extraction.

A classical approach to extract relations relies on syntactic patterns known as Hearst Patterns, which are designed to identify specific semantic relationships in the text. For example, consider the following sentence.:

    The bow lute, such as the Bambara ndang, is plucked and has an individual curved neck for each string.     

Even though we may have never heard of Bambara ndang, we can still infer that it is a kind of bow lute. More generally, we can identify syntactic patterns, that are strong indicators of the subclass of relationship. The following five syntactic patterns for identifying subclass-of relations are well established and have been shown to be highly effective in practice.

Pattern NameExample
    such as... works by authors such as Herric, Goldsmith, and Shakespear ...    
    or otherBruises, wounds, broken bones, or other injuries ...    
    and other... temples, treasuries, and other Civic Buildings, ...     
    includingAll common law countries including Canada and England ...     
    especiallyMost European countries especially France, England, and Spain, ...     

New syntactic patterns for extracting relationships can be discovered using a bootstrapping approach. First, we collect a small set of entity pairs for which the relationship is already known. We then search for sentences in a corpus where these pairs co-occur. By identifying common structures across such sentences, we can define new patterns, which are then tested against the corpus to extract additional entity pairs.

A well-known algorithm for this approach is Dual Iterative Pattern Relation Expansion (DIPRE). Consider the task of extracting (author, title) relationships. We start with a small set of known pairs (entities) and locate all sentences containing these pairs. From these sentences, we generate new syntactic patterns. The algorithm recursively uses the newly discovered patterns to identify additional entity pairs, which in turn are used to generate further patterns.

For example, given the seed pair of (William Shakespear, The Comedy of Errors), and the following sentences,

  • The Comedy of Errors, by William Shakespeare, was ...
  • The Comedy of Errors, by William Shakespeare, is ...
  • The Comedy of Errors, one of William Shakespeare's earliest attempts ...
  • The Comedy of Errors, one of William Shakespeare's most ...
we can derive the following patterns:
  • ?x , by ?y,
  • ?x , one of ?y‘s
Using the newly derived patterns, the extraction process continues recursively.

Supervised approaches to relation extraction require large amounts of labeled training data. When such data is available, standard machine learning algorithms can be trained to extract relationships effectively. However, in many domains, obtaining sufficient labeled data is difficult. To address this, weak supervision techniques have become popular. The basic idea of weak supervision is to write several approximate labeling functions that can automatically generate noisy training data. These weak labels are then combined using a probabilistic model to produce a final set of training labels, which can be used to train a supervised relation extraction system.

As an example of a weak labeling function, consider the has part relation. For this relation, it has been difficult to develop reliable syntactic patterns of the sort suggested above. One possible weak labeling function is to first generate a parse tree of the sentence, and then look for two entity nodes connected by a path of length one that contains the verbs has or have. For example, consider the sentence: Most prokaryotes have a cell wall located outside the cell membrane. In the parse tree of this sentence prokaryotes and cell wall are connected through a path of length one that has a labe have indicating a has part relationship. For taxonomic (subclass-of) relationships, another weak labeling function is based on entity modifiers: if two entities share the same base word but one includes an additional modifier, this can indicate a taxonomic relationship. For example, eukaryotic cell can be identified as a subclass of cell.

To adapt a language model for relation extraction, we modify the input representation of a sentence so that each individual term is explicity marked. For example, consider the sentence: All cells have a cell membrane. This sentence has two terms cells and cell membrane. We will indicate the presence of these terms in the sentence by enclosing them in markers [TERM1-START] and [TERM1-END] respectively denoting the start and the end of a term. The set of tokens in the resulting sentence will be: ["All", "[TERM1-START]", "cells", "[TERM1-END]", "have", "a", "[TERM2-START]", "cell" "membrane", "[TERM2-END]", "."]. Our training data then consists of sentences paired with the expected relationship between the two terms. Once trained on that data, the task for the model is to predict the relationships between the two terms indicated in an input sentence. With this simple modification to the input representation, a general-purpose language model can be repurposed for relation extraction.

4.3 Challenges in Relation Extraction

A major challenge in relation extraction is obtaining sufficient labeled training data. Manually annotating text with relation labels is time-consuming and expensive, which limits the scale of supervised approaches. Weak supervision offers a promising alternative by allowing models to be trained on data that contain noisy or imperfect labels. In this approach, external knowledge bases such as Wikidata and lexical resources like WordNet can be used to define labeling functions—heuristic rules that automatically assign tentative labels to text. Although these labels may be inaccurate for individual examples, learning algorithms can often aggregate many such weak signals to produce effective models. Designing better and more expressive weak labeling functions, and methods for combining them, remains an active area of research.

In addition to training, an effective workflow is needed to validate the outputs of relation extraction systems. In many cases, this validation can be performed through crowdsourcing, where human annotators check whether extracted relations are correct. To reduce annotation effort, validation can be prioritized for extracted relations with low confidence scores, which are more likely to contain errors. This selective validation process naturally leads to active learning loops, in which human feedback is used to iteratively improve the model. Designing efficient validation workflows and active learning strategies for relation extraction remains an important and active area of research.

5. Summary

In this chapter, we examined the problem of automatically constructing a knowledge graph from text. We focused on two fundamental tasks: entity extraction and relation extraction. Early approaches to both tasks relied on manually defined rules and syntactic patterns derived from linguistic analysis. In contrast, most modern methods build on large pre-trained language models, which are then fine-tuned or adapted to the specific text corpus and extraction task of interest.

For both entity extraction and relation extraction, the most prevalent current approach is to adapt pre-trained language models learned using deep learning techniques. Earlier syntactic and rule-based methods continue to play an important role, particularly in bootstrapping the training data needed for these models through weak or heuristic labeling. Despite these advances, validating the outputs of entity and relation extraction systems at scale remains an important and unresolved challenge.

Entity linking, also known as entity resolution, is another important task in constructing a knowledge graph from text. It involves mapping an entity mention in a document (such as a person, organization, or location) to the corresponding unique node in a knowledge graph. Accurate entity and relation extraction are prerequisites for effective entity linking, since errors at earlier stages propagate downstream. For this reason, entity linking is often considered a more advanced technique. In practice, however, it may or may not be the primary bottleneck in a given application, depending on factors such as the quality of the extracted entities, the ambiguity of entity names, and the goals of the underlying business problem.

6. Further Reading

The discussion on entity and relation extraction in this chapter draws from the information extraction chapter of the NLP textbook by Jurafsky and Martin [Jurafsky & Martin 2025] which is also an excellent source for more in-depth discussion of modern approaches to natural language processing. The weak labeling approach was pioneered in the Snorkel project at Stanford. [Ratner et. al. 2017]. The methodology described here was used in bootstrapping an ontology graph from a textbook [Chaudhri et. al. 2022].

Much recent work focuses on leveraging generative AI models for knowledge graph construction. Overview of efforts in the knowledge engineering community is available in [Shimizu & Hitzler 2025]. The work, however, remains preliminary and aspirational with ample room for systematic studies [Walker et. al. 2024]. A workshop on this topic is being held as part of the International Semantic Web Conference [LLMs4OL 2025].

[Jurafsky & Martin 2025] Jurafsky, D., & Martin, J. H. (2025). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models (3rd ed., online draft). Stanford University. Retrieved from https://web.stanford.edu/~jurafsky/slp3/

[Ratner et. al. 2017] Ratner, Alexander, Stephen Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. “Snorkel: Rapid Training Data Creation With Weak Supervision”, Proceedings of the VLDB Endowment, 11, no. 3 (November 1, 2017): 269–282. https://doi.org/10.14778/3157794.3157797.

[Chaudhri et. al. 2022] Chaudhri, V. K., Boggess, M., Aung, H. L., Mallick, D. B., Waters, A. C., & Baraniuk, R. E. (2021). A case study in bootstrapping ontology graphs from textbooks. In Proceedings of the 3rd Conference on Automated Knowledge Base Construction (AKBC).

[Shimizu & Hitzler 2025] Shimizu, C., & Hitzler, P. (2025). Accelerating knowledge graph and ontology engineering with large language models. Journal of Web Semantics, 85, 100862. https://doi.org/10.1016/j.websem.2025.100862

[Walker et. al. 2024] Walker, J., Koutsiana, E., Nwachukwu, M., Meroño Peñuela, A., & Simperl, E. (2024). The promise and challenge of large language models for knowledge engineering: Insights from a hackathon. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (pp. 1–9). ACM. https://doi.org/10.1145/3613905.3650844

[LLMs4OL 2025] LLMs4OL 2025: The 2nd Large Language Models for Ontology Learning Challenge @ ISWC 2025. (2025). Retrieved December 14, 2025, from https://sites.google.com/view/llms4ol2025

Exercises

Exercise 5.1. Using the concept of a language model on the following sentence corpus, answer the questions below:

  • I love running.
  • Good health can be achieved by those who love running.
  • I love good health.
  • I love those who love running.
(a) What is P(health|good)?
(b) What is P(running|love)?
(c) What is P(love|I)?
(d) What is (good|love)?
(e) What is P(love|running)?

Exercise 5.2. An important feature used for entity extraction is Word shape: it represents the abstract letter pattern of the word by mapping lower-case letters to ‘x’, upper-case to ‘X’, numbers to ’d’, and retaining punctuation. Thus for example C.I.A. would map to X.X.X. and IRS-1040 would map to XXX-dddd. In a shorter-version of word shape, consecutive character types are removed. For example, C.I.A. would still map to X.X.X, but IRS-1040 would map to X-d. With these definitions, address the following questions.

(a) What is the shape of the word: Googenheim?
(b) What is the short-shape of the word: Googenheim?
(c) What is the regular expression for the shape of the word Googenheim?
(d) What is the regular expression for the short-shape of the word Googenheim?
(e) Is it true that the short-shape is always strictly smaller than the regular shape of a word?

Exercise 5.3. Which of the following may not be a good feature for learning entity extraction?

(a) word shape
(b) part of speech
(c) presence in Gazeteer
(d) presence in Wikipedia
(e) number of characters

Exercise 5.4. Given the following sentence corpus, and the seed (Sacramento,California), what patterns will be extracted by the DIPRE algorithm?

  • The bill was signed in Sacramento, California.
  • Sacramento is the capital of California.
  • Sacramento is the capital of California, and its sixth largest city.
  • California's Sacramento is home to the state legislature, but not the state supreme court.
  • California Governor Jerry Brown signed the bill in Sacramento.
(a) in ?x, ?y
(b) ?x is the capitol of ?y
(c) ?y's ?x
(d) ?x's ?y
(e) ?y Governor * in ?x

Exercise 5.5. Which of the following would be a candidate for a weak labeling function to extract the parthood relationship between entities, i.e., an entity A has part entity B.
(a) ?xs have ?y
(b) ?x includes ?y
(c) ?x contains ?y
(d) ?y surrounds ?x
(e) ?x causes ?y

Exercise 5.6. The goal of this project is to extend the companies database created in Exercise 4.7, and automatically populate it using information extracted from earnings calls transcripts. For this project, you can use the publicly available earnings calls transcripts from StruxData: https://struxdata.github.io/. This project will give you hands-on experience with text processing, information extraction, and knowledge graph construction. Proceed in the following steps.

  1. Select companies and time window
    • Choose a small number of well-known companies to focus on.
    • Select a suitable time window so that you can work with multiple transcripts for the same company across different quarters.
  2. Extract relationships between companies
    • For each earnings call transcript, identify mentions of other companies and determine their relationship to the company hosting the call.
    • If your current database schema does not include these relationships, extend it appropriately (e.g., "competitor," "partner," "customer").
  3. Identify financial headwinds and tailwinds
    • Process each transcript to extract factors that may positively (tailwinds) or negatively (headwinds) affect the company’s stock performance.
    • Consider categorizing these factors (e.g., market trends, regulatory changes, supply chain issues) to make the database more informative.
  4. Optional analysis
    • Even though this task is meant for future chapters, Once your database is populated, begin exploring patterns across companies and quarters, or visualizing the relationships and factors affecting stock performance.

Exercise 5.7. The goal of this project is to bootstrap a knowledge graph from a textbook, following the approach described by Chaudhri et al. (2022). You may use any textbook from the OpenStax textbook library. While the previous work used BERT for information extraction, in this project you will use state-of-the-art language models such as Gemini or ChatGPT.

Additionally, you should design a suitable scheme to evaluate the quality of your extraction. Consider metrics such as precision, recall, or coverage of entities and relations, and think about how to validate the extracted knowledge in a systematic way.