Knowledge Graphs

Exercise 4.2 - TF/IDF

A commonly used approach to account for the importance of words is a measure known as TF/IDF. Term frequency (TF) denotes the number of times a term occurs in a document. Inverse Document Frequency (IDF) denotes the number of documents containing a term. The TF/IDF score is calculated by taking the product of TF and IDF. Use your intuition to answer whether the following is true or false?

a. Higher the TF/IDF score of a word, the rarer it is.
b. In a general news corpus, TF/IDF for the word Apple is likely to be higher than the TF/IDF for the word Corporation
c. Common words such as stop words will have a high TF/IDF.
d. If a document contains words with high TF/IDF, it is likely to be ranked higher by the search engines.
e. The concept of TF/IDF is not limited to words, and can also be applied to sequence of characters, for example, bigrams.