A commonly used approach to account for the importance
of words is a measure known as TF/IDF. Term frequency (TF) denotes
the number of times a term occurs in a document. Inverse Document
Frequency (IDF) denotes the number of documents containing a term.
The TF/IDF score is calculated by taking the product of TF and
IDF. Use your intuition to answer whether the following is true
or false?
a.
Higher the TF/IDF score of a word, the rarer it is.
b.
In a general news corpus, TF/IDF for the word Apple is likely to be higher than the TF/IDF for the word Corporation
c.
Common words such as stop words will have a high TF/IDF.
d.
If a document contains words with high TF/IDF, it is likely to be ranked higher by the search engines.
e.
The concept of TF/IDF is not limited to words, and can also be applied to sequence of characters, for example, bigrams.