An important feature used for entity extraction is
Word shape: it represents the abstract letter pattern of the
word by mapping lower-case letters to x, upper-case to X, numbers
to d, and retaining punctuation. Thus for example C.I.A. would map
to X.X.X. and IRS-1040 would map to XXX-dddd. In a shorter-version of
word shape, consecutive character types are removed. For example,
C.I.A. would still map to X.X.X, but IRS-1040 would map to X-d. With
these definitions, address the following questions.
a.
What is the shape of the word: Googenheim?
b.
What is the short-shape of the word: Googenheim?
c.
What is the regular expression for the shape of the word Googenheim?
d.
What is the regular expression for the short shape of the word Googenheim?
e.
Is it true that the short-shape is always strictly smaller than the regular shape of a word?