Big data analytics at North Carolina State University
Loading...
1,584
Loading...
Uploader Comments (IBMetinfo)
see all
All Comments (3)
-
how does IBM Content analytic recognize whether it's a company name or just a random name (considering we might deal with other languages)
moreover, do you have a link for a video or just steps explanation of creating the keywords? is it a must to have IBM Language ware in order to create those key words? thx
Loading...
We first used a dictionary of several hundred company names, then we added natural language processing rules to identify names not in a predefined list. For example, words starting with capital letters followed by "Inc.", "Corp", "Company", "LLC", "S.A.", "plc", "Ltd.", etc. There might be some false positives, but refining the rules can reduce them.
The ICA tool can recognize whether a document is of a particular language. So if needed, special language specific rules could be defined.
IBMetinfo 8 months ago