Tuesday, April 24, 2012

Membership testing in a large list that has some wildcards

How do I test whether a phrase is in a large (650k) list of phrases when that list includes special categories?



For instance, I want to test if the phrase ["he", "had", "the", "nerve"] is in the list. It is, but under ["he", "had", "!DETERMINER", "nerve"] where "!DETERMINER" is the name of a wordclass that contains several choices (a, an, the). I have about 350 wordclasses and some of them are quite lengthy, so I don't think it would be feasible to enumerate each item in the list that has one (or more) wordclasses.



I would like to use a set of these phrases instead of slowly working my way through a list, but I don't know how to deal with the variability of the wordclasses. Speed is pretty important, since I need to make this comparison hundreds of thousands of times per go.





No comments:

Post a Comment