c# - Excluding words from dictionary -


i reading through documents, , splitting words each word in dictionary, how exclude words (like "the/a/an").

this function:

private void splitter(string[] file) {     try     {         tempdict = file             .selectmany(i => file.readalllines(i)             .selectmany(line => line.split(new[] { ' ', ',', '.', '?', '!', }, stringsplitoptions.removeemptyentries))             .asparallel()             .distinct())             .groupby(word => word)             .todictionary(g => g.key, g => g.count());     }     catch (exception ex)     {         ex(ex);     } } 

also, in scenario, right place add .tolower() call make words file in lowercase? thinking before (temp = file..):

file.tolist().convertall(d => d.tolower()); 

do want filter out stop words?

 hashset<string> stopwords = new hashset<string> {     "a", "an", "the"   };    ...   tempdict = file    .selectmany(i => file.readalllines(i)    .selectmany(line => line.split(new[] { ' ', ',', '.', '?', '!', }, stringsplitoptions.removeemptyentries))    .asparallel()    .select(word => word.tolower()) // <- lower case     .where(word => !stopwords.contains(word)) // <- no stop words    .distinct()    .groupby(word => word)    .todictionary(g => g.key, g => g.count()); 

however, code partial solution: proper names berlin converted lower case: berlin acronyms: kiss (keep simple, stupid) become kiss , numbers incorrect.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -