list - Python : How to optimize comparison between two large sets? -

June 15, 2011

i salute ! i'm new here, , i've got little problem trying optimize part of code.

i'm reading 2 files :

corpus.txt -----> contains text (of 1.000.000 words)

stop_words.txt -----> contains stop_list (of 4000 words)

i must compare each word corpus every word in stop_list, because want have text without stop words, i've : 1.000.000*4000 comparisons code below :

fich= open("corpus.txt", "r") text = fich.readlines()  fich1= open("stop_words.txt", "r") stop = fich1.read()  tokens_stop = nltk.wordpunct_tokenize(stop) tokens_stop=sorted(set(tokens_stop))  line in text :     tokens_rm = nltk.wordpunct_tokenize(line)     z = [val val in tokens_rm if val not in tokens_stop]     in z:         print

my question : there differently ? structure optimize ?

you can create set of stop_words, every word in text see if in set.

actually looks using set. though don't know why sorting it.

Search This Blog

Lix

list - Python : How to optimize comparison between two large sets? -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -

php - How can I echo out this array? -