list - Python : How to optimize comparison between two large sets? -


i salute ! i'm new here, , i've got little problem trying optimize part of code.

i'm reading 2 files :

corpus.txt -----> contains text (of 1.000.000 words)

stop_words.txt -----> contains stop_list (of 4000 words)

i must compare each word corpus every word in stop_list, because want have text without stop words, i've : 1.000.000*4000 comparisons code below :

fich= open("corpus.txt", "r") text = fich.readlines()  fich1= open("stop_words.txt", "r") stop = fich1.read()  tokens_stop = nltk.wordpunct_tokenize(stop) tokens_stop=sorted(set(tokens_stop))  line in text :     tokens_rm = nltk.wordpunct_tokenize(line)     z = [val val in tokens_rm if val not in tokens_stop]     in z:         print 

my question : there differently ? structure optimize ?

you can create set of stop_words, every word in text see if in set.

actually looks using set. though don't know why sorting it.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -