returns a list of all the unique words in a file in python -

January 15, 2010

write function takes 3 parameters, filename , 2 substrings, , returns list of unique words in file contain both substrings (in order first appear in file).

for example, unique words in previous sentence contains substring 'th' , 'at' ['that']. function should pass following doctests:

def words_contain2(filename, substring1, substring2):      """     >>> words_contain2('words_tst.txt', 're', 'cu')     ['recursively', 'recursive.']     >>> words_contain2('words_tst.txt', 'th', 'at')     ['that']     >>> words_contain2('/usr/share/dict/words', 'ng', 'warm')     ['afterswarming', 'hearthwarming', 'housewarming', 'inswarming', 'swarming', 'unswarming', 'unwarming', 'warming', 'warmonger', 'warmongering']     """  if __name__ == '__main__':     import doctest     doctest.testmod(verbose = true)

actually ive tried this:

def words_contain2(filename, substring1, substring2):     files=open(filename,"r")     files_read=files.read()     filelist=files_read.split()     sub1=substring1     sub2=substring2     count=0     result=""     while count<len(filelist):         if sub1 in filelist[count] , sub2 in filelist[count]:             result = result + filelist[count]+","         count += 1     print result

but returns result recursively, recursively, recursive, recursively

in opinion, there 2 mistakes:

i got string not list in result
the question gives example doctest prints word in result list once. in file, same word might appear more 1 time.

i lost original file word_tst.txt.

filtering list strings contain substring without maintaining uniqueness order way easy filter function

not_unique = filter(lambda x:str(x).__contains__(substring1) , str(x).__contains__(substring2), content.split())

but need create unique list order maintained

def words_contain2(filename, substring1, substring2):     file_ = open(filename, "r")     content = file_.read()     not_unique = filter(lambda x:str(x).__contains__(substring1) , str(x).__contains__(substring2), content.split())     seen = set()     return [x x in not_unique if not (x in seen or seen.add(x))]

Search This Blog

Lix

returns a list of all the unique words in a file in python -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -

php - How can I echo out this array? -