python - Identifying keys with multiple values in a hash table -
i beginner in python scripting.
i have csv file has 5 columns , on 1000 rows. attaching screenshot give idea of how file looks like. (i have included 4 rows, real file has on 1000 rows). task trying achieve this:
i need print output csv file, prints rows of original csv file based on following conditions.
- each "number" field (column1) supposed have 1 "name" field associated it. if has more 1 name fields associated it, must throw error (or display message next number in output.csv)
- if number field has 1 name associated it, print entire row.
the data in csv file in below format.
number name choices 11234 abcdef a1b6n5 11234 abcdef a2b6c4 11234 efghjk a4f2 11235 abcdef a3f5h7 11236 mnopqr f3d4d5
so expected output should this. flag , message should displayed when "number" has more 1 "name" associated it. if "name" has been associated more 1 "number" should not flagged. (like 11235 had same name 11234, not flagged).
number name choices flag message 11234 1 more 1 name 11234
11234
11235 abcdef a3f5h7 11236 mnopqr f3d4d5
i understand can implemented hashtable, number serves key , name serves value. if value count more 1 key, can set flag , print error message accordingly.
but me started this? in, how implement in python?
any appreciated.
thanks!
here few concepts should learn , understand first:
importing , exporting csv: https://docs.python.org/2/library/csv.html
counter: https://docs.python.org/2/library/collections.html#collections.counter
or
defaultdict(int) counting: https://docs.python.org/2/library/collections.html#collections.defaultdict
it sounds need column1 key of dictionary. if you're trying count how many times appears (that's not clear), can use names = defaultdict(int); names[key]+=1
if want remove duplicates no counting or crash if there's duplicate, here's can do:
mydict = {} open('yourfile.csv', mode='r') infile: reader = csv.reader(infile) open('yourfile.csv', mode='w') outfile: writer = csv.writer(outfile) row in reader: key = row[0] if key in mydict: #could handle separately print "bad key, found: %s. ignoring row: %s" % (key, row) raise #element found mydict[key] = row writer.writerows(mydict.values())
if doesn't work, please give sample input , expected output. either way, should started. also, patient: you'll learn doing things wrong , figuring out why wrong. luck!
====
update: have few choices. easiest beginning build 2 lists , output them.
use key = row[1]
if key in dictionary, remove (del mydict[key]
) , add other dict multiple_dict = {}; multiple_dict[key] = [number, none, none, data, message]
def proc_entry(row): key = row[1] saved existing data if key in mydict: multiple_dict[key] = key, none, none, 1, "message" del mydict[key] elif key in multiple_dict: #key duplicated, increase flag? multiple_dict[key][4]+=1
at point, code getting complicated enough use things like: number, name, value = row
, , splitting code functions. should test functions known input see if output expected. i.e. pre-load "mydict", call processing function , see how worked. better? learn write simple unit tests :) .
while write you, that's not spirit of stackoverflow. if have more questions, might want split precise questions haven't been answered already. mentioned above have been found on stackoverflow , bit of practice. knowing solution go art of programming! have fun ...or hire programmer if isn't fun you!
Comments
Post a Comment