java - Most efficient data structure for storing an alphabetically ordered word list -
my program read in paragraph of words (stored in text file). need following:
- print out list of words (alphabetical order). each word, print frequency count (how many times word appears in entire paragraph) , line numbers in word appears on (does not need ordered). if word appears on line multiple times, line number not need stored twice (the frequency count of word still updated)
- display list of words ordered frequent least frequent.
- the user input specific word. if word found, print out frequency count.
limitations: cannot use collections
class , cannot store data multiple times. (e.g. reading in words paragraph , storing them set , arraylist)
coding won't hard, can't figure out efficient implementation since data size few paragraphs wikipedia article or something. here's idea now:
- have word class. word class contain methods return word's frequency count , lines in word appears on (and other relevant data).
- the paragraph stored in text file. program read data line line. split line array , read in words 1 one.
- as words being read in text file, put words sort of structure. if structure not contain word, create new word object.
- if structure contains word, update frequency counter word.
- i have
int
record down line number. these line numbers updated accordingly.
- i have
this incomplete, i'm thinking now. whole 'word' class may unnecessary, too.
first, create class holds data occurrences , row numbers (along word). class implement comparable
interface, providing easy comparisons based on word frequencies:
public class wordoccurrence implements comparable<wordoccurrence> { private final string word; private int totalcount = 0; private set<integer> linenumbers = new treeset<>(); public wordoccurrence(string word, int firstlinenumber) { this.word = word; addoccurrence(firstlinenumber); } public final void addoccurrence(int linenumber) { totalcount++; linenumbers.add(linenumber); } @override public int compareto(wordoccurrence o) { return totalcount - o.totalcount; } @override public string tostring() { stringbuilder linenumberinfo = new stringbuilder("["); (int line : linenumbers) { if (linenumberinfo.length() > 1) { linenumberinfo.append(", "); } linenumberinfo.append(line); } linenumberinfo.append("]"); return word + ", occurences: " + totalcount + ", on rows " + linenumberinfo.tostring(); } }
when reading words file, it's useful return data in map<string, wordoccurrence>
, mapping words wordoccurrence
s. using treemap
, you'll alphabetical ordering "for free". also, may want remove punctuation lines (e.g. using regexp \\p{p}
) , ignore case of words:
public treemap<string, wordoccurrence> countoccurrences(string filepath) throws ioexception { treemap<string, wordoccurrence> words = new treemap<>(); file file = new file(filepath); bufferedreader reader = new bufferedreader(new inputstreamreader( new fileinputstream(file))); string line = null; int linenumber = 0; while ((line = reader.readline()) != null) { // remove punctuation , normalize lower-case line = line.replaceall("\\p{p}", "").tolowercase(); linenumber++; string[] tokens = line.split("\\s+"); (string token : tokens) { if (words.containskey(token)) { words.get(token).addoccurrence(linenumber); } else { words.put(token, new wordoccurrence(token, linenumber)); } } } return words; }
displaying occurrences in alphabetical order using above code simple as
for (map.entry<string, wordoccurrence> entry : countoccurrences("path/to/file").entryset()) { system.out.println(entry.getvalue()); }
if cannot use collections.sort()
(and comparator<wordoccurrence>
) sorting occurrences, need write sorting yourself. should it:
public static void displayinorderofoccurrence( map<string, wordoccurrence> words) { list<wordoccurrence> orderedbyoccurrence = new arraylist<>(); // sort (map.entry<string, wordoccurrence> entry : words.entryset()) { wordoccurrence wo = entry.getvalue(); // initialize list on first round if (orderedbyoccurrence.isempty()) { orderedbyoccurrence.add(wo); } else { (int = 0; < orderedbyoccurrence.size(); i++) { if (wo.compareto(orderedbyoccurrence.get(i)) > 0) { orderedbyoccurrence.add(i, wo); break; } else if (i == orderedbyoccurrence.size() - 1) { orderedbyoccurrence.add(wo); break; } } } } // display (wordoccurrence wo : orderedbyoccurence) { system.out.println(wo); } }
running above code using following test data:
potato; orange. banana; apple, apple; potato. potato.
will produce output:
apple, occurrences: 2, on rows [2] banana, occurrences: 1, on rows [2] orange, occurrences: 1, on rows [1] potato, occurrences: 3, on rows [1, 2, 3] potato, occurrences: 3, on rows [1, 2, 3] apple, occurrences: 2, on rows [2] banana, occurrences: 1, on rows [2] orange, occurrences: 1, on rows [1]
Comments
Post a Comment