java - Most efficient data structure for storing an alphabetically ordered word list -

my program read in paragraph of words (stored in text file). need following:

print out list of words (alphabetical order). each word, print frequency count (how many times word appears in entire paragraph) , line numbers in word appears on (does not need ordered). if word appears on line multiple times, line number not need stored twice (the frequency count of word still updated)
display list of words ordered frequent least frequent.
the user input specific word. if word found, print out frequency count.

limitations: cannot use collections class , cannot store data multiple times. (e.g. reading in words paragraph , storing them set , arraylist)

coding won't hard, can't figure out efficient implementation since data size few paragraphs wikipedia article or something. here's idea now:

have word class. word class contain methods return word's frequency count , lines in word appears on (and other relevant data).
the paragraph stored in text file. program read data line line. split line array , read in words 1 one.
as words being read in text file, put words sort of structure. if structure not contain word, create new word object.
if structure contains word, update frequency counter word.
- i have int record down line number. these line numbers updated accordingly.

this incomplete, i'm thinking now. whole 'word' class may unnecessary, too.

first, create class holds data occurrences , row numbers (along word). class implement comparable interface, providing easy comparisons based on word frequencies:

public class wordoccurrence implements comparable<wordoccurrence> {      private final string word;     private int totalcount = 0;     private set<integer> linenumbers = new treeset<>();      public wordoccurrence(string word, int firstlinenumber) {         this.word = word;         addoccurrence(firstlinenumber);     }      public final void addoccurrence(int linenumber) {         totalcount++;         linenumbers.add(linenumber);     }      @override     public int compareto(wordoccurrence o) {         return totalcount - o.totalcount;     }      @override     public string tostring() {         stringbuilder linenumberinfo = new stringbuilder("[");         (int line : linenumbers) {             if (linenumberinfo.length() > 1) {                 linenumberinfo.append(", ");             }             linenumberinfo.append(line);         }         linenumberinfo.append("]");         return word + ", occurences: " + totalcount + ", on rows "                 + linenumberinfo.tostring();     } }

when reading words file, it's useful return data in map<string, wordoccurrence>, mapping words wordoccurrences. using treemap, you'll alphabetical ordering "for free". also, may want remove punctuation lines (e.g. using regexp \\p{p}) , ignore case of words:

public treemap<string, wordoccurrence> countoccurrences(string filepath)         throws ioexception {     treemap<string, wordoccurrence> words = new treemap<>();      file file = new file(filepath);     bufferedreader reader = new bufferedreader(new inputstreamreader(             new fileinputstream(file)));     string line = null;     int linenumber = 0;      while ((line = reader.readline()) != null) {         // remove punctuation , normalize lower-case         line = line.replaceall("\\p{p}", "").tolowercase();         linenumber++;         string[] tokens = line.split("\\s+");         (string token : tokens) {              if (words.containskey(token)) {                 words.get(token).addoccurrence(linenumber);             } else {                 words.put(token, new wordoccurrence(token, linenumber));             }         }     }      return words; }

displaying occurrences in alphabetical order using above code simple as

for (map.entry<string, wordoccurrence> entry :          countoccurrences("path/to/file").entryset()) {         system.out.println(entry.getvalue()); }

if cannot use collections.sort() (and comparator<wordoccurrence>) sorting occurrences, need write sorting yourself. should it:

public static void displayinorderofoccurrence(         map<string, wordoccurrence> words) {      list<wordoccurrence> orderedbyoccurrence = new arraylist<>();      // sort     (map.entry<string, wordoccurrence> entry : words.entryset()) {         wordoccurrence wo = entry.getvalue();          // initialize list on first round         if (orderedbyoccurrence.isempty()) {             orderedbyoccurrence.add(wo);         } else {              (int = 0; < orderedbyoccurrence.size(); i++) {                 if (wo.compareto(orderedbyoccurrence.get(i)) > 0) {                     orderedbyoccurrence.add(i, wo);                     break;                 } else if (i == orderedbyoccurrence.size() - 1) {                     orderedbyoccurrence.add(wo);                     break;                 }             }         }     }      // display     (wordoccurrence wo : orderedbyoccurence) {         system.out.println(wo);     } }

running above code using following test data:

 potato; orange. banana; apple, apple; potato. potato.

will produce output:

 apple, occurrences: 2, on rows [2] banana, occurrences: 1, on rows [2] orange, occurrences: 1, on rows [1] potato, occurrences: 3, on rows [1, 2, 3]  potato, occurrences: 3, on rows [1, 2, 3] apple, occurrences: 2, on rows [2] banana, occurrences: 1, on rows [2] orange, occurrences: 1, on rows [1]

Search This Blog

Lix

java - Most efficient data structure for storing an alphabetically ordered word list -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -