java - Most efficient data structure for storing an alphabetically ordered word list -


my program read in paragraph of words (stored in text file). need following:

  • print out list of words (alphabetical order). each word, print frequency count (how many times word appears in entire paragraph) , line numbers in word appears on (does not need ordered). if word appears on line multiple times, line number not need stored twice (the frequency count of word still updated)
  • display list of words ordered frequent least frequent.
  • the user input specific word. if word found, print out frequency count.

limitations: cannot use collections class , cannot store data multiple times. (e.g. reading in words paragraph , storing them set , arraylist)

coding won't hard, can't figure out efficient implementation since data size few paragraphs wikipedia article or something. here's idea now:

  • have word class. word class contain methods return word's frequency count , lines in word appears on (and other relevant data).
  • the paragraph stored in text file. program read data line line. split line array , read in words 1 one.
  • as words being read in text file, put words sort of structure. if structure not contain word, create new word object.
  • if structure contains word, update frequency counter word.
    • i have int record down line number. these line numbers updated accordingly.

this incomplete, i'm thinking now. whole 'word' class may unnecessary, too.

first, create class holds data occurrences , row numbers (along word). class implement comparable interface, providing easy comparisons based on word frequencies:

public class wordoccurrence implements comparable<wordoccurrence> {      private final string word;     private int totalcount = 0;     private set<integer> linenumbers = new treeset<>();      public wordoccurrence(string word, int firstlinenumber) {         this.word = word;         addoccurrence(firstlinenumber);     }      public final void addoccurrence(int linenumber) {         totalcount++;         linenumbers.add(linenumber);     }      @override     public int compareto(wordoccurrence o) {         return totalcount - o.totalcount;     }      @override     public string tostring() {         stringbuilder linenumberinfo = new stringbuilder("[");         (int line : linenumbers) {             if (linenumberinfo.length() > 1) {                 linenumberinfo.append(", ");             }             linenumberinfo.append(line);         }         linenumberinfo.append("]");         return word + ", occurences: " + totalcount + ", on rows "                 + linenumberinfo.tostring();     } } 

when reading words file, it's useful return data in map<string, wordoccurrence>, mapping words wordoccurrences. using treemap, you'll alphabetical ordering "for free". also, may want remove punctuation lines (e.g. using regexp \\p{p}) , ignore case of words:

public treemap<string, wordoccurrence> countoccurrences(string filepath)         throws ioexception {     treemap<string, wordoccurrence> words = new treemap<>();      file file = new file(filepath);     bufferedreader reader = new bufferedreader(new inputstreamreader(             new fileinputstream(file)));     string line = null;     int linenumber = 0;      while ((line = reader.readline()) != null) {         // remove punctuation , normalize lower-case         line = line.replaceall("\\p{p}", "").tolowercase();         linenumber++;         string[] tokens = line.split("\\s+");         (string token : tokens) {              if (words.containskey(token)) {                 words.get(token).addoccurrence(linenumber);             } else {                 words.put(token, new wordoccurrence(token, linenumber));             }         }     }      return words; } 

displaying occurrences in alphabetical order using above code simple as

for (map.entry<string, wordoccurrence> entry :          countoccurrences("path/to/file").entryset()) {         system.out.println(entry.getvalue()); } 

if cannot use collections.sort() (and comparator<wordoccurrence>) sorting occurrences, need write sorting yourself. should it:

public static void displayinorderofoccurrence(         map<string, wordoccurrence> words) {      list<wordoccurrence> orderedbyoccurrence = new arraylist<>();      // sort     (map.entry<string, wordoccurrence> entry : words.entryset()) {         wordoccurrence wo = entry.getvalue();          // initialize list on first round         if (orderedbyoccurrence.isempty()) {             orderedbyoccurrence.add(wo);         } else {              (int = 0; < orderedbyoccurrence.size(); i++) {                 if (wo.compareto(orderedbyoccurrence.get(i)) > 0) {                     orderedbyoccurrence.add(i, wo);                     break;                 } else if (i == orderedbyoccurrence.size() - 1) {                     orderedbyoccurrence.add(wo);                     break;                 }             }         }     }      // display     (wordoccurrence wo : orderedbyoccurence) {         system.out.println(wo);     } } 

running above code using following test data:

 potato; orange. banana; apple, apple; potato. potato. 

will produce output:

 apple, occurrences: 2, on rows [2] banana, occurrences: 1, on rows [2] orange, occurrences: 1, on rows [1] potato, occurrences: 3, on rows [1, 2, 3]  potato, occurrences: 3, on rows [1, 2, 3] apple, occurrences: 2, on rows [2] banana, occurrences: 1, on rows [2] orange, occurrences: 1, on rows [1] 

Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -