Word Cloud for Pride and Prejudice |
Your program will
- Read in a text file, one word at a time.
- Clean up the word by throwing away any non-letter characters, like ".,!? and turning it to lowercase.
- If the word is not a stop word then keep track of how many times it appears on the file.
- Printout the number of unique words you found and the top 20 most frequent words found along with the number of times they appear in the file.
Here is the output of the program for Pride and Prejudice
There are 6898 unique words. mr 783 elizabeth 594 such 393 darcy 371 mrs 343 much 328 more 326 bennet 293 miss 283 one 266 jane 263 bingley 257 know 239 before 229 herself 224 though 221 never 220 soon 216 well 212 think 211
and here is for Sense and Sensibility
There are 7531 unique words. elinor 616 mrs 525 marianne 488 more 403 such 359 one 317 much 287 herself 249 time 237 now 230 know 228 dashwood 224 though 213 sister 213 edward 210 miss 209 well 209 think 205 mother 200 before 198
The list of stop words you will use is
private static final String[] stopWordsList = { "a","able","about","after","all","almost","also","am","among","an", "and","any","are","as","at","be","because","been","but","by","can", "cannot","could","dear","did","do","does","either","else","ever", "every","for","from","get","got","had","has","have","he","her","hers", "him","his","how","however","i","if","in","into","is","it","its","just", "least","let","like","likely","may","me","might","most","must","my", "neither","no","nor","not","of","off","often","on","only","or","other", "our","own","rather","said","say","says","she","should","since","so", "some","than","that","the","their","them","then","there","these","they", "this","tis","to","too","twas","us","very","wants","was","we","were","what", "when","where","which","while","who","whom","why","will","with", "would","yet","you","your"};
Your program will use a
HashMap
to keep track of the counts. You might also want to use the a HashSet
.This homework is due Monday, 9 April @noon in the dropbox.cse.sc.edu.
No comments:
Post a Comment