SOLUTION: In this assignment, you will write a program that will analyze the sentiment (positive or negative) of a sentence based on the words it contains by implementing methods that use the List, Set, and Map interfaces from the Java Collections Framework. Learning Objectives In completing this assignment, you will: Become familiar with the methods in the java.util.List, java.util.Set, and java.util.Map interfaces Continue working with abstract data types by us

In this assignment, you will write a program that will analyze the sentiment (positive or negative) of a sentence based on the words it contains by implementing methods that use the List, Set, and Map interfaces from the Java Collections Framework. Learning Objectives In completing this assignment, you will: Become familiar with the methods in the java.util.List, java.util.Set, and java.util.Map interfaces Continue working with abstract data types by using only the interface of an implementation Apply what you have learned about how lists, sets, and maps work Get a better understanding of the difference between lists and sets Demonstrate that you can use lists, sets, and maps to solve real-world problems Gain experience writing Java code that reads an input file Background Sentiment analysis is a task from the field of computational linguistics that seeks to determine the general attitude of a given piece of text. For instance, we would like to have a program that could look at the text “This assignment was joyful and a pleasure” and realize that it was a positive statement while “It made me want to pull out my hair” is negative. For more on sentiment analysis in the context of this assignment see the supplemental document provided along with these directions. Definitions for this assignment: Valid Line: (in the context of reading the input corpus) a line starting with an optional sign character (- or +) and single digit representing a valid score (integers from -2 to 2, inclusive), followed by a single whitespace character, followed by a statement. Statement: a string that may be empty and may contain 0 or more whitespace separated tokens each of which may be a word. Sentence: An Object of type Sentence contains a text String that is the textual statement, as well as an integer sentiment score. Token: All of the non-whitespace characters between whitespace characters or at the beginning or end of a sentence/statement. word: A token starting with one letter. Any additional characters may be letters or any other non-whitespace character. Letter: any character for which the method java.lang.Character.isLetter returns “true”. https://docs.oracle.com/javase/8/docs/api/iava/lang/Character.html#isLetter-char- Whitespace: any character for which Character.isWhitespace returns true. https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-char- ObservationTally: An accumulator object of the Observation Tally class (ObservationTally.java) the word’s accumulated context scores from all of its appearances which have been analyzed SO far. An ObservationTally’s count is the total number of times the word has been seen so far, and its total is the sum of every occurrence of the word’s sentiment score seen so far. Getting Started Download the starter code files: Sentence java, Observation Tally. java and Analyzer.java All tasks for the assignment should be written in Analyzer.java; do not modify the other two (grading will use the original versions, not yours). You may also download the reviews.txt, which should be placed in the base directory of your project, unlike the java files, which should be placed in the source directory of your project. (Later, Analyzer.java should be uploaded to your submit folder on codio). This file can be used for testing your readF File method. Activities General note: for each activity the method should return a sensible output even if the input is invalid. For methods that return a collection of some sort, the return for bad input should just be an empty collection. For methods with numerical return types, the default return value should be 0. Bad items in the input, such as null strings or non-word tokens in non-null strings, should be ignored, and the method should continue processing subsequent valid items. When processing words, they should be converted to lower case to simplify case-insensitive comparison. Tokens and words should not be altered in any other way. 1. Implement Analyzer.readFile This method takes as input a (nullable) filename, reads the given file from the filesystem, and returns a non-null List of Sentence objects parsed from the valid lines of the file in the order in which they are encountered. Invalid lines should be ignored and not entered into the output list. If if the input filename is null or the file cannot be opened for reading, this method should return an empty List. For the return object you are free to select from any class that implements java.util.List. For an explanation of how to read a file line by line see: https://docs.oracle.com/javase/tutorial/essential/io/file.html#textfiles Valid lines are defined above, and in regular expression syntax the exact definition of the line is: “^(?[+-]?[0-2])lls(?.*)$” The documentation for this regular expression syntax may be found here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html Note: You are not required to use regular expressions (nor are you prohibited from using them) for this assignment; the expression here is provided as a formal exact definition. If you have any questions about what is or is not valid, please test with that expression before asking. You can test regular expressions on regex101.com and in Java using ishell. Note: the first whitespace character on the line is the separator between the score and the text. That character should not be considered part of either the score or the text. String split using a limit of 2 (i.e. line.split(“\\s”,2);) is one easy way to separate the line into those two components. https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#split-java.lang.String-int- For a valid line such as: 2 I am learning a lot then the score field of the Sentence object should be set to 2, and the text field should be: “I am learning a lot .” Evaluation will be based on exact String matching; do not alter the statement text. 2. Implement Analyzer.allOccurrences This method takes as input a (nullable) List of (nullable) Sentence objects and outputs a (non-null) List containing every word, converted to lowercase, encountered in the input List in the order in which it was encountered. Null Sentence objects and invalid words (i.e. tokens not starting with a letter) should both be ignored. If the input parameter is null, the output should be an empty List. You may select any implementation of java.util.L for the output. 3. Implement Analyzer.uniqueWords This method is identical to Analyzer.allOccurrences, except that the output should be in the form of a Set (i.e. without duplicates). You may select any implementation of java.util Set for the output. 4. Implement Analyzer.wordTallies This method takes as input a (nullable) List of (nullable) Sentence objects and outputs a non-null Map, whose keys are the (valid, lowercased) words in each input Sentence and whose values are the final context scores, represented by ObservationTally, for that word. If the input List of Sentences is null or empty, the method should return an empty Map. If a Sentence object in the input List is null or if the text of a Sentence is null, this method should ignore it and continue processing the remaining Sentences. You may return any implementation of java.util.Map. Note: if a word appears in multiple sentences or multiple times in the same sentence, its corresponding ObservationTally object should accrue multiple scores, one for each occurrence. Note: While your output keys should be lowercased, do not assume that the strings in the Sentence objects have already been converted to lowercase. Do not make any other alterations or assumptions or interpretations about the tokens or words. Hint: if you use String.split to tokenize the text of the Sentence, keep the pattern as simple as possible. Consult the Java documentation for help with this: https://docs.oracle.com/javase/8/docs/api/java/lang/String.html 5. Implement Analyzer.calculateScores This method takes a (nullable) Map from (non-null) word to (non-null) Observation Tally and outputs a non-null Map with the original word as key and the word’s average sentiment score as value. If the input Map is null, return an empty Map. You may return any implementation of java.util.Map. For this method, use the ObservationTally’s calculateScore method to get the average sentiment score for that word from its previously recorded context scores, and then place the text of the word (as key) and calculated score (as value) in the new Map. 6. Implement Analyzer.calculateSentenceScore This method takes as input a (nullable) Map from (non-null) words to (non-null) sentiment scores as well as an arbitrary statement text and, using the sentiment scores for each word in this Map, outputs the sentiment score for the given statement text, which is the arithmetic mean score of all its (valid) words. Note: each occurrence of a word counts towards the mean (i.e. do not filter out duplicates). Note: you will need to tokenize/split/filter the sentence to its valid words, as you did previously. Your calculateSentenceScore method must be case insensitive. Recall that, to ensure case insensitivity, you normalized all words by converting them to lowercase. Accordingly, for this method, you may assume that the keys given in the input Map are all lowercase words. If the input Map is null or empty, or if the input sentence is null or empty or does not contain any valid words, this method should return 0. General Hints Documentation about the methods in the List, Set, and Map interfaces are available as part of the Java API docs: https://docs.oracle.com/javase/8/docs/api/java/util/List.htm https://docs.oracle.com/javase/8/docs/api/java/util/Set.html https://docs.oracle.com/javase/8/docs/api/java/util/Map.html Refer to this documentation if you need help understanding the methods that are available to you. In implementing this program, we recommend that you implement and test each of the four methods individually. Each method is required to tolerate partially or entirely invalid input and return valid output. We also recommend you test the entire program using the main method in Analyzer.java. Be sure to specify the name of the input file as the argument to main. Before You Submit Please be sure that: your Analyzer class is in the default package, i.e. there is no “package” declaration at the top of the source code your Analyzer class compiles and you have not changed the signatures of any of the four methods you implemented you did not add other methods with the same name as the existing methods in Analyzer.java you have not created any additional .java files and have not made any changes to Sentence java or Observation Tally.java (you do not need to submit these files) any new methods you added have unique names that do not conflict with the existing methods, even if the input arguments are different you filled in and signed the academic integrity statement at the top of Analyzer.java Optional Challenges Use try-with-resources to simplify file reading Read the Oracle tutorial on try-with-resources: https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html, Try to make use of this Java construct in your code, but make sure you understand it well. It’s easy to misuse, leading to unexpected behavior in your code. Use lambdas and the Stream API to simplify your code If you’re not familiar with lambdas or streams (sometimes called sequences), this challenge will have a huge learning curve, but it will also introduce you to a completely different model of programming that is sometimes used in modern Java, and very often used in programming languages like Scala (which also compiles to the JVM), JavaScript, and others. The high-level goal of this challenge is to re-write your code into a style that uses almost no loops or other imperative constructs by making each of the four required functions into a pipeline of aggregate operations over a stream pipeline. When done consistently and with reasonably good style, the resulting code will likely have no more than 2 loops across the entire file. (These might be actual explicit for / while loops, or they could be . forEach functions.) Start here with learning about lambda expressions: https://dev.java/learn/lambda-expressions/ Then move on to learning about the Stream API: https://dev.java/learn/the-stream-api/