Exercises Download the Plain Text UTF-8 file “The Fairy Tales of Charles Perrault by Charles Perrault” from https://www.gutenberg.org/ebooks/29021 . In the following this file is referred to as the text. Question 1: Functional Programming Create the following functions, and any other auxiliary function you consider necessary, using the Functional Programming style: • read_lines_in_text(fname): read in memory a text file with filename fname and output an iterable over non empty text line

Exercises
Download the Plain Text UTF-8 file “The Fairy Tales of Charles Perrault by Charles Perrault” from https://www.gutenberg.org/ebooks/29021 . In the following this file is referred to as the text.
Question 1: Functional Programming
Create the following functions, and any other auxiliary function you consider necessary, using the Functional Programming style:
• read_lines_in_text(fname): read in memory a text file with filename fname and output an iterable over non empty text lines.
• define a collection of forbidden words that contain at least the following strings: ‘Illustration’,’*’, ‘#’,’_facing_’,’_page’ .
• define a function to filter lines when they contain one of the words in the
collection of forbidden words.
• define a function to determine if a line is the title of a story, where a line is a title if it starts and ends with the underscore character.
• split_text(start_definition_func, text_iterator): use the function
start_definition_func passed as the first argument to split a stream of lines in stories and output a list of stories where each story is a list of lines. Make sure that the first line is the title of the story.
• use function composition to create a function that reads a text file, filters lines and output a list of stories
• define a collection of forbidden titles that contain at least the following strings:’_The Moral_’, ‘_Another_’ .
• define a function to filter stories when their title is one of the words in the collection of forbidden titles.
• define a function to filter stories when their size, expressed as the number of lines, is not within two user defined values for min_num_lines and max_num_lines .
• use function composition to create a function that can filter stories if they have a forbidden title or they do not satisfy the size constraints
• transform_into_sentences(story): given a story, i.e. a list of strings, convert the text to lowercase and transform the story in a list of sentences, where a sentence is a string delimited by the full stop symbol . .
• use the pipeline function to create a list of stories from the fairy tale text, where the lines with the forbidden words are excluded, that do not have a title that is forbidden, that have between 10 and 200 lines and that are organized as a list of sentences.
[40 marks]
Question 2: Numpy
Create the following functions, and any other auxiliary function you consider necessary, using the functionalities offered by the Numpy library:
• code(item, nbits=10): convert a tuple of strings into an integer in the range [0, 2 b ] where b is the number of bits. You may use the built in function hash(object) .
• vectorize_line(line, k=3, nbits=10): convert a string into a 1 dimensional numpy array of size 2 b . The conversion should be performed as follows: extract all possible sub sequences of k words from the string in input; convert each k- word tuple into an integer using the previous function code ; use the resulting integer p as the position index; count the number of occurrences of the resulting integer in the input line; this is the value in position p in the returned array.
• vectorize_story(story, k=3, nbits=10): convert a story , which is a list of n strings, into a 2 dimensional numpy array of size n × 2 b using vectorize_line .
• convert all stories in text in a list of 2 dimensional numpy arrays and assign it to the variable mtxs .
[35 marks]
Question 3: Data processing and visualization

Create the following functions, and any other auxiliary function you consider necessary, using when needed the libraries Matplotlib and Numpy:
• pairwise_distance(vec1, vec2): compute the Euclidean distance between two 1 dimensional Numpy arrays v, u as d(v, u) = √ ∑ i (v i − u i ) 2 where v i indicates the entry of order i-th in the array.
• pairwise_distances(vec, mtx): compute a 1 dimensional Numpy array containing the distance of vec from each row of mtx which is a 2 dimensional numpy array of size n × 2 b
• mean(mtx): compute the mean of a 2 dimensional numpy array of size n × 2 b as a 1 dimensional numpy array of size 2 b .
• plot_hists(dist_vecs): plot the histograms of a list of 1 dimensional numpy arrays (one histogram per array) on the same figure.
• plot_comparison(mtxs): plot the comparison between each non redundant and distinct pair of stories; a single comparison between 2 stories s i , s j is the output of plot_hists of the following 3 arrays: array 1) the distance of each sentence in story s i from the average sentence in story s i , array 2) the distance of each sentence in story s j from the average sentence in story s j and array 3) the distance of each sentence in story s j from the average sentence in story s i .
• execute the plot_comparison function for stories vectorized with parameters k=2 and nbits=15 .
[20 marks]
Question 4: Analysis
Describe, using up to a maximum of 200 words, what information can be deduced from the plots obtained by plot_comparison . Discuss also what changes when the comparison is run on the same stories but vectorized with parameters k=1 and k=3 and nbits=15 .
[5 marks]

CLAIM YOUR 30% OFF TODAY

X
Don`t copy text!
WeCreativez WhatsApp Support
Our customer support team is here to answer your questions. Ask us anything!
???? Hi, how can I help?