Natural Language Processing and Text Analytics: Mandatory Assignment 1
Student ID's: 141042, 119577, 141840, 141044
Part 1: NLP Assignment
Question 1: Language Modelling
Creating Language Models
Finding values for λ1, λ2, λ3
Conclusion: The interpolation strategy with the highest probablity is the one based purely on the trigram model (lambda set 4).
Generate Random Sentences
Conclusion: The randomly created text segment from the trigram model is the most syntactically correct. In comparison, creating 100 words using the unigram ("bag of words") model is the least applicable, since the words are comepletely out of context and don't make sense in a sentence.
The remainder of question 2 is done by hand.