Machine learning


The Automatic Classification of Noh Chant Books with Machine Learning

Master Thesis

Three machine learning models were used to identify components specific to Noh chant books. Using document analysis models, all musical notations were extracted from document images. These were further processed with connected component analysis to separate them, and each component was classified using a symbol classification model. The last process is binary classification, which uses the outcome of the symbol classification model to identify Noh chant books using a decision tree classifier.


Data Deletion in Machine Learning

We reproduced the results observed in the paper “Making AI Forget You: Data Deletion in Machine Learning”, which explores deletion-efficient clustering algorithms. We were able to improve baselines for K-means algorithm and implemented new method of date deletions using hierarchical density-based spatial clustering of applications with noise (HDBSCAN).


modified MNIST classification

We have implemented a model that is capable of detecting and identifying the numerically largest integer contained in an image. The training dataset includes 50,000 images, each containing three handwritten digits over a noisy background, as well as a corresponding label. The performance of four different models were compared when tasked with classifying this data: a modified VGGNet, ResNet50, ResNet50V2, ResNet101V2.


Reddit comments classification

We compared the performance of different classifiers tasked with predicting what community a specific Reddit comment came from. The data was a balanced set of comments coming from 20 different subreddits (Reddit community). The classifiers used for comparison were a Bernoulli Naive Bayes Model – that was implemented from scratch – and four models from the SciKit Learn library: random forest classifier model, voting classifier model, support vector machine model, and logistic regression model.