Brendan Herger
 

Previous Work

A look at publicly available code I've written, talks I've given, and projects I've lead


Automated Movie Spoiler Tagging

Review Code on GitHUB

Spoilers are a complicated concept (see this guideline), but avoiding movie and tv show spoilers is a common goal. 

With this in mind, I build a model on that can determine in message board posts contain spoilers or not, using data I pulled from reddit. This model robustly handles edge-cases and new concepts (such as speculation and previously unseen characters), while generalizing well. 

 
spoiler_starwars

Machine Learning for Class Imbalances & Adversaries

▶ Watch on Youtube

There are many areas of applied Machine Learning which require models optimized for rare occurrences (i.e. class imbalance), as well as users actively attempting to subvert the system (i.e. adversaries).

The approaches discussed will include ensemble models, deep learning, genetic algorithms, outlier detection via dimensionally reduction (PCA and neural network auto-encoders), time-decay weighting, and Synthetic Minority Over-sampling Technique (SMOTE sampling).

 

 
Screen Shot 2017-10-28 at 11.15.26 AM.png

Natural Language Processing (NLP) with Deep Learning

Review code on GitHub

There aren't great batteries included examples for modeling text with deep learning, so I've built out this repo to contain starter code for:

  • Text processing: Processing text to be utilized with keras (text pre-processing, converting to indices, padding)

  • Pre-trained embedding: Using a pre-trained text embedding (GoogleNews 300) with keras (translating words to a point in \mathbb{R}^{300})

  • Convolutional architecture: Modeling text with a convolutional architecture (functionally similar to Ngrams)

  • RNN architecture: Modeling text with a Recurrent Neural Net (RNN) architecture (functionally similar to a rolling window)


 
Screen Shot 2017-10-28 at 11.47.40 AM.png

Resume Parser

Review code on GitHub

A utility to make handling many resumes easier by automatically pulling contact information, required skills and custom text fields. These results are then surfaced as a convenient summary CSV.

 

This started as a side project in grad school, but has become a community project used at companies across the globe.