v2.1.0 Bayes + LSI

Text classification for Ruby made simple

Bayesian classification and Latent Semantic Indexing for categorizing documents, detecting spam, analyzing sentiment, and building semantic search.

See it in action

Train a Classifier
 require 'classifier'

classifier = Classifier::Bayes.new 'Spam', 'Ham'

classifier.train_spam "Buy cheap viagra now!!!"
classifier.train_spam "You've won $1,000,000!"
classifier.train_ham "Meeting tomorrow at 3pm"
classifier.train_ham "Project deadline reminder" 
Create categories and train with example text
Classify Text
 classifier.classify "Claim your free prize now"
# => "Spam"

classifier.classify "Quarterly review scheduled"
# => "Ham"

# Get probability scores
classifier.classifications "Limited time offer"
# => {"Spam" => -4.2, "Ham" => -8.7} 
Get the best category for new text
Semantic Search with LSI
 lsi = Classifier::LSI.new

lsi.add_item doc1, "Ruby programming language"
lsi.add_item doc2, "Python snake reptile"
lsi.add_item doc3, "Rails web framework Ruby"

lsi.search "web development with Ruby"
# => [doc3, doc1]  # Semantically related results 
Find documents by meaning, not just keywords
Find Related Documents
 lsi.find_related doc1, 3
# => [doc3, ...]  # Documents similar to doc1

# Classify by learned categories
lsi.add_item "Ruby gem tutorial", :ruby
lsi.add_item "Python pip package", :python

lsi.classify "Installing bundler gem"
# => :ruby 
Discover similar content automatically

Start in seconds

$ gem install classifier

Or add to your Gemfile: gem 'classifier'