v2.2.0 Bayes + LSI + kNN + TF-IDF + Logistic Regression

Text classification for Ruby made simple

Bayesian, LSI, kNN, TF-IDF, and Logistic Regression algorithms for categorizing documents, detecting spam, analyzing sentiment, and building semantic search.

See it in action

Train a Classifier
 require 'classifier'

classifier = Classifier::Bayes.new 'Spam', 'Ham'

# Train with keyword arguments
classifier.train(spam: "Buy cheap viagra now!!!")
classifier.train(ham: "Meeting tomorrow at 3pm")

# Or batch train multiple examples
classifier.train(
  spam: ["You won $1M!", "Free money!"],
  ham: ["Project update", "Lunch tomorrow?"]
) 
Create categories and train with example text
Classify Text
 classifier.classify "Claim your free prize now"
# => "Spam"

classifier.classify "Quarterly review scheduled"
# => "Ham"

# Get log-probability scores (less negative = more likely)
classifier.classifications "Limited time offer"
# => {"Spam" => -10.5, "Ham" => -10.2} 
Get the best category for new text
Semantic Search with LSI
 lsi = Classifier::LSI.new

# Add documents with categories
lsi.add("Ruby" => "Ruby programming language")
lsi.add("Python" => "Python snake reptile")
lsi.add("Ruby" => "Rails web framework")

lsi.search "web development with Ruby"
# => ["Rails web framework", "Ruby programming..."] 
Find documents by meaning, not just keywords
k-Nearest Neighbors
 knn = Classifier::KNN.new(k: 3)

knn.add(
  spam: ["Free prize winner", "You won free money"],
  ham: ["Meeting at 3pm", "Review doc"]
)

result = knn.classify_with_neighbors "Claim your free prize"
result[:category]    # => "spam"
result[:confidence]  # => 0.67 
Instance-based classification with interpretable results
TF-IDF Vectorizer
 tfidf = Classifier::TFIDF.new

tfidf.fit([
  "Ruby programming language",
  "Python programming language",
  "Dogs are great pets"
])

vector = tfidf.transform "Ruby programming"
# => {:rubi => 0.80, :program => 0.61} 
Transform text into weighted feature vectors
Logistic Regression
 lr = Classifier::LogisticRegression.new([:spam, :ham])

lr.train(
  spam: ["Buy now!", "Free money!"],
  ham: ["Meeting at 3pm", "Project update"]
)

# Probabilities always sum to 1.0
lr.probabilities "Claim your prize"
# => {"spam" => 0.37, "ham" => 0.63} 
Well-calibrated probabilities for confident decisions

Start in seconds

$ gem install classifier

Or add to your Gemfile: gem 'classifier'