beginner
Getting Started
Install the classifier gem and make your first classification in under 5 minutes.
Getting Started with Ruby Classifier
This tutorial will get you up and running with the classifier gem in under 5 minutes. By the end, you’ll have a working text classifier.
Prerequisites
- Ruby 2.7 or higher
- Bundler (optional, but recommended)
Installation
Install the gem directly:
gem install classifier
Or add it to your Gemfile:
gem 'classifier'
Then run:
bundle install
Your First Classifier
Let’s build a simple spam detector. Create a file called spam_detector.rb:
require 'classifier'
# Create a classifier with two categories
classifier = Classifier::Bayes.new 'Spam', 'Ham'
# Train it with some examples
classifier.train_spam "Get rich quick! Buy now!"
classifier.train_spam "You've won a million dollars!"
classifier.train_spam "Click here for free stuff"
classifier.train_ham "Meeting tomorrow at 10am"
classifier.train_ham "Please review the attached document"
classifier.train_ham "Thanks for your email"
# Now classify some new text
puts classifier.classify "Claim your free prize today!"
# => "Spam"
puts classifier.classify "See you at the meeting"
# => "Ham"
Run it:
ruby spam_detector.rb
Understanding the Output
The classify method returns the most likely category for the given text. Under the hood, the Bayesian classifier:
- Tokenizes the text into words
- Stems each word to its root form
- Calculates the probability of each category
- Returns the category with the highest probability
Getting Probability Scores
Want to see the actual scores? Use classifications:
scores = classifier.classifications "Limited time offer!"
puts scores
# => {"Spam" => -5.2, "Ham" => -9.8}
Higher (less negative) scores indicate higher probability. The classifier returns log probabilities to avoid numerical underflow with large datasets.
Next Steps
Now that you have a basic classifier working, explore these topics:
- Build a Complete Spam Filter - A production-ready email classifier
- Bayes Basics Guide - Deep dive into how Bayesian classification works
- LSI for Semantic Search - Find related documents using meaning, not just keywords
Quick Reference
# Create classifier
classifier = Classifier::Bayes.new 'Category1', 'Category2'
# Train (two equivalent ways)
classifier.train 'Category1', "example text"
classifier.train_category1 "example text"
# Classify
result = classifier.classify "text to classify"
# Get scores
scores = classifier.classifications "text to classify"