v2.2.0 • Bayes + LSI + kNN + TF-IDF + Logistic Regression
Text classification for Ruby made simple
Bayesian, LSI, kNN, TF-IDF, and Logistic Regression algorithms for categorizing documents, detecting spam, analyzing sentiment, and building semantic search.
Bayesian Classification
Train categories with examples, classify new text with probability scores.
Semantic Search
Find related documents using Latent Semantic Indexing and SVD.
k-Nearest Neighbors
Instance-based classification with interpretable neighbor results.
TF-IDF Vectorizer
Transform text into weighted feature vectors for ML and search.
Logistic Regression
Well-calibrated probabilities with interpretable feature weights.
Streaming Training
Train on multi-GB datasets with memory-efficient streaming and progress tracking.
See it in action
Train a Classifier
require 'classifier' classifier = Classifier::Bayes.new 'Spam', 'Ham' # Train with keyword arguments classifier.train(spam: "Buy cheap viagra now!!!") classifier.train(ham: "Meeting tomorrow at 3pm") # Or batch train multiple examples classifier.train( spam: ["You won $1M!", "Free money!"], ham: ["Project update", "Lunch tomorrow?"] )
Create categories and train with example text
Classify Text
classifier.classify "Claim your free prize now" # => "Spam" classifier.classify "Quarterly review scheduled" # => "Ham" # Get log-probability scores (less negative = more likely) classifier.classifications "Limited time offer" # => {"Spam" => -10.5, "Ham" => -10.2}
Get the best category for new text
Semantic Search with LSI
lsi = Classifier::LSI.new # Add documents with categories lsi.add("Ruby" => "Ruby programming language") lsi.add("Python" => "Python snake reptile") lsi.add("Ruby" => "Rails web framework") lsi.search "web development with Ruby" # => ["Rails web framework", "Ruby programming..."]
Find documents by meaning, not just keywords
k-Nearest Neighbors
knn = Classifier::KNN.new(k: 3) knn.add( spam: ["Free prize winner", "You won free money"], ham: ["Meeting at 3pm", "Review doc"] ) result = knn.classify_with_neighbors "Claim your free prize" result[:category] # => "spam" result[:confidence] # => 0.67
Instance-based classification with interpretable results
TF-IDF Vectorizer
tfidf = Classifier::TFIDF.new tfidf.fit([ "Ruby programming language", "Python programming language", "Dogs are great pets" ]) vector = tfidf.transform "Ruby programming" # => {:rubi => 0.80, :program => 0.61}
Transform text into weighted feature vectors
Logistic Regression
lr = Classifier::LogisticRegression.new([:spam, :ham]) lr.train( spam: ["Buy now!", "Free money!"], ham: ["Meeting at 3pm", "Project update"] ) # Probabilities always sum to 1.0 lr.probabilities "Claim your prize" # => {"spam" => 0.37, "ham" => 0.63}
Well-calibrated probabilities for confident decisions
Start in seconds
$ gem install classifier
Or add to your Gemfile: gem 'classifier'