beginner

Save and Load Classifiers

Persist trained classifiers to disk and reload them for production use.

Save and Load Classifiers

Training a classifier takes time. You don’t want to retrain every time your application starts. In this tutorial, you’ll learn how to save trained classifiers and load them back.

What You’ll Learn

Saving classifiers to files
Loading pre-trained classifiers
Using different storage backends
Building a reusable model manager

The Problem

Without persistence, you’d need to retrain on every app restart:

# This runs every time your app starts - slow!
classifier = Classifier::Bayes.new 'Positive', 'Negative'
1000.times do |i|
  classifier.train(positive: training_data[i])
end

With persistence, you train once and load instantly:

# Fast - loads pre-trained model
classifier = Classifier::Bayes.load(storage: storage)

Basic Save and Load

Create train_and_save.rb:

require 'classifier'

# Create and train the classifier
classifier = Classifier::Bayes.new 'Tech', 'Sports', 'Politics'

classifier.train(tech: "New JavaScript framework released today")
classifier.train(tech: "Apple announces new MacBook Pro")
classifier.train(tech: "Python 4.0 features announced")

classifier.train(sports: "Lakers win championship game")
classifier.train(sports: "World Cup finals draw huge crowd")
classifier.train(sports: "Tennis star wins grand slam")

classifier.train(politics: "Senate passes new legislation")
classifier.train(politics: "Election results announced")
classifier.train(politics: "New policy affects healthcare")

# Configure file storage
classifier.storage = Classifier::Storage::File.new(path: "news_classifier.json")

# Save to disk
classifier.save

puts "Classifier saved to news_classifier.json"

Now create load_and_use.rb:

require 'classifier'

# Configure the same storage
storage = Classifier::Storage::File.new(path: "news_classifier.json")

# Load the pre-trained classifier
classifier = Classifier::Bayes.load(storage: storage)

# Use it immediately - no training needed!
test_articles = [
  "Google releases new AI model",
  "Football team signs star player",
  "Congress debates new bill",
]

test_articles.each do |article|
  category = classifier.classify(article)
  puts "#{category}: #{article}"
end

Run the scripts:

ruby train_and_save.rb
# => Classifier saved to news_classifier.json

ruby load_and_use.rb
# => Tech: Google releases new AI model
# => Sports: Football team signs star player
# => Politics: Congress debates new bill

Building a Model Manager

For production apps, wrap persistence in a manager class. Create model_manager.rb:

require 'classifier'

class ModelManager
  def initialize(models_dir: "models")
    @models_dir = models_dir
    Dir.mkdir(models_dir) unless Dir.exist?(models_dir)
  end

  def save(classifier, name)
    path = File.join(@models_dir, "#{name}.json")
    classifier.storage = Classifier::Storage::File.new(path: path)
    classifier.save
    puts "Saved #{name} to #{path}"
  end

  def load(name, type: :bayes)
    path = File.join(@models_dir, "#{name}.json")

    unless File.exist?(path)
      raise "Model '#{name}' not found at #{path}"
    end

    storage = Classifier::Storage::File.new(path: path)

    case type
    when :bayes
      Classifier::Bayes.load(storage: storage)
    when :lsi
      Classifier::LSI.load(storage: storage)
    end
  end

  def exists?(name)
    File.exist?(File.join(@models_dir, "#{name}.json"))
  end

  def list
    Dir.glob(File.join(@models_dir, "*.json")).map do |path|
      File.basename(path, ".json")
    end
  end
end

Use it in your application:

require_relative 'model_manager'

manager = ModelManager.new(models_dir: "trained_models")

# First run: train and save
unless manager.exists?("sentiment")
  classifier = Classifier::Bayes.new 'Positive', 'Negative'

  classifier.train(positive: "I love this product!")
  classifier.train(positive: "Excellent service")
  classifier.train(positive: "Highly recommended")

  classifier.train(negative: "Terrible experience")
  classifier.train(negative: "Waste of money")
  classifier.train(negative: "Very disappointed")

  manager.save(classifier, "sentiment")
end

# All subsequent runs: just load
sentiment = manager.load("sentiment")

puts sentiment.classify("This is amazing!")
# => Positive

puts sentiment.classify("Total garbage")
# => Negative

Checking for Unsaved Changes

The classifier tracks whether you have unsaved changes:

classifier = Classifier::Bayes.new :a, :b
classifier.storage = Classifier::Storage::File.new(path: "model.json")

classifier.dirty?
# => false

classifier.train(a: "new training data")
classifier.dirty?
# => true

classifier.save
classifier.dirty?
# => false

Use this to prompt users or auto-save:

at_exit do
  if classifier.dirty?
    print "You have unsaved changes. Save before exit? (y/n) "
    classifier.save if gets.chomp.downcase == 'y'
  end
end

Using Memory Storage for Tests

In tests, use memory storage to avoid file system dependencies:

require 'minitest/autorun'
require 'classifier'

class ClassifierTest < Minitest::Test
  def setup
    @storage = Classifier::Storage::Memory.new
    @classifier = Classifier::Bayes.new 'Spam', 'Ham'
    @classifier.storage = @storage
  end

  def test_persistence_roundtrip
    @classifier.train(spam: "Buy now!")
    @classifier.train(ham: "Meeting at 3pm")
    @classifier.save

    # Load into a new instance
    loaded = Classifier::Bayes.load(storage: @storage)

    assert_equal "Spam", loaded.classify("Buy cheap stuff")
    assert_equal "Ham", loaded.classify("See you at the meeting")
  end

  def test_dirty_tracking
    refute @classifier.dirty?

    @classifier.train(spam: "test")
    assert @classifier.dirty?

    @classifier.save
    refute @classifier.dirty?
  end
end

Complete Example: Persistent Sentiment API

Here’s a complete example combining everything. Create sentiment_api.rb:

require 'classifier'
require 'json'

class SentimentAPI
  MODEL_PATH = "sentiment_model.json"

  def initialize
    @storage = Classifier::Storage::File.new(path: MODEL_PATH)

    if File.exist?(MODEL_PATH)
      @classifier = Classifier::Bayes.load(storage: @storage)
      puts "Loaded existing model"
    else
      @classifier = Classifier::Bayes.new 'Positive', 'Negative', 'Neutral'
      @classifier.storage = @storage
      seed_training_data
      @classifier.save
      puts "Created and saved new model"
    end
  end

  def analyze(text)
    scores = @classifier.classifications(text)
    category = @classifier.classify(text)

    {
      text: text,
      sentiment: category.downcase,
      confidence: calculate_confidence(scores, category)
    }
  end

  def train(text, sentiment)
    @classifier.train(sentiment.downcase.to_sym => text)
    puts "Trained: #{sentiment}"
  end

  def save
    @classifier.save
    puts "Model saved"
  end

  def stats
    {
      categories: ['Positive', 'Negative', 'Neutral'],
      has_unsaved_changes: @classifier.dirty?
    }
  end

  private

  def seed_training_data
    # Positive
    @classifier.train(positive: "I love this!")
    @classifier.train(positive: "Excellent work")
    @classifier.train(positive: "This is fantastic")
    @classifier.train(positive: "Great job, well done")
    @classifier.train(positive: "Amazing results")

    # Negative
    @classifier.train(negative: "This is terrible")
    @classifier.train(negative: "I hate it")
    @classifier.train(negative: "Worst experience ever")
    @classifier.train(negative: "Completely disappointed")
    @classifier.train(negative: "Total waste of time")

    # Neutral
    @classifier.train(neutral: "It's okay I guess")
    @classifier.train(neutral: "Nothing special")
    @classifier.train(neutral: "Average performance")
    @classifier.train(neutral: "Could be better or worse")
    @classifier.train(neutral: "No strong feelings")
  end

  def calculate_confidence(scores, category)
    max_score = scores[category]
    other_scores = scores.values.reject { |s| s == max_score }
    gap = max_score - other_scores.max

    # Convert log probability gap to percentage
    confidence = (1 - Math.exp(-gap.abs)) * 100
    confidence.round(1)
  end
end

# Interactive demo
if __FILE__ == $0
  api = SentimentAPI.new

  loop do
    print "\nEnter text (or 'quit'): "
    input = gets.chomp

    break if input.downcase == 'quit'

    result = api.analyze(input)
    puts "Sentiment: #{result[:sentiment]} (#{result[:confidence]}% confidence)"
  end

  api.save if api.stats[:has_unsaved_changes]
end

Run it:

ruby sentiment_api.rb

# => Loaded existing model (or "Created and saved new model" on first run)
# Enter text (or 'quit'): This product is amazing!
# Sentiment: positive (87.3% confidence)
# Enter text (or 'quit'): I regret buying this
# Sentiment: negative (72.1% confidence)
# Enter text (or 'quit'): quit

Next Steps

Persistence Framework Guide - Deep dive into storage backends
Build a Spam Filter - Another practical classifier project
LSI Basics - Persist semantic search indexes