Persistence Framework

Save and load classifiers with pluggable storage backends.

Persistence Framework

The classifier gem provides a flexible persistence framework that lets you save and load trained classifiers using pluggable storage backends. Whether you need simple file storage, in-memory caching, or distributed storage like Redis, the API remains consistent.

Quick Start

require 'classifier'

# Create and train a classifier
classifier = Classifier::Bayes.new 'Spam', 'Ham'
classifier.train_spam "Buy cheap products now!"
classifier.train_ham "Meeting scheduled for tomorrow"

# Configure storage
classifier.storage = Classifier::Storage::File.new(path: "spam_filter.json")

# Save to storage
classifier.save

# Later, load it back
loaded = Classifier::Bayes.load(storage: classifier.storage)
loaded.classify "Limited time offer!"
# => "Spam"

Storage Backends

The gem includes two built-in storage backends, with a simple protocol for creating custom ones.

File Storage

Persist classifiers to JSON files on disk:

storage = Classifier::Storage::File.new(path: "/var/models/classifier.json")

classifier.storage = storage
classifier.save

# The file is human-readable JSON
File.read("/var/models/classifier.json")
# => {"type":"bayes","categories":{"Spam":{...},...}

File storage is ideal for:

  • Single-server deployments
  • Development and testing
  • Backup and versioning (commit models to git)
  • Serverless functions with mounted storage

Memory Storage

Keep classifiers in memory for testing or ephemeral use:

storage = Classifier::Storage::Memory.new

classifier.storage = storage
classifier.save

# Data persists only for the lifetime of the storage object
loaded = Classifier::Bayes.load(storage: storage)

Memory storage is ideal for:

  • Unit tests and integration tests
  • Caching layers
  • Ephemeral processing pipelines

The Storage API

Both Bayes and LSI classifiers share the same persistence API:

Saving

# Save to configured storage
classifier.storage = Classifier::Storage::File.new(path: "model.json")
classifier.save

# Or save directly to a file (legacy API)
classifier.save_to_file("model.json")

Loading

# Load with storage pre-configured
storage = Classifier::Storage::File.new(path: "model.json")
classifier = Classifier::Bayes.load(storage: storage)
classifier.storage  # => #<Classifier::Storage::File...>

# Or load directly from file (legacy API)
classifier = Classifier::Bayes.load_from_file("model.json")
classifier.storage  # => nil

Dirty Tracking

The classifier tracks whether it has unsaved changes:

classifier = Classifier::Bayes.new 'A', 'B'
classifier.dirty?
# => false

classifier.train_a "some text"
classifier.dirty?
# => true

classifier.save
classifier.dirty?
# => false

Reloading

Discard in-memory changes and reload from storage:

classifier.train_spam "new training data"
classifier.dirty?
# => true

# Safe reload - raises if there are unsaved changes
classifier.reload
# => raises Classifier::UnsavedChangesError

# Force reload - discards unsaved changes
classifier.reload!
classifier.dirty?
# => false

Creating Custom Storage Backends

Implement the Classifier::Storage::Base protocol to create custom backends:

class RedisStorage < Classifier::Storage::Base
  def initialize(redis:, key:)
    super()
    @redis = redis
    @key = key
  end

  def write(data)
    @redis.set(@key, data)
  end

  def read
    @redis.get(@key)
  end

  def delete
    @redis.del(@key)
  end

  def exists?
    @redis.exists?(@key)
  end
end

Use your custom backend:

require 'redis'

redis = Redis.new(url: ENV['REDIS_URL'])
storage = RedisStorage.new(redis: redis, key: "classifier:spam_filter")

classifier.storage = storage
classifier.save

Storage Protocol

Your storage class must implement these four methods:

MethodSignatureDescription
write(String) -> voidSave serialized classifier data
read() -> String?Load data, return nil if not found
delete() -> voidRemove stored data
exists?() -> boolCheck if data exists

Example: PostgreSQL Storage

class PostgresStorage < Classifier::Storage::Base
  def initialize(connection:, table: 'classifiers', id:)
    super()
    @conn = connection
    @table = table
    @id = id
  end

  def write(data)
    @conn.exec_params(
      "INSERT INTO #{@table} (id, data, updated_at) VALUES ($1, $2, NOW())
       ON CONFLICT (id) DO UPDATE SET data = $2, updated_at = NOW()",
      [@id, data]
    )
  end

  def read
    result = @conn.exec_params(
      "SELECT data FROM #{@table} WHERE id = $1",
      [@id]
    )
    result.ntuples > 0 ? result[0]['data'] : nil
  end

  def delete
    @conn.exec_params("DELETE FROM #{@table} WHERE id = $1", [@id])
  end

  def exists?
    result = @conn.exec_params(
      "SELECT 1 FROM #{@table} WHERE id = $1",
      [@id]
    )
    result.ntuples > 0
  end
end

Example: S3 Storage

class S3Storage < Classifier::Storage::Base
  def initialize(bucket:, key:, client: Aws::S3::Client.new)
    super()
    @bucket = bucket
    @key = key
    @client = client
  end

  def write(data)
    @client.put_object(bucket: @bucket, key: @key, body: data)
  end

  def read
    @client.get_object(bucket: @bucket, key: @key).body.read
  rescue Aws::S3::Errors::NoSuchKey
    nil
  end

  def delete
    @client.delete_object(bucket: @bucket, key: @key)
  end

  def exists?
    @client.head_object(bucket: @bucket, key: @key)
    true
  rescue Aws::S3::Errors::NotFound
    false
  end
end

Error Handling

The persistence framework defines specific exceptions:

# Base error class
Classifier::Error

# Raised when reload would discard unsaved changes
Classifier::UnsavedChangesError

# Raised when storage operations fail
Classifier::StorageError

Handle errors appropriately:

begin
  classifier.reload
rescue Classifier::UnsavedChangesError
  # Prompt user or auto-save
  classifier.save
  classifier.reload
rescue Classifier::StorageError => e
  # Storage backend failed
  logger.error "Failed to reload: #{e.message}"
end

Best Practices

1. Configure Storage at Initialization

def create_classifier
  classifier = Classifier::Bayes.new 'Spam', 'Ham'
  classifier.storage = Classifier::Storage::File.new(
    path: Rails.root.join('models', 'spam_filter.json').to_s
  )
  classifier
end

2. Save After Batch Training

# Don't save after every training example
emails.each do |email|
  classifier.train(email.label, email.body)
end

# Save once at the end
classifier.save

3. Use Memory Storage in Tests

RSpec.describe SpamFilter do
  let(:storage) { Classifier::Storage::Memory.new }
  let(:classifier) do
    c = Classifier::Bayes.new 'Spam', 'Ham'
    c.storage = storage
    c
  end

  it "persists training" do
    classifier.train_spam "buy now"
    classifier.save

    loaded = Classifier::Bayes.load(storage: storage)
    expect(loaded.classify("buy now")).to eq("Spam")
  end
end

4. Version Your Models

class VersionedStorage < Classifier::Storage::File
  def initialize(path:, version:)
    super(path: "#{path}.v#{version}.json")
    @version = version
  end
end

# Deploy new model versions without downtime
storage_v1 = VersionedStorage.new(path: "spam_filter", version: 1)
storage_v2 = VersionedStorage.new(path: "spam_filter", version: 2)

Works with Both Classifiers

The persistence API is identical for Bayes and LSI:

# Bayes
bayes = Classifier::Bayes.new 'A', 'B'
bayes.storage = Classifier::Storage::File.new(path: "bayes.json")
bayes.train_a "text"
bayes.save

# LSI
lsi = Classifier::LSI.new
lsi.storage = Classifier::Storage::File.new(path: "lsi.json")
lsi.add_item "document", :category
lsi.save

Next Steps