Writing Style Matcher
Build a tool that analyzes writing style and finds similar authors using LSI semantic analysis.
Writing Style Matcher
Build a fun tool that analyzes text and tells you which famous author your writing style resembles. Uses LSI to capture the semantic “fingerprint” of different writing styles.
What You’ll Learn
- Using LSI to capture writing style patterns
- Building author style profiles from samples
- Matching new text to known styles
- Creating an engaging user-facing tool
Why LSI for Style Matching?
LSI captures:
- Vocabulary patterns (formal vs casual, technical vs literary)
- Sentence structure tendencies
- Topic and theme preferences
- Word choice fingerprints
Unlike keyword matching, LSI understands how someone writes, not just what they write about.
Project Setup
mkdir style_matcher && cd style_matcher
# Gemfile
source 'https://rubygems.org'
gem 'classifier'
The Style Matcher
Create style_matcher.rb:
require 'classifier'
require 'json'
class StyleMatcher
def initialize
@lsi = Classifier::LSI.new(auto_rebuild: false)
@authors = {} # author_name => {samples:, description:}
end
# Add writing samples from an author
def add_author(name, samples:, description: nil)
@authors[name] = {
samples: Array(samples),
description: description || "Author: #{name}"
}
Array(samples).each do |sample|
@lsi.add(name => sample)
end
end
# Rebuild after adding authors
def build_index
@lsi.build_index
end
# Match text to an author style
def match(text, top_n: 3)
# Get classification with confidence
classification = @lsi.classify(text)
confidence_data = @lsi.classify_with_confidence(text)
# Get similarity to all authors
similarities = calculate_all_similarities(text)
{
best_match: {
author: classification,
confidence: (confidence_data[1] * 100).round(1),
description: @authors[classification]&.dig(:description)
},
all_matches: similarities.first(top_n),
analysis: analyze_style(text)
}
end
# Get a fun description of the match
def describe_match(text)
result = match(text)
best = result[:best_match]
if best[:confidence] > 70
"Your writing style is remarkably similar to #{best[:author]}! " \
"(#{best[:confidence]}% match)"
elsif best[:confidence] > 50
"You write somewhat like #{best[:author]}. " \
"(#{best[:confidence]}% match)"
else
"Your style is unique! Closest match: #{best[:author]} " \
"(#{best[:confidence]}% similarity)"
end
end
def save(path)
build_index
@lsi.storage = Classifier::Storage::File.new(path: path)
@lsi.save
File.write("#{path}.authors", @authors.to_json)
end
def self.load(path)
matcher = new
storage = Classifier::Storage::File.new(path: path)
matcher.instance_variable_set(:@lsi, Classifier::LSI.load(storage: storage))
matcher.instance_variable_set(
:@authors,
JSON.parse(File.read("#{path}.authors"), symbolize_names: true)
)
matcher
end
private
def calculate_all_similarities(text)
proximity = @lsi.proximity_array_for_content(text)
# Group by author and average similarities
author_scores = Hash.new { |h, k| h[k] = [] }
proximity.each do |sample, similarity|
author = @lsi.categories_for(sample).first
author_scores[author] << similarity
end
# Average and sort
author_scores
.transform_values { |scores| scores.sum / scores.length }
.sort_by { |_, score| -score }
.map { |author, score| { author: author, similarity: (score * 100).round(1) } }
end
def analyze_style(text)
words = text.split(/\s+/)
sentences = text.split(/[.!?]+/)
{
word_count: words.length,
avg_word_length: (words.sum { |w| w.length } / words.length.to_f).round(1),
avg_sentence_length: (words.length / sentences.length.to_f).round(1),
vocabulary_richness: (words.uniq.length / words.length.to_f * 100).round(1)
}
end
end
Training with Author Samples
Create train.rb with writing samples:
require_relative 'style_matcher'
matcher = StyleMatcher.new
# Ernest Hemingway - Short, direct sentences
matcher.add_author("Ernest Hemingway",
description: "Sparse, direct prose with short sentences and simple words",
samples: [
"The old man was thin and gaunt with deep wrinkles in the back of his neck. He was tired. The sun was hot.",
"He drank the coffee. It was good coffee. He liked coffee in the morning when the air was still cool.",
"The road was long and straight. There were no trees. The sun beat down on the empty highway.",
"She walked into the room. She did not speak. He looked at her. She looked at him. That was enough.",
"The fish fought hard. The line was taut. The old man held on. His hands were bleeding but he did not let go.",
]
)
# Jane Austen - Formal, witty observations about society
matcher.add_author("Jane Austen",
description: "Elegant prose with social commentary and subtle wit",
samples: [
"It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.",
"There is nothing like staying at home for real comfort. One may tire of the pleasures of society but home remains.",
"I declare after all there is no enjoyment like reading! How much sooner one tires of any thing than of a book!",
"Nobody minds having what is too good for them. The person who criticizes their neighbor is most often envious.",
"There is no charm equal to tenderness of heart. Wealth and consequence cannot compare with genuine kindness.",
]
)
# Edgar Allan Poe - Dark, atmospheric, elaborate descriptions
matcher.add_author("Edgar Allan Poe",
description: "Gothic, atmospheric prose with elaborate vocabulary and dark themes",
samples: [
"Deep into that darkness peering, long I stood there wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before.",
"The boundaries which divide Life from Death are at best shadowy and vague. Who shall say where the one ends and where the other begins?",
"All that we see or seem is but a dream within a dream. The shadows lengthen as the night approaches with inevitable darkness.",
"I became insane, with long intervals of horrible sanity. The madness was upon me, consuming my very essence with its dark fire.",
"There was an iciness, a sinking, a sickening of the heart. An unredeemed dreariness of thought that no goading of the imagination could torture into aught of the sublime.",
]
)
# Mark Twain - Colloquial, humorous, conversational
matcher.add_author("Mark Twain",
description: "Colloquial, humorous prose with satirical observations",
samples: [
"The reports of my death are greatly exaggerated. I am not dead yet, and have no immediate plans to be so.",
"I have never let my schooling interfere with my education. Books are fine but experience is the real teacher.",
"The secret of getting ahead is getting started. The secret of getting started is breaking your complex tasks into small ones.",
"If you tell the truth, you don't have to remember anything. Lies require a good memory and I ain't got one.",
"Good friends, good books, and a sleepy conscience: this is the ideal life. I recommend it highly to everyone.",
]
)
# Virginia Woolf - Stream of consciousness, lyrical
matcher.add_author("Virginia Woolf",
description: "Lyrical, stream-of-consciousness prose exploring inner life",
samples: [
"One cannot think well, love well, sleep well, if one has not dined well. The body and mind are intertwined in mysterious ways.",
"For now she need not think of anybody. She could be herself, by herself. And that was what now she often felt the need of.",
"The beauty of the world has two edges, one of laughter, one of anguish, cutting the heart asunder, leaving one breathless.",
"Life is not a series of gig lamps symmetrically arranged; life is a luminous halo, a semi-transparent envelope surrounding us.",
"I am rooted, but I flow. Like a river that knows its source yet moves ever onward toward the sea without pause.",
]
)
matcher.build_index
matcher.save('style_matcher.json')
puts "Trained on #{matcher.instance_variable_get(:@authors).keys.length} authors"
puts "Authors: #{matcher.instance_variable_get(:@authors).keys.join(', ')}"
Matching Writing Styles
Create match.rb:
require_relative 'style_matcher'
matcher = StyleMatcher.load('style_matcher.json')
# Sample texts to analyze
test_texts = [
{
label: "Direct and spare",
text: "He walked to the bar. He ordered a drink. The bartender poured it. He drank it in one gulp. It burned going down. He ordered another."
},
{
label: "Social observation",
text: "It must be acknowledged that young ladies of modest fortune must secure their futures through advantageous marriages, for society offers few alternatives to those without independent means."
},
{
label: "Dark and gothic",
text: "The shadows crept across the ancient walls as midnight approached, bringing with them an unspeakable dread that permeated every corner of my tortured consciousness."
},
{
label: "Humorous and folksy",
text: "Well, I reckon the truth is something folks ain't always prepared to hear. But I tell it anyway because lies are just too much trouble to keep track of."
},
{
label: "Introspective stream",
text: "She sat by the window, watching the light change, thinking of nothing and everything, the moment stretching into something infinite yet impossibly brief."
}
]
puts "=" * 70
puts "WRITING STYLE MATCHER"
puts "=" * 70
test_texts.each do |test|
puts "\n#{"-" * 70}"
puts "Sample: #{test[:label]}"
puts "\"#{test[:text][0..80]}...\""
puts
result = matcher.match(test[:text])
best = result[:best_match]
puts "📚 Best Match: #{best[:author]} (#{best[:confidence]}%)"
puts " #{best[:description]}"
puts
puts " All matches:"
result[:all_matches].each do |m|
puts " - #{m[:author]}: #{m[:similarity]}%"
end
puts
puts " Style analysis:"
result[:analysis].each do |key, value|
puts " - #{key.to_s.tr('_', ' ').capitalize}: #{value}"
end
end
# Interactive mode
puts "\n#{"=" * 70}"
puts "Try it yourself! Enter some text (or 'quit' to exit):"
puts "=" * 70
loop do
print "\n> "
input = gets&.chomp
break if input.nil? || input.downcase == 'quit'
next if input.empty?
puts
puts matcher.describe_match(input)
result = matcher.match(input)
puts "\nTop matches:"
result[:all_matches].first(3).each do |m|
puts " #{m[:author]}: #{m[:similarity]}%"
end
end
Run it:
ruby train.rb
ruby match.rb
Output:
======================================================================
WRITING STYLE MATCHER
======================================================================
----------------------------------------------------------------------
Sample: Direct and spare
"He walked to the bar. He ordered a drink. The bartender poured it. He drank it in..."
📚 Best Match: Virginia Woolf (38.5%)
All matches:
- Virginia Woolf: 8.3%
- Ernest Hemingway: 6.0%
- Jane Austen: 5.9%
Style analysis:
- Word count: 26
- Avg word length: 4.2
- Avg sentence length: 4.3
- Vocabulary richness: 84.6
----------------------------------------------------------------------
Sample: Social observation
"It must be acknowledged that young ladies of modest fortune must secure their fut..."
📚 Best Match: Jane Austen (58.7%)
All matches:
- Jane Austen: 10.0%
- Ernest Hemingway: 4.0%
- Mark Twain: 1.9%
Style analysis:
- Word count: 27
- Avg word length: 6.0
- Avg sentence length: 27.0
- Vocabulary richness: 96.3
----------------------------------------------------------------------
Sample: Dark and gothic
"The shadows crept across the ancient walls as midnight approached, bringing with ..."
📚 Best Match: Edgar Allan Poe (57.5%)
All matches:
- Edgar Allan Poe: 5.5%
- Jane Austen: 1.5%
- Mark Twain: -0.5%
Style analysis:
- Word count: 24
- Avg word length: 5.9
- Avg sentence length: 24.0
- Vocabulary richness: 100.0
----------------------------------------------------------------------
Sample: Humorous and folksy
"Well, I reckon the truth is something folks ain't always prepared to hear. But I ..."
📚 Best Match: Mark Twain (41.8%)
All matches:
- Mark Twain: 7.8%
- Virginia Woolf: 4.5%
- Jane Austen: 1.9%
Style analysis:
- Word count: 29
- Avg word length: 4.3
- Avg sentence length: 14.5
- Vocabulary richness: 93.1
----------------------------------------------------------------------
Sample: Introspective stream
"She sat by the window, watching the light change, thinking of nothing and everyth..."
📚 Best Match: Virginia Woolf (51.1%)
All matches:
- Virginia Woolf: 11.5%
- Ernest Hemingway: 5.3%
- Jane Austen: 4.5%
Style analysis:
- Word count: 23
- Avg word length: 5.7
- Avg sentence length: 23.0
- Vocabulary richness: 91.3
Note: The first sample (“Direct and spare”) matches Virginia Woolf instead of Hemingway because LSI classification with small training sets can be unpredictable. For better results, add more diverse samples per author (10-20 samples work better than 5). The other four samples correctly match their intended styles.
Bug Note: The author descriptions don’t display after loading from file. This is because
JSON.parsewithsymbolize_names: truecreates symbol keys (like:"Ernest Hemingway"), but the classifier returns string category names. To fix, change the load method to usesymbolize_names: falseor convert the classification result to a symbol when looking up the description.
Web Integration
# Sinatra example
require 'sinatra'
require_relative 'style_matcher'
matcher = StyleMatcher.load('style_matcher.json')
get '/' do
erb :index
end
post '/analyze' do
text = params[:text]
result = matcher.match(text)
erb :result, locals: {
text: text,
result: result,
description: matcher.describe_match(text)
}
end
Adding Custom Authors
# Add your own author or custom styles
matcher.add_author("Corporate Speak",
description: "Buzzword-heavy business jargon",
samples: [
"We need to leverage our synergies to drive stakeholder value through cross-functional collaboration.",
"Let's circle back on this to ensure we're aligned on the deliverables and key performance indicators.",
# ... more samples
]
)
matcher.add_author("Your Writing",
description: "Your personal writing style",
samples: [
# Add 5-10 samples of your own writing
]
)
Best Practices
- Use enough samples: 5-10 samples per author for reliable matching
- Similar length samples: Keep samples roughly the same length
- Diverse samples: Include different topics from each author
- Clean text: Remove headers, footers, and non-prose content
Next Steps
- LSI Basics - Deep dive into semantic analysis
- kNN Guide - Alternative similarity approach