The Intent Recognition System is a multi-tenant, machine learning-powered platform designed to understand and classify user queries into predefined business intents. Built for enterprise use, it provides real-time intent matching with high accuracy, automatic pattern generation, and intelligent memory management.
Transform natural language queries like "What time do you open?" into structured business intents like store_hours
with confidence scores, enabling automated customer service, chatbots, FAQ systems, and voice assistants.
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Client Apps โ โ Load Balancer โ โ Monitoring โ
โ (Web, Mobile, โโโโโโ (Optional) โโโโโโ (Grafana, etc) โ
โ Chatbots) โ โ โ โ โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโ
โ Flask App โ
โ (Port 5000) โ
โ โ
โ โข CRUD API โ
โ โข Query API โ
โ โข Cache Mgmt โ
โ โข Rate Limiting โ
โโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ PostgreSQL โ โ Redis โ โ Background โ
โ Database โ โ Cache โ โ Job Worker โ
โ โ โ โ โ โ
โ โข Intents โ โ โข Model โ โ โข Pattern โ
โ โข Patterns โ โ Cache โ โ Generationโ
โ โข Metadata โ โ โข Queue โ โ โข OpenAI โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
intent_app
)intent_background_job
)intent_postgres
)intent_redis
)1. Client creates intent via POST /intents
โโโ Validates input data (customer_id, intent_key, description, etc.)
โโโ Stores intent in PostgreSQL with empty query_pattern
โโโ Caches intent data in Redis
โโโ Invalidates existing model cache for customer
โโโ Queues pattern generation job
2. Background worker processes queue
โโโ Receives intent_id from Redis queue
โโโ Calls OpenAI API to generate query patterns
โโโ Updates database with generated patterns
โโโ Triggers model retraining on next query
3. First query triggers model training
โโโ Loads all customer intents from database
โโโ Trains TF-IDF vectorizer + LogisticRegression model
โโโ Caches trained model in memory
โโโ Returns query result
1. Client sends query via POST /query
โโโ Validates query string and parameters
โโโ Checks rate limits
โโโ Preprocesses query (tokenization, lemmatization)
2. Model loading/caching
โโโ Checks memory cache for customer model
โโโ If cached: Uses in-memory model (fast path ~60ms)
โโโ If not cached: Loads from disk and caches
โโโ If no model exists: Trains new model
3. Intent matching
โโโ Vectorizes query using TF-IDF
โโโ Computes cosine similarity with all intent patterns
โโโ Finds best match above similarity threshold
โโโ Returns intent details with confidence score
4. Response formatting
โโโ Formats result with intent metadata
โโโ Updates model cache access statistics
โโโ Returns JSON response to client
customer_id
Customer Growth Pattern:
โโโ 1-10 customers: Single instance, shared resources
โโโ 10-100 customers: Model caching essential, memory management
โโโ 100-500 customers: Consider horizontal scaling
โโโ 500+ customers: Distributed architecture, shared base models
For each customer:
1. Collect all intent patterns from database
2. Preprocess text:
โโโ Tokenization (NLTK word_tokenize)
โโโ Lowercasing and stopword removal
โโโ Lemmatization (WordNet)
โโโ Clean text normalization
3. Feature extraction:
โโโ TF-IDF vectorization (max 1000 features)
โโโ English stopwords filtering
โโโ Sparse matrix generation
4. Model training:
โโโ LogisticRegression (max_iter=1000)
โโโ Handle single-class edge case
โโโ Cross-validation ready
5. Model persistence:
โโโ Save vectorizer and model to disk (joblib)
โโโ Cache in memory for fast access
โโโ Track memory usage and metadata
For each query:
1. Preprocess query text (same pipeline as training)
2. Transform to TF-IDF vector using trained vectorizer
3. Compute cosine similarity with all intent patterns
4. Find maximum similarity score
5. Return intent if above threshold, else "no match"
Background Job Process:
1. Receive intent from queue (intent_id, info_blob, pattern_count)
2. Generate prompt for OpenAI:
"Generate N questions showing intent to know things from: {info_blob}"
3. Call GPT-4o-mini API with retry logic
4. Parse response into individual patterns
5. Update database with generated patterns
6. Handle failures with exponential backoff
Redis Queue Architecture:
โโโ pattern_generation_queue: New jobs
โโโ pattern_generation_failed: Failed jobs
โโโ BRPOP blocking: Real-time processing
โโโ Retry logic: 3 attempts with backoff
โโโ Fallback polling: Every 5 minutes
ModelMemoryManager:
โโโ OrderedDict for LRU ordering
โโโ Memory estimation per model
โโโ Automatic eviction when limits reached
โโโ TTL-based expiration (24 hours default)
โโโ Background cleanup thread (every 5 minutes)
โโโ Thread-safe operations with RLock
Model Caching Flow:
1. Model trained โ Estimate memory usage โ Cache if space available
2. Model accessed โ Move to end of LRU โ Update access count
3. Memory pressure โ Evict LRU models โ Force garbage collection
4. TTL expired โ Remove stale models โ Free memory
5. Intent updated โ Invalidate cache โ Force retrain on next query
CREATE TABLE intents (
intent_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL,
intent_key VARCHAR(255) NOT NULL,
query_pattern TEXT[] DEFAULT '{}',
description TEXT NOT NULL,
info_blob TEXT NOT NULL,
tags TEXT[] DEFAULT '{}',
intent_query_pattern_count INTEGER DEFAULT 0,
process_time TIMESTAMP,
processing_error TEXT,
last_processing_time TIMESTAMP,
processing_attempts INTEGER DEFAULT 0,
UNIQUE(customer_id, intent_key)
);
CREATE INDEX idx_intents_customer_id ON intents(customer_id);
CREATE INDEX idx_intents_tags ON intents USING GIN(tags);
POST /intents # Create new intent
GET /intents/{intent_id} # Get intent details
PUT /intents/{intent_id} # Update intent
DELETE /intents/{intent_id} # Delete intent
GET /intents/customer/{customer_id} # List customer intents
POST /query # Process query and return intent match
GET /up # Health check
GET /queue/status # Queue monitoring
GET /models/cache/status # Model cache statistics
POST /models/cache/clear # Clear model cache
# Database Configuration
DB_HOST=intent_postgres
DB_PORT=5432
DB_NAME=mydatabase
DB_USER=user
DB_PASSWORD=password
# Redis Configuration
REDIS_HOST=intent_redis
REDIS_PORT=6379
# OpenAI Configuration
OPENAI_API_KEY=sk-your-api-key-here
# Model Memory Management
MAX_CACHED_MODELS=10 # Maximum models in cache
MAX_MODEL_MEMORY_MB=500 # Memory limit in MB
MODEL_TTL_HOURS=24 # Cache expiration time
# Background Job Configuration
SLEEP_INTERVAL=10 # Queue polling interval
services:
app:
environment:
MAX_CACHED_MODELS: '10'
MAX_MODEL_MEMORY_MB: '500'
MODEL_TTL_HOURS: '24'
background_job:
environment:
MAX_CACHED_MODELS: '5'
MAX_MODEL_MEMORY_MB: '200'
MODEL_TTL_HOURS: '12'
Scenario | Response Time | Notes |
---|---|---|
Cold Cache | ~6000ms | First query, model training required |
Warm Cache | ~60-70ms | Model cached in memory |
Cache Miss | ~100-200ms | Model loaded from disk |
No Match | ~60ms | Fast similarity computation |
Component | Memory Usage | Scaling |
---|---|---|
Base Application | ~60MB | Fixed overhead |
Per Model | ~1-2MB | Linear with customers |
Cache Limit | 500MB (app) | Configurable |
Background Job | ~60MB | Fixed overhead |
Metric | Capacity | Bottleneck |
---|---|---|
Queries/sec | 50-100 | Model inference |
Concurrent Users | 100+ | Rate limiting |
Pattern Generation | 10/min | OpenAI API limits |
Model Training | 1-2/sec | CPU intensive |
Current: Basic validation (customer_id in requests)
Recommended: JWT tokens, API keys, OAuth2 integration
Docker Compose Stack:
โโโ intent_postgres (PostgreSQL 16)
โโโ intent_redis (Redis 7 with AOF persistence)
โโโ intent_app (Python 3.9 Flask application)
โโโ intent_background_job (Python 3.9 worker)
Health Checks:
โโโ /up endpoint (application health)
โโโ /queue/status (background job health)
โโโ /models/cache/status (memory management)
โโโ PostgreSQL health check (pg_isready)
โโโ Redis health check (ping)
MAX_MODEL_MEMORY_MB
for more cached modelsEscalate complex issues to humans
Voice Assistants
Provide voice-based customer support
FAQ Systems
Improve self-service capabilities
Content Management
Business Metrics:
โโโ Intent match accuracy (similarity scores)
โโโ Query volume per customer
โโโ Most common intents
โโโ Customer engagement patterns
Technical Metrics:
โโโ Response time percentiles (p50, p95, p99)
โโโ Cache hit rates
โโโ Memory usage trends
โโโ Error rates and types
โโโ Background job processing times
alerts:
high_response_time:
threshold: "> 500ms p95"
action: "Check model cache performance"
low_cache_hit_rate:
threshold: "< 80%"
action: "Review TTL settings"
high_memory_usage:
threshold: "> 90%"
action: "Clear cache or increase limits"
queue_backlog:
threshold: "> 100 items"
action: "Scale background workers"
This documentation provides a comprehensive overview of how the Intent Recognition System works. Would you like me to expand on any particular section or create additional documentation for specific aspects like deployment, API reference, or troubleshooting?