Documentation

The Intent Recognition System is a multi-tenant, machine learning-powered platform designed to understand and classify user queries into predefined business intents. Built for enterprise use, it provides real-time intent matching with high accuracy, automatic pattern generation, and intelligent memory management.

Core Purpose

Transform natural language queries like "What time do you open?" into structured business intents like store_hours with confidence scores, enabling automated customer service, chatbots, FAQ systems, and voice assistants.

🏗️ System Architecture

High-Level Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Client Apps   │    │   Load Balancer │    │   Monitoring    │
│  (Web, Mobile,  │────│   (Optional)    │────│  (Grafana, etc) │
│   Chatbots)     │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Flask App     │
                    │  (Port 5000)    │
                    │                 │
                    │ • CRUD API      │
                    │ • Query API     │
                    │ • Cache Mgmt    │
                    │ • Rate Limiting │
                    └─────────────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
    ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
    │ PostgreSQL  │ │   Redis     │ │ Background  │
    │ Database    │ │   Cache     │ │ Job Worker  │
    │             │ │             │ │             │
    │ • Intents   │ │ • Model     │ │ • Pattern   │
    │ • Patterns  │ │   Cache     │ │   Generation│
    │ • Metadata  │ │ • Queue     │ │ • OpenAI    │
    └─────────────┘ └─────────────┘ └─────────────┘

Service Components

1. Flask Application (intent_app)

2. Background Job Worker (intent_background_job)

3. PostgreSQL Database (intent_postgres)

4. Redis Cache (intent_redis)

🔄 How The System Works

1. Intent Lifecycle Management

Intent Creation Flow

1. Client creates intent via POST /intents
   ├── Validates input data (customer_id, intent_key, description, etc.)
   ├── Stores intent in PostgreSQL with empty query_pattern
   ├── Caches intent data in Redis
   ├── Invalidates existing model cache for customer
   └── Queues pattern generation job

2. Background worker processes queue
   ├── Receives intent_id from Redis queue
   ├── Calls OpenAI API to generate query patterns
   ├── Updates database with generated patterns
   └── Triggers model retraining on next query

3. First query triggers model training
   ├── Loads all customer intents from database
   ├── Trains TF-IDF vectorizer + LogisticRegression model
   ├── Caches trained model in memory
   └── Returns query result

Query Processing Flow

1. Client sends query via POST /query
   ├── Validates query string and parameters
   ├── Checks rate limits
   └── Preprocesses query (tokenization, lemmatization)

2. Model loading/caching
   ├── Checks memory cache for customer model
   ├── If cached: Uses in-memory model (fast path ~60ms)
   ├── If not cached: Loads from disk and caches
   └── If no model exists: Trains new model

3. Intent matching
   ├── Vectorizes query using TF-IDF
   ├── Computes cosine similarity with all intent patterns
   ├── Finds best match above similarity threshold
   └── Returns intent details with confidence score

4. Response formatting
   ├── Formats result with intent metadata
   ├── Updates model cache access statistics
   └── Returns JSON response to client

2. Multi-Tenant Architecture

Customer Isolation

Scaling Strategy

Customer Growth Pattern:
├── 1-10 customers: Single instance, shared resources
├── 10-100 customers: Model caching essential, memory management
├── 100-500 customers: Consider horizontal scaling
└── 500+ customers: Distributed architecture, shared base models

3. Machine Learning Pipeline

Training Process

For each customer:
1. Collect all intent patterns from database
2. Preprocess text:
   ├── Tokenization (NLTK word_tokenize)
   ├── Lowercasing and stopword removal
   ├── Lemmatization (WordNet)
   └── Clean text normalization

3. Feature extraction:
   ├── TF-IDF vectorization (max 1000 features)
   ├── English stopwords filtering
   └── Sparse matrix generation

4. Model training:
   ├── LogisticRegression (max_iter=1000)
   ├── Handle single-class edge case
   └── Cross-validation ready

5. Model persistence:
   ├── Save vectorizer and model to disk (joblib)
   ├── Cache in memory for fast access
   └── Track memory usage and metadata

Inference Process

For each query:
1. Preprocess query text (same pipeline as training)
2. Transform to TF-IDF vector using trained vectorizer
3. Compute cosine similarity with all intent patterns
4. Find maximum similarity score
5. Return intent if above threshold, else "no match"

4. Pattern Generation System

OpenAI Integration

Background Job Process:
1. Receive intent from queue (intent_id, info_blob, pattern_count)
2. Generate prompt for OpenAI:
   "Generate N questions showing intent to know things from: {info_blob}"
3. Call GPT-4o-mini API with retry logic
4. Parse response into individual patterns
5. Update database with generated patterns
6. Handle failures with exponential backoff

Queue Management

Redis Queue Architecture:
├── pattern_generation_queue: New jobs
├── pattern_generation_failed: Failed jobs
├── BRPOP blocking: Real-time processing
├── Retry logic: 3 attempts with backoff
└── Fallback polling: Every 5 minutes

5. Memory Management System

LRU Cache Implementation

ModelMemoryManager:
├── OrderedDict for LRU ordering
├── Memory estimation per model
├── Automatic eviction when limits reached
├── TTL-based expiration (24 hours default)
├── Background cleanup thread (every 5 minutes)
└── Thread-safe operations with RLock

Cache Lifecycle

Model Caching Flow:
1. Model trained → Estimate memory usage → Cache if space available
2. Model accessed → Move to end of LRU → Update access count
3. Memory pressure → Evict LRU models → Force garbage collection
4. TTL expired → Remove stale models → Free memory
5. Intent updated → Invalidate cache → Force retrain on next query

🔧 Technical Implementation

Database Schema

CREATE TABLE intents (
    intent_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL,
    intent_key VARCHAR(255) NOT NULL,
    query_pattern TEXT[] DEFAULT '{}',
    description TEXT NOT NULL,
    info_blob TEXT NOT NULL,
    tags TEXT[] DEFAULT '{}',
    intent_query_pattern_count INTEGER DEFAULT 0,
    process_time TIMESTAMP,
    processing_error TEXT,
    last_processing_time TIMESTAMP,
    processing_attempts INTEGER DEFAULT 0,
    UNIQUE(customer_id, intent_key)
);

CREATE INDEX idx_intents_customer_id ON intents(customer_id);
CREATE INDEX idx_intents_tags ON intents USING GIN(tags);

API Endpoints

Intent Management

POST   /intents              # Create new intent
GET    /intents/{intent_id}  # Get intent details
PUT    /intents/{intent_id}  # Update intent
DELETE /intents/{intent_id}  # Delete intent
GET    /intents/customer/{customer_id}  # List customer intents

Query Processing

POST   /query               # Process query and return intent match

System Management

GET    /up                  # Health check
GET    /queue/status        # Queue monitoring
GET    /models/cache/status # Model cache statistics
POST   /models/cache/clear  # Clear model cache

Configuration Management

Environment Variables

# Database Configuration
DB_HOST=intent_postgres
DB_PORT=5432
DB_NAME=mydatabase
DB_USER=user
DB_PASSWORD=password

# Redis Configuration
REDIS_HOST=intent_redis
REDIS_PORT=6379

# OpenAI Configuration
OPENAI_API_KEY=sk-your-api-key-here

# Model Memory Management
MAX_CACHED_MODELS=10        # Maximum models in cache
MAX_MODEL_MEMORY_MB=500     # Memory limit in MB
MODEL_TTL_HOURS=24          # Cache expiration time

# Background Job Configuration
SLEEP_INTERVAL=10           # Queue polling interval

Docker Compose Services

services:
  app:
    environment:
      MAX_CACHED_MODELS: '10'
      MAX_MODEL_MEMORY_MB: '500'
      MODEL_TTL_HOURS: '24'

  background_job:
    environment:
      MAX_CACHED_MODELS: '5'
      MAX_MODEL_MEMORY_MB: '200'
      MODEL_TTL_HOURS: '12'

📊 Performance Characteristics

Response Times

Scenario	Response Time	Notes
Cold Cache	~6000ms	First query, model training required
Warm Cache	~60-70ms	Model cached in memory
Cache Miss	~100-200ms	Model loaded from disk
No Match	~60ms	Fast similarity computation

Memory Usage

Component	Memory Usage	Scaling
Base Application	~60MB	Fixed overhead
Per Model	~1-2MB	Linear with customers
Cache Limit	500MB (app)	Configurable
Background Job	~60MB	Fixed overhead

Throughput

Metric	Capacity	Bottleneck
Queries/sec	50-100	Model inference
Concurrent Users	100+	Rate limiting
Pattern Generation	10/min	OpenAI API limits
Model Training	1-2/sec	CPU intensive

🔒 Security & Compliance

Data Protection

Authentication & Authorization

Current: Basic validation (customer_id in requests)
Recommended: JWT tokens, API keys, OAuth2 integration

Compliance Features

🚀 Deployment & Operations

Container Architecture

Docker Compose Stack:
├── intent_postgres (PostgreSQL 16)
├── intent_redis (Redis 7 with AOF persistence)
├── intent_app (Python 3.9 Flask application)
└── intent_background_job (Python 3.9 worker)

Health Monitoring

Health Checks:
├── /up endpoint (application health)
├── /queue/status (background job health)
├── /models/cache/status (memory management)
├── PostgreSQL health check (pg_isready)
└── Redis health check (ping)

Scaling Strategies

Vertical Scaling

Horizontal Scaling

🎯 Use Cases & Applications

Primary Use Cases

Industry Applications

📈 Monitoring & Analytics

Key Metrics

Business Metrics:
├── Intent match accuracy (similarity scores)
├── Query volume per customer
├── Most common intents
└── Customer engagement patterns

Technical Metrics:
├── Response time percentiles (p50, p95, p99)
├── Cache hit rates
├── Memory usage trends
├── Error rates and types
└── Background job processing times

Alerting Thresholds

alerts:
  high_response_time:
    threshold: "> 500ms p95"
    action: "Check model cache performance"

  low_cache_hit_rate:
    threshold: "< 80%"
    action: "Review TTL settings"

  high_memory_usage:
    threshold: "> 90%"
    action: "Clear cache or increase limits"

  queue_backlog:
    threshold: "> 100 items"
    action: "Scale background workers"

🔮 Future Roadmap

Short Term (1-3 months)

Medium Term (3-6 months)

Long Term (6+ months)

This documentation provides a comprehensive overview of how the Intent Recognition System works. Would you like me to expand on any particular section or create additional documentation for specific aspects like deployment, API reference, or troubleshooting?

Intent Recognition System - Complete Documentation

🎯 System Overview

Core Purpose

🏗️ System Architecture

High-Level Architecture

Service Components

1. Flask Application (intent_app)

2. Background Job Worker (intent_background_job)

3. PostgreSQL Database (intent_postgres)

4. Redis Cache (intent_redis)

🔄 How The System Works

1. Intent Lifecycle Management

Intent Creation Flow

Query Processing Flow

2. Multi-Tenant Architecture

Customer Isolation

Scaling Strategy

3. Machine Learning Pipeline

Training Process

Inference Process

4. Pattern Generation System

OpenAI Integration

Queue Management

5. Memory Management System

LRU Cache Implementation

Cache Lifecycle

🔧 Technical Implementation

Database Schema

API Endpoints

Intent Management

Query Processing

System Management

Configuration Management

Environment Variables

Docker Compose Services

📊 Performance Characteristics

Response Times

Memory Usage

Throughput

🔒 Security & Compliance

Data Protection

Authentication & Authorization

Compliance Features

🚀 Deployment & Operations

Container Architecture

Health Monitoring

Scaling Strategies

Vertical Scaling

Horizontal Scaling

🎯 Use Cases & Applications

Primary Use Cases

Industry Applications

📈 Monitoring & Analytics

Key Metrics

Alerting Thresholds

🔮 Future Roadmap

Short Term (1-3 months)

Medium Term (3-6 months)

Long Term (6+ months)

1. Flask Application (`intent_app`)

2. Background Job Worker (`intent_background_job`)

3. PostgreSQL Database (`intent_postgres`)

4. Redis Cache (`intent_redis`)