Local Search System πŸ”— ↑ TOC

The SignalWire Agents SDK includes a powerful local search system that provides DataSphere-compatible search functionality without external dependencies. This system uses advanced query preprocessing, local embeddings, and hybrid search techniques to enable agents to search through document collections offline.

Table of Contents πŸ”— ↑ TOC

Overview πŸ”— ↑ TOC

The local search system provides:

Architecture πŸ”— ↑ TOC

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Documents     │───▢│   Index Builder  │───▢│  .swsearch DB   β”‚
β”‚ (MD, PDF, etc.) β”‚    β”‚                  β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
                                                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Agent       │───▢│  Search Skill    │───▢│  Search Engine  β”‚
β”‚                 β”‚    β”‚                  β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Installation Options πŸ”— ↑ TOC

The search system uses optional dependencies to keep the base SDK lightweight. Choose the installation option that fits your needs:

Basic Search (~500MB) πŸ”— ↑ TOC

pip install signalwire-agents[search]

Includes:

Full Document Processing (~600MB) πŸ”— ↑ TOC

pip install signalwire-agents[search-full]

Adds:

Advanced NLP Features (~700MB) πŸ”— ↑ TOC

pip install signalwire-agents[search-nlp]

Adds:

⚠️ Additional Setup Required:

python -m spacy download en_core_web_sm

Performance Note: Advanced NLP features provide significantly better query understanding, synonym expansion, and search relevance, but are 2-3x slower than basic search. Only recommended if you have sufficient CPU power and can tolerate longer response times.

NLP Backend Control: You can choose which NLP backend to use: - NLTK (default): Fast processing, good for most use cases - spaCy: Better quality but slower, requires model download

Configure via the nlp_backend parameter in your search skill.

All Search Features (~700MB) πŸ”— ↑ TOC

pip install signalwire-agents[search-all]

Includes everything above

⚠️ Additional Setup Required:

python -m spacy download en_core_web_sm

Performance Note: This includes advanced NLP features which improve search quality but increase response times.

Minimal Installation (Base SDK only) πŸ”— ↑ TOC

pip install signalwire-agents

Search functionality will show helpful error messages when dependencies are missing.

Quick Start πŸ”— ↑ TOC

1. Install Dependencies πŸ”— ↑ TOC

pip install signalwire-agents[search-full]

2. Build a Search Index πŸ”— ↑ TOC

# Build from the comprehensive concepts guide
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch

# Build from multiple individual files
sw-search README.md docs/agent_guide.md docs/architecture.md --output knowledge.swsearch

# Build from mixed sources (files and directories)
sw-search docs/signalwire_agents_concepts_guide.md examples --file-types md,py --output comprehensive.swsearch

# Build from a directory (traditional approach)
sw-search docs --output docs.swsearch

# Include specific file types
sw-search docs --file-types md,txt,py

# Exclude patterns
sw-search docs --exclude "**/test/**,**/__pycache__/**"

3. Use in Your Agent πŸ”— ↑ TOC

from signalwire_agents import AgentBase

class MyAgent(AgentBase):
    def __init__(self):
        super().__init__()

        # Add search capability using the concepts guide
        self.add_skill("native_vector_search", {
            "tool_name": "search_docs",
            "description": "Search the comprehensive SDK concepts guide for information",
            "index_file": "concepts.swsearch",
            "count": 5
        })

agent = MyAgent()
agent.serve()

Ask your agent: "How do I create a new agent?" and it will search the comprehensive concepts guide to provide detailed answers.

Building Search Indexes πŸ”— ↑ TOC

Search indexes are SQLite databases with the .swsearch extension that contain processed documents, embeddings, and search metadata.

Basic Index Building πŸ”— ↑ TOC

# Build index from the comprehensive concepts guide
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch

# Build from multiple individual files
sw-search README.md docs/agent_guide.md docs/architecture.md --output knowledge.swsearch

# Build from mixed sources (files and directories)
sw-search docs/signalwire_agents_concepts_guide.md examples --file-types md,py --output comprehensive.swsearch

# Build from a directory (traditional approach)
sw-search docs --output docs.swsearch

# Include specific file types
sw-search docs --file-types md,txt,py

# Exclude patterns
sw-search docs --exclude "**/test/**,**/__pycache__/**"

Advanced Index Building πŸ”— ↑ TOC

# Full configuration example with multiple sources
sw-search docs/signalwire_agents_concepts_guide.md ./examples README.md \
    --output ./knowledge.swsearch \
    --chunking-strategy sentence \
    --max-sentences-per-chunk 8 \
    --file-types md,txt,rst,py \
    --exclude "**/test/**,**/__pycache__/**" \
    --model sentence-transformers/all-mpnet-base-v2 \
    --tags documentation,api \
    --verbose

Supported File Types πŸ”— ↑ TOC

Format Extension Requirements
Markdown .md Built-in
Text .txt Built-in
Python .py Built-in
reStructuredText .rst Built-in
PDF .pdf search-full
Word Documents .docx search-full
HTML .html search-full
JSON .json Built-in

Index Structure πŸ”— ↑ TOC

Each .swsearch file contains:

Using the Search Skill πŸ”— ↑ TOC

The native_vector_search skill provides search functionality to your agents.

Basic Configuration πŸ”— ↑ TOC

self.add_skill("native_vector_search", {
    "tool_name": "search_knowledge",
    "description": "Search the knowledge base",
    "index_file": "knowledge.swsearch"
})

Advanced Configuration πŸ”— ↑ TOC

NLP Backend Selection πŸ”— ↑ TOC

Choose between NLTK (fast) and spaCy (better quality) for query processing:

# Fast NLTK processing (default)
self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",
    "nlp_backend": "nltk"  # Fast, good for most use cases
})

# Better quality spaCy processing
self.add_skill("native_vector_search", {
    "tool_name": "search_docs", 
    "index_file": "docs.swsearch",
    "nlp_backend": "spacy"  # Slower but better quality, requires model download
})

Performance Comparison:

Custom Embedding Models πŸ”— ↑ TOC

# Use a different embedding model
self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",
    "model": "sentence-transformers/all-MiniLM-L6-v2"  # Smaller, faster model
})

Query Enhancement πŸ”— ↑ TOC

The system automatically enhances queries using: - Language detection - POS tagging (with NLP dependencies) - Synonym expansion using WordNet - Keyword extraction - Vector embeddings

Response Customization πŸ”— ↑ TOC

self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",

    # Customize responses for voice calls
    "response_prefix": "Based on the documentation, here's what I found:",
    "response_postfix": "Would you like me to search for more specific information?",

    # Custom no-results message
    "no_results_message": "I couldn't find information about '{query}'. Try rephrasing your question.",

    # SWAIG function fillers for natural conversation
    "swaig_fields": {
        "fillers": {
            "en-US": [
                "Let me search the documentation",
                "Checking our knowledge base",
                "Looking that up for you"
            ]
        }
    }
})

Tag-Based Filtering πŸ”— ↑ TOC

# Only search documents tagged with specific categories
self.add_skill("native_vector_search", {
    "tool_name": "search_api_docs",
    "index_file": "docs.swsearch", 
    "tags": ["api", "reference"],  # Only search API docs
    "description": "Search API reference documentation"
})

Complete Configuration Example πŸ”— ↑ TOC

self.add_skill("native_vector_search", {
    # Tool configuration
    "tool_name": "search_docs",
    "description": "Search SDK documentation for detailed information",

    # Index configuration
    "index_file": "docs.swsearch",
    "build_index": True,  # Auto-build if missing
    "source_dir": "./docs",  # Source for auto-build
    "file_types": ["md", "txt"],

    # Search parameters
    "count": 5,  # Number of results
    "distance_threshold": 0.1,  # Similarity threshold
    "tags": ["documentation"],  # Filter by tags

    # NLP backend selection
    "nlp_backend": "nltk",  # or "spacy" for better quality

    # Response formatting
    "response_prefix": "Based on the documentation:",
    "response_postfix": "Would you like more details?",
    "no_results_message": "No information found for '{query}'",

    # SWAIG configuration
    "swaig_fields": {
        "fillers": {
            "en-US": ["Let me search for that", "Checking the docs"]
        }
    }
})

Multiple Search Instances πŸ”— ↑ TOC

You can add multiple search instances for different document collections:

# Documentation search with spaCy for better quality
self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",
    "nlp_backend": "spacy",
    "description": "Search SDK documentation"
})

# Code examples search with NLTK for speed
self.add_skill("native_vector_search", {
    "tool_name": "search_examples", 
    "index_file": "examples.swsearch",
    "nlp_backend": "nltk",
    "description": "Search code examples"
})

Local vs Remote Modes πŸ”— ↑ TOC

The search skill supports both local and remote operation modes.

Local Mode (Default) πŸ”— ↑ TOC

Pros:

Cons:

Configuration:

self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",  # Local file
    "nlp_backend": "nltk"  # Choose NLP backend
})

Remote Mode πŸ”— ↑ TOC

Pros:

Cons:

Configuration:

self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "remote_url": "http://localhost:8001",  # Search server
    "index_name": "docs",  # Index name on server
    "nlp_backend": "nltk"  # NLP backend for query preprocessing
})

Running a Remote Search Server πŸ”— ↑ TOC

  1. Start the search server:
python examples/search_server_standalone.py
  1. The server provides HTTP API:
  2. POST /search - Search the indexes
  3. GET /health - Health check and available indexes
  4. POST /reload_index - Add or reload an index

  5. Test the API:

curl -X POST "http://localhost:8001/search" \
     -H "Content-Type: application/json" \
     -d '{"query": "how to create an agent", "index_name": "docs", "count": 3}'

Automatic Mode Detection πŸ”— ↑ TOC

The skill automatically detects which mode to use: - If remote_url is provided β†’ Remote mode - If index_file is provided β†’ Local mode - Remote mode takes priority if both are specified

Advanced Configuration πŸ”— ↑ TOC

Custom Embedding Models πŸ”— ↑ TOC

CLI Reference πŸ”— ↑ TOC

sw-search Command πŸ”— ↑ TOC

sw-search <source_dir> [options]

Arguments:

Options:

Subcommands:

validate - Validate Search Index πŸ”— ↑ TOC

sw-search validate <index_file> [--verbose]

Validates an existing .swsearch index file and shows statistics.

search - Search Within Index πŸ”— ↑ TOC

sw-search search <index_file> <query> [options]

Search within an existing .swsearch index file. This is useful for: - Testing search quality and relevance - Exploring index contents - Debugging search results - Scripting and automation

Search Options:

Examples:

# Build from the comprehensive concepts guide
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch

# Build from multiple sources (files and directories)
sw-search docs/signalwire_agents_concepts_guide.md examples README.md \
    --output comprehensive.swsearch \
    --file-types md,py,txt \
    --verbose

# Traditional directory-based approach
sw-search ./documentation \
    --output knowledge.swsearch \
    --chunking-strategy sentence \
    --max-sentences-per-chunk 8 \
    --file-types md,rst,txt \
    --exclude "**/drafts/**" \
    --tags documentation,help \
    --verbose

# Validate an existing index
sw-search validate concepts.swsearch --verbose

# Search within an index
sw-search search concepts.swsearch "how to create an agent"
sw-search search concepts.swsearch "API reference" --count 3 --verbose
sw-search search concepts.swsearch "configuration" --tags documentation --json

# Use different NLP backends
sw-search search concepts.swsearch "deployment options" --nlp-backend nltk  # Fast
sw-search search concepts.swsearch "deployment options" --nlp-backend spacy  # Better quality

# Advanced search with filtering
sw-search search concepts.swsearch "deployment options" \
    --count 10 \
    --distance-threshold 0.1 \
    --tags "deployment,production" \
    --nlp-backend spacy \
    --verbose

# JSON output for scripting
sw-search search concepts.swsearch "error handling" --json | jq '.results[0].content'

# Build multiple specialized indexes
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch
sw-search examples --output examples.swsearch --file-types py,md

Index Validation πŸ”— ↑ TOC

# Validate an existing index
python -c "
from signalwire_agents.search import SearchEngine
engine = SearchEngine('docs.swsearch')
print(f'Index stats: {engine.get_stats()}')
"

API Reference πŸ”— ↑ TOC

SearchEngine Class πŸ”— ↑ TOC

from signalwire_agents.search import SearchEngine

# Load an index
engine = SearchEngine("docs.swsearch")

# Perform search
results = engine.search(
    query_vector=[...],  # Optional: pre-computed query vector
    enhanced_text="search query",  # Enhanced query text
    count=5,  # Number of results
    distance_threshold=0.0,  # Minimum similarity score
    tags=["documentation"]  # Filter by tags
)

# Get index statistics
stats = engine.get_stats()
print(f"Total chunks: {stats['total_chunks']}")
print(f"Total files: {stats['total_files']}")

IndexBuilder Class πŸ”— ↑ TOC

from signalwire_agents.search import IndexBuilder

# Create index builder
builder = IndexBuilder(
    model_name="sentence-transformers/all-mpnet-base-v2",
    chunk_size=500,
    chunk_overlap=50,
    verbose=True
)

# Build index
builder.build_index(
    source_dir="./docs",
    output_file="docs.swsearch",
    file_types=["md", "txt"],
    exclude_patterns=["**/test/**"],
    tags=["documentation"]
)