Documentation

The SignalWire Agents SDK includes a powerful local search system that provides DataSphere-compatible search functionality without external dependencies. This system uses advanced query preprocessing, local embeddings, and hybrid search techniques to enable agents to search through document collections offline.

Table of Contents 🔗 ↑ TOC

Overview 🔗 ↑ TOC

Architecture 🔗 ↑ TOC

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Documents     │───▶│   Index Builder  │───▶│  .swsearch DB   │
│ (MD, PDF, etc.) │    │                  │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
                                                         ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│     Agent       │───▶│  Search Skill    │───▶│  Search Engine  │
│                 │    │                  │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Installation Options 🔗 ↑ TOC

The search system uses optional dependencies to keep the base SDK lightweight. Choose the installation option that fits your needs:

Basic Search (~500MB) 🔗 ↑ TOC

pip install signalwire-agents[search]

Full Document Processing (~600MB) 🔗 ↑ TOC

pip install signalwire-agents[search-full]

Advanced NLP Features (~700MB) 🔗 ↑ TOC

pip install signalwire-agents[search-nlp]

python -m spacy download en_core_web_sm

Performance Note: Advanced NLP features provide significantly better query understanding, synonym expansion, and search relevance, but are 2-3x slower than basic search. Only recommended if you have sufficient CPU power and can tolerate longer response times.

NLP Backend Control: You can choose which NLP backend to use: - NLTK (default): Fast processing, good for most use cases - spaCy: Better quality but slower, requires model download

All Search Features (~700MB) 🔗 ↑ TOC

pip install signalwire-agents[search-all]

python -m spacy download en_core_web_sm

Performance Note: This includes advanced NLP features which improve search quality but increase response times.

Minimal Installation (Base SDK only) 🔗 ↑ TOC

pip install signalwire-agents

Search functionality will show helpful error messages when dependencies are missing.

Quick Start 🔗 ↑ TOC

1. Install Dependencies 🔗 ↑ TOC

pip install signalwire-agents[search-full]

2. Build a Search Index 🔗 ↑ TOC

# Build from the comprehensive concepts guide
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch

# Build from multiple individual files
sw-search README.md docs/agent_guide.md docs/architecture.md --output knowledge.swsearch

# Build from mixed sources (files and directories)
sw-search docs/signalwire_agents_concepts_guide.md examples --file-types md,py --output comprehensive.swsearch

# Build from a directory (traditional approach)
sw-search docs --output docs.swsearch

# Include specific file types
sw-search docs --file-types md,txt,py

# Exclude patterns
sw-search docs --exclude "**/test/**,**/__pycache__/**"

3. Use in Your Agent 🔗 ↑ TOC

from signalwire_agents import AgentBase

class MyAgent(AgentBase):
    def __init__(self):
        super().__init__()

        # Add search capability using the concepts guide
        self.add_skill("native_vector_search", {
            "tool_name": "search_docs",
            "description": "Search the comprehensive SDK concepts guide for information",
            "index_file": "concepts.swsearch",
            "count": 5
        })

agent = MyAgent()
agent.serve()

4. Test the Search 🔗 ↑ TOC

Ask your agent: "How do I create a new agent?" and it will search the comprehensive concepts guide to provide detailed answers.

Building Search Indexes 🔗 ↑ TOC

Search indexes are SQLite databases with the .swsearch extension that contain processed documents, embeddings, and search metadata.

Basic Index Building 🔗 ↑ TOC

# Build index from the comprehensive concepts guide
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch

# Build from multiple individual files
sw-search README.md docs/agent_guide.md docs/architecture.md --output knowledge.swsearch

# Build from mixed sources (files and directories)
sw-search docs/signalwire_agents_concepts_guide.md examples --file-types md,py --output comprehensive.swsearch

# Build from a directory (traditional approach)
sw-search docs --output docs.swsearch

# Include specific file types
sw-search docs --file-types md,txt,py

# Exclude patterns
sw-search docs --exclude "**/test/**,**/__pycache__/**"

Advanced Index Building 🔗 ↑ TOC

# Full configuration example with multiple sources
sw-search docs/signalwire_agents_concepts_guide.md ./examples README.md \
    --output ./knowledge.swsearch \
    --chunking-strategy sentence \
    --max-sentences-per-chunk 8 \
    --file-types md,txt,rst,py \
    --exclude "**/test/**,**/__pycache__/**" \
    --model sentence-transformers/all-mpnet-base-v2 \
    --tags documentation,api \
    --verbose

Supported File Types 🔗 ↑ TOC

Format	Extension	Requirements
Markdown	`.md`	Built-in
Text	`.txt`	Built-in
Python	`.py`	Built-in
reStructuredText	`.rst`	Built-in
PDF	`.pdf`	`search-full`
Word Documents	`.docx`	`search-full`
HTML	`.html`	`search-full`
JSON	`.json`	Built-in

Index Structure 🔗 ↑ TOC

Using the Search Skill 🔗 ↑ TOC

Basic Configuration 🔗 ↑ TOC

self.add_skill("native_vector_search", {
    "tool_name": "search_knowledge",
    "description": "Search the knowledge base",
    "index_file": "knowledge.swsearch"
})

Advanced Configuration 🔗 ↑ TOC

NLP Backend Selection 🔗 ↑ TOC

# Fast NLTK processing (default)
self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",
    "nlp_backend": "nltk"  # Fast, good for most use cases
})

# Better quality spaCy processing
self.add_skill("native_vector_search", {
    "tool_name": "search_docs", 
    "index_file": "docs.swsearch",
    "nlp_backend": "spacy"  # Slower but better quality, requires model download
})

Custom Embedding Models 🔗 ↑ TOC

# Use a different embedding model
self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",
    "model": "sentence-transformers/all-MiniLM-L6-v2"  # Smaller, faster model
})

Query Enhancement 🔗 ↑ TOC

The system automatically enhances queries using: - Language detection - POS tagging (with NLP dependencies) - Synonym expansion using WordNet - Keyword extraction - Vector embeddings

Response Customization 🔗 ↑ TOC

self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",

    # Customize responses for voice calls
    "response_prefix": "Based on the documentation, here's what I found:",
    "response_postfix": "Would you like me to search for more specific information?",

    # Custom no-results message
    "no_results_message": "I couldn't find information about '{query}'. Try rephrasing your question.",

    # SWAIG function fillers for natural conversation
    "swaig_fields": {
        "fillers": {
            "en-US": [
                "Let me search the documentation",
                "Checking our knowledge base",
                "Looking that up for you"
            ]
        }
    }
})

Tag-Based Filtering 🔗 ↑ TOC

# Only search documents tagged with specific categories
self.add_skill("native_vector_search", {
    "tool_name": "search_api_docs",
    "index_file": "docs.swsearch", 
    "tags": ["api", "reference"],  # Only search API docs
    "description": "Search API reference documentation"
})

Complete Configuration Example 🔗 ↑ TOC

self.add_skill("native_vector_search", {
    # Tool configuration
    "tool_name": "search_docs",
    "description": "Search SDK documentation for detailed information",

    # Index configuration
    "index_file": "docs.swsearch",
    "build_index": True,  # Auto-build if missing
    "source_dir": "./docs",  # Source for auto-build
    "file_types": ["md", "txt"],

    # Search parameters
    "count": 5,  # Number of results
    "distance_threshold": 0.1,  # Similarity threshold
    "tags": ["documentation"],  # Filter by tags

    # NLP backend selection
    "nlp_backend": "nltk",  # or "spacy" for better quality

    # Response formatting
    "response_prefix": "Based on the documentation:",
    "response_postfix": "Would you like more details?",
    "no_results_message": "No information found for '{query}'",

    # SWAIG configuration
    "swaig_fields": {
        "fillers": {
            "en-US": ["Let me search for that", "Checking the docs"]
        }
    }
})

Multiple Search Instances 🔗 ↑ TOC

# Documentation search with spaCy for better quality
self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",
    "nlp_backend": "spacy",
    "description": "Search SDK documentation"
})

# Code examples search with NLTK for speed
self.add_skill("native_vector_search", {
    "tool_name": "search_examples", 
    "index_file": "examples.swsearch",
    "nlp_backend": "nltk",
    "description": "Search code examples"
})

Local vs Remote Modes 🔗 ↑ TOC

Local Mode (Default) 🔗 ↑ TOC

self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "index_file": "docs.swsearch",  # Local file
    "nlp_backend": "nltk"  # Choose NLP backend
})

Remote Mode 🔗 ↑ TOC

self.add_skill("native_vector_search", {
    "tool_name": "search_docs",
    "remote_url": "http://localhost:8001",  # Search server
    "index_name": "docs",  # Index name on server
    "nlp_backend": "nltk"  # NLP backend for query preprocessing
})

Running a Remote Search Server 🔗 ↑ TOC

python examples/search_server_standalone.py

curl -X POST "http://localhost:8001/search" \
     -H "Content-Type: application/json" \
     -d '{"query": "how to create an agent", "index_name": "docs", "count": 3}'

Automatic Mode Detection 🔗 ↑ TOC

The skill automatically detects which mode to use: - If remote_url is provided → Remote mode - If index_file is provided → Local mode - Remote mode takes priority if both are specified

Advanced Configuration 🔗 ↑ TOC

Custom Embedding Models 🔗 ↑ TOC

CLI Reference 🔗 ↑ TOC

sw-search Command 🔗 ↑ TOC

sw-search <source_dir> [options]

validate - Validate Search Index 🔗 ↑ TOC

sw-search validate <index_file> [--verbose]

search - Search Within Index 🔗 ↑ TOC

sw-search search <index_file> <query> [options]

Search within an existing .swsearch index file. This is useful for: - Testing search quality and relevance - Exploring index contents - Debugging search results - Scripting and automation

# Build from the comprehensive concepts guide
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch

# Build from multiple sources (files and directories)
sw-search docs/signalwire_agents_concepts_guide.md examples README.md \
    --output comprehensive.swsearch \
    --file-types md,py,txt \
    --verbose

# Traditional directory-based approach
sw-search ./documentation \
    --output knowledge.swsearch \
    --chunking-strategy sentence \
    --max-sentences-per-chunk 8 \
    --file-types md,rst,txt \
    --exclude "**/drafts/**" \
    --tags documentation,help \
    --verbose

# Validate an existing index
sw-search validate concepts.swsearch --verbose

# Search within an index
sw-search search concepts.swsearch "how to create an agent"
sw-search search concepts.swsearch "API reference" --count 3 --verbose
sw-search search concepts.swsearch "configuration" --tags documentation --json

# Use different NLP backends
sw-search search concepts.swsearch "deployment options" --nlp-backend nltk  # Fast
sw-search search concepts.swsearch "deployment options" --nlp-backend spacy  # Better quality

# Advanced search with filtering
sw-search search concepts.swsearch "deployment options" \
    --count 10 \
    --distance-threshold 0.1 \
    --tags "deployment,production" \
    --nlp-backend spacy \
    --verbose

# JSON output for scripting
sw-search search concepts.swsearch "error handling" --json | jq '.results[0].content'

# Build multiple specialized indexes
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch
sw-search examples --output examples.swsearch --file-types py,md

Index Validation 🔗 ↑ TOC

# Validate an existing index
python -c "
from signalwire_agents.search import SearchEngine
engine = SearchEngine('docs.swsearch')
print(f'Index stats: {engine.get_stats()}')
"

API Reference 🔗 ↑ TOC

SearchEngine Class 🔗 ↑ TOC

from signalwire_agents.search import SearchEngine

# Load an index
engine = SearchEngine("docs.swsearch")

# Perform search
results = engine.search(
    query_vector=[...],  # Optional: pre-computed query vector
    enhanced_text="search query",  # Enhanced query text
    count=5,  # Number of results
    distance_threshold=0.0,  # Minimum similarity score
    tags=["documentation"]  # Filter by tags
)

# Get index statistics
stats = engine.get_stats()
print(f"Total chunks: {stats['total_chunks']}")
print(f"Total files: {stats['total_files']}")

IndexBuilder Class 🔗 ↑ TOC

from signalwire_agents.search import IndexBuilder

# Create index builder
builder = IndexBuilder(
    model_name="sentence-transformers/all-mpnet-base-v2",
    chunk_size=500,
    chunk_overlap=50,
    verbose=True
)

# Build index
builder.build_index(
    source_dir="./docs",
    output_file="docs.swsearch",
    file_types=["md", "txt"],
    exclude_patterns=["**/test/**"],
    tags=["documentation"]
)

Local Search System 🔗 ↑ TOC

Table of Contents 🔗 ↑ TOC

Overview 🔗 ↑ TOC

Architecture 🔗 ↑ TOC

Installation Options 🔗 ↑ TOC

Basic Search (~500MB) 🔗 ↑ TOC

Full Document Processing (~600MB) 🔗 ↑ TOC

Advanced NLP Features (~700MB) 🔗 ↑ TOC

All Search Features (~700MB) 🔗 ↑ TOC

Minimal Installation (Base SDK only) 🔗 ↑ TOC

Quick Start 🔗 ↑ TOC

1. Install Dependencies 🔗 ↑ TOC

2. Build a Search Index 🔗 ↑ TOC

3. Use in Your Agent 🔗 ↑ TOC

4. Test the Search 🔗 ↑ TOC

Building Search Indexes 🔗 ↑ TOC

Basic Index Building 🔗 ↑ TOC

Advanced Index Building 🔗 ↑ TOC

Supported File Types 🔗 ↑ TOC

Index Structure 🔗 ↑ TOC

Using the Search Skill 🔗 ↑ TOC

Basic Configuration 🔗 ↑ TOC

Advanced Configuration 🔗 ↑ TOC

NLP Backend Selection 🔗 ↑ TOC

Custom Embedding Models 🔗 ↑ TOC

Query Enhancement 🔗 ↑ TOC

Response Customization 🔗 ↑ TOC

Tag-Based Filtering 🔗 ↑ TOC

Complete Configuration Example 🔗 ↑ TOC

Multiple Search Instances 🔗 ↑ TOC

Local vs Remote Modes 🔗 ↑ TOC

Local Mode (Default) 🔗 ↑ TOC

Remote Mode 🔗 ↑ TOC

Running a Remote Search Server 🔗 ↑ TOC

Automatic Mode Detection 🔗 ↑ TOC

Advanced Configuration 🔗 ↑ TOC

Custom Embedding Models 🔗 ↑ TOC

CLI Reference 🔗 ↑ TOC

sw-search Command 🔗 ↑ TOC

validate - Validate Search Index 🔗 ↑ TOC

search - Search Within Index 🔗 ↑ TOC

Index Validation 🔗 ↑ TOC

API Reference 🔗 ↑ TOC

SearchEngine Class 🔗 ↑ TOC

IndexBuilder Class 🔗 ↑ TOC