The SignalWire Agents SDK includes a powerful local search system that provides DataSphere-compatible search functionality without external dependencies. This system uses advanced query preprocessing, local embeddings, and hybrid search techniques to enable agents to search through document collections offline.
The local search system provides:
.swsearch
index filesβββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Documents βββββΆβ Index Builder βββββΆβ .swsearch DB β
β (MD, PDF, etc.) β β β β β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Agent βββββΆβ Search Skill βββββΆβ Search Engine β
β β β β β β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
The search system uses optional dependencies to keep the base SDK lightweight. Choose the installation option that fits your needs:
pip install signalwire-agents[search]
Includes:
pip install signalwire-agents[search-full]
Adds:
pip install signalwire-agents[search-nlp]
Adds:
β οΈ Additional Setup Required:
python -m spacy download en_core_web_sm
Performance Note: Advanced NLP features provide significantly better query understanding, synonym expansion, and search relevance, but are 2-3x slower than basic search. Only recommended if you have sufficient CPU power and can tolerate longer response times.
NLP Backend Control: You can choose which NLP backend to use: - NLTK (default): Fast processing, good for most use cases - spaCy: Better quality but slower, requires model download
Configure via the nlp_backend
parameter in your search skill.
pip install signalwire-agents[search-all]
Includes everything above
β οΈ Additional Setup Required:
python -m spacy download en_core_web_sm
Performance Note: This includes advanced NLP features which improve search quality but increase response times.
pip install signalwire-agents
Search functionality will show helpful error messages when dependencies are missing.
pip install signalwire-agents[search-full]
# Build from the comprehensive concepts guide
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch
# Build from multiple individual files
sw-search README.md docs/agent_guide.md docs/architecture.md --output knowledge.swsearch
# Build from mixed sources (files and directories)
sw-search docs/signalwire_agents_concepts_guide.md examples --file-types md,py --output comprehensive.swsearch
# Build from a directory (traditional approach)
sw-search docs --output docs.swsearch
# Include specific file types
sw-search docs --file-types md,txt,py
# Exclude patterns
sw-search docs --exclude "**/test/**,**/__pycache__/**"
from signalwire_agents import AgentBase
class MyAgent(AgentBase):
def __init__(self):
super().__init__()
# Add search capability using the concepts guide
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"description": "Search the comprehensive SDK concepts guide for information",
"index_file": "concepts.swsearch",
"count": 5
})
agent = MyAgent()
agent.serve()
Ask your agent: "How do I create a new agent?" and it will search the comprehensive concepts guide to provide detailed answers.
Search indexes are SQLite databases with the .swsearch
extension that contain processed documents, embeddings, and search metadata.
# Build index from the comprehensive concepts guide
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch
# Build from multiple individual files
sw-search README.md docs/agent_guide.md docs/architecture.md --output knowledge.swsearch
# Build from mixed sources (files and directories)
sw-search docs/signalwire_agents_concepts_guide.md examples --file-types md,py --output comprehensive.swsearch
# Build from a directory (traditional approach)
sw-search docs --output docs.swsearch
# Include specific file types
sw-search docs --file-types md,txt,py
# Exclude patterns
sw-search docs --exclude "**/test/**,**/__pycache__/**"
# Full configuration example with multiple sources
sw-search docs/signalwire_agents_concepts_guide.md ./examples README.md \
--output ./knowledge.swsearch \
--chunking-strategy sentence \
--max-sentences-per-chunk 8 \
--file-types md,txt,rst,py \
--exclude "**/test/**,**/__pycache__/**" \
--model sentence-transformers/all-mpnet-base-v2 \
--tags documentation,api \
--verbose
Format | Extension | Requirements |
---|---|---|
Markdown | .md |
Built-in |
Text | .txt |
Built-in |
Python | .py |
Built-in |
reStructuredText | .rst |
Built-in |
.pdf |
search-full |
|
Word Documents | .docx |
search-full |
HTML | .html |
search-full |
JSON | .json |
Built-in |
Each .swsearch
file contains:
The native_vector_search
skill provides search functionality to your agents.
self.add_skill("native_vector_search", {
"tool_name": "search_knowledge",
"description": "Search the knowledge base",
"index_file": "knowledge.swsearch"
})
Choose between NLTK (fast) and spaCy (better quality) for query processing:
# Fast NLTK processing (default)
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"index_file": "docs.swsearch",
"nlp_backend": "nltk" # Fast, good for most use cases
})
# Better quality spaCy processing
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"index_file": "docs.swsearch",
"nlp_backend": "spacy" # Slower but better quality, requires model download
})
Performance Comparison:
# Use a different embedding model
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"index_file": "docs.swsearch",
"model": "sentence-transformers/all-MiniLM-L6-v2" # Smaller, faster model
})
The system automatically enhances queries using: - Language detection - POS tagging (with NLP dependencies) - Synonym expansion using WordNet - Keyword extraction - Vector embeddings
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"index_file": "docs.swsearch",
# Customize responses for voice calls
"response_prefix": "Based on the documentation, here's what I found:",
"response_postfix": "Would you like me to search for more specific information?",
# Custom no-results message
"no_results_message": "I couldn't find information about '{query}'. Try rephrasing your question.",
# SWAIG function fillers for natural conversation
"swaig_fields": {
"fillers": {
"en-US": [
"Let me search the documentation",
"Checking our knowledge base",
"Looking that up for you"
]
}
}
})
# Only search documents tagged with specific categories
self.add_skill("native_vector_search", {
"tool_name": "search_api_docs",
"index_file": "docs.swsearch",
"tags": ["api", "reference"], # Only search API docs
"description": "Search API reference documentation"
})
self.add_skill("native_vector_search", {
# Tool configuration
"tool_name": "search_docs",
"description": "Search SDK documentation for detailed information",
# Index configuration
"index_file": "docs.swsearch",
"build_index": True, # Auto-build if missing
"source_dir": "./docs", # Source for auto-build
"file_types": ["md", "txt"],
# Search parameters
"count": 5, # Number of results
"distance_threshold": 0.1, # Similarity threshold
"tags": ["documentation"], # Filter by tags
# NLP backend selection
"nlp_backend": "nltk", # or "spacy" for better quality
# Response formatting
"response_prefix": "Based on the documentation:",
"response_postfix": "Would you like more details?",
"no_results_message": "No information found for '{query}'",
# SWAIG configuration
"swaig_fields": {
"fillers": {
"en-US": ["Let me search for that", "Checking the docs"]
}
}
})
You can add multiple search instances for different document collections:
# Documentation search with spaCy for better quality
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"index_file": "docs.swsearch",
"nlp_backend": "spacy",
"description": "Search SDK documentation"
})
# Code examples search with NLTK for speed
self.add_skill("native_vector_search", {
"tool_name": "search_examples",
"index_file": "examples.swsearch",
"nlp_backend": "nltk",
"description": "Search code examples"
})
The search skill supports both local and remote operation modes.
Pros:
Cons:
Configuration:
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"index_file": "docs.swsearch", # Local file
"nlp_backend": "nltk" # Choose NLP backend
})
Pros:
Cons:
Configuration:
self.add_skill("native_vector_search", {
"tool_name": "search_docs",
"remote_url": "http://localhost:8001", # Search server
"index_name": "docs", # Index name on server
"nlp_backend": "nltk" # NLP backend for query preprocessing
})
python examples/search_server_standalone.py
POST /search
- Search the indexesGET /health
- Health check and available indexes POST /reload_index
- Add or reload an index
Test the API:
curl -X POST "http://localhost:8001/search" \
-H "Content-Type: application/json" \
-d '{"query": "how to create an agent", "index_name": "docs", "count": 3}'
The skill automatically detects which mode to use:
- If remote_url
is provided β Remote mode
- If index_file
is provided β Local mode
- Remote mode takes priority if both are specified
sw-search <source_dir> [options]
Arguments:
source_dir
- Directory containing documents to indexOptions:
--output FILE
- Output .swsearch file (default: <source_dir>.swsearch
)--chunk-size SIZE
- Chunk size in characters (default: 500)--chunk-overlap SIZE
- Overlap between chunks (default: 50)--file-types TYPES
- Comma-separated file extensions (default: md,txt,rst)--exclude PATTERNS
- Comma-separated glob patterns to exclude--languages LANGS
- Comma-separated language codes (default: en)--model MODEL
- Embedding model name (default: sentence-transformers/all-mpnet-base-v2)--tags TAGS
- Comma-separated tags to add to all chunks--verbose
- Show detailed progress information--validate
- Validate the created index after buildingSubcommands:
sw-search validate <index_file> [--verbose]
Validates an existing .swsearch index file and shows statistics.
sw-search search <index_file> <query> [options]
Search within an existing .swsearch index file. This is useful for: - Testing search quality and relevance - Exploring index contents - Debugging search results - Scripting and automation
Search Options:
--count COUNT
- Number of results to return (default: 5)--distance-threshold FLOAT
- Minimum similarity score (default: 0.0)--tags TAGS
- Comma-separated tags to filter by--nlp-backend {nltk,spacy}
- NLP backend to use (default: nltk)--verbose
- Show detailed information including index stats--json
- Output results as JSON for scripting--no-content
- Hide content in results (show only metadata)Examples:
# Build from the comprehensive concepts guide
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch
# Build from multiple sources (files and directories)
sw-search docs/signalwire_agents_concepts_guide.md examples README.md \
--output comprehensive.swsearch \
--file-types md,py,txt \
--verbose
# Traditional directory-based approach
sw-search ./documentation \
--output knowledge.swsearch \
--chunking-strategy sentence \
--max-sentences-per-chunk 8 \
--file-types md,rst,txt \
--exclude "**/drafts/**" \
--tags documentation,help \
--verbose
# Validate an existing index
sw-search validate concepts.swsearch --verbose
# Search within an index
sw-search search concepts.swsearch "how to create an agent"
sw-search search concepts.swsearch "API reference" --count 3 --verbose
sw-search search concepts.swsearch "configuration" --tags documentation --json
# Use different NLP backends
sw-search search concepts.swsearch "deployment options" --nlp-backend nltk # Fast
sw-search search concepts.swsearch "deployment options" --nlp-backend spacy # Better quality
# Advanced search with filtering
sw-search search concepts.swsearch "deployment options" \
--count 10 \
--distance-threshold 0.1 \
--tags "deployment,production" \
--nlp-backend spacy \
--verbose
# JSON output for scripting
sw-search search concepts.swsearch "error handling" --json | jq '.results[0].content'
# Build multiple specialized indexes
sw-search docs/signalwire_agents_concepts_guide.md --output concepts.swsearch
sw-search examples --output examples.swsearch --file-types py,md
# Validate an existing index
python -c "
from signalwire_agents.search import SearchEngine
engine = SearchEngine('docs.swsearch')
print(f'Index stats: {engine.get_stats()}')
"
from signalwire_agents.search import SearchEngine
# Load an index
engine = SearchEngine("docs.swsearch")
# Perform search
results = engine.search(
query_vector=[...], # Optional: pre-computed query vector
enhanced_text="search query", # Enhanced query text
count=5, # Number of results
distance_threshold=0.0, # Minimum similarity score
tags=["documentation"] # Filter by tags
)
# Get index statistics
stats = engine.get_stats()
print(f"Total chunks: {stats['total_chunks']}")
print(f"Total files: {stats['total_files']}")
from signalwire_agents.search import IndexBuilder
# Create index builder
builder = IndexBuilder(
model_name="sentence-transformers/all-mpnet-base-v2",
chunk_size=500,
chunk_overlap=50,
verbose=True
)
# Build index
builder.build_index(
source_dir="./docs",
output_file="docs.swsearch",
file_types=["md", "txt"],
exclude_patterns=["**/test/**"],
tags=["documentation"]
)