When running sw-search
with embedding generation, you may encounter an "Illegal instruction" error that crashes the process.
This error occurs when PyTorch was compiled with newer CPU instruction sets (like AVX2 or AVX-512) that aren't supported by your CPU. This is common on older server hardware.
Check your CPU capabilities:
bash
cat /proc/cpuinfo | grep -E "(model name|flags)" | head -5
Check PyTorch version:
bash
python -c "import torch; print('PyTorch version:', torch.__version__)"
Test model loading:
bash
python -c "from sentence_transformers import SentenceTransformer; model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')"
Set environment variables to disable unsupported instruction sets:
export PYTORCH_DISABLE_AVX2=1
export PYTORCH_DISABLE_AVX512=1
Then run your commands:
PYTORCH_DISABLE_AVX2=1 PYTORCH_DISABLE_AVX512=1 sw-search ./docs --output index.swsearch
Add these to your shell profile (.bashrc
, .zshrc
, etc.):
export PYTORCH_DISABLE_AVX2=1
export PYTORCH_DISABLE_AVX512=1
The sw-search
tool supports four different chunking strategies, each with different trade-offs:
Best for: Most use cases, balanced content quality
sw-search ./docs --chunking-strategy sentence --max-sentences-per-chunk 8
Characteristics:
Results: Medium-sized chunks with complete thoughts and good context.
Best for: Dense content, overlapping context needed
sw-search ./docs --chunking-strategy sliding --chunk-size 100 --overlap-size 20
Characteristics:
Results: Smaller, overlapping chunks that capture all content nuances.
Best for: Well-structured documents with clear paragraph breaks
sw-search ./docs --chunking-strategy paragraph
Characteristics:
Results: Variable-sized chunks that respect document structure.
Best for: PDFs, presentations, or documents with page boundaries
sw-search ./docs --chunking-strategy page
Characteristics:
Results: Page-sized chunks that maintain document flow.
Use Case | Recommended Strategy | Reason |
---|---|---|
General documentation | sentence |
Balanced, readable chunks |
Technical manuals | sliding |
Overlapping context prevents information loss |
Blog posts/articles | paragraph |
Respects natural structure |
PDFs/presentations | page |
Maintains original pagination |
Code documentation | sentence with --split-newlines 2 |
Respects code blocks |
Based on search results for "how to deploy agents":
# Fine-tuned sentence chunking
sw-search ./docs \
--chunking-strategy sentence \
--max-sentences-per-chunk 30 \
--split-newlines 2
# Optimized sliding window
sw-search ./docs \
--chunking-strategy sliding \
--chunk-size 150 \
--overlap-size 30
# Combined with other options
sw-search ./docs \
--chunking-strategy sentence \
--max-sentences-per-chunk 40 \
--file-types md,txt,rst,py \
--exclude "**/test/**" \
--verbose