Configuration Guide¶
Langvio can be configured through YAML files or programmatically to customize models, performance, and output settings.
Basic Configuration¶
Using Different Models¶
import langvio
# Use specific models
pipeline = langvio.create_pipeline(
llm_name="gpt-4", # OpenAI GPT-4
vision_name="yoloe_large" # YOLOe large model
)
# Use configuration file
pipeline = langvio.create_pipeline(config_path="my_config.yaml")
Available Models¶
Vision Models:
- yolo_world_v2_s - YOLO-World v2 small (fastest, good accuracy)
- yolo_world_v2_m - YOLO-World v2 medium (balanced speed/accuracy, recommended default)
- yolo_world_v2_l - YOLO-World v2 large (better accuracy)
- yolo_world_v2x - YOLO-World v2 extra-large (best accuracy, slower)
- yolo11n - YOLO11 nano (alias to YOLO-World v2 small, fastest)
- yolo - YOLO11 alias (alias to YOLO-World v2 small)
- yoloe - YOLOe alias (alias to YOLO-World v2 small)
- yoloe_medium - YOLOe medium (alias to YOLO-World v2 medium)
- yoloe_large - YOLOe large (alias to YOLO-World v2 large)
Note: All vision models use YOLO-World v2 under the hood. The aliases (yolo11n, yoloe, etc.) map to YOLO-World v2 models for flexible object detection without predefined classes.
Language Models:
- gpt-4o-mini - OpenAI GPT-4o Mini (default, fast, cost-effective)
- gpt-3 or gpt-3.5 - OpenAI GPT-3.5 Turbo (fast, cost-effective)
- gpt-4 - OpenAI GPT-4.1 Mini (best reasoning)
- gpt-4.1-mini - OpenAI GPT-4.1 Mini (latest model)
- gpt-4.1-nano - OpenAI GPT-4.1 Nano (smallest, fastest)
- gemini - Google Gemini 2.0 Flash (free tier available)
Configuration File¶
Create a config.yaml file:
# Language Model Settings
llm:
default: "gpt-4o-mini" # Default model
models:
gemini:
model_name: "gemini-2.0-flash"
model_kwargs:
temperature: 0.2
max_tokens: 1024
gpt-4o-mini:
model_name: "gpt-4o-mini"
model_kwargs:
temperature: 0.1
max_tokens: 2048
gpt-4:
model_name: "gpt-4.1-mini"
model_kwargs:
temperature: 0.1
max_tokens: 2048
# Vision Model Settings
vision:
default: "yolo11n" # Fastest default, or use "yolo_world_v2_m" for balanced
models:
yolo_world_v2_m:
type: "yolo_world"
model_name: "yolov8m-worldv2"
confidence: 0.45
track_thresh: 0.3
track_buffer: 70
match_thresh: 0.6
yolo_world_v2_s:
type: "yolo_world"
model_name: "yolov8s-worldv2"
confidence: 0.5
track_thresh: 0.5
track_buffer: 30
match_thresh: 0.8
yolo11n: # Alias for YOLO-World v2 small
type: "yolo"
model_path: "yolo11n.pt"
confidence: 0.8
model_type: "yolo"
# Output Settings
media:
output_dir: "./output"
temp_dir: "./temp"
visualization:
box_color: [0, 255, 0] # Green boxes
text_color: [255, 255, 255] # White text
line_thickness: 2
show_attributes: true
show_confidence: true
# Logging
logging:
level: "INFO"
file: "langvio.log"
Performance Tuning¶
For Speed (Real-time Applications)¶
pipeline = langvio.create_pipeline(
llm_name="gpt-4o-mini", # Faster LLM
vision_name="yolo11n" # Fastest vision model (or yolo_world_v2_s)
)
For Accuracy (Research/Analysis)¶
pipeline = langvio.create_pipeline(
llm_name="gpt-4",
vision_name="yolo_world_v2_l" # Large model for best accuracy
)
For Cost Optimization¶
pipeline = langvio.create_pipeline(
llm_name="gemini", # Google's free tier
vision_name="yolo11n" # Fastest model (or yolo_world_v2_m for balance)
)
Video Processing Settings¶
Adjust Frame Sampling¶
# Process every frame (accurate but slow)
result = pipeline.process(query, video_path)
# Or modify in config.yaml
vision:
models:
yoloe:
confidence: 0.3 # Lower = more detections
sample_rate: 5 # Process every 5th frame
Memory Optimization¶
vision:
models:
yolo_efficient:
model_path: "yolo11n.pt"
confidence: 0.4
max_detections: 50 # Limit detections per frame
Output Customization¶
Change Visualization Colors¶
media:
visualization:
box_color: [255, 0, 0] # Red boxes
highlight_color: [0, 0, 255] # Blue for highlighted objects
text_color: [255, 255, 255] # White text
line_thickness: 3
show_attributes: true
show_confidence: false # Hide confidence scores
Custom Output Directory¶
pipeline = langvio.create_pipeline()
pipeline.config.config["media"]["output_dir"] = "/custom/output/path"
Environment Variables¶
Set these in your .env file or environment:
# Required: LLM API Keys
OPENAI_API_KEY=your_openai_key
GOOGLE_API_KEY=your_google_key
# Optional: Performance Settings
CUDA_VISIBLE_DEVICES=0 # Use specific GPU
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
# Optional: Model Settings (override defaults)
LANGVIO_DEFAULT_LLM=gemini
LANGVIO_DEFAULT_VISION=yolo_world_v2_m
Command Line Usage¶
# Basic usage
langvio --query "Count cars" --media parking.jpg
# With custom config
langvio --query "Find red objects" --media scene.jpg --config my_config.yaml
# Specify models
langvio --query "Analyze video" --media traffic.mp4 --llm gpt-4 --vision yolo_world_v2_l
# Set output directory
langvio --query "Count people" --media crowd.jpg --output ./results
# List available models
langvio --list-models
Advanced Configuration¶
Custom Model Paths¶
vision:
models:
custom_yolo:
model_path: "/path/to/custom/model.pt"
confidence: 0.3
device: "cuda" # or "cpu"
Batch Processing Settings¶
# For processing multiple files
import os
pipeline = langvio.create_pipeline()
# Process all images in a directory
image_dir = "images/"
for filename in os.listdir(image_dir):
if filename.lower().endswith(('.jpg', '.jpeg', '.png')):
result = pipeline.process(
"What's in this image?",
os.path.join(image_dir, filename)
)
print(f"{filename}: {result['explanation']}")
Integration with Other Systems¶
# Use in existing applications
class MyVisionAnalyzer:
def __init__(self):
self.pipeline = langvio.create_pipeline(
config_path="production_config.yaml"
)
def analyze_security_footage(self, video_path):
return self.pipeline.process(
"Detect any unusual activities or security concerns",
video_path
)
Configuration Tips¶
- Start with defaults and adjust based on your needs
- Use YOLO-World v2 models - they offer flexible object detection without predefined classes
- Model selection: Use
yolo_world_v2_sfor speed,yolo_world_v2_lfor accuracy - Gemini is free for personal use, GPT-4 for best results
- Lower confidence values detect more objects but may have false positives
- Increase line thickness for better visibility in high-resolution images
- Disable attributes for faster processing if not needed
- Use environment variables for easy configuration without code changes
- Check logs - Langvio provides detailed logging for debugging configuration issues