Skip to content

Configuration Guide

Langvio can be configured through YAML files or programmatically to customize models, performance, and output settings.

Basic Configuration

Using Different Models

import langvio

# Use specific models
pipeline = langvio.create_pipeline(
    llm_name="gpt-4",           # OpenAI GPT-4
    vision_name="yoloe_large"   # YOLOe large model
)

# Use configuration file
pipeline = langvio.create_pipeline(config_path="my_config.yaml")

Available Models

Vision Models: - yolo_world_v2_s - YOLO-World v2 small (fastest, good accuracy) - yolo_world_v2_m - YOLO-World v2 medium (balanced speed/accuracy, recommended default) - yolo_world_v2_l - YOLO-World v2 large (better accuracy) - yolo_world_v2x - YOLO-World v2 extra-large (best accuracy, slower) - yolo11n - YOLO11 nano (alias to YOLO-World v2 small, fastest) - yolo - YOLO11 alias (alias to YOLO-World v2 small) - yoloe - YOLOe alias (alias to YOLO-World v2 small) - yoloe_medium - YOLOe medium (alias to YOLO-World v2 medium) - yoloe_large - YOLOe large (alias to YOLO-World v2 large)

Note: All vision models use YOLO-World v2 under the hood. The aliases (yolo11n, yoloe, etc.) map to YOLO-World v2 models for flexible object detection without predefined classes.

Language Models: - gpt-4o-mini - OpenAI GPT-4o Mini (default, fast, cost-effective) - gpt-3 or gpt-3.5 - OpenAI GPT-3.5 Turbo (fast, cost-effective) - gpt-4 - OpenAI GPT-4.1 Mini (best reasoning) - gpt-4.1-mini - OpenAI GPT-4.1 Mini (latest model) - gpt-4.1-nano - OpenAI GPT-4.1 Nano (smallest, fastest) - gemini - Google Gemini 2.0 Flash (free tier available)

Configuration File

Create a config.yaml file:

# Language Model Settings
llm:
  default: "gpt-4o-mini"  # Default model
  models:
    gemini:
      model_name: "gemini-2.0-flash"
      model_kwargs:
        temperature: 0.2
        max_tokens: 1024

    gpt-4o-mini:
      model_name: "gpt-4o-mini"
      model_kwargs:
        temperature: 0.1
        max_tokens: 2048

    gpt-4:
      model_name: "gpt-4.1-mini"
      model_kwargs:
        temperature: 0.1
        max_tokens: 2048

# Vision Model Settings
vision:
  default: "yolo11n"  # Fastest default, or use "yolo_world_v2_m" for balanced
  models:
    yolo_world_v2_m:
      type: "yolo_world"
      model_name: "yolov8m-worldv2"
      confidence: 0.45
      track_thresh: 0.3
      track_buffer: 70
      match_thresh: 0.6

    yolo_world_v2_s:
      type: "yolo_world"
      model_name: "yolov8s-worldv2"
      confidence: 0.5
      track_thresh: 0.5
      track_buffer: 30
      match_thresh: 0.8

    yolo11n:  # Alias for YOLO-World v2 small
      type: "yolo"
      model_path: "yolo11n.pt"
      confidence: 0.8
      model_type: "yolo"

# Output Settings
media:
  output_dir: "./output"
  temp_dir: "./temp"
  visualization:
    box_color: [0, 255, 0]      # Green boxes
    text_color: [255, 255, 255]  # White text
    line_thickness: 2
    show_attributes: true
    show_confidence: true

# Logging
logging:
  level: "INFO"
  file: "langvio.log"

Performance Tuning

For Speed (Real-time Applications)

pipeline = langvio.create_pipeline(
    llm_name="gpt-4o-mini",       # Faster LLM
    vision_name="yolo11n"          # Fastest vision model (or yolo_world_v2_s)
)

For Accuracy (Research/Analysis)

pipeline = langvio.create_pipeline(
    llm_name="gpt-4",
    vision_name="yolo_world_v2_l"  # Large model for best accuracy
)

For Cost Optimization

pipeline = langvio.create_pipeline(
    llm_name="gemini",            # Google's free tier
    vision_name="yolo11n"          # Fastest model (or yolo_world_v2_m for balance)
)

Video Processing Settings

Adjust Frame Sampling

# Process every frame (accurate but slow)
result = pipeline.process(query, video_path)

# Or modify in config.yaml
vision:
  models:
    yoloe:
      confidence: 0.3  # Lower = more detections
      sample_rate: 5   # Process every 5th frame

Memory Optimization

vision:
  models:
    yolo_efficient:
      model_path: "yolo11n.pt"
      confidence: 0.4
      max_detections: 50  # Limit detections per frame

Output Customization

Change Visualization Colors

media:
  visualization:
    box_color: [255, 0, 0]      # Red boxes
    highlight_color: [0, 0, 255] # Blue for highlighted objects
    text_color: [255, 255, 255]  # White text
    line_thickness: 3
    show_attributes: true
    show_confidence: false       # Hide confidence scores

Custom Output Directory

pipeline = langvio.create_pipeline()
pipeline.config.config["media"]["output_dir"] = "/custom/output/path"

Environment Variables

Set these in your .env file or environment:

# Required: LLM API Keys
OPENAI_API_KEY=your_openai_key
GOOGLE_API_KEY=your_google_key

# Optional: Performance Settings
CUDA_VISIBLE_DEVICES=0          # Use specific GPU
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

# Optional: Model Settings (override defaults)
LANGVIO_DEFAULT_LLM=gemini
LANGVIO_DEFAULT_VISION=yolo_world_v2_m

Command Line Usage

# Basic usage
langvio --query "Count cars" --media parking.jpg

# With custom config
langvio --query "Find red objects" --media scene.jpg --config my_config.yaml

# Specify models
langvio --query "Analyze video" --media traffic.mp4 --llm gpt-4 --vision yolo_world_v2_l

# Set output directory
langvio --query "Count people" --media crowd.jpg --output ./results

# List available models
langvio --list-models

Advanced Configuration

Custom Model Paths

vision:
  models:
    custom_yolo:
      model_path: "/path/to/custom/model.pt"
      confidence: 0.3
      device: "cuda"  # or "cpu"

Batch Processing Settings

# For processing multiple files
import os
pipeline = langvio.create_pipeline()

# Process all images in a directory
image_dir = "images/"
for filename in os.listdir(image_dir):
    if filename.lower().endswith(('.jpg', '.jpeg', '.png')):
        result = pipeline.process(
            "What's in this image?", 
            os.path.join(image_dir, filename)
        )
        print(f"{filename}: {result['explanation']}")

Integration with Other Systems

# Use in existing applications
class MyVisionAnalyzer:
    def __init__(self):
        self.pipeline = langvio.create_pipeline(
            config_path="production_config.yaml"
        )

    def analyze_security_footage(self, video_path):
        return self.pipeline.process(
            "Detect any unusual activities or security concerns",
            video_path
        )

Configuration Tips

  1. Start with defaults and adjust based on your needs
  2. Use YOLO-World v2 models - they offer flexible object detection without predefined classes
  3. Model selection: Use yolo_world_v2_s for speed, yolo_world_v2_l for accuracy
  4. Gemini is free for personal use, GPT-4 for best results
  5. Lower confidence values detect more objects but may have false positives
  6. Increase line thickness for better visibility in high-resolution images
  7. Disable attributes for faster processing if not needed
  8. Use environment variables for easy configuration without code changes
  9. Check logs - Langvio provides detailed logging for debugging configuration issues