Skip to content

API Reference

Core Functions

create_pipeline(config_path=None, llm_name=None, vision_name=None)

Creates and configures a Langvio pipeline.

Parameters: - config_path (str, optional): Path to YAML configuration file - llm_name (str, optional): LLM processor name ("gpt-3.5", "gpt-4", "gemini") - vision_name (str, optional): Vision processor name ("yolo_world_v2_s", "yolo_world_v2_m", "yolo_world_v2_l", "yolo_world_v2x", "yolo11n", "yolo", "yoloe", "yoloe_medium", "yoloe_large")

Returns: - Pipeline: Configured pipeline object

Example:

# Default pipeline
pipeline = langvio.create_pipeline()

# With specific models
pipeline = langvio.create_pipeline(
    llm_name="gpt-4",
    vision_name="yolo_world_v2_l"
)

# With config file
pipeline = langvio.create_pipeline(config_path="config.yaml")

Pipeline Class

Pipeline.process(query, media_path)

Process a query on an image or video.

Parameters: - query (str): Natural language question about the media - media_path (str): Path to image or video file

Returns: - dict: Analysis results containing: - explanation (str): Natural language answer - output_path (str): Path to annotated media file - detections (dict): Structured detection data - query_params (dict): Parsed query parameters - highlighted_objects (list): Objects highlighted in visualization

Example:

result = pipeline.process(
    "Count all people wearing red",
    "crowd_scene.jpg"
)
print(result['explanation'])
print(result['output_path'])

Pipeline.set_llm_processor(processor_name)

Change the language model processor.

Parameters: - processor_name (str): Name of LLM processor

Example:

pipeline.set_llm_processor("gpt-4")

Pipeline.set_vision_processor(processor_name)

Change the vision model processor.

Parameters: - processor_name (str): Name of vision processor

Example:

pipeline.set_vision_processor("yolo_world_v2_l")

Configuration Class

Config(config_path=None)

Manages configuration settings.

Parameters: - config_path (str, optional): Path to YAML configuration file

Methods:

get_llm_config(model_name=None)

Get LLM model configuration.

get_vision_config(model_name=None)

Get vision model configuration.

get_media_config()

Get media processing configuration.

Example:

from langvio.config import Config

config = Config("my_config.yaml")
llm_settings = config.get_llm_config("gpt-4")

Result Structure

Image Analysis Result

{
    'explanation': 'I found 3 people in the image...',
    'output_path': './output/image_processed.jpg',
    'detections': {
        'objects': [
            {
                'id': 'obj_0',
                'label': 'person',
                'confidence': 0.85,
                'bbox': [100, 150, 200, 400],
                'size': 'medium',
                'color': 'red',
                'position': 'center-left'
            }
        ],
        'summary': {
            'total_objects': 3,
            'by_type': {'person': 3}
        }
    },
    'query_params': {
        'task_type': 'counting',
        'target_objects': ['person'],
        'attributes': [{'attribute': 'color', 'value': 'red'}]
    },
    'highlighted_objects': [...]
}

Video Analysis Result

{
    'explanation': 'Throughout the video, I observed...',
    'output_path': './output/video_processed.mp4',
    'detections': {
        'summary': {
            'video_info': {
                'duration_seconds': 30.5,
                'resolution': '1920x1080',
                'activity_level': 'high_activity'
            },
            'counting_analysis': {
                'total_crossings': 15,
                'objects_entered': 8,
                'objects_exited': 7,
                'by_object_type': {
                    'person': {'entered': 5, 'exited': 4},
                    'car': {'entered': 3, 'exited': 3}
                }
            },
            'speed_analysis': {
                'average_speed_kmh': 25.3,
                'by_object_type': {
                    'car': {'average_speed': 35.2}
                }
            }
        },
        'frame_detections': {
            '0': [{'label': 'person', ...}],
            '5': [{'label': 'car', ...}]
        }
    }
}

Command Line Interface

Basic Usage

langvio --query "QUERY" --media "FILE_PATH"

Options

  • --query, -q: Natural language query (required)
  • --media, -m: Path to media file (required)
  • --config, -c: Configuration file path
  • --llm, -l: LLM processor name
  • --vision, -v: Vision processor name
  • --output, -o: Output directory
  • --log-level: Logging level (DEBUG, INFO, WARNING, ERROR)
  • --list-models: List available models

Examples

# Basic analysis
langvio -q "Count cars" -m parking.jpg

# With specific models
langvio -q "Find red objects" -m scene.jpg -l gpt-4 -v yolo_world_v2_l

# With config file
langvio -q "Analyze traffic" -m traffic.mp4 -c config.yaml

# List available models
langvio --list-models

Supported File Formats

Images

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • BMP (.bmp)
  • TIFF (.tiff)
  • WebP (.webp)

Videos

  • MP4 (.mp4)
  • AVI (.avi)
  • MOV (.mov)
  • MKV (.mkv)
  • WebM (.webm)

Query Types

Object Detection

"What objects are in this image?"
"Identify all items in the scene"
"What can you see in this picture?"

Counting

"How many people are there?"
"Count all vehicles"
"How many red objects do you see?"
"Find all red objects"
"Show me large items"
"Identify objects on the left side"

Spatial Relationships

"What is on the table?"
"What objects are near the car?"
"Describe object positions"

Video Analysis

"Track movement patterns"
"How many people crossed the street?"
"What activities are happening?"
"Measure vehicle speeds"

Verification

"Is there a dog in this image?"
"Are people wearing masks?"
"Is the area crowded?"

Error Handling

Common Errors

Missing API Key:

# Error: LLM processor initialization failed
# Solution: Set OPENAI_API_KEY or GOOGLE_API_KEY in .env file

File Not Found:

# Error: Media file not found
# Solution: Check file path and permissions

Model Download Failed:

# Error: Failed to download YOLO model
# Solution: Check internet connection, ensure sufficient disk space

Out of Memory:

# Error: CUDA out of memory
# Solution: Use smaller model (vision_name="yolo_world_v2_s") or enable CPU mode

Error Recovery

try:
    result = pipeline.process(query, media_path)
except Exception as e:
    print(f"Analysis failed: {e}")
    # Fallback to basic analysis or retry with different settings

Performance Guidelines

For Speed

  • Use vision_name="yolo11n" or vision_name="yolo_world_v2_s" (fastest)
  • Use llm_name="gpt-4o-mini" or llm_name="gpt-3.5" or llm_name="gemini"
  • Lower confidence thresholds
  • Process fewer video frames (higher sample_rate)

For Accuracy

  • Use vision_name="yolo_world_v2_l" (most accurate)
  • Use llm_name="gpt-4" (best reasoning)
  • Higher confidence thresholds
  • Process more video frames (lower sample_rate)

For Cost Optimization

  • Use llm_name="gemini" (free tier available)
  • Use batch processing for multiple files
  • Cache results when possible

Environment Variables

# Required
OPENAI_API_KEY=your_openai_key
GOOGLE_API_KEY=your_google_key

# Optional
CUDA_VISIBLE_DEVICES=0
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
LANGVIO_DEFAULT_LLM=gpt-4o-mini
LANGVIO_DEFAULT_VISION=yolo11n