API Reference¶

Core Functions¶

`create_pipeline(config_path=None, llm_name=None, vision_name=None)`¶

Creates and configures a Langvio pipeline.

Parameters: - config_path (str, optional): Path to YAML configuration file - llm_name (str, optional): LLM processor name ("gpt-3.5", "gpt-4", "gemini") - vision_name (str, optional): Vision processor name ("yolo_world_v2_s", "yolo_world_v2_m", "yolo_world_v2_l", "yolo_world_v2x", "yolo11n", "yolo", "yoloe", "yoloe_medium", "yoloe_large")

Returns: - Pipeline: Configured pipeline object

Example:

# Default pipeline
pipeline = langvio.create_pipeline()

# With specific models
pipeline = langvio.create_pipeline(
    llm_name="gpt-4",
    vision_name="yolo_world_v2_l"
)

# With config file
pipeline = langvio.create_pipeline(config_path="config.yaml")

Pipeline Class¶

`Pipeline.process(query, media_path)`¶

Process a query on an image or video.

Parameters: - query (str): Natural language question about the media - media_path (str): Path to image or video file

Returns: - dict: Analysis results containing: - explanation (str): Natural language answer - output_path (str): Path to annotated media file - detections (dict): Structured detection data - query_params (dict): Parsed query parameters - highlighted_objects (list): Objects highlighted in visualization

Example:

result = pipeline.process(
    "Count all people wearing red",
    "crowd_scene.jpg"
)
print(result['explanation'])
print(result['output_path'])

`Pipeline.set_llm_processor(processor_name)`¶

Change the language model processor.

Parameters: - processor_name (str): Name of LLM processor

Example:

pipeline.set_llm_processor("gpt-4")

`Pipeline.set_vision_processor(processor_name)`¶

Change the vision model processor.

Parameters: - processor_name (str): Name of vision processor

Example:

pipeline.set_vision_processor("yolo_world_v2_l")

Configuration Class¶

`Config(config_path=None)`¶

Manages configuration settings.

Parameters: - config_path (str, optional): Path to YAML configuration file

Methods:

`get_llm_config(model_name=None)`¶

Get LLM model configuration.

`get_vision_config(model_name=None)`¶

Get vision model configuration.

`get_media_config()`¶

Get media processing configuration.

Example:

from langvio.config import Config

config = Config("my_config.yaml")
llm_settings = config.get_llm_config("gpt-4")

Result Structure¶

Image Analysis Result¶

{
    'explanation': 'I found 3 people in the image...',
    'output_path': './output/image_processed.jpg',
    'detections': {
        'objects': [
            {
                'id': 'obj_0',
                'label': 'person',
                'confidence': 0.85,
                'bbox': [100, 150, 200, 400],
                'size': 'medium',
                'color': 'red',
                'position': 'center-left'
            }
        ],
        'summary': {
            'total_objects': 3,
            'by_type': {'person': 3}
        }
    },
    'query_params': {
        'task_type': 'counting',
        'target_objects': ['person'],
        'attributes': [{'attribute': 'color', 'value': 'red'}]
    },
    'highlighted_objects': [...]
}

Video Analysis Result¶

{
    'explanation': 'Throughout the video, I observed...',
    'output_path': './output/video_processed.mp4',
    'detections': {
        'summary': {
            'video_info': {
                'duration_seconds': 30.5,
                'resolution': '1920x1080',
                'activity_level': 'high_activity'
            },
            'counting_analysis': {
                'total_crossings': 15,
                'objects_entered': 8,
                'objects_exited': 7,
                'by_object_type': {
                    'person': {'entered': 5, 'exited': 4},
                    'car': {'entered': 3, 'exited': 3}
                }
            },
            'speed_analysis': {
                'average_speed_kmh': 25.3,
                'by_object_type': {
                    'car': {'average_speed': 35.2}
                }
            }
        },
        'frame_detections': {
            '0': [{'label': 'person', ...}],
            '5': [{'label': 'car', ...}]
        }
    }
}

Command Line Interface¶

Basic Usage¶

langvio --query "QUERY" --media "FILE_PATH"

Options¶

--query, -q: Natural language query (required)
--media, -m: Path to media file (required)
--config, -c: Configuration file path
--llm, -l: LLM processor name
--vision, -v: Vision processor name
--output, -o: Output directory
--log-level: Logging level (DEBUG, INFO, WARNING, ERROR)
--list-models: List available models

Examples¶

# Basic analysis
langvio -q "Count cars" -m parking.jpg

# With specific models
langvio -q "Find red objects" -m scene.jpg -l gpt-4 -v yolo_world_v2_l

# With config file
langvio -q "Analyze traffic" -m traffic.mp4 -c config.yaml

# List available models
langvio --list-models

Supported File Formats¶

Images¶

JPEG (.jpg, .jpeg)
PNG (.png)
BMP (.bmp)
TIFF (.tiff)
WebP (.webp)

Videos¶

MP4 (.mp4)
AVI (.avi)
MOV (.mov)
MKV (.mkv)
WebM (.webm)

Query Types¶

Object Detection¶

"What objects are in this image?"
"Identify all items in the scene"
"What can you see in this picture?"

Counting¶

"How many people are there?"
"Count all vehicles"
"How many red objects do you see?"

Attribute Search¶

"Find all red objects"
"Show me large items"
"Identify objects on the left side"

Spatial Relationships¶

"What is on the table?"
"What objects are near the car?"
"Describe object positions"

Video Analysis¶

"Track movement patterns"
"How many people crossed the street?"
"What activities are happening?"
"Measure vehicle speeds"

Verification¶

"Is there a dog in this image?"
"Are people wearing masks?"
"Is the area crowded?"

Error Handling¶

Common Errors¶

Missing API Key:

# Error: LLM processor initialization failed
# Solution: Set OPENAI_API_KEY or GOOGLE_API_KEY in .env file

File Not Found:

# Error: Media file not found
# Solution: Check file path and permissions

Model Download Failed:

# Error: Failed to download YOLO model
# Solution: Check internet connection, ensure sufficient disk space

Out of Memory:

# Error: CUDA out of memory
# Solution: Use smaller model (vision_name="yolo_world_v2_s") or enable CPU mode

Error Recovery¶

try:
    result = pipeline.process(query, media_path)
except Exception as e:
    print(f"Analysis failed: {e}")
    # Fallback to basic analysis or retry with different settings

Performance Guidelines¶

For Speed¶

Use vision_name="yolo11n" or vision_name="yolo_world_v2_s" (fastest)
Use llm_name="gpt-4o-mini" or llm_name="gpt-3.5" or llm_name="gemini"
Lower confidence thresholds
Process fewer video frames (higher sample_rate)

For Accuracy¶

Use vision_name="yolo_world_v2_l" (most accurate)
Use llm_name="gpt-4" (best reasoning)
Higher confidence thresholds
Process more video frames (lower sample_rate)

For Cost Optimization¶

Use llm_name="gemini" (free tier available)
Use batch processing for multiple files
Cache results when possible

Environment Variables¶

# Required
OPENAI_API_KEY=your_openai_key
GOOGLE_API_KEY=your_google_key

# Optional
CUDA_VISIBLE_DEVICES=0
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
LANGVIO_DEFAULT_LLM=gpt-4o-mini
LANGVIO_DEFAULT_VISION=yolo11n

API Reference¶

Core Functions¶

create_pipeline(config_path=None, llm_name=None, vision_name=None)¶

Pipeline Class¶

Pipeline.process(query, media_path)¶

Pipeline.set_llm_processor(processor_name)¶

Pipeline.set_vision_processor(processor_name)¶

Configuration Class¶

Config(config_path=None)¶

get_llm_config(model_name=None)¶

get_vision_config(model_name=None)¶

get_media_config()¶

Result Structure¶

Image Analysis Result¶

Video Analysis Result¶

Command Line Interface¶

Basic Usage¶

Options¶

Examples¶

Supported File Formats¶

Images¶

Videos¶

Query Types¶

Object Detection¶

Counting¶

Attribute Search¶

Spatial Relationships¶

Video Analysis¶

Verification¶

Error Handling¶

Common Errors¶

Error Recovery¶

Performance Guidelines¶

For Speed¶

For Accuracy¶

For Cost Optimization¶

Environment Variables¶

`create_pipeline(config_path=None, llm_name=None, vision_name=None)`¶

`Pipeline.process(query, media_path)`¶

`Pipeline.set_llm_processor(processor_name)`¶

`Pipeline.set_vision_processor(processor_name)`¶

`Config(config_path=None)`¶

`get_llm_config(model_name=None)`¶

`get_vision_config(model_name=None)`¶

`get_media_config()`¶