Home
Repositories
Models
Docs
Blog
Pricing
Login
Sign up
Repositories
Models
Blog
Community
Pricing
Search
Docs
LOG IN
SIGN UP
Model Library
Try the latest models and share the outputs with your team - all in one place.
New models added every week.
Copyright © 2026 Oxen Labs, Inc., All Rights Reserved
Careers
Privacy Policy
Terms and Conditions
Model Library
Try the latest models and share the outputs with your team - all in one place.
36 models
. New models added every week.
Search Models
Model Types
Text
Image
Video
Embeddings
Fine-tuning
Fine-tunable
7
Modalities
Image to Image
6
Image to Text
1
Image to Video
2
Text to Embeddings
1
Text to Image
5
Text to Text
20
Text to Video
3
Video to Text
1
Video to Video
5
Developers
Mistral
10
O
Openai
6
M
Meta
4
Google
3
Black_forest_labs
2
Lightricks
2
Qwen
2
A
Alibaba
2
K
Kling
2
Topazlabs
2
B
ByteDance
1
FLUX.2 Klein 4B
Black Forest Labs
FLUX.2 Klein 4B is a compact 4 billion parameter text-to-image diffusion model optimized for fast inference and high-quality image generation.
text-to-image
image-to-image
LTX-2 Pro - 19B
Lightricks
Fine-tunable
Generates high-res 4K@25FPS videos from image+text, with multi-keyframe control, 3D camera logic, and synced audio. (124 chars)
text-to-video
image-to-video
google/nano-banana-pro
Google
Generates high-resolution images from text, excels at character and background consistency, style adaptation, and natural language-based image editing.
image-to-image
FLUX.2 Klein 9B
Black Forest Labs
FLUX.2 Klein 9B is a compact 9 billion parameter text-to-image diffusion model optimized for fast inference and high-quality image generation.
text-to-image
image-to-image
LTX-2 Retake
Lightricks
Multimodal LLM for targeted video editing: regenerate 2-16s segments (video/audio/both) via prompts, preserving motion, lighting, and continuity.
text-to-video
video-to-video
Qwen Image - 2512
Qwen
Fine-tunable
Large vision model excelling in photorealistic human portraits, finer natural textures, superior text rendering (esp. Chinese), and instruction-based image editing.
text-to-image
Qwen-Image-Edit-2511
Qwen
Fine-tunable
Performs advanced image edits with bilingual text support, precise style preservation, multi-image and semantic editing, plus native ControlNet for enhanced control.
image-to-image
Gemini 3 Flash
Google
Fast multimodal model with configurable reasoning, strong agentic workflows, long context, and tool use for interactive chat, coding, and complex tasks.
text-to-text
image-to-text
video-to-text
O
GPT Image 1.5
OpenAI
Diffusion model for high‑fidelity image generation and editing, with strong prompt adherence, preserved composition and lighting, and adjustable quality controls.
text-to-image
A
WAN 2.6 - Video to Video
Alibaba
Generates videos from reference videos, maintaining character consistency, with multi-shot narratives, up to 15s duration, and native audio sync.
video-to-video
K
Kling O1 - Reference to Video
Kling
Multimodal video model for reference-guided synthesis, preserving motion, camera, and styles from videos/images. Supports video-to-video, multi-image conditioning.
image-to-video
K
Kling O1 Edit - Video to Video
Kling
Text-guided video-to-video editing that preserves motion and continuity while enabling character swaps, style changes, motion transfer, and scene transformations.
video-to-video
M
Segment Anything 3 - Video
Meta
video-to-video
Topaz Video Upscaler
Topaz Labs
Professional-grade video upscaling powered by AI, from Topaz Labs.
video-to-video
topazlabs/image-upscale
Topaz Labs
Professional-grade image upscaling powered by AI, from Topaz Labs.
image-to-image
B
Seedream 4.0
ByteDance
Delivers ultra-fast, high-resolution image generation, precise natural-language editing, and consistent multi-image output—ideal for creative, batch, or professional workflows.
text-to-image
image-to-image
O
openai/gpt-oss-20b
OpenAI
Fine-tunable
text-to-text
A
Wan2.2 A14B - Text to Video
Alibaba
Fine-tunable
Delivers high-fidelity text-to-video synthesis at 480p/720p using dual expert models for scene layout and fine motion detail, ideal for creative production.
text-to-video
M
Llama 3.2 3B Instruct
Meta
Fine-tunable
text-to-text
M
Llama 3.1 8B Instruct
Meta
Fine-tunable
text-to-text
mistral-nemo
Mistral AI
State-of-the-art Mistral model trained specifically for code tasks.
text-to-text
open-mixtral-8x7b
Mistral AI
A 7B sparse Mixture-of-Experts (SMoE). Uses 12.9B active parameters out of 45B total.
text-to-text
O
o1-mini
OpenAI
o1-mini is a fast, cost-efficient reasoning model tailored to coding, math, and science use cases. The model has 128K context and an October 2023 knowledge cutoff.
text-to-text
open-mistral-7b
Mistral AI
A 7B transformer model, fast-deployed and easily customisable.
text-to-text
M
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
Meta
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
text-to-text
ministral-8b-latest
Mistral AI
Powerful model for on-device use cases.
text-to-text
codestral-2405
Mistral AI
State-of-the-art Mistral model trained specifically for code tasks.
text-to-text
pixtral-12b
Mistral AI
Version-capable small model.
text-to-text
open-mixtral-8x22b
Mistral AI
Mixtral 8x22B is currently the most performant open model. A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B.
text-to-text
mistral-small-2409
Mistral AI
Cost-efficient, fast, and reliable option for use cases such as translation, summarization, and sentiment analysis.
text-to-text
Gemini 1.5 Flash
Google
Fast, Lightweight Model
text-to-text
mistral-large-2407
Mistral AI
Top-tier reasoning for high-complexity tasks, for your most sophisticated needs.
text-to-text
O
gpt-4o-mini
OpenAI
GPT-4o mini is our most cost-efficient small model that’s smarter and cheaper than GPT-3.5 Turbo, and has vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.
text-to-text
O
gpt-4o
OpenAI
GPT-4o is our most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.
text-to-text
O
o1-preview
OpenAI
o1-preview is our new reasoning model for complex tasks. The model has 128K context and an October 2023 knowledge cutoff.
text-to-text
ministral-3b-latest
Mistral AI
Most efficient edge model.
text-to-text