Access 100+ AI models for image generation, video creation, and more.
Agentic coding, 1M-token context
4K image editing from text
Reference-guided video with audio
Text-to-video up to 1080p
Coding and agents, 1M-token context
Image-to-video up to 1080p
Reference-to-video, up to 9 images
Image-to-video with prompt control, up to 1080p
Aspect-ratio reframe with outpainting, up to 1080p
Text-to-video with prompt control, up to 1080p
Prompt-guided video editing, up to 1080p
High-quality audio-driven video
Lift SDR video into HDR
High-quality image-to-video with audio
Structure-guided video-to-video generation
High-quality text-to-video with audio
Image-to-video with audio (Grok 1.5)
HDR video conversion
Prompt-based video editing up to 1080p
Image-to-video, 1080p, frame control
Multi-reference video, up to 1080p
Text-to-video, 1080p, multi-shot
Efficient MoE model, 1M context
Large MoE for reasoning and coding
Reasoning and coding model, 1M context
Deep reasoning, 1M-token context
4K images with near-perfect text rendering
Prompt-based video editing
Fast, affordable image-to-video
Fast reference-guided video generation
Fast, affordable text-to-video
Animate images into 4k video
Cinematic text-to-video, up to 4k
Open multimodal model, 256K context
Lightweight on-device multimodal model
Compact on-device multimodal model
Closed model for reasoning, coding
Detail-preserving video upscaling
Coding and agentic engineering, 200K context
Fast model for coding, computer use
Consistent depth maps from video
Controllable text/image-to-video generation
Agentic coding model, 1M context
Fast image-to-video with native audio
Fast text-to-video with native audio
Image-to-video with native audio
Audio-driven video with lip-sync
Extend existing video clips
Regenerate video segments via prompts
Text-to-video with native audio
GPT-5.3 instant chat model
Agentic reasoning MoE, 1M context
Pro quality at Flash speed
Flash-speed natural-language image editing
Multimodal model with vision, 256K context
Intelligent visual reasoning image model
Multimodal reasoning over text, audio, video
Balanced coding and agents, 1M context
Text-and-image to image generation
Coding and agents, 1M context
4K cinematic image-to-video
4K reference-driven video
4K cinematic text-to-video
Pro cinematic image-to-video
Pro reference-driven cinematic video
Pro video-to-video editing
Transfer motion from video to image
Versatile image styles by xAI
Edit images with Grok Imagine
Edit videos with Grok Imagine
Image-to-video with audio by xAI
Typography-focused image and design generation
High-fidelity text-to-image with style references
Compact, fast text-to-image model
Targeted video segment editing
Fast HD image-to-video generation
Open MoE multimodal agentic model
Coding, reasoning, and agentic tasks
Photorealistic portraits and natural scenes
Improved natural-language image editing
Low-latency multimodal model, 1M context
Cinematic image-to-video with audio
Cinematic text-to-video with native audio
True-color precision rendering
Image-to-video with audio and lip-sync
Reference video with character consistency
Reasoning model, 400K context
GPT-5.2 chat model for ChatGPT
Cinematic image-to-video generation
Reference-driven cinematic video
Advanced video editing
Text-to-image and editing, up to 4K
Fast photorealistic bilingual image generation
Open text-to-image with multi-reference
Tunable open text-to-image model
Professional FLUX.2 image generation
Software engineering and agentic workflows
Studio-quality 4K image generation
Zero-shot image segmentation
Zero-shot video object segmentation
Adaptive reasoning and instruction following
Professional AI image upscaling
Professional AI video upscaling
Text/image-to-video, audio, narrative control
Fast image-to-video with audio, 4K
Cost-efficient image-to-video by Google
Realistic video generation with audio
Fast cinematic image-to-video
Multi-image natural-language editing
On-device photorealistic image generation
Balanced coding, agents, computer use
Photorealistic 4K image generation
Natural-language image editing
Vision-language model, 256K context
Vision-language model, long context
Vision-language model for text and images
Open text-to-image with editing tools
Multi-model routing, 400K context
Low-cost GPT-5 for real-time apps
Ultra-low-latency GPT-5 model
Refined reasoning and coding model
Open MoE reasoning model
Open MoE, runs on consumer hardware
Text-to-image with strong text rendering
High-fidelity open text-to-video, 720p
Efficient 720p video on consumer GPUs
Fast multimodal reasoning, 1M context
In-context image editing model
Reasoning-heavy coding and analysis
Balanced everyday reasoning and coding
Text/image-to-video with native audio
Multimodal reasoning and coding, 1M context
Open model, 256K context
Model for edge and offline use
Reasoning for coding, math, science
Fast, low-cost multimodal reasoning
Coding and chat, 1M-token context
Lower-cost GPT-4.1, 1M context
Low-latency GPT-4.1 for classification
Multimodal model, 128K context
Open multimodal model, 128K context
Multi-step web research and synthesis
Multi-step reasoning over web search
Multilingual on-device chat model
Text-to-video with bilingual text
Fast web-grounded search, built on Llama
Advanced web-grounded search, 200K context
Efficient model for edge deployment
Vision-language model, 128K context
Fast, low-cost multimodal model
Fine-detail video upscaling
Text embeddings for search, RAG
Multimodal model for text, audio, vision
MoE model for code and multilingual
Long-context reasoning model, 128K
High-quality text embeddings
Efficient text embeddings
Versatile video upscaling
Efficient open MoE model
Face-focused video upscaling
General-purpose open model
Model for on-device and edge
Reasoning for science, math, code
Fast reasoning for STEM tasks
Efficient multilingual model
Lightweight text-to-video on consumer GPUs