AI Models

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It handles complex coding tasks with minimal prompting, provides clear explanations, and introduces enhanced agentic capabilities, making it a powerful coding collaborator and intelligent assistant for all users.

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It handles complex coding tasks with minimal prompting, provides clear explanations, and introduces enhanced agentic capabilities, making it a powerful coding collaborator and intelligent assistant for all users.

GPT-5 mini is a lightweight version of GPT-5. GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It handles complex coding tasks with minimal prompting, provides clear explanations, and introduces enhanced agentic capabilities, making it a powerful coding collaborator and intelligent assistant for all users.

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

GPT-4.1 mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard instruction evals, 35.8% on MultiChallenge, and 84.1% on IFEval. Mini also shows strong coding ability (e.g., 31.6% on Aider’s polyglot diff benchmark) and vision understanding, making it suitable for interactive applications with tight performance constraints.

OpenAI's high-intelligence flagship model for complex, multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo.

GPT-4o with image generation capabilities. It's a powerful model for creating images from text.

OpenAI o4-mini is a compact reasoning model in the o-series, optimized for fast, cost-efficient performance while retaining strong multimodal and agentic capabilities. It supports tool use and demonstrates competitive reasoning and coding performance across benchmarks like AIME (99.5% with Python) and SWE-bench, outperforming its predecessor o3-mini and even approaching o3 in some domains.

OpenAI o3 is a reflective generative pre-trained transformer (GPT) model developed by OpenAI as a successor to OpenAI o1.

gpt-oss-120b is a high-performance, open-weight language model designed for production-grade, general-purpose use cases. It fits on a single H100 GPU, making it accessible without requiring multi-GPU infrastructure. Trained on the Harmony response format, it excels at complex reasoning and supports configurable reasoning effort, full chain-of-thought transparency for easier debugging and trust, and native agentic capabilities for function calling, tool use, and structured outputs.

Claude Sonnet 4

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios.

AdvancedAnthropic

Claude Sonnet 4 Thinking

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%), Sonnet 4 balances capability and computational efficiency, making it suitable for a broad range of applications from routine coding tasks to complex software development projects. Key enhancements include improved autonomous codebase navigation, reduced error rates in agent-driven workflows, and increased reliability in following intricate instructions. Sonnet 4 is optimized for practical everyday use, providing advanced reasoning capabilities while maintaining efficiency and responsiveness in diverse internal and external scenarios.

AdvancedAnthropic

Claude Opus 4.1

Claude Opus 4.1 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation.

AdvancedAnthropic

Claude 3.7 Sonnet

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes.

AdvancedAnthropic

Claude 3.7 Sonnet Thinking

Claude 3.7 Sonnet with thinking mode.

AdvancedAnthropic

Claude 3.5 Haiku

Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions.

AdvancedAnthropic

Gemini 2.5 Pro is Google's state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's first fully hybrid reasoning model that allows developers to toggle thinking capabilities on or off according to their needs, offering enhanced reasoning abilities while maintaining the speed and cost-effectiveness of its predecessor.

Gemini 2.0 Flash

Gemini 2.0 Flash builds on the success of 1.5 Flash, offering improved performance and twice the speed of 1.5 Pro on key benchmarks. It supports multimodal inputs like images, video, and audio, as well as outputs such as generated images, text, and multilingual text-to-speech. Additionally, it can natively integrate with tools like Google Search, execute code, and use third-party functions.

DeepSeek R1 is a cutting-edge AI model developed by DeepSeek that was released as a competitor to OpenAI's o1 model. This model emphasizes strong reasoning capabilities in areas such as complex math, coding, and logic. Designed to compete with leading AI models, it offers both transparency and competitive performance, making it a significant step forward in open-source AI development.

DeepSeek-V3.1 is the latest open-source model from DeepSeek, which has hybrid inference: Think & Non-Think — one model, two modes.

Grok 4 is the latest and greatest flagship model from xAI. Grok 4 displays significant improvements in reasoning, mathematics, coding, world knowledge, and instruction-following tasks.

Grok 3 is the most advanced model from xAI. Grok 3 displays significant improvements in reasoning, mathematics, coding, world knowledge, and instruction-following tasks.

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Mistral Large is a cutting-edge language model developed by Mistral AI, renowned for its advanced reasoning capabilities. It excels in multilingual tasks, code generation, and complex problem-solving, making it ideal for diverse text-based applications.

AdvancedMistral AI

Mistral Medium 3.1

Mistral Medium 3.1 is a cutting-edge language model designed for enterprise use, offering state-of-the-art performance at 8 times lower cost compared to competitors. It excels in professional applications like coding and multimodal understanding, while providing flexible deployment options and seamless integration into enterprise systems.

AdvancedMistral AI

Magistral Medium

The first reasoning model by Mistral AI — excelling in domain-specific, transparent, and multilingual reasoning.

AdvancedMistral AI

Perplexity Sonar

The Perplexity Sonar Online model is a state-of-the-art large language model developed by Perplexity AI. It offers real-time internet access, ensuring up-to-date information retrieval. Known for its cost-efficiency, speed, and enhanced performance, it surpasses previous models in the Sonar family, making it ideal for dynamic and accurate data processing.

AdvancedPerplexity

Kimi K2 is a cutting-edge model from Moonshot AI, known for its advanced reasoning capabilities and tool calling support.

BasicMoonshot AI

Doubao Seed 1.6

Doubao Seed 1.6 is an advanced model from Doubao with enhanced reasoning capabilities and tool calling support.

Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

Qwen2.5-Max is a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes. It achieves competitive performance against the top-tier models, and outcompetes DeepSeek V3 in benchmarks like Arena Hard, LiveBench, LiveCodeBench, GPQA-Diamond.

AdvancedAlibaba

QwQ is an experimental research model developed by the Qwen Team, designed to advance AI reasoning capabilities. This model embodies the spirit of philosophical inquiry, approaching problems with genuine wonder and doubt. QwQ demonstrates impressive analytical abilities, achieving scores of 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, and 50.0% on LiveCodeBench. With its contemplative approach and exceptional performance on complex problems.

AdvancedAlibaba

QVQ-Max, the inaugural official visual reasoning model from the Qwen Team, was released in March 2025. This model builds upon their earlier exploratory work with QVQ-72B-Preview. QVQ-Max is engineered to not only "see" content within images and videos but also to analyze and reason based on this visual data. Furthermore, it can generate solutions for diverse challenges, spanning mathematical problems, everyday scenarios, programming code, and artistic endeavors. Although this marks its first iteration, QVQ-Max showcases considerable promise as a practical visual agent equipped with both strong visual perception and analytical capabilities.

AdvancedAlibaba

GLM-4.5 is the latest flagship foundation model from Zai, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment.

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases.

The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service. It stands out for its multilingual proficiency, particularly in Spanish, Chinese, Japanese, German, and French.

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.

AdvancedMicrosoft

Amazon Nova Pro

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX).