8 LLM Types Compared: Open vs. Closed Source – State of the Art April 2026

8 LLM Types Compared: Open vs. Closed Source – State of the Art April 2026

AI
Report
Ernst Gamauf
EG

Ernst Gamauf

Managing Partner

12 min read

Claude Opus 4.7, Qwen3.5-Omni, Qwen3.6, GLM-5.1: Which of the 8 LLM types fits which enterprise use case? Strategic comparison with current benchmarks for the DACH market.


AI that thinks. AI that sees. AI that acts. AI that reasons like a brain. The model landscape in 2026 is more complex – and strategically more important – than ever before. Within just three months (February to April 2026), Anthropic, OpenAI, Google, Alibaba, and NVIDIA have released new flagship models. Alibaba's Qwen team alone has established a complete model family with Qwen3.5, Qwen3.5-Omni, and Qwen3.6 that directly challenges closed-source competition across many fronts. At the same time, open source is catching up so fast that choosing the right model has become a genuine business decision.

This article provides a structured overview of the eight core LLM types, their current representatives, and the implications for enterprise AI architectures in the DACH market.


Why LLM Types Are Strategically Relevant

Back in 2023, model selection was simple: ChatGPT or LLaMA. Today there are eight architecturally distinct categories, each solving different problems in different ways. Deploying a Large Reasoning Model for a medical diagnostic process, or a Large Action Model for contract analysis, are fundamentally wrong decisions that affect both performance and cost.

According to a current market report (WhatLLM, October 2025), open-source models already account for 62.8% of all available LLMs and now deliver SOTA performance for around 80% of real-world use cases at a fraction of the cost – on average 7.3× cheaper.


1. GPT – Generative Pre-trained Transformer

The standard that started it all

The GPT type refers to decoder-only Transformer models with autoregressive token prediction and massive pretraining on web data, refined through RLHF and DPO. This architecture is the foundation of virtually all current frontier models.

Closed Source – Current SOTA Models

GPT-5.4 (OpenAI, March 5, 2026) The most capable general-purpose frontier model to date, featuring native Computer Use, a 1-million-token context window via API, and integrated Codex coding capabilities. It achieves 83% on the GDPval benchmark (professional knowledge work across 44 occupational fields), surpassing human experts in many scenarios. Hallucinations have been reduced by 33% compared to GPT-5.2. On March 17, 2026, GPT-5.4 mini and nano followed for sub-agents and high-volume workloads ($0.75/1M input tokens, 400K context).

Claude Opus 4.7 (Anthropic, April 16, 2026) 87.6% on SWE-bench Verified (+6.8 pts vs. Opus 4.6), 64.3% on SWE-bench Pro (industry leader, +10.9 pts), 78.0% on OSWorld-Verified. Vision resolution tripled to 3.75 megapixels (2,576 px), visual acuity up from 54.5% to 98.5%. New xhigh effort level is the default in Claude Code; /ultrareview command for multi-stage code reviews. Pricing unchanged at $5/$25 per million tokens. Note: new tokenizer generates 1.0–1.35× more tokens. Terminal-Bench 2.0 slightly down (69.4% vs. GPT-5.4 at 75.1%).

Claude Sonnet 4.6 (Anthropic, February 17, 2026) Achieves 79.6% on SWE-bench Verified – just 1.2 percentage points behind Opus 4.6, at one-fifth of the price ($3/$15 per million tokens). Developers preferred it in 59% of coding sessions over the previous Opus 4.5. Effectively the best price-performance ratio on the market.

Gemini 3 Pro (Google) 1M-token context, 100% on AIME 2025 (with code execution), 80.6% on SWE-bench Verified.

Open Source – Current SOTA Models

Model Parameters License Strength
Qwen3-235B 235B / 22B active Apache 2.0 AIME 89.2%, Chatbot Arena 1422
Qwen3.6-35B-A3B 35B / 3B active Apache 2.0 73.4% SWE-bench Verified, 92.7% AIME 2026, beats Gemma 4-31B
GPT-OSS-120B 117B / 5.1B active Apache 2.0 OpenAI's first open weights since GPT-2
GLM-5.1 (Zhipu AI) 744B / 40B active MIT SWE-Bench Pro: surpasses Claude Opus 4.6 and GPT-5.4; $3/month in GLM Coding Plan
Mistral Large 3 675B MoE Apache 2.0 92% GPT-5.2 performance, 15% of the cost
DeepSeek-V3.2 671B / 37B active Open Fine-Grained Sparse Attention, $0.07/MTok

Enterprise Recommendation: Claude Opus 4.7 as the default for demanding coding and agent workflows (SWE-bench Pro #1, Vision 3.75MP). Sonnet 4.6 for standard tasks at 5× lower cost. Hybrid routing saves 60–80% of budget at near-identical quality.


2. LRM – Large Reasoning Model

AI that thinks before it answers

LRMs extend standard LLMs with explicit chain-of-thought phases trained through Reinforcement Learning (GRPO/PPO). The model "thinks" – in visible or hidden reasoning traces – before responding. Inference-time scaling rather than parameter scaling is the key concept: more compute at inference time rather than more parameters during training.

Closed Source

  • GPT-5.4 Thinking – Upfront Planning: users can view and correct the reasoning process before answer generation
  • Claude Opus 4.7 Adaptive Thinking + xhigh – 5 effort levels (low/medium/high/xhigh/max); xhigh is the new default in Claude Code; Opus 4.7 at high surpasses Opus 4.6 at max using fewer tokens
  • Gemini 3 Deep Think – 2.5× reasoning improvement over its predecessor, 45.1% on ARC-AGI-2

Open Source

  • DeepSeek-R1 / R1-0528 – 671B MoE, MIT license; AIME score improved from 70% to 87.5%
  • QwQ-32B – Alibaba, 32B, RL-trained, Apache 2.0
  • Sky-T1-32B – UC Berkeley, trained for ~$450, fully open source
  • Qwen3 (Thinking Mode) – Hybrid: thinking and non-thinking modes switchable via toggle

Enterprise Recommendation: For compliance reviews, legal analysis, financial modeling, and medical decision support, LRMs are the first choice. Costs are well manageable through effort level control.


3. MoE – Mixture of Experts

The efficiency revolution of the Transformer era

MoE models activate only a small fraction of all parameters per token: a router network selects 2–8 from thousands of expert sub-networks. DeepSeek-V3, for example, activates only 37 out of 671 billion parameters per forward pass. This architecture has established itself as the de facto standard for all major frontier models in 2025/26.

Open Source – Current SOTA Models

Model Total / Active Params Highlight
DeepSeek-V3.2 671B / 37B Fine-Grained Sparse Attention, 50% efficiency↑
Kimi K2.5 (Moonshot) 1T / 32B HumanEval 99.0%, MATH-500 98.0%
Nemotron 3 Super (NVIDIA) 120B / 12B Hybrid Mamba-Transformer + Latent MoE, 1M Ctx, 5× throughput, agentic-optimized
Gemma 4 26B MoE (Google) 26B / 3.8B active Apache 2.0, #6 Arena AI, 97% of 31B performance, 256K Ctx, ollama-ready
GLM-5.1 (Zhipu AI) 744B / 40B active MIT license, 200K Ctx, SWE-Bench Pro above Claude Opus 4.6 level; $3/month Coding Plan
GLM-4.7 (Z.ai) 355B MoE #1 Open-Source Leaderboard early 2026
Qwen3.5-397B-A17B 397B / 17B Apache 2.0, 8.6–19× higher decode throughput; basis of the Qwen3.5 family
Qwen3.5-122B-A10B 122B / 10B Apache 2.0, Feb 2026; balance of capacity and on-prem efficiency
Qwen3.6-35B-A3B 35B / 3B Apache 2.0, Apr 2026; 73.4% SWE-bench Verified, 262K Ctx, consumer-GPU-capable
Mixtral 8x22B 141B / 39B Apache 2.0, proven in production

Enterprise Recommendation: For co-location deployments and on-premises setups with NVIDIA infrastructure, MoE models are the most cost-efficient option. DeepSeek-V3.2 for batch workloads ($0.07/MTok with cache), Qwen3.5-397B and Qwen3.6-35B-A3B for coding agents. GLM-5.1 (MIT, 744B/40B active) is the most aggressive price-performance attack on closed source in the current cycle – 94.6% of Opus 4.6 coding quality at a fraction of the cost. For high-volume multi-agent pipelines on NVIDIA Blackwell hardware: Nemotron 3 Super – the only open-source model explicitly designed to address context explosion and the thinking-tax effect in agentic workflows.


4. VLM – Vision-Language Model

AI that sees and understands

VLMs combine a vision encoder (typically ViT-based) with an LLM backbone through cross-attention fusion layers. They process text, images, documents, and video within a unified framework. Open-source VLMs have reduced inference costs by up to 60% compared to commercial APIs while maintaining competitive benchmark scores.

Closed Source

  • GPT-5.4 Vision – 81.2% on MMMU-Pro without tools, up to 10.24 megapixels (original resolution), native Computer Use for screenshot-based GUI automation
  • Claude Opus 4.7 – Vision at 3.75MP (3× vs. 4.6), 82.1% visual reasoning without tools, 98.5% visual acuity; pixel-precise coordinates eliminate scaling errors in Computer Use
  • Claude Sonnet 4.6 – 94% accuracy on insurance benchmarks (highest measured value)
  • Gemini 3 Pro – Video-native approach, 1M-token context, Pan & Scan for dynamic resolution

Open Source

  • Gemma 4 31B / 26B MoE (Google) – Natively multimodal (text, image, video up to 60 s), audio in E2B/E4B; 140+ languages; Apache 2.0; ollama run gemma4; 256K Ctx
  • Qwen3.5-Omni (Alibaba, March 30, 2026) – The most advanced fully omnimodal open-weight model to date: processes text, images, audio, and video natively in a single inference call. Thinker-Talker architecture with Hybrid-Attention MoE. 256K context corresponds to >10 hours of audio or ~400 seconds of 720p video. Speech recognition in 113 languages/dialects, speech output in 36 languages. 215 SOTA benchmark results; surpasses Gemini 3.1 Pro on audio benchmarks. Emergent capability: Audio-Visual Vibe Coding – code generation directly from audio/video instructions without text input. Three variants: Plus (flagship), Flash (latency-optimized), Light (edge/on-device). Realtime API with semantic interrupt detection and ARIA technology (Adaptive Rate Interleave Alignment) for natural speech flow control. Apache 2.0.
  • Qwen3.5-VL – Video, images, documents; 200+ languages; Apache 2.0
  • LLaMA 4 Scout / Maverick – Meta, 109B/400B MoE, natively multimodal
  • GLM-4.7V (Z.ai) – Computer Vision + Video Understanding, Open
  • DeepSeek-OCR – Document OCR specialist, up to 20× token compression at 97% accuracy
  • DeepSeek-VL (1.3B) – Smallest VLM with strong reasoning results

Enterprise Recommendation: For insurance documents, medical image analysis, and automated invoice processing (IDP), VLMs are the key building block. Qwen3.5-Omni is the first genuine open-source alternative to proprietary omni models – particularly for voice AI applications, multilingual customer communication (113 ASR languages), and multimodal agents that must process text, image, audio, and video in a single pipeline. Privacy-sensitive deployments: Gemma 4 26B MoE or Qwen3.5-VL locally via Ollama – both Apache 2.0, single-GPU-capable.


5. SRM – Small Reasoning Model

Frontier reasoning in edge format

SRMs are compact reasoning models under ~15 billion parameters, derived from large reasoning models through knowledge distillation. Microsoft Phi-4-mini-reasoning (3.8B) was distilled from DeepSeek-R1 and achieves 88.6% on MATH-500 – nearly on par with significantly larger models. RL fine-tuning on synthetic mathematics data is the decisive training step.

Open Source

Model Params Benchmark Highlight
Phi-4-mini-reasoning (Microsoft) 3.8B MATH-500: 88.6% 128K Ctx, 20+ languages, Apache-like
DeepSeek-R1-Distill-Qwen3-8B 8B AIME: 87.5% Beats Gemini 2.5 Flash! Single-GPU
Gemma 4 E2B / E4B (Google) 2.3B / 4.5B eff. MMLU 85.2% (31B) Apache 2.0, Audio+Image+Text, on-device, 128K Ctx, ollama run gemma4
Qwen3 1.7B–8B 1.7–8B Best in class Apache 2.0, Ollama-ready
SmolLM3-3B (HuggingFace) 3B Beats Llama-3.2-3B Fully transparent (data, methodology)

Closed Source (SRM tier)

  • GPT-5.4 mini / nano – 400K context, $0.75/1M input, 2× faster than GPT-5.4
  • Claude Sonnet 4.6 – Effectively positioned in the SRM price segment ($3/$15) with flagship quality
  • Gemini 3 Flash – 78% SWE-bench Verified, fastest closed-source tier

Enterprise Recommendation: For local AI workstations (ASUS/NVIDIA with Ollama), GDPR-compliant on-device deployments, and offline scenarios in healthcare facilities, SRMs are the first choice. Gemma 4 E4B runs on smartphones, Gemma 4 26B MoE on a consumer GPU – both Apache 2.0. Phi-4-mini-reasoning runs on a single NVIDIA RTX 4090.


6. LAM – Large Action Model

From answer to action

LAMs combine language understanding with an execution layer for real-world actions: calling APIs, filling out forms, controlling software, managing files. The defining characteristic: LAMs learn from human action sequences and can autonomously execute multi-step plans without asking for confirmation at each step.

GPT-5.4 marks a milestone in 2026: it is the first general-purpose frontier model with native Computer Use – 75% on OSWorld-Verified, the benchmark for desktop automation. Claude Opus 4.6 introduces Agent Teams: multiple parallel Claude instances coordinate on complex projects.

A special role is played by NVIDIA Nemotron 3 Super (120B, 12B active, March 11, 2026): multi-agent systems generate up to 15× more tokens than standard chats, as history, tool outputs, and reasoning traces are resent at every turn. Over long tasks, this leads to "context explosion" and "goal drift" – the model gradually loses alignment with the original objective. Nemotron 3 Super directly addresses this "thinking tax" effect through its 1M-token context with linear Mamba scaling, achieving up to 2.2× higher inference throughput than GPT-OSS-120B.

Open Source

  • Nemotron 3 Super (NVIDIA) – 120B / 12B active, Hybrid Mamba-Transformer + Latent MoE, 1M Ctx, 85.6% on PinchBench (agentic benchmark), #1 open model in its class; via build.nvidia.com, OpenRouter and Hugging Face
  • Qwen3-Coder-Next (Alibaba, Feb 2026) – 80B MoE / 3B active; specialized coding-agent model for multi-file refactoring, repo-level tasks and autonomous debugging. 70.6–71.3% on SWE-bench; 262K Ctx; $0.12/$0.75 per million tokens via API. Requires locally at least 46 GB RAM (Mac Studio 64GB+ recommended).
  • xLAM (Salesforce) – #2 on Berkeley Function Calling Leaderboard V1
  • OpenHands (All-Hands AI) – Open-source framework for software engineering agents
  • Qwen3-Agent – MCP-ready, tool-use-optimized, Apache 2.0

Closed Source

  • GPT-5.4 + Computer Use – Tool Search API reduces token consumption in multi-tool setups by up to 47%
  • Claude Opus 4.7 Agent Teams – 87.6% SWE-bench Verified, MCP-Atlas industry leader (+9.2 pts vs. GPT-5.4); Task Budgets (Beta) for controlled agent loops
  • Qwen3.6-Plus (Alibaba, April 2, 2026) – Proprietary agentic coding flagship with 1M-token context and always-on chain-of-thought. 78.8% on SWE-bench Verified, 61.6% on Terminal-Bench 2.0. Preserve-Thinking parameter for consistent agent loops. Approximately 12× cheaper than Claude Opus 4.6 ($0.29/$1.65 per million tokens). Via OpenRouter and Alibaba Cloud Bailian. Important: closed source, no on-premises deployment.
  • Google Agentspace – Enterprise integration in Google Workspace

Enterprise Recommendation: LAMs are the most direct path to replacing RPA systems (UiPath, Automation Anywhere) with language-driven agents. MCP (Model Context Protocol, now under the Linux Foundation) is the emerging industry standard for tool and data access. For on-premises multi-agent deployments on NVIDIA DGX or Blackwell hardware: Nemotron 3 Super as a powerful open-source alternative to Claude Opus 4.7 Agent Teams.


7. HRM – Hierarchical Reasoning Model

The paradigm shift: thinking in latent space

The Hierarchical Reasoning Model is the most spectacular architectural innovation of 2025. Developed by Sapient Intelligence (Singapore, July 2025), it is inspired by the hierarchical, multi-timescale processing of the human brain.

The architecture consists of two interdependent recurrent modules:

  • High-level module – slow, abstract planning (corresponds to Kahneman's System 2)
  • Low-level module – fast, detailed computation (System 1)

The key distinction: HRM does not think in token space, but in latent space – no chain-of-thought, no verbalization of the reasoning process, no pretraining, no CoT data required.

Benchmark Results

With just 27 million parameters and approximately 1,000 training examples, Sapient HRM outperforms models such as OpenAI o3-mini-high, DeepSeek-R1 (671B!) and Claude 3.7 Sonnet on specific reasoning benchmarks:

Benchmark HRM (27M) o3-mini-high DeepSeek-R1
ARC-AGI-2 5% 0% 0%
Sudoku-Extreme ~100% 0% 0%
Maze-Hard (30×30) ~100% 0% 0%

Important caveat: HRM is not a generalist model. It cannot converse, generate code, or summarize. It is a specialized reasoning system that must be trained directly for each new task.

Open source on GitHub: github.com/sapientinc/HRM

Enterprise Recommendation: Not yet production-ready as a standalone system. Highly relevant for research collaborations, specialized constraint-satisfaction problems (routing, planning, optimization), and as a foundation for hybrid architectures. Observation horizon: 12–24 months.


8. LCM – Large Concept Model

Beyond the token: thinking in sentences

Meta's Large Concept Model represents the most conceptually radical approach in this comparison: instead of processing text token by token, the LCM operates at the sentence level within the SONAR embedding space.

The three core components:

  1. Concept Encoder – converts input into a semantic embedding space
  2. Core (Inference) – operates on abstract concept representations
  3. Concept Decoder – transforms abstractions back into natural language

The SONAR space supports 200 text languages and 76 languages for audio – without language-specific retraining. A single LCM model is therefore natively multilingual to a degree that token-based models can barely match.

Current Implementation

Meta LCM 7B – Surpasses LLaMA-3.1-8B on multilingual summarization (XLSum); modularly extensible; open source: github.com/facebookresearch/large_concept_model

Meta positions LCM as scientific diversification – explicitly not as a direct competitor to current frontier LLMs, but as a long-term research path away from the token paradigm.

Enterprise Recommendation: For multilingual DACH deployments and scenarios with 20+ languages (European financial institutions, international insurance groups), LCM is a fascinating research candidate. Production deployment: 12–24 month horizon.


Market Landscape: Open vs. Closed Source in April 2026

The balance of power has shifted fundamentally. Three key messages for enterprise AI decision-makers:

1. Performance parity for 80% of use cases

Open-source models such as DeepSeek V3.2, Qwen3-235B, GLM-5.1, and Qwen3.6-35B-A3B offer pricing from $0.07–0.29 per million tokens at quality scores of 56–58 (on a 0–70 scale) – compared to $15–30 for comparable closed-source models. GLM-5.1 (MIT, 744B/40B active) is the most aggressive assault: 94.6% of Claude Opus 4.6 coding performance at a fraction of the cost. For compliance-intensive industries (insurance, banking, healthcare), on-premises deployments of these models are more economically attractive than ever.

2. Closed source retains the lead for the hardest 20% of tasks

GPT-5.4 leads in native computer-use workflows and professional knowledge-work benchmarks. Claude Opus 4.7 dominates in complex reasoning chains and terminal-based coding agents (69.5% on Terminal-Bench 2.0). For these premium use cases, the quality advantage justifies the higher cost.

3. Hybrid multi-model routing as best practice

The strategically optimal architecture in April 2026 is not a single model, but an intelligent routing system:

  • 80–90% of requests → Cost-efficient tier (Sonnet 4.6, GPT-5.4 mini, Qwen3.5-35B-A3B, Qwen3.6-35B-A3B)
  • 10–20% of requests → Escalation to Opus 4.7 or GPT-5.4 for complex tasks
  • Coding agents (cloud) → Qwen3.6-Plus (1M Ctx, $0.29/$1.65, 78.8% SWE-bench) as a cost-efficient alternative to Opus 4.7
  • Agentic pipelines on-prem → Nemotron 3 Super on NVIDIA hardware (eliminates context-explosion costs)
  • Multimodal voice pipelines → Qwen3.5-Omni (113 ASR languages, Apache 2.0, on-prem-capable)
  • Savings: 60–80% of budget at near-identical quality

Conclusion: Strategic Recommendations for the DACH Market

The eight LLM types are not competitors – they are a toolkit. A well-considered enterprise AI architecture combines:

Type Recommended Use Model (Recommendation)
GPT / LRM (Closed) Frontier coding, knowledge work, vision Claude Opus 4.7 / Sonnet 4.6
MoE / SRM (Open) Scalable batch workloads, GDPR, on-prem DeepSeek-V3.2, Qwen3.5-35B-A3B, GLM-5.1, Gemma 4 26B
VLM Omni (Open) Voice AI, multilingual agents, audio-video analysis Qwen3.5-Omni (Apache 2.0, 113 languages)
VLM (Mix) Document and image processing, IDP Opus 4.7 (API) / Gemma 4 / Qwen3.5-VL (local)
SRM Edge / On-Device Mobile, offline, on-device, GDPR Gemma 4 E2B/E4B / Phi-4-mini / Qwen3.5-4B
LAM – Agentic Coding (Open) Repo-level coding, multi-file refactoring Qwen3-Coder-Next / Nemotron 3 Super
LAM – Agentic (Cloud) RPA replacement, multi-agent on-prem, NVIDIA infra Qwen3.6-Plus (API) / xLAM
LAM – Agentic (Closed) Parallel agent teams, Computer Use API Opus 4.7 Agent Teams / GPT-5.4
HRM / LCM Research collaborations, future planning Sapient HRM / Meta LCM (Open)

For companies in the DACH market: now is the right time to define an AI model strategy – not as a one-time decision, but as a living architecture capable of keeping pace with the rapid rate of innovation.


This article was produced by the AI Data Center Practice at GlobalCore Consulting. GlobalCore supports companies in the DACH market in selecting, architecting, and implementing enterprise AI systems – from local workstation deployments to co-location data center strategies.

For an individual LLM architecture analysis, contact us via our contact form.

Share: Back to articles

Related Articles

Ready to transform your business?

Get in touch with our team to discuss how we can help you achieve your goals.