Meta’s Llama 4 just dropped and it’s a huge step forward for open source LLMs

With three powerful models, native multimodal support, and open access, Llama 4 shows that open source AI can match or even beat top proprietary systems.

Key Takeaways

  • Open source – Download and run anywhere, with no API calls. Total control over data privacy and model customization
  • Powerful – Outperforms ChatGPT 4o on many benchmarks
  • Lightweight – 17B active parameters
  • Cheap – ~$0.40 per 1M input + 1M output tokens vs. $4.38 for ChatGPT-4o
  • Multimodal and multilingual – Supports text, image, and video inputs, with 12 input/output languages

Three Models, Three Strengths

Llama 4 is a suite of three specialized models, each offering unique advantages—but all share some impressive core features:

  • Open source: You can download the architecture and weights, run them locally, and fine-tune them for your specific needs. This has huge implications for data privacy as no data needs to be sent to external APIs.
  • Multimodal: Native support for text, images, and video, with strong cross-modal understanding.
  • Multilingual – Supports 12 languages for both input and output.

Maverick: Built for Performance and Efficiency

Maverick is the most capable Llama 4 model currently available.

  • 1 million token context window – on par with Gemini 2.5 Pro.
  • 17B active parameters with 400B total parameters.
  • Mixture-of-Experts (MoE) model with 128 experts.
  • Runs on high-performance hardware (e.g., a single NVIDIA H100 GPU), with support for BF16 and FP8 precision.
  • Scores 1417 on LMSYS’s LMArena, second only to Gemini 2.5 Pro. It beats all ChatGPT, Claude, and Grok models.

Scout: Built for Accessibility and Memory

Scout is a lightweight but powerful model with an unprecedented context window size.

  • 10 million token context window, 5x the size of the next largest (Gemini 1.5) and the current best in class.
  • 17B active parameters, 109B total parameters.
  • MoE model with 16 experts.
  • Supports 4-bit and 8-bit quantization, great for edge devices or resource-constrained environments.
  • Despite its compact footprint, it handles massive inputs and delivers solid performance.

Scout’s massive context window might raise a fair question: Can a model actually make use of all that space?

The answer is yes. In “needle-in-a-haystack” benchmarks, which test a model’s ability to retrieve relevant information buried in huge contexts, Scout performs exceptionally well across a wide range of context lengths across both video and text.

For context windows, it’s not just the size that counts, it’s how you use it. Scout comes through on both.


Behemoth: Built for Power

Behemoth is Meta’s largest and most powerful Llama 4 model. Still unreleased but already making waves.

  • 288B active parameters and over 2 trillion total parameters.
  • MoE model with 16 experts.
  • Built for scale and maximum performance, details on release are still coming out.

Under the Hood: Mixture-of-Experts and Multimodality

Llama 4 was trained on over 30 trillion tokens, this is more than double the dataset used for Llama 3. This dataset integrates text, images, and videos, giving Llama 4 strong native multimodal capabilities.

The Architecture

  • Alternating dense and Mixture-of-Experts (MoE) layers.
  • In MoE layers, information is routed through one shared expert, and one other expert (1 of 128 in Maverick, 1 of 16 in Scout).
  • Only a fraction of parameters are active during inference, reducing compute and increasing efficiency.
  • Faster training and lower inference latency compared to dense models.

Training

Multimodal pre-training with early fusion: text, image, and video tokens are unified from the start.

Post-training includes:

  • Lightweight supervised fine-tuning (SFT).
  • Online reinforcement learning (RL).
  • Lightweight direct preference optimization (DPO).

Meta found that traditional SFT and DPO can over-constrain the model and limit performance during RL. Their lightweight approach keeps guidance while allowing its RL to shine, this is one of the key reasons for Llama 4’s strong reasoning ability.


How to Use Llama 4 Today

You can try out the Llama 4 in several places:

  • Llama.com – Model download
  • Hugging Face – Download + online inference
  • Meta.ai – Online inference
  • WhatsApp, Facebook Messenger, Instagram DMs – Integrated chat features

Whether you’re building apps or integrating into existing tools, Llama 4 is ready to go.


Final Thoughts

Llama 4 makes a clear case: open source can lead.

With models like Maverick outperforming closed systems, and Scout enabling efficient, affordable deployment, there’s never been a better time to explore what open source AI can do.

Which version will you use?