#DeepSeek

2 posts tagged with "DeepSeek"

Engram: How DeepSeek Added a Second Brain to Their LLM

2026-01-13T11:00:00Z•18 min read

#deep learning #llm architecture #memory #mixture of experts #deepseek #sparse computation

A technical deep dive into DeepSeek's Engram architecture, which introduces conditional memory as a new axis of sparsity for large language models.

Mixture of Experts: The Efficiency Trick Behind Modern AI

2025-12-08•7 min read

#machine learning #moe #efficiency #llm #architecture #mixtral #deepseek #neural networks

Mixtral uses 46.7B parameters but only activates 13B per token. This architectural trick called Mixture of Experts powers Gemini 1.5, DeepSeek V3, and more. Learn how MoE works, its hidden costs, and when to use it.