Tagged

#neural networks

2 articles tagged with #neural networks.

Mixture of Experts: The Efficiency Trick Behind Modern AI

Mixtral uses 46.7B parameters but only activates 13B per token. This architectural trick called Mixture of Experts powers Gemini 1.5, DeepSeek V3, and more. Learn how MoE works, its hidden costs, and when to use it.

#machine-learning #moe #efficiencyDec 8, 2025
7 min read
◆ cited

Why Your LLM Only Uses 10-20% of Its Context Window (And How TITANS Fixes It)

GPT-4's 128K context window? It only uses about 10% effectively. Google's TITANS architecture introduces test-time memory learning that outperforms GPT-4 on long-context tasks with 70x fewer parameters.

#ai #machine-learning #transformersDec 8, 2025
15 min read
◆ cited