New MEAPs to pair up
CUDA (Compute Unified Device Architecture) provides a powerful parallel programming model AI engineers can use to tap the massive processing power of NVIDIA GPUs. This guide shows you how to work within the CUDA ecosystem, from your first kernel to implementing advanced LLM features like Flash Attention. [Read more] 4 chapters of this MEAP are available now, with more to follow soon!
Structural techniques for efficient models
General purpose LLMs are not optimized for specific domains and business goals. Using techniques like specialized fine-tuning, pruning unnecessary neural components, and knowledge distillation, you can rearchitect your models to cost less, run faster, and deliver more accurate results. This book turns research from the latest AI papers into production-ready practices for domain-specific model optimization. [Read more]
2 chapters of this MEAP are available now, with more to follow soon!
|