Book

Practical Deep Learning for Cloud, Mobile, and Edge

by Aniruddha Kulkarni and Sridhar Swaminathan

The central thesis of "Practical Deep Learning for Cloud, Mobile, and Edge" is that efficient and effective deep learning deployment requires specialized techniques and architectures tailored to the unique constraints of edge devices, mobile platforms, and cloud environments. The book argues that a one-size-fits-all approach is insufficient and that understanding the trade-offs between model complexity, computational resources, and performance is crucial for successful implementation.

Readers will learn to design, optimize, and deploy deep learning models across diverse hardware and software stacks. Key takeaways include methods for model compression, quantization, efficient inference engines, and strategies for managing distributed learning and data privacy in resource-limited settings. The book equips practitioners with the knowledge to build performant AI applications that run closer to the data source, enabling real-time processing and reduced latency.

Full text isn't indexed yet — this overview draws on general knowledge of the book and its metadata, and chat works the same way.

Key concepts

  • Model QuantizationReducing the precision of model weights and activations to decrease memory footprint and speed up inference.
  • Knowledge DistillationTraining a smaller, more efficient model to mimic the behavior of a larger, more complex model.
  • ONNX RuntimeAn open-source inference engine designed for high performance across various hardware accelerators.
  • Federated LearningTraining machine learning models on decentralized data residing on edge devices without direct data sharing.
  • TensorRTNVIDIA's SDK for high-performance deep learning inference, optimizing models for NVIDIA GPUs.