While much of the focus of machine learning research is on the process of training models (i.e., learning) there are a unique set of challenges around the process of serving and updating those models that is often overlooked. In this lecture we will explore the bigger machine learning life-cycle and discuss the challenges around serving predictions.
Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica Clipper: A Low-Latency Online Prediction Serving System. This paper is still under review.
Daniel Crankshaw, Peter Bailis, Joseph E. Gonzalez, Haoyuan Li, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Michael I. Jordan The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox. The Conference on Innovative Database Research (CIDR’2015).
Mert Akdere, Ugur Cetintemel, Matteo Riondato, Eli Upfal, and Stan Zadonic. The Case for Predictive Database Systems: Opportunities and Challenges. The Conference on Innovative Database Research (CIDR’2011).
Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Quiñonero Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising (ADKDD’14). [direct pdf link]
H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘13) [direct pdf link]
Cascades for fast predictions: Paul Viola and Michael Jones Rapid Object Detection using a Boosted Cascade of Simple Features CVPR 2001. This landmark paper introduced a simple but efficient and accurate algorithm for face detection which is widely used. I have included this class paper because illustrates the advantages of cascades.
Speech recognition: Xuedong Huang, James Baker, Raj Reddy A Historical Perspective of Speech Recognition. This survey summarizes the development in speech recognition, an actively deployed area of prediction serving. Unfortunately, while the article does discuss the key developments in speech recognition it only briefly discuss some of the developments in serving speech recognition models.
Baidu Research Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. This is a fairly comprehensive description of both the model and systems used by Baidu for their English and Mandarin ASR systems. To the best of my knowledge this is the most complete public description of a state-of-the-art ASR system.
Anuj Kumar, Anuj Tewari, Seth Horrigan, Matthew Kam, Forian Metze, and John Canny Rethinking Speech Recognition on MObile Devices. This paper explores the challenges of speech recognition in the developing world on mobile devices with limited connectivity.
Hosted speech service Google’s Cloud Speech API. Any papers on this system???
Hardware for Deep Learning Inference:
GPU-Based Deep Learning Inference: A Performance and Power Analysis. This NVidia whitepaper looks at the performance and power implications of running the AlexNet model on various architectures. A key implication here is the importance of batching as well as the peak achievable throughput of these accelerators.
Google’s Tensor Processing Unit (TPU):
IEEE Spectrum Article: Google Translate Gets a Deep-Learning Upgrade
Technical paper on the TPU Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Security and Prediction Serving:
Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart Stealing Machine Learning Models via Prediction APIs