Prediction Serving

ML-Lifecycle

While much of the focus of machine learning research is on the process of training models (i.e., learning) there are a unique set of challenges around the process of serving and updating those models that is often overlooked. In this lecture we will explore the bigger machine learning life-cycle and discuss the challenges around serving predictions.

Slides

Overview and Next Directions presented by Joseph Gonzalez [pptx, pdf]
Managing the Machine Learning Lifecycle presented by Giulio Zhou [pptx, pdf]
Incrementally Maintaining Classification Using an RDBMS presented by Noah Golmant [pptx, pdf]
The LASER, Clipper, and TensorFlow Prediction Serving Systems presented by Daniel Crankshaw [pptx, pdf]

Reading lists:

Prediction Serving Systems [Daniel Crankshaw]

Deepak Agarwal, Bo Long, Jonathan Traupman, Doris Xin, and Liang Zhang. 2014. LASER: a scalable response prediction platform for online advertising. In Proceedings of the 7th ACM international conference on Web search and data mining (WSDM ‘14). [direct pdf link]

Optional Reading

Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica Clipper: A Low-Latency Online Prediction Serving System. This paper is still under review.
Daniel Crankshaw, Peter Bailis, Joseph E. Gonzalez, Haoyuan Li, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Michael I. Jordan The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox. The Conference on Innovative Database Research (CIDR’2015).
Mert Akdere, Ugur Cetintemel, Matteo Riondato, Eli Upfal, and Stan Zadonic. The Case for Predictive Database Systems: Opportunities and Challenges. The Conference on Innovative Database Research (CIDR’2011).

Managing the ML Lifecycle [Giulio Zhou]

D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young 2014. Machine Learning: The High Interest Credit Card of Technical Debt. SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop) [direct pdf link]

Optional Reading

Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Quiñonero Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising (ADKDD’14). [direct pdf link]
H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘13) [direct pdf link]
D. Sculley, Matthew Eric Otey, Michael Pohl, Bridget Spitznagel, John Hainsworth, and Yunkai Zhou. Detecting adversarial advertisements in the wild. KDD’11 [direct pdf link]

Eager Materialization of Predictions [Noah Golmant]

M. Levent Koc and Christopher Re 2014. Incrementally Maintaining Classification Using an RDBMS Proc. VLDB Endow. 4, 5 (February 2011).[direct pdf link]

Optional Reading

Amol Deshpande and Samuel Madden MauveDB: Supporting Model-based User Views in Database Systems This earlier work introduces the idea of using views to encapsulate models and their predictions in a coherent interface that is composable with other data systems.

Optional Reading on Machine Learning Applications

Cascades for fast predictions: Paul Viola and Michael Jones Rapid Object Detection using a Boosted Cascade of Simple Features CVPR 2001. This landmark paper introduced a simple but efficient and accurate algorithm for face detection which is widely used. I have included this class paper because illustrates the advantages of cascades.
- Viola and Jones inspired later work by David Weiss and Ben Taskar in the design of Structured prediction cascades which address the challenges of graphical model inference.
Speech recognition: Xuedong Huang, James Baker, Raj Reddy A Historical Perspective of Speech Recognition. This survey summarizes the development in speech recognition, an actively deployed area of prediction serving. Unfortunately, while the article does discuss the key developments in speech recognition it only briefly discuss some of the developments in serving speech recognition models.
- Baidu Research Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. This is a fairly comprehensive description of both the model and systems used by Baidu for their English and Mandarin ASR systems. To the best of my knowledge this is the most complete public description of a state-of-the-art ASR system.
- Anuj Kumar, Anuj Tewari, Seth Horrigan, Matthew Kam, Forian Metze, and John Canny Rethinking Speech Recognition on MObile Devices. This paper explores the challenges of speech recognition in the developing world on mobile devices with limited connectivity.
- Hosted speech service Google’s Cloud Speech API. Any papers on this system???
Hardware for Deep Learning Inference:
- GPU-Based Deep Learning Inference: A Performance and Power Analysis. This NVidia whitepaper looks at the performance and power implications of running the AlexNet model on various architectures. A key implication here is the importance of batching as well as the peak achievable throughput of these accelerators.
- Google’s Tensor Processing Unit (TPU):
  - IEEE Spectrum Article: Google Translate Gets a Deep-Learning Upgrade
  - Technical paper on the TPU Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
  - Google’s Tensor Processing Unit explained: this is what the future of computing looks like
  - Google supercharges machine learning tasks with TPU custom chip
Security and Prediction Serving:

Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart Stealing Machine Learning Models via Prediction APIs

Prediction Serving

Slides

Reading lists:

Prediction Serving Systems [Daniel Crankshaw]

Optional Reading

Managing the ML Lifecycle [Giulio Zhou]

Optional Reading

Eager Materialization of Predictions [Noah Golmant]

Optional Reading

Optional Reading on Machine Learning Applications

Questions