Introduction to Koalas: pandas API on Apache Spark and MLflow


Time: 3-4:30PM EST

Workshop Leaders:

Rexwell Minnis, Director of Software Engineering, Capital One

Andrew Gertsog, Sr Lead ML Engineer, Capital One

Pramod Lahoti, Master Software Engineer, Capital One

Arunadevi Inamdar, Senior Data Engineer, Capital One

This workshop is intended to demystify Machine Learning using Apache Spark. The biggest barrier people face is the distributed nature of Spark that scares them away. We’ll show that there’s nothing to be afraid of and the transition to machine learning at scale can be pretty smooth through the set of well known APIs that are considered to be industry standards. Koalas enables non Scala/Java data engineers/scientists to be productive very quickly. Leverage Big Data capabilities and Machine learning features with spark gives data scientists capability to process algorithms with larger dataset it gives more real time processing of data giving real time analysis.


All workshop attendees please create a Databricks Community edition account before attending the workshop. 


This session is sponsored by Capital One.

There is no need to register for this individual session, to register for Datapalooza go here.