Wednesday, February 15, 2023, 8:00 AM - 11:00 AM  **


***NOTE:  To accommodate a larger audience, this event has moved to a virtual format with a new date/time***


Moving data, transforming data types, taking small samples so they’ll fit in your sandbox – these are all things every data scientist puts up with as routine. And when you’re finished, a data engineer has to build full production pipelines to reproduce all that work at scale. It can be months or a year or more before the company sees benefit from your work. But what if you could leave all the data where it is and analyze it in place?

What if you could jump straight to the meat of the work, and when you’re finished, a single line of code would push it all into production?

In this presentation, you will see how to use Python code familiar to Pandas and SciKit-learn to build data science pipelines at scale, on real world use cases. Examples include, predicting energy consumption of households using smart meters, cell tower predictive maintenance, and more.


  • Modern in-database machine learning – what it is, how it works, why it’s good
  • How to use Python code and a Jupyter notebook inside a database
  • How to manage, train, and evaluate models inside a database
  • What makes your model ready for production and how to get it there