Democratizing Data Science requires a fundamental rethinking of the way data analytics and model discovery is done. Available tools for analyzing massive data sets and curating machine learning models (e.g., R and Spark) are limited in a number of fundamental ways. First, existing tools require well-trained data scientists to select the appropriate techniques to build models and to evaluate their outcomes. Second, existing tools require heavy data preparation steps (e.g., building indexes, cleaning data form errors) and are often too slow to give interactive feedback to domain experts in the model building process, severely limiting the possible interactions. Third, current tools do not provide adequate analysis of statistical risk factors in the model development.
In this project, we outline our vision for a new breed of systems designed to enable interactive data analytics and machine learning (ML). The main goal is to develop the first system for Qualityaware Interactive Curation of Models, called QuIC-M (pronounced quick-m). Using QuIC-M we will enable domain experts to build models themselves without the need to involve a data scientist.