Prochaine édition du SPOC Introduction to HPC for decision-makers (SMEs) du 17 novembre au 21 décembre

Prochaine session deeptech Scikit-learn, la boîte à outils de l’apprentissage automatique débutantspécial PME le 18 novembre

Prochaine session deeptech Scikit-learn, la boîte à outils de l’apprentissage automatique avancé les 20 et 21 novembre

Prochaine session deeptech SOFA, le moteur de simulation multiphysique débutant et avancé le 24 novembre en présentiel à la SOFA Week

Prochaine session état de l’art Cyberattaques dans les réseaux sans fils : quel impact, quels enjeux ? le 25 novembre

Toutes nos formations
apprentissage automatique génie logiciel Intelligence artificielle

skrub like a pro: clean, prepare, and transform your data faster

 Module deeptech 
 NEW! 
skrub is a Python package that bridges between dataframe libraries like Pandas, and machine-learning libraries such as scikit-learn. This course shows how skrub simplifies going from raw data to machine learning models by introducing the main features of the library. ©Inria / Photo C. Morel

Session:

Aucune session disponible actuellement.

Contactez-nous !

Objectifs

  • Understand the main features of skrub and how it can slot in a typical pipeline for machine-learning algorithms.
  • Learn the use cases for skrub transformers, and how they can simplify certain data preparation tasks.
  • Apply and combine the objects provided by the skrub library to various scenarios.

Pré-requis

  • Basic Python programming.
  • Basic Pandas and scikit-learn knowledge.
  • Experience with Jupyter notebooks is helpful, but not required.
  • Scikit-learn beginner course is a the pre-requisite for this course (for people who do not have a basic knowledge of scikit-learn).

Programme

This introductory course will cover the main features of skrub by building a full machine-learning pipeline from data exploration to model training. It will describe different use cases where the skrub API can be used to lessen the load on the user by simplifying common data preparation operations. The course will include a high level overview of the library with practical explanations and examples, time slots will be dedicated to exercise the notions learned.

SKILLS YOU’LL GAIN:
  • Explore and diagnose tabular data with TableReport.
  • Clean and engineer features using skrub transformers.
  • Build end-to-end scikit-learn pipelines with skrub and compare performance to baseline setups.

Intervenant(s)

  • Riccardo Cappuzzo

    Riccardo Cappuzzo

    Riccardo Cappuzzo holds a dual master’s degree: one in Computer Systems Security (Telecom Paris, 2018) and another in Communications and Computer Networks Engineering (Politecnico di Torino). He earned his PhD in Computer Science from Sorbonne Université, where his research focused on automated methods for cleaning tabular data.
    Currently, he serves as the lead developer of the skrub Python library and is a member of the SODA Team at Inria. His work involves developing new features for the library and promoting its adoption through public outreach.

    ©Coll.privée

Les prochaines sessions

1

Target audience

IT developers, engineers, data scientists and data analysts.

Teaching methods

  • The course combines theoretical sections with practical exercises.
  • Participants will test their understanding through quizzes and corrected assignments.
  • All course material and resources will be provided to support independent learning after the course.

Practical information

  • Launch Offer:  €650 per participant.
  • Free training for beta testers of December 16, 2025

    • Group discounts: 10% off for groups of 5 to 9 participants; 20% off for groups of 10 or more
    • Partnership with the Aktantis Cluster: members of the Aktantis cluster benefit from a special rate of €500 per participant.
    • Duration: 1 day (6 hours — from 9:00 a.m. to 12:00 p.m. and from 2:00 p.m. to 5:00 p.m.)
    • Format: Online (remote session)
    • Group size: Up to 12 participants
    • Language: English
    • Private sessions: Private training sessions can be organized for companies with 10 or more participants.

     

Program P16

This training was developed with the support of the Program P16 (lead by Inria), which aims at strengthening digital sovereignty of France and Europe in the field of AI by developing open, interoperable software libraries covering the full data-cycle.