Prochaine session deeptech Skrub like a pro: clean, prepare and transform your data faster le 23 février – NOUVEAU

Prochaine session executive education Initiation à l’informatique quantique le 25 février – NOUVEAU

Prochaine session état de l’art La cybersécurité des systèmes de contrôle industriel le 10 mars

Prochaine session état de l’art Convergence calcul haute performance et intelligence artificielle le 10 mars – NOUVEAU

Prochaine session deeptech Mapping de données existantes vers des graphes de connaissances RDF le 13 mars – NOUVEAU

Toutes nos formations
apprentissage automatique génie logiciel Intelligence artificielle

skrub like a pro: clean, prepare, and transform your data faster

 Module deeptech   NEW 
skrub is a Python package that bridges between dataframe libraries like Pandas, and machine-learning libraries such as scikit-learn. This course shows how skrub simplifies going from raw data to machine learning models by introducing the main features of the library. ©Inria / Photo C. Morel

Session:

Aucune session disponible actuellement.

Contactez-nous !

Objectifs

  • Understand the main features of skrub and how it can slot in a typical pipeline for machine-learning algorithms.
  • Learn the use cases for skrub transformers, and how they can simplify certain data preparation tasks.
  • Apply and combine the objects provided by the skrub library to various scenarios.

Pré-requis

  • Basic Python programming.
  • Basic Pandas and scikit-learn knowledge.
  • Experience with Jupyter notebooks is helpful, but not required.
  • Scikit-learn beginner course is a the pre-requisite for this course (for people who do not have a basic knowledge of scikit-learn).

Programme

This introductory course will cover the main features of skrub by building a full machine-learning pipeline from data exploration to model training. It will describe different use cases where the skrub API can be used to lessen the load on the user by simplifying common data preparation operations. The course will include a high level overview of the library with practical explanations and examples, time slots will be dedicated to exercise the notions learned.

SKILLS YOU’LL GAIN:
  • Explore and diagnose tabular data with TableReport.
  • Clean and engineer features using skrub transformers.
  • Build end-to-end scikit-learn pipelines with skrub and compare performance to baseline setups.

Intervenant(s)

  • Riccardo Cappuzzo

    Riccardo Cappuzzo

    Riccardo Cappuzzo holds a dual master’s degree: one in Computer Systems Security (Telecom Paris, 2018) and another in Communications and Computer Networks Engineering (Politecnico di Torino). He earned his PhD in Computer Science from Sorbonne Université, where his research focused on automated methods for cleaning tabular data.
    Currently, he serves as the lead developer of the skrub Python library and is a member of the SODA Team at Inria. His work involves developing new features for the library and promoting its adoption through public outreach.

    ©Coll.privée

Les prochaines sessions

1

Target audience

IT developers, engineers, data scientists and data analysts.

Teaching methods

  • The course combines theoretical sections with practical exercises.
  • Participants will test their understanding through quizzes and corrected assignments.
  • All course material and resources will be provided to support independent learning after the course.

Practical information

Free training for beta testers of December 16, 2025 – Course is full ! 

  • Launch Offer:  €650 per participant.
  • Group discounts: 10% off for groups of 5 to 9 participants; 20% off for groups of 10 or more
  • Partnership with the Aktantis Cluster: members of the Aktantis cluster benefit from a special rate of €500 per participant.
  • Duration: 1 day (6 hours — from 9:00 a.m. to 12:00 p.m. and from 2:00 p.m. to 5:00 p.m.)
  • Format: Online (remote session)
  • Group size: Up to 12 participants
  • Language: English
  • Private sessions: Private training sessions can be organized for companies with 10 or more participants.

Program P16

This training was developed with the support of the Program P16 (lead by Inria), which aims at strengthening digital sovereignty of France and Europe in the field of AI by developing open, interoperable software libraries covering the full data-cycle.