CNAM and Huawei run a joint seminar (for CNAM master students) and webinar (for students from the AI4CI European master) on GenAI infrastructure design and traffic management. The ongoing NET4AI activities on GenAI traffic modeling and scheduling were presented and the ideas challenged with the students in a 3h-long session.

More details on the seminars are below. Students from CNAM Paris, CNAM en Grand Est, Universitat Politecnica de Catalunya (UPC, Spain), National Technical University of Ukraine (NTUU), Avignon Université, University of Ulm (Germany) and Universitatea Babes Bolyai (UBB, Romania) joined actively and made a ton of questions !

A keynote in this area is under preparation for ICIN 2026.

Date: 1/12  9:30 am – 1 pm CET

Speakers: Davide Avesani (CNAM), Massimo Gallo & Paolo Medagliani (Huawei)

The life of a token: from words to bits on the wire

 Abstract:

Large Language Models (LLMs) are reshaping our world, but how do they actually function? While it is common knowledge that Graphics Processing Units (GPUs) are the key ingredient for LLMs, a look “under the hood” reveals complex requirements: models must be trained efficiently to ensure quality and served (inference) with low latency to guarantee a good user experience. Furthermore, as model sizes exceed the capacity of a single GPU, parallelism strategies become essential. The goal is to keep all GPUs busy at maximum capacity while avoiding downtime, which requires tight synchronization and generates a high volume of communication. Efficiently achieving this remains a complex and ongoing research challenge.

The first part of this talk explores the origins and structure of this communication, with the goal of characterizing and estimating the expected network traffic. We will look at how an LLM is built, how the transformer chain works, how training data is turned into something the model can understand, and what information is passed through different model stages. Building on this foundation, we will review the main parallelization strategies for LLMs and how each one impacts communication patterns and network load.

The second part of the session will focus on collective communication, a term used to refer to the set of methods used to manage data exchange. In this part, we will introduce the different types of collective communications used in both training and inference. We will examine the network limitations and overheads they introduce, shedding light on how to measure performance and optimize data exchange between GPUs. Finally, we will explore how these collectives can be extended across multiple Data Centers via WAN for distributed training and inference.

GenAI seminar