Scaling Distributed Machine Learning with In-Network Aggregation
Marco Canini,
KAUST: King Abdullah University of Science and Technology –
Abstract:
Training complex machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the training process. Our approach reduces the volume of exchanged data by aggregating the model updates from multiple workers in the network. We co-design the switch processing with the end-host protocols and ML frameworks to provide a robust, efficient solution that speeds up training by up to 310%, and at least by 20% in most cases for a number of real-world benchmark models.
Bio
Marco does not know what the next big thing will be. But he’s sure that our next-gen computing and networking infrastructure must be a viable platform for it and avoid stifling innovation. Marco’s research area is cloud computing, distributed systems and networking. His current interest is in designing better systems support for AI/ML and provide practical implementations deployable in the real-world.
Marco is an associate professor in Computer Science at KAUST. Marco obtained his Ph.D. in computer science and engineering from the University of Genoa in 2009 after spending the last year as a visiting student at the University of Cambridge, Computer Laboratory. He was a postdoctoral researcher at EPFL from 2009 to 2012 and after that a senior research scientist for one year at Deutsche Telekom Innovation Labs & TU Berlin. Before joining KAUST, he was an assistant professor at the UCLouvain. He also held positions at Intel, Microsoft and Google.
Date: 2019-Nov-15 Time: 15:00:00 Room: 020
For more information:
Upcoming Events
Mathematics, Physics & Machine Learning Seminar Series (Online)

The Mathematics, Physics & Machine Learning seminar series has started on October 2020 and runs until March 2021.
The seminars aim to bring together mathematicians and physicists interested in machine learning (ML) with ML and AI experts interested in mathematics and physics, with the goal of introducing innovative Mathematics and Physics-inspired techniques in Machine Learning and, reciprocally, applying Machine Learning to problems in Mathematics and Physics.
Attendance is free but registration is required.
More information is available here.
International European Conference on Parallel and Distributed Computing

The 27th International European Conference on Parallel and Distributed Computing (Euro-Par 2021) will take from August 30 to September 3 2021 in Lisbon.
Euro-Par is the prime European conference covering all aspects of parallel and distributed processing, ranging from theory to practice, from small to the largest parallel and distributed systems and infrastructures, from fundamental computational problems to full-fledged applications, from architecture, compiler, language and interface design and implementation, to tools, support infrastructures, and application performance aspects.
The 2021 edition of Euro-Par will be organized as a collaboration between INESC-ID and Instituto Superior Técnico (IST).
Important Dates:
– Abstract Submission: February 5, 2021
– Paper Submission Deadline: February 12, 2021
– Author Notification: April 30, 2021
– Camera-Ready Papers: June 6, 2021
More information is available here.