Dejan Kostic (Link to Full CV)


Dejan Kostic
Chair Professor of Internetworking at the KTH Royal Institute of Technology. Associated with the Connected Intelligence unit of RISE (Research Institutes of Sweden), and a Wallenberg AI, Autonomous Systems and Software Program (WASP) faculty member. Main research interests are: Distributed Systems, Computer Networks, Operating Systems, and Mobile Computing.

News:

  • Community Award at NSDI '22 for our Packet Order Matters! paper! Also in the news at KTH EECS and Ericcson.
  • RedN is in the news at TechXplore!, and HPCWire.
  • Best Paper Award at SOSP 2021 for our LineFS paper! Also in the News at KTH EECS.
  • Google Ph.D. Fellowship (2021) for Alireza Farshin!
  • PacketMill is in the news in the Ericsson Blog., our load-balancing work (RSS++, Cheetah, and CrossRSS) is in the news at KTH EECS, and CacheDirector is in the news (TechXplore, Ericsson, KTH, KTH EECS).
  • Metron is in the news! ACM TechNews, PHYS.ORG, and KTH.
  • Open positions

    I am completing the recruiting process for Multiple FPGA research engineers within my European Research Council (ERC) Consolidator Project ULTRA (ERC is the flagship funding agency in Europe, and is funding this project with 2 M EUR). I am also seeking applications for the Multiple Postdoc positions in Networked Systems for Machine Learning.

    In this project we want to dramatically change the way Internet Services are constructed. We have already started to achieve significant gains in efficiency (about 7x), that come with substantial improvements in bandwidth (reaching and exceeding 100-Gbps with commodity server hardware) and latency (low and with low variance). As you can imagine, having services like this is crucial for the society. You can get a more detailed view of the type of work we do by looking at our Metron NSDI '18, Kurma SOCC '18, RSS++ CoNEXT '19, Cheetah NSDI '20, PacketMill ASPLOS '21, Packet Order Matters! NSDI '22 papers. More info is available in the text below, and especially, at the ULTRA project blog.

    Research

    My recent focus is on using machine learning for systems, and building systems for machine learning. One key example is our project on Scalable Federated Learning (SFL). This project aims to develop a highly scalable, flexible, extensible, distributed federated machine learning approach that can directly benefit public health and wellness. More details are available in the SFL blog. In our recent CFD paper we demonstrate our ability to perform dropout on image recognition models using coding theory (e.g., using Gold codes), resulting in more than 2X network bandwidth savings when used for Federated Learning. Our DeepGANTT IPSN '23 paper demonstrates our ability to apply transformers to graph neural networks for scaling up an IoT scheduling problem 10X beyond what a constraint optimization solver can solve in a reasonable time.

    I remain committed to working on low-latency networked systems, with my biggest effort in the 2018-2024 period being my ERC Consolidator Project called ULTRA. In this single-PI, 2-Million EUR project, we want to dramatically change the way Internet Services are constructed. The project has started well, and in our SoCC 2018 paper, we present Kurma, our fast and accurate load balancer for geo-distributed storage systems. We have recently collaborated with Ericsson on our CacheDirector work published at EuroSys 2019. The work on Metron (NFV service chains at the true speed of the underlying hardware) has led to, and has inspired this project. Our most recent CoNEXT 2019 paper called RSS++shows how load-balancing with Receive Side Scaling (RSS) can be improved for large increases in efficiency. In our NSDI 2020 paper we have introduced Cheetah, a new load balancer for solving the difficult challenge of remembering which server is serving which connection, without the tradeoff between uniform load balancing and efficiency. In our ASPLOS 2021 paper, we have shown that vertically integrating a network protocol stack enables a single CPU core to forward packets faster than 100 Gbps. More details are available at the PacketMill project page. The Metron journal article is now also available, and it shows how high performance NFV service chaining can be done even in the presence of blackboxes. Our NSDI 2022 paper Packet Order Matters shows a surprising result: deliberately delaying packets can improve the performance of backend servers by up to a factor of 2 (e.g., those used for Network Function Virtualization)! We show three different scenarios in which our Reframer can be deployed.

    In our OSDI 2020 paper on Assise, we provide a blueprint of how a next generation distributed file system should be built on top of NVRAM that is becoming widely available, and achieve large performance gains. Our LineFS paper (Best Paper award at SOSP '21) builds upon this work by offloading CPU-intensive tasks to a SmartNIC (BlueField-1 in our case) for about 80% performance improvement across the board. Our RedN NSDI 2022 paper shows another suprising result, namely that Remote Direct Memory Access (RDMA), as implemented in widely deployed RDMA Network Interface Cards, is Turing Complete! We leverage this finding to reduce the tail latency of services running on busy servers by 35x!

    A major focus area for my research group was on Time-Critical Clouds, a 2016-2022 project supported by SSF (the Swedish Foundation for Strategic Research) with 27 M SEK (~2.7 M EUR). This was a joint effort with the Connected Intelligence group of RISE AB.  Our first major contribution is a Eurosys 2017 paper (link to video and paper available here). In this work we reduce the tail latency in key-value stores by up to 1.9x by scheduling multiget requests more efficiently. Most recently we have shown how to run NFV service chains at the true speed of the underlying hardware in our NSDI '18 paper. In our EuroSys 2019 paper we have unlocked a performance-enhancing feature that existed in Intel processors for almost a decade. In our USENIX ATC 2020 paper, we are reexamining Direct Cache Access (DCA) to optimize I/O intensive applications for multi-hundred-gigabit networks. In our PAM 2021 paper, we show that the forwarding throughput of the widely-deployed programmable Network Interface Cards (NICs) sharply degrades when i) the forwarding plane is updated and ii) packets match multiple forwarding tables in the NIC.

    We have also concluded the work on the PROPHET ERC project (2010-2016), in which we aimed to dramatically change the way networked systems are developed and deployed. For example, we improved the performance of geo-replicated storage systems using GeoPerf [SOCC '15]. We have successfully applied software verification techniques to increase the reliability of Software-Defined Networks (SDN). Some of our key contributions to testing of OpenFlow networks are NICE [NSDI'12] and SOFT [CoNEXT'12].  We have identified serious issues in the interplay between the control and data planes in OpenFlow switches [PAM '15], and proposed an approach for verifying rule installation [CoNEXT '14] as well as fine-grained dynamic monitoring of switch dataplanes [CoNEXT '15]. Extended versions of these contributions are now available as IEEE/ACM TON and Elsevier Computer Networks journal publications. My work in Wallenberg AI, Autonomous Systems and Software Program (WASP) as a WASP faculty (advising an industrial PhD student at Ericsson, Amir Roozbeh) is complementary to these efforts.

    We have wrapped up our work in the BEhavioral-BAsed Forwarding (BEBA) Horizon2020 project (2014-2017) that aimed to reshape Software-Defined Networks. Our contributions are described in George Katsikas' licentiate thesis, and involve deep understanding and performance optimization  of Network Functions Virtualization (NFV) service chains. Moreover, our recent work on Synthesized Network Functions, demonstrates high throughput with low predictable latency on a single commodity server thanks to its highly synthesized code and request dispatching. The overall project was recently highlighted by the EU comission.

    Selected Recent Service

    ERC Starting Grant 2023 PE6 Panel Chair
    ERC Starting Grant 2021 PE6 Panel Chair
    ERC Starting Grant 2019 PE6 Panel member
    ERC Starting Grant 2017 PE6 Panel member

    TPC Co-chair of SYSTOR 2023
    TPC Co-chair of EuroSys 2020
    TPC Co-chair of ICDCS 2018 "Cloud Computing and Data Centers"
    TPC Co-chair of CoNEXT 2016

    TPC Member for NSDI '25
    TPC Member for ASPLOS '24, OSDI '24, SIGCOMM '24
    TPC Member for NSDI '23
    TPC Member for ASPLOS '22, NSDI '22, OSDI '22, SYSTOR '22
    TPC Member for NSDI '21, SOSP '21
    TPC Member for NSDI '20
    TPC Member for SIGCOMM '19, USENIX ATC '19, NSDI '19
    TPC Member for Eurosys '18, EuroSys '17, CoNEXT '18, CoNEXT '17
    TPC Member for OSDI '16, CoNEXT '15, SoCC '15, NSDI'15

    Main Publications

    Please see the complete list of publications below for full author lists. (Auto-generated publication list from the DiVA repository is also available)

  • "A High-Speed Stateful Packet Processing Approach for Tbps Programmable Switches" NSDI 2023.
  • "Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets", NSDI 2022. Community Award!.
  • "RDMA is Turing complete, we just did not know it yet!", NSDI 2022.
  • "LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism", SOSP 2021. Best Paper Award!
  • "PacketMill: Toward per-core 100-Gbps Networking", ASPLOS 2021.
  • "Assise: Performance and Availability via Client-local NVM in a Distributed File System", USENIX OSDI 2020.
  • "Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks", USENIX ATC 2020.
  • "A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency", NSDI 2020.
  • "RSS++: load and state-aware receive side scaling", CoNEXT 2019.
  • "Make the Most out of Last Level Cache in Intel Processors", EuroSys 2019.
  • "Fast and Accurate Load Balancing for Geo-Distributed Storage Systems", SoCC 2018.
  • "Metron: NFV Service Chains at the True Speed of the Underlying Hardware", NSDI 2018.
  • "Rein: Taming Tail Latency in Key-Value Stores via Multiget Scheduling", EuroSys 2017.
  • "Monocle: Dynamic, Fine-Grained Data Plane Monitoring", CoNEXT, 2015.
  • "The Nearest Replica Can Be Farther Than You Think", SOCC 2015
  • "What You Need to Know About SDN Flow Tables",  PAM 2015. 
  • "Providing Reliable FIB Update Acknowledgments in SDN", CoNEXT 2014.
  • "DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments", USENIX ATC 2013.
  • "A SOFT Way for OpenFlow Switch Interoperability Testing", CoNEXT 2012.
  • "A NICE Way to Test OpenFlow Applications", NSDI 2012.
  • "DejaVu: Accelerating Resource Allocation in Virtualized Environments", ASPLOS 2012.
  • "Identifying and Using Energy-Critical Paths", CoNEXT 2011.
  • "Insomnia in the Access (or How to Curb Access Network Related Energy Consumption)"SIGCOMM 2011.
  • "CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems", NSDI 2009.
  • "Staged Deployment in Mirage, an Integrated Software Upgrade Testing and Distribution System", SOSP 2007.
  • "Maintaining High Bandwidth under Dynamic Network Conditions", USENIX ATC 2005.
  • "FUSE: Lightweight Guaranteed Distributed Failure Notification", OSDI 2004.
  • "MACEDON: Methodology for Automatically Creating, Evaluating, and Designing Overlay Networks", NSDI 2004.
  • "Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh", SOSP 2003.
  • "Using Random Subsets to Build Scalable Network Services", USITS 2003.
  • "Scalability and Accuracy in a Large-Scale Network Emulator", OSDI 2002.
  • Journal Publications

    Conference and Workshop Publications

    Current and past students

    I am advising several doctoral students at KTH:


    Some of my students at KTH have already defended their licentiate theses (a degree half-way to the doctoral degree in Sweden):

    My Full CV contains the list of Master projects that were supervised and/or examined by me:

    Teaching

    At KTH, I teach (or have taught):

    Short biography

    Dejan Kostic obtained his Ph.D. in Computer Science at the Duke University. He spent the last two years of his studies and a brief stay as a postdoctoral scholar at the University of California, San Diego. He received his Master of Science degree in Computer Science from the University of Texas at Dallas, and his Bachelor of Science degree in Computer Engineering and Information Technology from the University of Belgrade (ETF), Serbia. From 2006 until 2012 he worked as a tenure-track Assistant Professor at the School of Computer and Communications Sciences at EPFL (Ecole Polytechnique Federale de Lausanne), Switzerland. In 2010, he received a European Research Council (ERC) Starting Investigator Award. From 2012 until June 2014, he worked at the IMDEA Networks Institute (Madrid, Spain) as a Research Associate Professor with tenure. He is a Professor of Internetworking at KTH since April 2014. In 2017, he received a European Research Council (ERC) Consolidator Award.

    Contact

    d m k <at> k t h  <dot> s e

    Office phone# +46 8-790 42 65

    Mailing address


    Prof. Dejan Kostic
    KTH Kista
    Kistagangen 16
    164 40 Kista Sweden


    How to reach my office

    My office is 4401 in the Electrum Building on the KTH Kista campus, East side, entering from Elevator B on the 4th floor. Approximate coordinates (on Google Maps): 59.404850, 17.949922

    The best way to enter the Electrum building is from Kistagangen 16, 164 40 Kista, Sweden. Another, lower and harder-to-find, entrance is Isafjordsgatan 26, 164 40 Kista, Sweden.

    Getting here from the Arlanda Stockholm airport: a convenient way of getting to KTH Kista is by catching the suburban train from the Arlanda airport (but NOT the Arlanda express train!) to the Helenelund Train Station. You need to go to Arlanda C in Terminal 5 to board the train, and please expect to pay an airport supplement (85 SEK, I think but prices are gradually increasing). Example google maps itinerary from the airport (entrance to the Electrum building is a bit inconspicuous, through the sliding doors).

    Link to Google Maps

    Getting here from Stockholm downtown: taking the Blue Line metro toward Akalla and getting off at Kista T-Bana (next to the Galleria shopping mall) is the best option. Then you follow the signs for Kistamassan, going up the street called Kistagangen. You will reach KTH Kista very quickly (and will not get to Kistamassan itself).



    Personal

    I love taking Stockholm Photos, and my larger portfolio is here: https://pixels.com/profiles/dejan-kostic?tab=artwork.
    You can also follow me (dmkostic) on Instagram and Twitter. My LinkedIn profile is here.