Communications and Networking Research Group

Search form.

  • Publications

PUBLICATIONS

Journal articles | other papers | conference papers | book chapters | technical reports, journal articles.

134. Vishrant Tripathi, Nick Jones, Eytan Modiano, Fresh-CSMA: A Distributed Protocol for Minimizing Age of Information, IEEE Journal on Communications and Networks, 2024.

133. Bai Liu, Quang Nguyen, Qingkai Liang, Eytan Modiano, Tracking Drift-Plus-Penalty: Utility Maximization for Partially Observable and Controllable Networks, IEEE/ACM Transactions on Networking, 2024.

132. Xinzhe Fu, Eytan Modiano, Optimal Routing to Parallel Servers with Unknown Utilities – Multi-armed Bandit With Queues, IEEE/ACM Transactions on Networking, January 2022.

131. Bai Liu, Qingkai Liang, Eytan Modiano, Tracking MaxWeight: Optimal Control for Partially Observable and Controllable Networks, IEEE/ACM Transactions on Networking, August 2023.

130. Xinzhe Fu, Eytan Modiano, Joint Learning and Control in Stochastic Queueing Networks with unknown Utilities, Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2023.

129. Vishrant Tripathi, Rajat Talak, Eytan Modiano, Information Freshness in Multi-Hop Wireless Networks, IEEE/ACM Transactions on Networking,” April 2023.

128.  Xinzhe Fu, Eytan Modiano, “ Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay ,”  IEEE/ACM Transactions on Networking,” 2022.

127.  Bai Liu, Qiaomin Xie, Eytan Modiano,  " RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems ,"  ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), 2022.

126. Xinzhe Fu and E. Modiano,  “ Elastic Job Scheduling with Unknown Utility Functions ,” Performance Evaluation, 2021.

125. Bai Liu and E. Modiano, “ Optimal Control for Networks with Unobservable Malicious Nodes ,”  Performance Evaluation, 2021.

124. Vishrant Tripathi, Rajat Talak, Eytan Modiano, " Age Optimal Information Gathering and Dissemination on Graphs ,”  Transactions on Mobile Computing, April 2021.

123.  Xinyu Wu, Dan Wu, Eytan Modiano, “ Predicting Failure Cascades in Large Scale Power Systems via the Influence Model Framework, ”  IEEE Transactions on Power Systems, 2021.

122.   Roy D. Yates, Yin Sun, D. Richard Brown III, Sanjit K. Kaul, Eytan Modiano and Sennur Ulukus, “ Age of Information: An Introduction and Survey, ”  Journal on Selected Areas in Communications, February 2021.

121.   Jianan Zhang, Abhishek Sinha, Jaime Llorca, Anonia Tulino, Eytan Modiano, “ Optimal Control of Distributed Computing Networks with Mixed-Cast Traffic Flows ,”  IEEE/ACM Transactions on Networking, 2021.

120.   Thomas Stahlbuhk, Brooke Shrader, Eytan Modiano, " Learning Algorithms for Minimizing Queue Length Regret ,”  IEEE Transactions on Information Theory, 2021.

119.   Thomas Stahlbuhk, Brooke Shrader, Eytan Modiano, “ Throughput Maximization in Uncooperative Spectrum Sharing Networks ,”  IEEE/ACM IEEE/ACM Transactions on Networking, Vol. 28, No. 6, December 2020.

118.   Thomas Stahlbuhk, Brooke Shrader, Eytan Modiano, “ Learning algorithms for scheduling in wireless networks with unknown channel statistics ,” Ad Hoc Networks, Vol. 85, pp. 131-144, 2019.

117.   Rajat Talak, Eytan Modiano, “ Age-Delay Tradeoffs in Queueing Systems ,”  IEEE Transactions on Information Theory, 2021.

116.   Rajat Talak, Sertac Karaman, Eytan Modiano, " Improving Age of Information in Wireless Networks with Perfect Channel State Information ,”  IEEE/ACM Transactions on Networking, Vol. 28, No. 4, August 2020.

115.   Igor Kadota and Eytan Modiano, “ Minimizing the Age of Information in Wireless Networks with Stochastic Arrivals ,” IEEE Transactions on Mobile Computing, 2020.

114.   Rajat Talak, Sertac Karaman, Eytan Modiano, “ Optimizing Information Freshness in Wireless Networks under General Interference Constraints ,”  IEEE/ACM transactions on Networking, Vol. 28, No. 1, February 2020.

113.   X. Fu and E. Modiano, " Fundamental Limits of Volume-based Network DoS Attacks ," Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 3, No. 3, December 2019. 

112.   Rajat Talak, Sertac Karaman, Eytan Modiano, “ Capacity and Delay Scaling for Broadcast Transmission in Highly Mobile Wireless Networks ,” IEEE Transactions on Mobile Computing, 2019.

111.   Abhishek Sinha and Eytan Modiano, “ Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions , IEEE Transactions on Mobile Computing, Vol. 19, No. 9, September 2020.

110.   Yu-Pin Hsu, Eytan Modiano, Lingjie Duan, “ Scheduling Algorithms for Minimizing Age of Information in Wireless Broadcast Networks with Random Arrivals ,”  IEEE Transactions on Mobile Computing, Vol. 19, No. 12, December 2020.

109.   Xiaolin Jiang, Hossein S. Ghadikolaei, Gabor Fodor, Eytan Modiano, Zhibo Pang, Michele Zorzi, Carlo Fischione, " Low-latency Networking: Where Latency Lurks and How to Tame It ,”  Proceedings of the IEEE, 2019.

108.   Jianan Zhang, Edmund Yeh, Eytan Modiano, “ Robustness of Interdependent Random Geometric Networks ,” IEEE Transactions on Network Science and Engineering, Vol. 6, No. 3, July-September 2019.

107.   Qingkai Liang, Hyang-Won Lee, Eytan Modiano, “ Robust Design of Spectrum-Sharing Networks ,” IEEE Transactions on Mobile Computing, Vol. 18, No. 8, August 2019.

106.   A. Sinha, L. Tassiulas, E. Modiano, “ Throughput-Optimal Broadcast in Wireless Networks with Dynamic Topology ,”  IEEE Transactions on Mobile Computing, Vol. 18, No. 5, May 2019.

105. Igor Kadota, Abhishek Sinha, Eytan Modiano, “ Scheduling Algorithms for Optimizing Age of Information in Wireless Networks With Throughput Constraints ,”  IEEE/ACM Transactions on Networking, August 2019.

104.   Igor Kadota, Abhishek Sinha, Rahul Singh, Elif Uysal-Biyikoglu, Eytan Modjano, “ Scheduling Policies for Minimizing Age of Information in Broadcast Wireless Networks ,” IEEE/ACM Transactions on Networking, Vol. 26, No. 5, October 2018.

103.   Jianan Zhang and Eytan Modiano, “ Connectivity in Interdependent Networks ,”  IEEE/ACM Transactions on Networking, 2018.

102.   Qingkai Liang, Eytan Modiano, “ Minimizing Queue Length Regret Under Adversarial Network Models ,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 2, Issue 1, April 2018, Article No.: 11, pp 1-32. (same as Sigmetrics 2018).

101.   A. Sinha and E. Modiano, “ Optimal Control for Generalized Network Flow Problems ,”  IEEE/ACM Transactions on Networking, 2018.

100.   Hossein Shokri-Ghadikolaei, Carlo Fischione, Eytan Modiano  “ Interference Model Similarity Index and Its Applications to mmWave Networks ,”  IEEE Transactions on Wireless Communications, 2018.

99.   Matt Johnston, Eytan Modiano, “ Wireless Scheduling with Delayed CSI: When Distributed Outperforms Centralized, ’ IEEE Transactions on Mobile Computing, 2018.

98.   A. Sinha, G. Paschos, E. Modiano, “ Throughput-Optimal Multi-hop Broadcast Algorithms ," IEEE/ACM Transactions on Networking, 2017.

97.   Nathan Jones, Georgios Paschos, Brooke Shrader, Eytan Modiano, " An Overlay Architecture for Throughput Optimal Multipath Routing ,” IEEE/ACM Transactions on Networking, 2017.

96.   Greg Kuperman, Eytan Modiano, “ Providing Guaranteed Protection in Multi-Hop Wireless Networks with Interference Constraints ,” IEEE Transactions on Mobile Computing, 2017.

95.   Matt Johnston, Eytan Modiano, Isaac Kesslassy, “ Channel Probing in Opportunistic Communications Systems ,”  IEEE Transactions on Information Theory, November, 2017.

94.   Anurag Rai, Georgios Paschos, Chih-Ping Lee, Eytan Modiano, " Loop-Free Backpressure Routing Using Link-Reversal Algorithms ", IEEE/ACM Transactions on Networking, October, 2017.

93.   Matt Johnston and Eytan Modiano, “" Controller Placement in Wireless Networks with Delayed CSI ,” IEEE/ACM Transactions on Networking, 2017.

92.   Jianan Zheng, E. Modiano, D. Hay, " Enhancing Network Robustness via Shielding ,”  IEEE Transactions on Networking, 2017.

91.   M. Markakis, E. Modiano, J.N. Tsitsiklis, “ Delay Analysis of the Max-Weight Policy under Heavy-Tailed Traffic via Fluid Approximations ,” Mathematics of Operations Research, October, 2017.

90.   Qingkai Liang and E. Modiano, “ Survivability in Time-Varying Graphs ,”  IEEE Transactions on Mobile Computing, 2017.

89.   A. Sinha, G. Paschos, C. P. Li, and E. Modiano, “ Throughput-Optimal Multihop Broadcast on Directed Acyclic Wireless Networks ," IEEE/ACM Transactions on Networking, Vol. 25, No. 1, Feb. 2017.

88.   G. Celik, S. Borst, , P. Whiting , E. Modiano, “ Dynamic Scheduling with Reconfiguration Delays ,”  Queueing Systems, 2016.

87.  G. Paschos, C. P. Li, E. Modiano, K. Choumas, T. Korakis, “ In-network Congestion Control for Multirate Multicast ,”   IEEE/ACM Transactions on Networking,  2016.

86.   H. Seferoglu and E. Modiano, “ TCP-Aware Backpressure Routing and Scheduling ,” IEEE Transactions on Mobile Computing, 2016.

85.   H. Seferoglu and E. Modiano, “ Separation of Routing and Scheduling in Backpressure-Based Wireless Networks ," IEEE/ACM Transactions on Networking, Vol. 24, No. 3, 2016.

84.   M. Markakis, E. Modiano, J.N. Tsitsiklis, “ Delay Stability of Back-Pressure Policies in the presence of Heavy-Tailed Traffic ,”  IEEE/ACM Transactions on Networking, 2015.

83.   S. Neumayer, E. Modiano,  “ Network Reliability Under Geographically Correlated Line and Disk Failure Models ,” Computer Networks, to appear, 2016.

82.   S. Neumayer, E. Modiano, A. Efrat, “ Geographic Max-Flow and Min-Cut Under a Circular Disk Failure Model ,” Computer Networks, 2015.

81.   Marzieh Parandehgheibi, Hyang-Won Lee, Eytan Modiano, Survivable Path Sets:  A new approach to survivability in multi-layer networks ,”  IEEE Journal on Lightwave Technology, 2015.

80.   G. Kuperman, E. Modiano, A. Narula-Tam, “ Network Protection with Multiple Availability Guarantees ,” Computer Networks, 2015.

79.   G. Kuperman, E. Modiano, A. Narula-Tam, “ Analysis and Algorithms for Partial Protection in Mesh Networks ,” IEEE/OSA Journal of Optical Communications and Networks, 2014.

78.   Krishna Jagannathan, Mihalis Markakis, Eytan Modiano, John Tsitsiklis, " Throughput Optimal Scheduling over Time-Varying Channels in the presence of Heavy-Tailed Traffic ,” IEEE Transactions on Information Theory, 2014.

77.   Chih-Ping Li and Eytan Modiano, “ Receiver-Based Flow Control for Networks in Overload ," IEEE/ACM Transactions on Networking, Vol. 23, No. 2, 2015.

76.   Matthew Johnston, Hyang-Won Lee, Eytan Modiano, “ A Robust Optimization Approach to Backup Network Design with Random Failures ,” IEEE/ACM Transactions on Networking, Vol. 23, No. 4, 2015.

75.   Guner Celik and Eytan Modiano, “ Scheduling in Networks with Time-Varying Channels and Reconfiguration Delay ," IEEE/ACM Transactions on Networking, Vol. 23, No. 1, 2015.

74.   Matt Johnston, H.W. Lee, E. Modiano, “ Robust Network Design for Stochastic Traffic Demands ," IEEE Journal of Lightwave Technology, 2013.

73.   Mihalis Markakis, Eytan Modiano, John Tsitsiklis, “ Max-Weight Scheduling in Queueing Networks With Heavy-Tailed Traffic, ” IEEE/ACM Transactions on Networking, 2014.

72.   Kayi Lee, Hyang-Won Lee, Eytan Modiano, " Maximizing Reliability in WDM Networks through Lightpath Routing ,”  IEEE ACM Transactions on Networking, 2014.

71.   Krishna Jaggannathan and Eytan Modiano, “ The Impact of Queue Length Information on Buffer Overflow in Parallel Queues ,”  IEEE transactions on Information Theory, 2013.

70.   Krishna Jagannathan, Ishai Menashe, Gil Zussman, Eytan Modiano, “ Non-cooperative Spectrum Access - The Dedicated vs. Free Spectrum Choice ,” IEEE JSAC, special issue on Economics of Communication Networks & Systems, to appear, 2012.

69.   Guner Celik and Eytan Modiano, “ Dynamic Server Allocation over Time Varying Channels with Switchover Delay ," IEEE Transactions on Information Theory, to appear, 2012.

68.   Anand Srinivas and Eytan Modiano, " Joint Node Placement and Assignment for Throughput Optimization in Mobile Backbone Networks ,” IEEE JSAC, special issue on Communications Challenges and Dynamics for Unmanned Autonomous Vehicles, June, 2012.

67.   Guner Celik and Eytan Modiano, “ Controlled Mobility in Stochastic and Dynamic Wireless Networks ," Queueing Systems, 2012.

66.   Krishna Jagannathan, Shie Mannor, Ishai Menache, Eytan Modiano, “ A State Action Frequency Approach to Throughput Maximization over Uncertain Wireless Channels ,” Internet Mathematics, Vol. 9, Nos. 2–3: 136–160.

65.   Long Le, E. Modiano, N. Shroff, “Optimal Control of Wireless Networks with Finite Buffers ,” IEEE/ACM Transactions on Networking, to appear, 2012.

64.   K. Jagannathan, M. Markakis, E. Modiano, J. Tsitsiklis, “Queue Length Asymptotics for Generalized Max-Weight Scheduling in the presence of Heavy-Tailed Traffic,” IEEE/ACM Transactions on Networking, Vol. 20, No. 4, August 2012.

63.   Kayi Lee, Hyang-Won Lee, Eytan Modiano, “ Reliability in Layered Networks with Random Link Failures, ” IEEE/ACM Transactions on Networking, December 2011.

62.   Krishna Jagannathan, Eytan Modiano, Lizhong Zheng, “ On the Role of Queue Length Information in Network Control ,” IEEE Transactions on Information Theory, September 2011.

61.   Hyang-Won Lee, Long Le, Eytan Modiano, “ Distributed Throughput Maximization in Wireless Networks via Random Power Allocation, ” IEEE Transactions on Mobile Computing, 2011.

60.   Sebastian Neumayer, Gil Zussman, Rueven Cohen, Eytan Modiano, " Assessing the Vulnerability of the Fiber Infrastructure to Disasters, " IEEE/ACM Transactions on Networking, December 2011.

59.   Kayi Lee, Eytan Modiano, Hyang-Won Lee, “ Cross Layer Survivability in WDM-based Networks ,” IEEE/ACM Transactions on Networking, August 2011.

58.   Emily Craparo, Jon How, and Eytan Modiano, “Throughput Optimization in Mobile Backbone Networks,” IEEE Transactions on Mobile Computing, April, 2011.

57.   Hyang-Won Lee, Kayi Lee, and Eytan Modiano, “Diverse Routing in Networks with Probabilistic Failures,” IEEE/ACM Transactions on Networking, December, 2010.

56.   Guner Celik, Gil Zussman, Wajahat Khan and Eytan Modiano, “MAC Protocols For Wireless Networks With Multi-packet Reception Cabaility ,” IEEE Transactions on Mobile Computing, February, 2010.

55.   Atilla Eryilmaz, Asuman Ozdaglar, Devavrat Shah, and Eytan Modiano, “Distributed Cross-Layer Algorithms for the Optimal Control of Multi-hop Wireless Networks,” IEEE/ACM Transactions on Networking, April 2010.

54.   Murtaza Zafer and Eytan Modiano, “Minimum Energy Transmission over a Wireless Channel With Deadline and Power Constraints ,” IEEE Transactions on Automatic Control, pp. 2841-2852, December, 2009.

53.   Murtaza Zafer and Eytan Modiano, “A Calculus Approach to Energy-Efficient Data Transmission with Quality of Service Constraints,” IEEE/ACM Transactions on Networking, 2009.

52.   Anand Srinivas, Gil Zussman, and Eytan Modiano, “Construction and Maintenance of Wireless Mobile Backbone Networks,” IEEE/ACM Transactions on Networking, 2009.

51.   Andrew Brzezinski, Gil Zussman, and Eytan Modiano, “Distributed Throughput Maximization in Wireless Mesh Networks Via Pre-Partitioning,” IEEE/ACM Transactions on Networking, December, 2008.

50.   Amir Khandani, Eytan Modiano, Jinane Abounadi, Lizhong Zheng, “Reliability and Route Diversity in Wireless Networks,” IEEE Transactions on Wireless Communications, December, 2008.

49.   Alessandro Tarello, Jun Sun, Murtaza Zafer and Eytan Modiano, “Minimum Energy Transmission Scheduling Subject to Deadline Constraints,” ACM Wireless Networks, October, 2008.

48.   Murtaza Zafer, Eytan Modiano, “Optimal Rate Control for Delay-Constrained Data Transmission over a Wireless Channel,” IEEE Transactions on Information Theory, September, 2008.

47.   Andrew Brzezinski and Eytan Modiano, “Achieving 100% Throughput In Reconfigurable IP/WDM Networks,” IEEE/ACM Transactions on Networking, August, 2008.

46.   Michael Neely, Eytan Modiano and C. Li, “Fairness and Optimal Stochastic Control for Heterogeneous Networks,” IEEE/ACM Transactions on Networking, September, 2008.

45.   Amir Khandani, Jinane Abounadi, Eytan Modiano, Lizhong Zheng, “Cooperative Routing in Static Wireless Networks,” IEEE Transactions on Communications, November 2007.

44.   Murtaza Zafer, Eytan Modiano, “Joint Scheduling of Rate-guaranteed and Best-effort Users over a Wireless Fading Channel,” IEEE Transactions on Wireless Communications, October, 2007.

43.   Krishna Jagannathan, Sem Borst, Phil Whiting and Eytan Modiano, “Scheduling of Multi-Antenna Broadcast Systems with Heterogeneous Users,” IEEE Journal of Selected Areas in Communications, September, 2007.Amir Khandani, Jinane

42.   Anand Ganti, Eytan Modiano, and John Tsitsiklis, “Optimal Transmission Scheduling in Symmetric Communication Models with Intermittent Connectivity, ” IEEE Transactions on Information Theory, March, 2007.

41.   Michael Neely and Eytan Modiano, “Logarithmic Delay for NxN Packet Switches Under Crossbar Constraints,” IEEE/ACM Transactions on Networking, November, 2007.

40.   Jun Sun, Jay Gao, Shervin Shambayati and Eytan Modiano, “Ka-Band Link Optimization with Rate Adaptation for Mars and Lunar Communications,”   International Journal of Satellite Communications and Networks, March, 2007.

39.   Jun Sun and Eytan Modiano, "Fair Allocation of A Wireless Fading Channel: An Auction Approach" Institute for Mathematics and its Applications, Volume 143: Wireless Communications, 2006.

38.   Jun Sun, Eytan Modiano and Lizhong Zhang, “Wireless Channel Allocation Using An Auction Algorithm,” IEEE Journal on Selected Areas in Communications, May, 2006.

37.   Murtaza Zafer and Eytan Modiano, "Blocking Probability and Channel Assignment for Connection Oriented Traffic in Wireless Networks," IEEE Transactions on Wireless Communications, April, 2006.

36.   Alvin Fu, Eytan Modiano, and John Tsitsiklis, "Optimal Transmission Scheduling over a fading channel with Energy and Deadline Constraints" IEEE Transactions on Wireless Communications, March,2006.

35.   Poompat Saengudomlert, Eytan Modiano and Rober Gallager, “On-line Routing and Wavelength Assignment for Dynamic Traffic in WDM Ring and Torus Networks,” IEEE Transactions on Networking, April, 2006.

34.   Li-Wei Chen, Eytan Modiano and Poompat Saengudomlert, "Uniform vs. Non-Uniform band Switching in WDM Networks," Computer Networks (special issue on optical networks), January, 2006.

33.   Andrew Brzezinski and Eytan Modiano, "Dynamic Reconfiguration and Routing Algorithms for IP-over-WDM networks with Stochastic Traffic," IEEE Journal of Lightwave Technology, November, 2005

32.   Randall Berry and Eytan Modiano, "Optimal Transceiver Scheduling in WDM/TDM Networks," IEEE Journal on Selected Areas in Communications, August, 2005.

31.   Poompat Saengudomlert, Eytan Modiano, and Robert G. Gallager, “Dynamic Wavelength Assignment for WDM All-Optical Tree Networks,” IEEE Transactions on Networking, August, 2005.

30.   Ashwinder Ahluwalia and Eytan Modiano, "On the Complexity and Distributed Construction of Energy Efficient Broadcast Trees in Wireless Ad Hoc Networks," IEEE Transactions on Wireless Communications, October, 2005.

29.   Michael Neely, Charlie Rohrs and Eytan Modiano, "Equivalent Models for Analysis of Deterministic Service Time Tree Networks," IEEE Transactions on Information Theory, October, 2005.

28.   Michael Neely and Eytan Modiano, "Capacity and Delay Tradeoffs for Ad Hoc Mobile Networks," IEEE Transactions on Information Theory, May, 2005.

27.   Li-Wei Chen and Eytan Modiano, "Efficient Routing and Wavelength Assignment for Reconfigurable WDM Networks with Wavelength Converters," IEEE/ACM Transactions on Networking, February, 2005. Selected as one of the best papers from Infocom 2003 for fast-track publication in IEEE/ACM Transactions on Networking.

26.   Michael Neely and Eytan Modiano, "Convexity in Queues with General Inputs," IEEE Transactions on Information Theory, May, 2005.

25.   Anand Srinivas and Eytan Modiano, "Finding Minimum Energy Disjoint Paths in Wireless Ad Hoc Networks," ACM Wireless Networks, November, 2005. Selected to appear in a special issue dedicated to best papers from Mobicom 2003.

24.   Michael Neely, Eytan Modiano and Charlie Rohrs, "Dynamic Power Allocation and Routing for Time-Varying Wireless Networks," IEEE Journal of Selected Areas in Communication, January, 2005.

23.   Chunmei Liu and Eytan Modiano, "On the performance of additive increase multiplicative decrease (AIMD) protocols in hybrid space-terrestrial networks," Computer Networks, September, 2004.

22.   Li-Wei Chen and Eytan Modiano, "Dynamic Routing and Wavelength Assignment with Optical Bypass using Ring Embeddings," Optical Switching and Networking (Elsevier), December, 2004.

21.   Aradhana Narula-Tam, Eytan Modiano and Andrew Brzezinski, "Physical Topology Design for Survivable Routing of Logical Rings in WDM-Based Networks," IEEE Journal of Selected Areas in Communication, October, 2004.

20.   Randall Berry and Eytan Modiano, "'The Role of Switching in Reducing the Number of Electronic Ports in WDM Networks," IEEE Journal of Selected Areas in Communication, October, 2004.

19.   Jun Sun and Eytan Modiano, "Routing Strategies for Maximizing Throughput in LEO Satellite Networks,," IEEE JSAC, February, 2004.

18.   Jun Sun and Eytan Modiano, "Capacity Provisioning and Failure Recovery for Low Earth Orbit Satellite Networks," International Journal on Satellite Communications, June, 2003.

17.   Alvin Fu, Eytan Modiano, and John Tsitsiklis, "Optimal Energy Allocation and Admission Control for Communications Satellites," IEEE Transactions on Networking, June, 2003.

16.   Michael Neely, Eytan Modiano and Charles Rohrs, "Power Allocation and Routing in Multi-Beam Satellites with Time Varying Channels," IEEE Transactions on Networking, February, 2003.

15.   Eytan Modiano and Aradhana Narula-Tam, "Survivable lightpath routing: a new approach to the design of WDM-based networks," IEEE Journal of Selected Areas in Communication, May 2002.

14.   Aradhana Narula-Tam, Phil Lin and Eytan Modiano, "Efficient Routing and Wavelength Assignment for Reconfigurable WDM Networks," IEEE Journal of Selected Areas in Communication, January, 2002.

13.   Brett Schein and Eytan Modiano, "Quantifying the benefits of configurability in circuit-switched WDM ring networks with limited ports per node," IEEE Journal on Lightwave Technology, June, 2001.

12.   Aradhana Narula-Tam and Eytan Modiano, "Dynamic Load Balancing in WDM Packet Networks with and without Wavelength Constraints," IEEE Journal of Selected Areas in Communications, October 2000.

11.   Randy Berry and Eytan Modiano, "Reducing Electronic Multiplexing Costs in SONET/WDM Rings with Dynamically Changing Traffic," IEEE Journal of Selected Areas in Communications, October 2000.

10.   Eytan Modiano and Richard Barry, "A Novel Medium Access Control Protocol for WDM-Based LANs and Access Networks Using a Master-Slave Scheduler," IEEE Journal on Lightwave Technology, April 2000.

9.   Eytan Modiano and Anthony Ephremides, "Communication Protocols for Secure Distributed Computation of Binary Functions," Information and Computation, April 2000.

8.   Angela Chiu and Eytan Modiano, "Traffic Grooming Algorithms for Reducing Electronic Multiplexing Costs in WDM Ring Networks," IEEE Journal on Lightwave Technology, January 2000.

7.   Eytan Modiano, "An Adaptive Algorithm for Optimizing the Packet Size Used in Wireless ARQ Protocols," Wireless Networks, August 1999.

6.   Eytan Modiano, "Random Algorithms for Scheduling Multicast Traffic in WDM Broadcast-and-Select Networks," IEEE Transactions on Networking, July, 1999.

5.   Eytan Modiano and Richard Barry, "Architectural Considerations in the Design of WDM-based Optical Access Networks," Computer Networks, February 1999.

4.   V.W.S. Chan, K. Hall, E. Modiano and K. Rauschenbach, "Architectures and Technologies for High-Speed Optical Data Networks," IEEE Journal of Lightwave Technology, December 1998.

3.   Eytan Modiano and Anthony Ephremides, "Efficient Algorithms for Performing Packet Broadcasts in a Mesh Network," IEEE Transactions on Networking, May 1996.

2.   Eytan Modiano, Jeffrey Wieselthier and Anthony Ephremides, "A Simple Analysis of Queueing Delay in a Tree Network of Discrete-Time Queues with Constant Service Times," IEEE Transactions on Information Theory, February 1996.

1.   Eytan Modiano and Anthony Ephremides, "Communication Complexity of Secure Distributed Computation in the Presence of Noise," IEEE Transactions on Information Theory, July 1992.

Other Papers

5.  Eytan Modiano, "Satellite Data Networks," AIAA Journal on Aerospace Computing, Information and Communication, September, 2004.

4.  Eytan Modiano and Phil Lin, "Traffic Grooming in WDM networks," IEEE Communications Magazine, July, 2001.

3.  Eytan Modiano and Aradhana Narula, "Mechanisms for Providing Optical Bypass in WDM-based Networks," SPIE Optical Networks, January 2000.

2.  K. Kuznetsov, N. M. Froberg, Eytan Modiano, et. al., "A Next Generation Optical Regional Access Networks," IEEE Communications Magazine, January, 2000.

1.  Eytan Modiano, "WDM-based Packet Networks," (Invited Paper) IEEE Communications Magazine, March 1999.

Conference Papers

246. Xinyu Wu, Dan Wu, Eytan Modiano, “ Overload Balancing in Single-Hop Networks With Bounded Buffers ,” IFIP Networking, 2022.

245.  Xinzhe Fu, Eytan Modiano, “ Optimal Routing for Stream Learning Systems ,”  IEEE Infocom, April 2022.

244.  Vishrant Tripathi, Luca Ballotta, Luca Carlone, E. Modiano, “ Computation and Communication Co-Design for Real-Time Monitoring and Control in Multi-Agent Systems ,”  IEEE Wiopt, 2021.

243. Eray Atay, Igor Kadota, E. Modiano, “ Aging Wireless Bandits: Regret Analysis and Order-Optimal Learning Algorithm ,”  IEEE Wiopt 2021.

242. Xinzhe Fu and E. Modiano,  “ Elastic Job Scheduling with Unknown Utility Functions ,” IFIP Performance, Milan, 2021.

241. Bai Liu and E. Modiano, “ Optimal Control for Networks with Unobservable Malicious Nodes ,”  IFIP Performance, Milan, 2021.

240. Bai Liu, Qiaomin Xie,  Eytan Modiano, “ RL-QN:  A Reinforcement Learning Framework for Optimal Control of Queueing Systems ,”  ACM Sigmetrics Workshop on Reinforcement Learning in Networks and Queues (RLNQ), 2021.

239. Xinzhe Fu and E. Modiano,  “ Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay ,  ACM MobiHoc 2021.  

238. Vishrant Tripathi and Eytan Modiano,  “ An Online Learning Approach to Optimizing Time-Varying Costs of AoI ,”  ACM MobiHoc 2021. 

237.   Igor Kadota, Muhammad Shahir Rahman, and Eytan Modiano, " WiFresh: Age-of-Information from Theory to Implementation ,”  International Conference on Computer Communications and Networks (ICCCN), 2021.

236. Vishrant Tripathi and Eytan Modiano, “ Age Debt: A General Framework For Minimizing Age of Information ,”  IEEE Infocom Workshop on Age-of-Information, 2021.

235. Igor Kadota, Eytan Modiano, “ Age of Information in Random Access Networks with Stochastic Arrivals ,” IEEE Infocom, 2020.

234. Igor Kadota, M. Shahir Rahman, Eytan Modiano, Poster: Age of Information in Wireless Networks: from Theory to Implementation , ACM Mobicom, 2020.

233. Xinyu Wu, Dan Wu, Eytan Modiano, “ An Influence Model Approach to Failure Cascade Prediction in Large Scale Power Systems ,” IEEE American Control Conference, July, 2020.

232. X. Fu and E. Modiano, " Fundamental Limits of Volume-based Network DoS Attacks ," Proc. ACM Sigmetrics, Boston, MA, June 2020.

231. Vishrant Tripathi, Eytan Modiano, “ A Whittle Index Approach to Minimizing Functions of Age of Information ,” Allerton Conference on Communication, Control, and Computing, September 2019.

230. Bai Liu, Xiaomin Xie, Eytan Modiano, “ Reinforcement Learning for Optimal Control of Queueing Systems ,” Allerton Conference on Communication, Control, and Computing, September 2019.

229. Rajat Talak, Sertac Karaman, Eytan Modiano, “ A Theory of Uncertainty Variables for State Estimation and Inference ,” Allerton Conference on Communication, Control, and Computing, September 2019.

228. Rajat Talak, Eytan Modiano, “ Age-Delay Tradeoffs in Single Server Systems ,” IEEE International Symposium on Information Theory, Paris, France, July, 2019.

227. Rajat Talak, Sertac Karaman, Eytan Modiano, “ When a Heavy Tailed Service Minimizes Age of Information ,” IEEE International Symposium on Information Theory, Paris, France, July, 2019.

226. Qingkai Liang, Eytan Modiano, “ Optimal Network Control with Adversarial Uncontrollable Nodes ,” ACM MobiHoc, Catania, Italy, June 2019.

225. Igor Kadota, Eytan Modiano, “ Minimizing the Age of Information in Wireless Networks with Stochastic Arrivals ,” ACM MobiHoc, June 2019.

224. Maotong Xu, Jelena Diakonikolas, Suresh Subramaniam, Eytan Modiano, “ A Hierarchical WDM-based Scalable Data Center Network Architecture ,” IEEE International Conference on Communications (ICC), Shanghai, China, June 2019.

223. Maotong Xu, Min Tian, Eytan Modiano, Suresh Subramaniam, " RHODA Topology Configuration Using Bayesian Optimization

222.   Anurag Rai, Rahul Singh and Eytan Modiano, " A Distributed Algorithm for Throughput Optimal Routing in Overlay Networks ,”  IFIP Networking 2019, Warsaw, Poland, May 2019.

221.   Qingkai Liang and Eytan Modiano, " Optimal Network Control in Partially-Controllable Networks ,”  IEEE Infocom, Paris, April 2019.

220.   Xinzhe Fu and Eytan Modiano, " Network Interdiction Using Adversarial Traffic Flows ,”  IEEE Infocom, Paris, April 2019.

219.   Vishrant Tripathi, Rajat Talak, Eytan Modiano, " Age Optimal Information Gathering and Dissemination on Graphs ,”  IEEE Infocom, Paris, April 2019.

218.   Jianan Zhang, Hyang-Won Lee, Eytan Modiano, " On the Robustness of Distributed Computing Networks ,”  DRCN 2019, Coimbra, Portugal, March, 2019.

217.   Hyang-Won Lee, Jianan Zhang and Eytan Modiano, " Data-driven Localization and Estimation of Disturbance in the Interconnected Power System ,”  IEEE Smartgridcomm, October, 2018.

216.   Jianan Zhang and Eytan Modiano, " Joint Frequency Regulation and Economic Dispatch Using Limited Communication ,”  IEEE Smartgridcomm, October, 2018.

215.   Rajat Talak, Sertac Karaman, Eytan Modiano, " Scheduling Policies for Age Minimization in Wireless Networks with Unknown Channel State ,”  IEEE International Symposium on Information Theory, July 2018.

214.   Thomas Stahlbuhk, Brooke Shrader, Eytan Modiano, " Online Learning Algorithms for Minimizing Queue Length Regret ,”  IEEE International Symposium on Information Theory, July 2018.

213.   Rajat Talak, Sertac Karaman, Eytan Modiano, " Distributed Scheduling Algorithms for Optimizing Information Freshness in Wireless Networks ,”  IEEE SPAWC, Kalamata, Greece, June, 2018.

212.   Rajat Talak, Sertac Karaman, Eytan Modiano, " Optimizing Information Freshness in Wireless Networks under General Interference Constraints ,”  ACM MobiHoc 2018, Los Angeles, CA, June 2018.

211.   Thomas Stahlbuhk, Brooke Shrader, Eytan Modiano, " Learning Algorithms for Scheduling in Wireless Networks with Unknown Channel Statistics ,”  ACM MobiHoc, June 2018.

210.   Khashayar Kamran, Jianan Zhang, Edmund Yeh, Eytan Modiano, " Robustness of Interdependent Geometric Networks Under Inhomogeneous Failures ,”  Workshop on Spatial Stochastic Models for Wireless Networks (SpaSWiN), Shanghai, China, May 2018.

209.   Rajat Talak, Sertac Karaman, Eytan Modiano, " Optimizing Age of Information in Wireless Networks with Perfect Channel State Information ,”  Wiopt 2018, Shanghai, China, May 2018.

208.   Abhishek Sinha, Eytan Modiano, " Network Utility Maximization with Heterogeneous Traffic Flows ,”  Wiopt 2018, Shanghai, China, May 2018.

207.   Qingkai Liang, Eytan Modiano, " Minimizing Queue Length Regret Under Adversarial Network Models ,”  ACM Sigmetrics, 2018.

206.   Jianan Zhang, Abhishek Sinha, Jaime Llorca, Anonia Tulino, Eytan Modiano, " Optimal Control of Distributed Computing Networks with Mixed-Cast Traffic Flows ,”  IEEE Infocom, Honolulu, HI, April 2018.

205.   Qingkai Liang, Eytan Modiano, " Network Utility Maximization in Adversarial Environments ,”  IEEE Infocom, Honolulu, HI, April 2018.

204.   Igor Kadota, Abhishek Sinha, Eytan Modiano, " Optimizing Age of Information in Wireless Networks with Throughput Constraints ,”  IEEE Infocom, Honolulu, HI, April 2018.

203.   QIngkai Liang, Verina (Fanyu) Que, Eytan Modiano, " Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning ,”  NIPS workshop on “Transparent and interpretable machine learning in safety critical environments,"December 2017.

202.   Rahul Singh, Xueying Guo,Eytan Modiano, " Risk-Sensitive Optimal Control of Queues ,”  IEEE Conference on Decision and Control (CDC), December 2017.

201.   Rajat Talak, Sertac Karaman, Eytan Modiano, " Minimizing Age of Information in Multi-Hop Wireless Networks ,”  Allerton Conference on Communication, Control, and Computing, September 2017.

200.   Abhishek Sinha, Eytan Modiano, " Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions ,”  ACM MobiHoc, Madras, India, July 2017.

199.   Rajat Talak, Sertac Karaman, Eytan Modiano, " Capacity and delay scaling for broadcast transmission in highly mobile wireless networks ,”  ACM MobiHoc, Madras, India, July 2017.

198.5 . Y.-P. Hsu, E. Modiano, and L. Duan, " Age of Information: Design and Analysis of Optimal Scheduling Algorithms ,”  IEEE International Symposium on Information Theory (ISIT), 2017.

198.   Qingkai Liang and Eytan Modiano, " Coflow Scheduling in Input-Queued Switches: Optimal Delay Scaling and Algorithms ,”  IEEE Infocom, Atlanta, GA, May 2017.

197.   Jianan Zhang and Eytan Modiano, " Robust Routing in Interdependent Networks ,”  IEEE Infocom, Atlanta, GA, May 2017.

196.   Abhishek Sinha, Eytan Modiano, " Optimal Control for Generalized Network Flow Problems ,”  IEEE Infocom, Atlanta, GA, May 2017.

195.   Rajat Talak*, Sertac Karaman, Eytan Modiano, " Speed Limits in Autonomous Vehicular Networks due to Communication Constraints ,”  IEEE Conference on Decision and Control (CDC), Las Vegas, NV, December 2016.

194.   Marzieh Parandehgheibi*, Konstantin Turitsyn, Eytan Modiano, " Distributed Frequency Control in Power Grids Under Limited Communication ,”  IEEE Conference on Decision and Control (CDC), Las Vegas, NV, December 2016.

193.   Igor Kadota, Elif Uysal-Biyikoglu, Rahul Singh, Eytan Modiano, " Minimizing Age of Information in Broadcast Wireless Networks ,”  Allerton Allerton Conference on Communication, Control, and Computing, September 2016.

192.   Jianan Zhang, Edmund Yeh, Eytan Modiano, " Robustness of Interdependent Random Geometric Networks ,”  Allerton Conference on Communication, Control, and Computing, September 2016.

191.   Abhishek Sinha, Leandros Tassiulas, Eytan Modiano, " Throughput-Optimal Broadcast in Wireless Networks with Dynamic Topology ,”  ACM MobiHoc'16, Paderborn, Germany, July, 2016. (winner of best paper award)

190.   Abishek Sinha, Georgios Paschos, Eytan Modiano, " Throughput-Optimal Multi-hop Broadcast Algorithms ,”  ACM MobiHoc'16, Paderborn, Germany, July, 2016.

189.   Thomas Stahlbuhk, Brooke Shrader, Eytan Modiano, " Throughput Maximization in Uncooperative Spectrum Sharing Networks ,”  IEEE International Symposium on Information Theory, Barcelona, Spain, July 2016.

188.   Thomas Stahlbuhk, Brooke Shrader, Eytan Modiano, " Topology Control for Wireless Networks with Highly-Directional Antennas ,”  IEEE Wiopt, Tempe, Arizona, May, 2016.

187.   Qingkai Liang, H.W. Lee, Eytan Modiano, " Robust Design of Spectrum-Sharing Networks ,”  IEEE Wiopt, Tempe, Arizona, May, 2016.

186.   Hossein Shokri-Ghadikolae, Carlo Fischione and Eytan Modiano, " On the Accuracy of Interference Models in Wireless Communications ,”  IEEE International Conference on Communications (ICC), 2016.

185.   Qingkai Liang and Eytan Modiano, " Survivability in Time-varying Networks ,”  IEEE Infocom, San Francisco, CA, April 2016.

184.   Kyu S. Kim, Chih-Ping Li, Igor Kadota, Eytan Modiano, " Optimal Scheduling of Real-Time Traffic in Wireless Networks with Delayed Feedback ,”  Allerton conference on Communication, Control, and Computing, September 2015.

183.   Marzieh Parandehgheibi, Eytan Modiano, " Modeling the Impact of Communication Loss on the Power Grid Under Emergency Control ,”  IEEE SmartGridComm, Miami, FL, Nov. 2015.

182.   Anurag Rai, Chih-ping Li, Georgios Paschos, Eytan Modiano, " Loop-Free Backpressure Routing Using Link-Reversal Algorithms ,”  Proceedings of the ACM MobiHoc, July 2015.

181.   Longbo Huang, Eytan Modiano, " Optimizing Age of Information in a Multiclass Queueing System ,”  Proceedings of IEEE ISIT 2015, Hong Kong, Jun 2015.

180.   M. Johnston, E. Modiano, " A New Look at Wireless Scheduling with Delayed Information ,”  Proceedings of IEEE ISIT 2015, Hong Kong, June 2015.

179.   M. Johnston, E. Modiano, " Scheduling over Time Varying Channels with Hidden State Information ,”  Proceedings of IEEE ISIT 2015, Hong Kong, June 2015.

178.   M. Johnston and E. Modiano, " Controller Placement for Maximum Throughput Under Delayed CSI ,”  IEEE Wiopt, Mombai, India, May 2015.

177.   A. Sinha, G. Paschos, C. P. Li, and E. Modiano, " Throughput Optimal Broadcast on Directed Acyclic Graphs ,”  IEEE Infocom, Hong Kong, April 2015.

176.   J. Zheng and E. Modiano, " Enhancing Network Robustness via Shielding ,”  IEEE Design of Reliable Communication Networks, Kansas City, March 2015.

175.   H. W. Lee and E. Modiano, " Robust Design of Cognitive Radio Networks ,”  Information and Communication Technology Convergence (ICTC), 2014.

174.   Greg Kuperman and Eytan Modiano, " Disjoint Path Protection in Multi-Hop Wireless Networks with Interference Constraints ,”  IEEE Globecom, Austin, TX, December 2014.

173.   Marzieh Parandehgheibi, Eytan Modiano, David Hay, " Mitigating Cascading Failures in Interdependent Power Grids and Communication Networks ,”  IEEE Smartgridcomm, Venice, Italy, November 2014.

172.   Georgios Paschos and Eytan Modiano, " Throughput optimal routing in overlay networks ,”  Allerton conference on Communication, Control, and Computing, September 2014.

171.   Nathan Jones, George Paschos, Brooke Shrader, Eytan Modiano, " An overlay architecture for Throughput Optimal Multipath Routing ,”  ACM MobiHoc, August 2014.

170.   Matt Johnston, Eytan Modiano, Yuri Polyanskiy, " Opportunistic Scheduling with Limited Channel State Information: A Rate Distortion Approach ,”  IEEE International Symposium on Information Theory, Honolulu, HI, July 2014.

169.   Chih-Ping Li, Georgios Paschos, Eytan Modiano, Leandros Tassiulas, " Dynamic Overload Balancing in Server Farms ,”  Networking 2014, Trondheim, Norway, June, 2014.

168.   Hulya Seferonglu and Eytan Modiano, " TCP-Aware Backpressure Routing and Scheduling ,”  Information Theory and Applications, San Diego, CA, February 2014.

167.   Mihalis Markakis, Eytan Modiano, John Tsitsiklis, " Delay Stability of Back-Pressure Policies in the presence of Heavy-Tailed Traffic ,”  Information Theory and Applications, San Diego, CA, February 2014.

166.   Kyu Soeb Kim, Chih-ping Li, Eytan Modiano, " Scheduling Multicast Traffic with Deadlines in Wireless Networks ,”  IEEE Infocom, Toronto, CA, April 2014.

165.   Georgios Paschos, Chih-ping Li, Eytan Modiano, Kostas Choumas, Thanasis Korakis, " A Demonstration of Multirate Multicast Over an 802.11 Mesh Network ,”  IEEE Infocom, Toronto, CA, April 2014.

164.   Sebastian Neumayer, Eytan Modiano, " Assessing the Effect of Geographically Correlated Failures on Interconnected Power-Communication Networks ,”  IEEE SmartGridComm, 2013.

163.   Marzieh Parandehgheibi, Eytan Modiano, " Robustness of Interdependent Networks: The case of communication networks and the power grid ,”  IEEE Globecom, December 2013.

162.   Matt Johnston, Eytan Modiano, " Optimal Channel Probing in Communication Systems: The Two-Channel Case ,”  IEEE Globecom, December 2013.

161.   Mihalis Markakis, Eytan Modiano, John N. Tsitsiklis, " Delay Analysis of the Max-Weight Policy under Heavy-Tailed Traffic via Fluid Approximations ,”  Allerton Conference, October 2013.

160.   Matthew Johnston, Isaac Keslassy, Eytan Modiano, " Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal ,”  IEEE International Symposium on Information Theory, July 2013.

159.   Krishna P Jagannathan, Libin Jiang, Palthya Lakshma Naik, Eytan Modiano, " Scheduling Strategies to Mitigate the Impact of Bursty Traffic in Wireless Networks ,”  11th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks Wiopt 2013, Japan, May 2013. (Winner – Best Paper Award).

158.   Hulya Seferoglu and Eytan Modiano, " Diff-Max: Separation of Routing and Scheduling in Backpressure-Based Wireless Networks ,”  IEEE Infocom, Turin, Italy, April 2013.

157.   Chih-Ping Li, Eytan Modiano, " Receiver-Based Flow Control for Networks in Overload ,”  IEEE Infocom, Turin, Italy, April 2013.

156.   Nathan Jones, Brooke Shrader, Eytan Modiano, " Distributed CSMA with Pairwise Coding ,”  IEEE Infocom, Turin, Italy, April 2013.

155.   Greg Kuperman and Eytan Modiano, " Network Protection with Guaranteed Recovery Times using Recovery Domains ,”  IEEE Infocom, Turin, Italy, April 2013.

154.   Greg Kuperman and Eytan Modiano, " Providing Protection in Multi-Hop Wireless Networks ,”  IEEE Infocom, Turin, Italy, April 2013.

153.   Greg Kuperman, Eytan Modiano, Aradhana Narula-Tam, " Network Protection with Multiple Availability Guarantees ,”  IEEE ICC workshop on New Trends in Optical Networks Survivability, June 2012.

152.   Nathaniel Jones, Brooke Shrader, Eytan Modiano, " Optimal Routing and Scheduling for a Simple Network Coding Scheme ,”  IEEE Infocom, Orlando, Fl, March, 2012.

151.   Mihalis Markakis, Eytan Modiano, John Tsitsiklis, " Max-Weight Scheduling in Networks with Heavy-Tailed Traffic ,”  IEEE Infocom, Orlando, Fl, March, 2012.

150.   Guner Celik and Eytan Modiano, " Scheduling in Networks with Time-Varying Channels and Reconfiguration Delay ,”  IEEE Infocom, Orlando, Fl, March, 2012.

149.   Sebastian Neumayer, Alon Efrat, Eytan Modiano, " Geographic Max-Flow and Min-cut Under a Circular Disk Failure Model ,”  IEEE Infocom (MC), Orlando, Fl, March, 2012.

148.   Marzieh Parandehgheibi, Hyang-Won Lee, and Eytan Modiano, " Survivable Paths in Multi-Layer Networks ,”  Conference on Information Science and Systems, March, 2012.

147.   Greg Kuperman, Eytan Modiano, and Aradhana Narula-Tam, " Partial Protection in Networks with Backup Capacity Sharing ,”  Optical Fiber Communications Conference (OFC), Anaheim, CA, March, 2012.

146.   Krishna Jagannathan, Libin Jiang, Eytan Modiano, " On Scheduling Algorithms Robust to Heavy-Tailed Traffic ,”  Information Theory and Applications (ITA), San Diego, CA, February 2012.

145.   M. Johnston, H.W. Lee, E. Modiano, " Robust Network Design for Stochastic Traffic Demands ,”  IEEE Globecom, Next Generation Networking Symposium, Houston, TX, December 2011.

144.   S. Neumayer, E. Modiano, " Network Reliability Under Random Circular Cuts ,”  IEEE Globecom, Optical Networks and Systems Symposium, Houston, TX, December 2011.

143.   H.W. Lee, K. Lee, E. Modiano, " Maximizing Reliability in WDM Networks through Lightpath Routing ,”  IEEE Globecom, Optical Networks and Systems Symposium, Houston, TX, December 2011.

142.   Guner Celik, Sem Borst, Eytan Modiano, Phil Whiting, " Variable Frame Based Max-Weight Algorithms for Networks with Switchover Delay ,”  IEEE International Symposium on Information Theory, St. Petersburgh, Russia, August 2011.

141.   Krishna Jaganathan, Ishai Menache, Eytan Modiano, and Gil Zussman, " Non-cooperative Spectrum Access - The Dedicated vs. Free Spectrum Choice ,”  ACM MOBIHOC'11, May 2011.

140.   Krishna Jagannathan, Shie Mannor, Ishai Menache, Eytan Modiano, " A State Action Frequency Approach to Throughput Maximization over Uncertain Wireless Channels ,”  IEEE Infocom (Mini-conference), Shanghai, China, April 2011.

139.   Guner Celik, Long B. Le, Eytan Modiano, " Scheduling in Parallel Queues with Randomly Varying Connectivity and Switchover Delay ,”  IEEE Infocom (Mini-conference), Shanghai, China, April 2011.

138.   Gregory Kuperman, Eytan Modiano, Aradhana Narula-Tam, " Analysis and Algorithms for Partial Protection in Mesh Networks ,”  IEEE Infocom (Mini-conference), Shanghai, China, April 2011.

137.   Matthew Johnston, Hyang-Won Lee, Eytan Modiano, " A Robust Optimization Approach to Backup Network Design with Random Failures ,”  IEEE Infocom, Shanghai, China, April 2011.

136.   Krishna Jagannathan, Mihalis Markakis, Eytan Modiano, John Tsitsiklis, " Queue Length Asymptotics for Generalized Max-Weight Scheduling in the presence of Heavy-Tailed Traffic ,”  IEEE Infocom, Shanghai, China, April 2011.

135.   Guner Celik and Eytan Modiano, " Dynamic Vehicle Routing for Data Gathering in Wireless Networks ,”  In Proc. IEEE CDC'10, Dec. 2010..***

134.   Long B. Le, Eytan Modiano, Changhee Joo, and Ness B. Shroff, " Longest-queue-first scheduling under the SINR interference model ,”  ACM MobiHoc, September 2010..***

133.   Krishna Jagannathan, Mihalis Markakis, Eytan Modiano, John Tsitsiklis, " Throughput Optimal Scheduling in the Presence of Heavy-Tailed Traffic ,”  Allerton Conference on Communication, Control, and Computing, September 2010..**

132.   Delia Ciullo, Guner Celik, Eytan Modiano, " Minimizing Transmission Energy in Sensor Networks via Trajectory Control ,”  IEEE Wiopt 2010, Avignon, France, June 2010, (10 pages; CD proceedings – page numbers not available).

131.   Sebastian Neumayer and Eytan Modiano, " Network Reliability with Geographically Correlated Failures ,”  IEEE Infocom 2010, San Diego, CA, March 2010, (9 pages; CD proceedings – page numbers not available).**

130.   Long Le, Eytan Modiano, Ness Shroff, " Optimal Control of Wireless Networks with Finite Buffers ,”  IEEE Infocom 2010, San Diego, CA, March 2010, (9 pages; CD proceedings – page numbers not available).

129.   Kayi Lee, Hyang-Won Lee, Eytan Modiano, " Reliability in Layered Network with Random Link Failures ,”  IEEE Infocom 2010, San Diego, CA, March 2010, (9 pages; CD proceedings – page numbers not available).**

128.   Krishna Jagannathan, Eytan Modiano, " The Impact of Queue length Information on Buffer Overflow in Parallel Queues ,”  Allerton Conference on Communication, Control, and Computing, September 2009, pgs. 1103 -1110 **

127.   Mihalis Markakis, Eytan Modiano, John Tsitsiklis, " Scheduling Policies for Single-Hop with Heavy-Tailed Traffic ,”  Allerton Conference on Communication, Control, and Computing, September 2009, pgs. 112 – 120..**

126.   Dan Kan, Aradhana Narula-Tam, Eytan Modiano, " Lightpath Routing and Capacity Assignment for Survivable IP-over-WDM Networks ,”  DRCN 2009, Alexandria, VA October 2009, pgs. 37 -44..**

125.   Mehdi Ansari, Alireza Bayesteh, Eytan Modiano, " Opportunistic Scheduling in Large Scale Wireless Networks ,”  IEEE International Symposium on Information Theory, Seoul, Korea, June 2009, pgs. 1624 – 1628.

124.   Hyang-Won Lee, Eytan Modiano and Long Bao Le, " Distributed Throughput Maximization in Wireless Networks via Random Power Allocation ,”  IEEE Wiopt, Seoul, Korea, June 2009. (9 pages; CD proceedings – page numbers not available).

123.   Wajahat Khan, Eytan Modiano, Long Le, " Autonomous Routing Algorithms for Networks with Wide-Spread Failures ,”  IEEE MILCOM, Boston, MA, October 2009. (6 pages; CD proceedings – page numbers not available).**

122.   Guner Celik and Eytan Modiano, " Random Access Wireless Networks with Controlled Mobility ,”  IEEE Med-Hoc-Nets, Haifa, Israel, June 2009, pgs. 8 – 14.**

121.   Hyang-Won Lee and Eytan Modiano, " Diverse Routing in Networks with Probabilistic Failures ,”  IEEE Infocom, April 2009, pgs. 1035 – 1043.

120.   Kayi Lee and Eytan Modiano, " Cross-layer Survivability in WDM-based Networks ,”  IEEE Infocom, April 2009, pgs. 1017 -1025..**

119.   Krishna Jagannathan, Eytan Modiano, Lizhong Zheng, " On the Trade-off between Control Rate and Congestion in Single Server Systems ,”  IEEE Infocom, April 2009, pgs. 271 – 279.**

118.   Sebastian Neumayer, Gil Zussman, Rueven Cohen, Eytan Modiano, " Assessing the Vulnerability of the Fiber Infrastructure to Disasters ,”  IEEE Infocom, April 2009, pgs. 1566 – 1574.**

117.   Long Le, Krishna Jagannathan and Eytan Modiano, " Delay analysis of max-weight scheduling in wireless ad hoc networks ,”  Conference on Information Science and Systems, Baltimore, MD, March, 2009, pgs. 389 – 394.**

116.   Krishna Jagannathan, Eytan Modiano, Lizhong Zheng, " Effective Resource Allocation in a Queue: How Much Control is Necessary? ,”  Allerton Conference on Communication, Control, and Computing, September 2008, pgs. 508 – 515.**

115.   Sebastian Neumayer, Gil Zussman, Rueven Cohen, Eytan Modiano, " Assessing the Impact of Geographically Correlated Network Failures ,”  IEEE MILCOM, November 2008. (6 pages; CD proceedings – page numbers not available).**

114.   Emily Craparo, Jonathan P. How, and Eytan Modiano, " Simultaneous Placement and Assignment for Exploration in Mobile Backbone Networks ,”  IEEE conference on Decision and Control (CDC), November 2008, pgs. 1696 – 1701 **

113.   Anand Srinivas and Eytan Modiano, " Joint node placement and assignment for throughput optimization in mobile backbone networks ,”  IEEE INFOCOM'08, pp. 1130 – 1138, Phoenix, AZ, Apr. 2008, pgs. 1130 – 1138.**

112.   Guner Celik, Gil Zussman, Wajahat Khan and Eytan Modiano, " MAC for Networks with Multipacket Reception Capability and Spatially Distributed Nodes ,”  IEEE INFOCOM'08, Phoenix, AZ, Apr. 2008, pgs. 1436 – 1444.**

111.   Gil Zussman, Andrew Brzezinski, and Eytan Modiano, " Multihop Local Pooling for Distributed Throughput Maximization in Wireless Networks ,”  IEEE INFOCOM'08, Phoenix, AZ, Apr. 2008, pgs 1139 – 1147.**

110.   Emily Craparo, Jonathan How and Eytan Modiano, " Optimization of Mobile Backbone Networks: Improved Algorithms and Approximation ,”  IEEE American Control Conference, Seattle, WA, June 2008, pgs. 2016 – 2021.**

109.   Atilla Eryilmaz, Asuman Ozdaglar, Devavrat Shah, Eytan Modiano, " Imperfect Randomized Algorithms for the Optimal Control of Wireless Networks ,”  Conference on Information Science and Systems, Princeton, NJ, March, 2008, pgs. 932 – 937.

108.   Anand Srinivas and Eytan Modiano, " Optimal Path Planning for Mobile Backbone Networks ,”  Conference on Information Science and Systems, Princeton, NJ, March, 2008, pgs. 913 – 918.

107.   Kayi Lee and Eytan Modiano, " Cross-layer Survivability in WDM Networks with Multiple Failures ,”  IEEE Optical Fiber Communications Conference, San Diego, CA February, 2008 (3 pages; CD proceedings – page numbers not available).

106.   Andrew Brzezinski, Gil Zussman and Eytan Modiano, " Local Pooling Conditions for Joint Routing and Scheduling ,”  Workshop on Information Theory and Applications, pp. 499 – 506, La Jolla, CA, January, 2008, pgs. 499 – 506.

105.   Murtaza Zafer and Eytan Modiano, " Minimum Energy Transmission over a Wireless Fading Channel with Packet Deadlines ,”  Proceedings of IEEE Conference on Decision and Control (CDC), New Orleans, LA, December, 2007, pgs. 1148 – 1155.**

104.   Atilla Eryilmaz, Asuman Ozdaglar, Eytan Modiano, " Polynomial Complexity Algorithms for Full Utilization of Multi-hop Wireless Networks ,”  IEEE Infocom, Anchorage, AK, April, 2007, pgs. 499 – 507.

103.   Murtaza Zafer and Eytan Modiano, " Delay Constrained Energy Efficient Data Transmission over a Wireless Fading Channel ,”  Workshop on Information Theory and Application, University of California, San Diego, CA, February, 2007, pgs. 289 – 298.**

102.   Atilla Eryilmaz, Eytan Modiano, Asuman Ozdaglar, " Randomized Algorithms for Throughput-Optimality and Fairness in Wireless Networks ,”  Proceedings of IEEE Conference on Decision and Control (CDC), San Diego, CA, December, 2006, pgs. 1936 – 1941.

101.   Anand Srinivas, Gil Zussman, and Eytan Modiano, " Distributed Mobile Disk Cover - A Building Block for Mobile Backbone Networks ,”  Proc. Allerton Conf. on Communication, Control, and Computing, Allerton, IL, September 2006, (9 pages; CD proceedings – page numbers not available).**

100.   Krishna Jagannathan, Sem Borst, Phil Whiting, Eytan Modiano, " Scheduling of Multi-Antenna Broadcast Systems with Heterogeneous Users ,”  Allerton Conference on Communication, Control and Computing, Allerton, IL, September 2006, (10 pages; CD proceedings – page numbers not available).**

99.   Andrew Brzezinski, Gil Zussman, and Eytan Modiano, " Enabling Distributed Throughput Maximization in Wireless Mesh Networks - A Partitioning Approach ,”  Proceedings of ACM MOBICOM'06, Los Angeles, CA, Sep. 2006, (12 pages; CD proceedings – page numbers not available).**

98.   Eytan Modiano, Devavrat Shah, and Gil Zussman, " Maximizing Throughput in Wireless Networks via Gossiping ,”  Proc. ACM SIGMETRICS / IFIP Performance'06, Saint-Malo, France, June 2006, (12 pages; CD proceedings – page numbers not available). (best paper award)

97.   Anand Srinivas, Gil Zussman, and Eytan Modiano, " Mobile Backbone Networks – Construction and Maintenance ,”  Proc. ACM MOBIHOC'06, Florence, Italy, May 2006, (12 pages; CD proceedings – page numbers not available).**

96.   Andrew Brzezinski and Eytan Modiano, " Achieving 100% throughput in reconfigurable optical networks ,”  IEEE INFOCOM 2006 High-Speed Networking Workshop, Barcelona, Spain, April 2006, (5 pages; CD proceedings – page numbers not available).**

95.   Krishna P. Jagannathan, Sem Borst, Phil Whiting, Eytan Modiano, " Efficient scheduling of multi-user multi-antenna systems ,”  Proceedings of WiOpt 2006, Boston, MA, April 2006, (8 pages; CD proceedings – page numbers not available).**

94.   Andrew Brzezinski and Eytan Modiano, " Greedy weighted matching for scheduling the input-queued switch ,”  Conference on Information Sciences and Systems (CISS), Princeton, NJ, March 2006, pgs. 1738 – 1743.**

93.   Murtaza Zafer and Eytan Modiano, " Optimal Adaptive Data Transmission over a Fading Channel with Deadline and Power Constraints ,”  Conference on Information Sciences and Systems (CISS), Princeton, New Jersey, March 2006, pgs. 931 – 937.**

92.   Li-Wei Chen and E. Modiano, " A Geometric Approach to Capacity Provisioning in WDM Networks with Dynamic Traffic ,”  Conference on Information Science and Systems (CISS), Princeton, NJ, March, 2006, pgs. 1676 – 1683, **

91.   Jun Sun and Eytan Modiano, " Channel Allocation Using Pricing in Satellite Networks ,”  Conference on Information Science and Systems (CISS), Princeton, NJ, March, 2006, pgs. 182 – 187.**

90.   Jun Sun, Jay Gao, Shervin Shambayatti and Eytan Modiano, " Ka-Band Link Optimization with Rate Adaptation ,”  IEEE Aerospace Conference, Big Sky, MN, March, 2006. (7 pages; CD proceedings – page numbers not available).

89.   Alessandro Tarello, Eytan Modiano and Jay Gao, " Energy efficient transmission scheduling over Mars proximity links ,”  IEEE Aerospace Conference, Big Sky, MN, March, 2006. (10 pages; CD proceedings – page numbers not available).

88.   A. Brzezinski and E. Modiano, " RWA decompositions for optimal throughput in reconfigurable optical networks ,”  INFORMS Telecommunications Conference, Dallas, TX, March 2006 (3 pages; CD proceedings – page numbers not available).**

87.   Li Wei Chen and E. Modiano, " Geometric Capacity Provisioning for Wavelength Switched WDM Networks ,”  Workshop on Information Theory and Application, University of California, San Diego, CA, February, 2006. (8 pages; CD proceedings – page numbers not available).**

86.   Murtaza Zafer and Eytan Modiano, " Joint Scheduling of Rate-guaranteed and Best-effort Services over a Wireless Channel ,”  IEEE Conference on Decision and Control, Seville, Spain, December, 2005, pgs. 6022–6027.**

85.   Jun Sun and Eytan Modiano, " Opportunistic Power Allocation for Fading Channels with Non-cooperative Users and Random Access ,”  IEEE BroadNets – Wireless Networking Symposium, Boston, MA, October, 2005, pgs. 397–405.**

84.   Li Wei Chen and Eytan Modiano, " Uniform vs. Non-uniform Band Switching in WDM Networks ,”  IEEE BroadNets-Optical Networking Symposium, Boston, MA, October, 2005, pgs. 219– 228.**

83.   Sonia Jain and Eytan Modiano, " Buffer Management Schemes for Enhanced TCP Performance over Satellite Links ,”  IEEE MILCOM, Atlantic City, NJ, October 2005 (8 pages; CD proceedings – page numbers not available).**

82.   Murtaza Zafer and Eytan Modiano, " Continuous-time Optimal Rate Control for Delay Constrained Data Transmission ,”  Allerton Conference on Communications, Control and Computing, Allerton, IL, September, 2005 (10 pages; CD proceedings – page numbers not available).**

81.   Alessandro Tarello, Eytan Modiano, Jun Sun, Murtaza Zafer, " Minimum Energy Transmission Scheduling subject to Deadline Constraints ,”  IEEE Wiopt, Trentino, Italy, April, 2005, pgs. 67–76. (Winner of best student paper award).**

80.   Amir Khandani, Eytan Modiano, Jinane Abounadi, Lizhong Zheng, " Reliability and Route Diversity in Wireless Networks ,”  Conference on Information Science and System, Baltimore, MD, March, 2005, (8 pages; CD proceedings – page numbers not available).**

79.   Andrew Brzezinski, Iraj Saniee, Indra Widjaja, Eytan Modiano, " Flow Control and Congestion Management for Distributed Scheduling of Burst Transmissions in Time-Domain Wavelength Interleaved Networks ,”  IEEE/OSA Optical Fiber Conference (OFC), Anaheim, CA, March, 2005, pgs. WC4-1–WC4-3.

78.   Andrew Brzezinski and Eytan Modiano, " Dynamic Reconfiguration and Routing Algorithms for IP-over-WDM Networks with Stochastic Traffic ,”  IEEE Infocom 2005, Miami, FL, March, 2005, pgs. 6–11.**

77.   Murtaza Zafer and Eytan Modiano, " A Calculus Approach to Minimum Energy Transmission Policies with Quality of Service Guarantees ,”  IEEE Infocom 2005, Miami, FL, March, 2005, pgs. 548–559.**

76.   Michael Neely and Eytan Modiano, " Fairness and optimal stochastic control for heterogeneous networks ,”  IEEE Infocom 2005, Miami, FL, March, 2005, pgs. 1723 – 1734.**

75.   Aradhana Narula-Tam, Thomas G. Macdonald, Eytan Modiano, and Leslie Servi, " A Dynamic Resource Allocation Strategy for Satellite Communications ,”  IEEE MILCOM, Monterey, CA, October, 2004, pgs. 1415 – 1421.

74.   Li-Wei Chen, Poompat Saengudomlert and Eytan Modiano, " Optimal Waveband Switching in WDM Networks ,”  IEEE International Conference on Communication (ICC), Paris, France, June, 2004, pgs. 1604 – 1608.**

73.   Michael Neely and Eytan Modiano, " Logarithmic Delay for NxN Packet Switches ,”  IEEE Workshop on High performance Switching and Routing (HPSR 2004), Phoenix, AZ, April, 2004, pgs. 3–9.**

72.   Li-Wei Chen and Eytan Modiano, " Dynamic Routing and Wavelength Assignment with Optical Bypass using Ring Embeddings ,”  IEEE Workshop on High performance Switching and Routing (HPSR 2004), Phoenix, Az, April, 2004, pgs. 119–125.**

71.   Randall Berry and Eytan Modiano, " On the Benefits of Tunability in Reducing Electronic Port Counts in WDM/TDM Networks ,”  IEEE Infocom, Hong Kong, March 2004, pgs. 1340–1351.

70.   Andrew Brzezinski and Eytan Modiano, " A new look at dynamic traffic scheduling in WDM networks with transceiver tuning latency ,”  Informs Telecommunications Conference, Boca Raton, FL, March 2004, pgs. 25–26.**

69.   Chunmei Liu and Eytan Modiano, " Packet Scheduling with Window Service Constraints ,”  Conference on Information Science and System, Princeton, NJ, March, 2004, pgs. 178–184.**

68.   Jun Sun, Eytan Modiano, and Lizhong Zheng, " A Novel Auction Algorithm for Fair Allocation of a Wireless Fading Channel ,”  Conference on Information Science and System, Princeton, NJ, March, 2004, pgs. 1377–1383.**

67.   Murtaza Zafer and Eytan Modiano, " Impact of Interference and Channel Assignment on Blocking Probability in Wireless Networks ,”  Conference on Information Science and System, Princeton, NJ, March, 2004, pgs. 430–436.**

66.   Chunmei Liu and Eytan Modiano, " An Analysis of TCP over Random Access Satellite Links ,”  IEEE Wireless Communications and Networking Conference (WCNC), Atlanta, GA, February, 2004, pgs. 2033–2040..**

65.   Randall Berry and Eytan Modiano, " Using tunable optical transceivers for reducing the number of ports in WDM/TDM Networks ,”  IEEE/OSA Optical Fiber Conference (OFC), Los Angeles, CA, February, 2004, pgs. 23–27.

64.   Aradhana Narula-Tam, Eytan Modiano and Andrew Brzezinski, " Physical Topology Design for Survivable Routiing of Logical Rings in WDM-based Networks ,”  IEEE Globecom, San francisco, CA, December, 2003, pgs. 2552–2557.

63.   Jun Sun, Lizhong Zheng and Eytan Modiano, " Wireless Channel Allocation Using an Auction Algorithm ,”  Allerton Conference on Communications, Control and Computing, October, 2003, pgs. 1114–1123..**

62.   Amir Khandani, Jinane Abounadi, Eytan Modiano, Lizhong Zhang, " Cooperative Routing in Wireless Networks ,”  Allerton Conference on Communications, Control and Computing, October, 2003, pgs. 1270–1279.**

61.   Poompat Saengudomlert, Eytan Modiano and Robert Gallager, " Dynamic Wavelength Assignment for WDM all optical Tree Networks ,”  Allerton Conference on Communications, Control and Computing, October, 2003, 915–924.**

60.   Aradhana Narula-Tam and Eytan Modiano, " Designing Physical Topologies that Enable Survivable Routing of Logical Rings ,”  IEEE Workshop on Design of Reliable Communication Networks (DRCN), October, 2003, pgs. 379–386.

59.   Anand Srinivas and Eytan Modiano, " Minimum Energy Disjoint Path Routing in Wireless Ad Hoc Networks ,”  ACM Mobicom, San Diego, Ca, September, 2003, pgs. 122–133.**

58.   Michael Neely and Eytan Modiano, " Improving Delay in Ad-Hoc Mobile Networks Via Redundant Packet Transfers ,”  Conference on Information Science and System, Baltimore, MD, March, 2003 (6 pages; CD proceedings – page numbers not available).**

57.   Michael Neely, Eytan Modiano and Charles Rohrs, " Dynamic Power Allocation and Routing for Time Varying Wireless Networks ,”  IEEE Infocom 2003, San Francisco, CA, April, 2003, pgs. 745–755.**

56.   Alvin Fu, Eytan Modiano, and John Tsitsiklis, " Optimal Energy Allocation for Delay-Constrained Data Transmission over a Time-Varying Channel ,”  IEEE Infocom 2003, San Francisco, CA, April, 2003, pgs. 1095–1105.**

55.   Poompat Saengudomlert, Eytan Modiano and Rober Gallager, " On-line Routing and Wavelength Assignment for Dynamic Traffic in WDM Ring and Torus Networks ,”  IEEE Infocom 2003, San Francisco, CA, April, 2003, pgs. 1805–1815.**

54.   Li-Wei Chen and Eytan Modiano, " Efficient Routing and Wavelength Assignment for Reconfigurable WDM Networks with Wavelength Converters ,”  IEEE Infocom 2003, San Francisco, CA, April, 2003, pgs. 1785–1794. Selected as one of the best papers of Infocom 2003 for fast track publication in IEEE/ACM Transactions on Networking.**

53.   Mike Neely, Jun Sun and Eytan Modiano, " Delay and Complexity Tradeoffs for Dynamic Routing and Power Allocation in a Wireless Network ,”  Allerton Conference on Communication, Control, and Computing, Allerton, Illinois, October, 2002, pgs. 157 –159.**

52.   Anand Ganti, Eytan Modiano and John Tsitsiklis, " Transmission Scheduling for Multi-Channel Satellite and Wireless Networks ,”  Allerton Conference on Communication, Control, and Computing, Allerton, Illinois, October, 2002, pgs. 1318–1327.**

51.   Poompat Saengudomlert, Eytan Modiano, and Robert G. Gallager, " Optimal Wavelength Assignment for Uniform All-to-All Traffic in WDM Tree Networks ,”  Allerton Conference on Communication, Control, and Computing, Allerton, Illinois, October, 2002, pgs. 528–537.**

50.   Hungjen Wang, Eytan Modiano and Muriel Medard, " Partial Path Protection for WDM Networks: End-to-End Recovery Using Local Failure Information ,”  IEEE International Symposium on Computer Communications (ISCC), Taormina, Italy, July 2002, pgs. 719–725.**

49.   Jun Sun and Eytan Modiano, " Capacity Provisioning and Failure Recovery in Mesh-Torus Networks with Application to Satellite Constellations ,”  IEEE International Symposium on Computer Communications (ISCC), Taormina, Italy, July 2002, pgs. 77–84.**

48.   Alvin Fu, Eytan Modiano, and John Tsitsiklis, " Optimal Energy Allocation and Admission Control for Communications Satellites ,”  IEEE INFOCOM 2002, New York, June, 2002, pgs. 648–656.**

47.   Michael Neely, Eytan Modiano and Charles Rohrs, " Power and Server Allocation in a Multi-Beam Satellite with Time Varying Channels ,”  IEEE INFOCOM 2002, New York, June, 2002, pgs. 1451–1460..**

46.   Mike Neely, Eytan Modiano and Charles Rohrs, " Tradeoffs in Delay Guarantees and Computation Complexity for N x N Packet Switches ,”  Conference on Information Science and Systems, Princeton, NJ, March, 2002, pgs. 136–148.**

45.   Alvin Fu, Eytan Modiano and John Tsitsiklis, " Transmission Scheduling Over a Fading Channel with Energy and Deadline Constraints ,”  Conference on Information Science and System, Princeton, NJ, March, 2002, pgs. 1018–1023.**

44.   Chunmei Liu and Eytan Modiano, " On the Interaction of Layered Protocols: The Case of Window Flow Control and ARQ ,”  Conference on Information Science and System, Princeton, NJ, March, 2002, pgs. 118–124.**

43.   Mike Neely, Eytan Modiano and Charles Rohrs, " Packet Routing over Parallel Time-varying Queues with Application to Satellite and Wireless Networks ,”  Conference on Information Science and System, Princeton, NJ, March, 2002, pgs. 360–366.**

42.   Ahluwalia Ashwinder, Eytan Modiano and Li Shu, " On the Complexity and Distributed Construction of Energy Efficient Broadcast Trees in Static Ad Hoc Wireless Networks ,”  Conference on Information Science and System, Princeton, NJ, March, 2002, pgs. 807–813.**

41.   Jun Sun and Eytan Modiano, " Capacity Provisioning and Failure Recovery for Satellite Constellations ,”  Conference on Information Science and System, Princeton, NJ, March, 2002, pgs. 1039–1045.**

40.   Eytan Modiano, Hungjen Wang, and Muriel Medard, " Partial Path Protection for WDM networks ,”  Informs Telecommunications Conference, Boca Raton, FL, March 2002, pgs. 78–79.**

39.   Poompat Saengudomlert, Eytan H. Modiano, and Robert G. Gallager, " An On-Line Routing and Wavelength Assignment Algorithm for Dynamic Traffic in a WDM Bidirectional Ring ,”  Joint Conference on Information Sciences (JCIS), Durham, North Carolina, March, 2002, pgs. 1331–1334.**

38.   Randy Berry and Eytan Modiano, " Switching and Traffic Grooming in WDM Networks ,”  Joint Conference on Information Sciences (JCIS), Durham, North Carolina, March, 2002, pgs. 1340–1343.

37.   Eytan Modiano, Hungjen Wang, and Muriel Medard, " Using Local Information for WDM Network Protection ,”  Joint Conference on Information Sciences (JCIS), Durham, North Carolina, March, 2002, pgs. 1398–1401.**

36.   Aradhana Narula-Tam and Eytan Modiano, " Network architectures for supporting survivable WDM rings ,”  IEEE/OSA Optical Fiber Conference (OFC) 2002, Anaheim, CA, March, 2002, pgs. 105–107.

35.   Michael Neely, Eytan Modiano, Charles Rohrs, " Packet Routing over Parallel Time-Varying Queues with Application to Satellite and Wireless Networks ,”  Allerton Conference on Communication, Control, and Computing, Allerton, Illinois, September, 2001, pgs. 1110-1111.**

34.   Eytan Modiano and Randy Berry, " The Role of Switching in Reducing Network Port Counts ,”  Allerton Conference on Communication, Control, and Computing, Allerton, Illinois, September, 2001, pgs. 376-385.

33.   Eytan Modiano, " Resource allocation and congestion control in next generation satellite networks ,”  IEEE Gigabit Networking Workshop (GBN 2001), Anchorage, AK, April 2001, (2 page summary-online proceedings).

32.   Eytan Modiano and Aradhana Narula-Tam, " Survivable Routing of Logical Topologies in WDM Networks ,”  IEEE Infocom 2001, Anchorage, AK, April 2001, pgs. 348–357.

31.   Michael Neely and Eytan Modiano, " Convexity and Optimal Load Distribution in Work Conserving */*/1 Queues ,”  IEEE Infocom 2001, Anchorage, AK, April 2001, pgs. 1055–1064.

30.   Eytan Modiano and Randy Berry, " Using Grooming Cross- Connects to Reduce ADM Costs in Sonet/WDM Ring Networks ,”  IEEE/OSA Optical Fiber Conference (OFC) 2001, Anaheim, CA March 2001, pgs. WL1- WL3.

29.   Eytan Modiano and Aradhana Narula-Tam, " Designing Survivable Networks Using Effective Rounting and Wavelenght Assignment (RWA) ,”  IEEE/OSA Optical Fiber Conference (OFC) 2001, Anaheim, CA March 2001, pgs. TUG5-1 – TUG5– 3.

28.   Roop Ganguly and Eytan Modiano, " Distributed Algorithms and Architectures for Optical Flow Switching in WDM networks ,”  IEEE International Symposium on Computer Communications (ISCC 2000), Antibes, France, July 2000, pgs. 134–139.

27.   Aradhana Narula-Tam, Philip J. Lin and Eytan Modiano, " Wavelength Requirements for Virtual topology Reconfiguration in WDM Ring Networks ,”  IEEE International Conference on Communications (ICC 2000), New Orleans, LA, June 2000, pgs. 1650–1654.

26.   Eytan Modiano, "Optical Flow Switching for the Next Generation Internet,”  IEEE Gigabit Networking Workshop (GBN 2000), Tel-aviv, March 2000 (2 page summary-online proceedings).

25.   Aradhana Narula and Eytan Modiano, " Dynamic Reconfiguration in WDM Packet Networks with Wavelength Limitations ,”  IEEE/OSA Optical Fiber Conference (OFC) 2000, Baltimore, MD, March, 2000, pgs. 1210–1212.

24.   Brett Schein and Eytan Modiano, " Quantifying the benefits of configurability in circuit-switched WDM ring networks ,”  IEEE Infocom 2000, Tel Aviv, Israel, April, 2000, pgs.1752–1760..***

23.   Aradhana Narula-Tam and Eytan Modiano, " Load Balancing Algorithms for WDM-based IP networks ,”  IEEE Infocom 2000, Tel Aviv, Israel, April, 2000, pgs. 1010–1019.

22.   Nan Froberg, M. Kuznetsov, E. Modiano, et. al., " The NGI ONRAMP test bed: Regional Access WDM technology for the Next Generation Internet ,”  IEEE LEOS ’99, October, 1999, pgs. 230–231.

21.   Randy Berry and Eytan Modiano, " Minimizing Electronic Multiplexing Costs for Dynamic Traffic in Unidirectional SONET Ring Networks ,”  IEEE International Conference on Communications (ICC ’99), Vancouver, CA, June 1999, pgs. 1724–1730..***

20.   Brett Schein and Eytan Modiano, "Increasing Traffic Capacity in WDM Ring Networks via Topology Reconfiguration,”  Conference on Information Science and Systems, Baltimore, MD, March 1999, pgs. 201 – 206.

19.   Eytan Modiano and Richard Barry, " Design and Analysis of an Asynchronous WDM Local Area Network Using a Master/Slave Scheduler ,”  IEEE Infocom ’99, New York, NY, March 1999, pgs. 900–907.

18.   Randy Berry and Eytan Modiano, " Grooming Dynamic Traffic in Unidirectional SONET Ring Networks ,”  IEEE/OSA Optical Fiber Conference (OFC) ’99, San Diego, CA, February 1999, pgs. 71–73.

17.   Angela Chiu and Eytan Modiano, " Reducing Electronic Multiplexing Costs in Unidirectional SONET/WDM Ring Networks Via Efficient Traffic Grooming ,”  IEEE Globecom '98, Sydney, Australia, November 1998, pgs. 322–327.

16.   Eytan Modiano, " Throughput Analysis of Unscheduled Multicast Transmissions in WDM Broadcast-and-Select Networks ,”  IEEE International Symposium on Information Theory, Boston, MA, September 1998, pg. 167.

15.   Eytan Modiano and Angela Chiu, "Traffic Grooming Algorithms for Minimizing Electronic Multiplexing Costs in Unidirectional SONET/WDM Ring Networks,”  Conference on Information Science and Systems, Princeton, NJ, March 1998, 653–658.

14.   Eytan Modiano and Eric Swanson, " An Architecture for Broadband Internet Services over a WDM-based Optical Access Network ,”  IEEE Gigabit Networking Workshop (GBN '98), San Francisco, CA, March 1998 (2 page summary-online proceedings).

13.   Eytan Modiano, " Unscheduled Multicasts in WDM Broadcast-and-Select Networks ,”  IEEE Infocom '98, San Francisco, CA, March 1998, pgs. 86–93.

12.   Eytan Modiano, Richard Barry and Eric Swanson, " A Novel Architecture and Medium Access Control (MAC) protocol for WDM Networks ,”  IEEE/OSA Optical Fiber Conference (OFC) '98, San Jose, CA, February 1998, pgs. 90–91.

11.   Eytan Modiano, " Scheduling Algorithms for Message Transmission Over a Satellite Broadcast System ,”  IEEE MILCOM 97, Monterey, CA, November 1997, pgs. 628–634.

10.   Eytan Modiano, " Scheduling Packet Transmissions in A Multi-hop Packet Switched Network Based on Message Length ,”  IEEE International Conference on Computer Communications and Networks (IC3N) Las Vegas, Nevada, September 1997, pgs. 350–357.

9.   Eytan Modiano, "A Simple Algorithm for Optimizing the Packet Size Used in ARQ Protocols Based on Retransmission History,”  Conference on Information Science and Systems, Baltimore, MD, March 1997, pgs. 672–677.

8.   Eytan Modiano, " A Multi-Channel Random Access Protocol for the CDMA Channel ,”  IEEE PIMRC '95, Toronto, Canada, September 1995, pgs. 799–803.

7.   Eytan Modiano Jeffrey Wieselthier and Anthony Ephremides, " A Simple Derivation of Queueing Delay in a Tree Network of Discrete-Time Queues with Deterministic Service Times ,”  IEEE International Symposium on Information Theory, Trondheim, Norway, June 1994, pg. 372.

6.   Eytan Modiano, Jeffrey Wieselthier and Anthony Ephremides, "An Approach for the Analysis of Packet Delay in an Integrated Mobile Radio Network,”  Conference on Information Sciences and Systems, Baltimore, MD, March 1993, pgs. 138-139.

5.   Eytan Modiano and Anthony Ephremides, " A Method for Delay Analysis of Interacting Queues in Multiple Access Systems ,”  IEEE INFOCOM 1993, San Francisco, CA, March 1993, pgs. 447 – 454.

4.   Eytan Modiano and Anthony Ephremides, " A Model for the Approximation of Interacting Queues that Arise in Multiple Access Schemes ,”  IEEE International Symposium on Information Theory, San Antonio, TX, January 1993, pg. 324.

3.   Eytan Modiano and Anthony Ephremides, " Efficient Routing Schemes for Multiple Broadcasts in a Mesh ,”  Conference on Information Sciences and Systems, Princeton, NJ, March 1992, pgs. 929 – 934.

2.   Eytan Modiano and Anthony Ephremides, " On the Secrecy Complexity of Computing a Binary Function of Non-uniformly Distributed Random Variables ,”  IEEE International Symposium on Information Theory, Budapest, Hungary, June 1991, pg. 213.

1.   Eytan Modiano and Anthony Ephremides, "Communication Complexity of Secure Distributed Computation in the Presence of Noise,”  IEEE International Symposium on Information Theory, San Diego, CA, January 1990, pg. 142.

Book Chapters

  • Hyang-Won Lee, Kayi Lee, Eytan Modiano, " Cross-Layer Survivability " in Cross-Layer Design in Optical Networks, Springer, 2013.
  • Li-Wei Chen and Eytan Modiano, " Geometric Capacity Provisioning for Wavelength-Switched WDM Networks ," Chapter in Computer Communications and Networks Series: Algorithms for Next Generation Networks, Springer, 2010.
  • Amir Khandani, Eytan Modiano, Lizhong Zhang, Jinane Aboundi, " Cooperative Routing in Wireless Networks ," Chapter in Advances in Pervasive Computing and Networking, Kluwer Academic Publishers, 2005.
  • Jian-Qiang Hu and Eytan Modiano, " Traffic Grooming in WDM Networks ," Chapter in Emerging Optical Network Technologies, Kluwer Academic Publishers, to appear, 2004.
  • Eytan Modiano, " WDM Optical Networks ," Wiley Encyclopedia of Telecommunications (John Proakis, Editor), 2003.
  • Eytan Modiano, " Optical Access Networks for the Next Generation Internet ," in Optical WDM Networks: Principles and Practice, Kluwer Academic Prublishers, 2002.
  • Eytan Modiano, Richard Barry and Eric Swanson, " A Novel Architecture and Medium Access Control protocol for WDM Networks ," Trends in Optics and Photonics Series (TOPS) volume on Optical Networks and Their Applications, 1998.
  • Eytan Modiano and Kai-Yeung Siu, "Network Flow and Congestion Control," Wiley Encyclopedia of Electrical and Electronics Engineering, 1999.

Technical Reports

  • Amir Khandani, Eytan Modiano, Jinane Abounadi, Lizhong Zheng, "Reliability and Route Diversity in Wireless Networks, " MIT LIDS Technical Report number 2634, November, 2004.
  • Anand Srinivas and Eytan Modiano, "Minimum Energy Disjoint Path Routing in Wireless Ad Hoc Networks, " MIT LIDS Technical Report, P-2559, March, 2003.
  • Eytan Modiano and Aradhana Narula-Tam, "Survivable lightpath routing: a new approach to the design of WDM-based networks, " LIDS report 2552, October, 2002.
  • Michael Neely, Eytan Modiano and Charles Rohrs, "Packet Routing over Parallel Time-Varying Queues with Application to Satellite and Wireless Networks," LIDS report 2520, September, 2001.
  • Jun Sun and Eytan Modiano, "Capacity Provisioning and Failure Recovery in Mesh-Torus Networks with Application to Satellite Constellations," LIDS report 2518, September, 2001.
  • Hungjen Wang, Eytan Modiano and Muriel Medard, "Partial Path Protection for WDM Networks: End-to-End Recovery Using Local Failure Information, " LIDS report 2517, Sept. 2001.
  • Alvin Fu, Eytan Modiano, and John Tsitsiklis, "Optimal Energy Allocation and Admission Control for Communications Satellites, " LIDS report 2516, September, 2001.
  • Michael Neely, Eytan Modiano and Charles Rohrs, "Power and Server Allocation in a Multi-Beam Satellite with Time Varying Channels, " LIDS report 2515, September, 2001.
  • Eytan Modiano, "Scheduling Algorithms for Message Transmission Over the GBS Satellite Broadcast System, " Lincoln Laboratory Technical Report Number TR-1035, June 1997.
  • Eytan Modiano, "Scheduling Packet Transmissions in A Multi-hop Packet Switched Network Based on Message Length, " Lincoln Laboratory Technical Report number TR-1036, June, 1997.

Networking is central to modern computing, from WANs connecting cell phones to massive data stores, to the data-center interconnects that deliver seamless storage and fine-grained distributed computing. Because our distributed computing infrastructure is a key differentiator for the company, Google has long focused on building network infrastructure to support our scale, availability, and performance needs, and to apply our expertise and infrastructure to solve similar problems for Cloud customers. Our research combines building and deploying novel networking systems at unprecedented scale, with recent work focusing on fundamental questions around data center architecture, cloud virtual networking, and wide-area network interconnects. We helped pioneer the use of Software Defined Networking, the application of ML to networking, and the development of large-scale management infrastructure including telemetry systems. We are also addressing congestion control and bandwidth management, capacity planning, and designing networks to meet traffic demands. We build cross-layer systems to ensure high network availability and reliability. By publishing our findings at premier research venues, we continue to engage both academic and industrial partners to further the state of the art in networked systems.

Recent Publications

Some of our teams.

Cloud networking

Global networking

Network infrastructure

We're always looking for more talented, passionate people.

Careers

  • Open access
  • Published: 21 June 2018

A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

  • Raouf Boutaba 1 ,
  • Mohammad A. Salahuddin 1 ,
  • Noura Limam 1 ,
  • Sara Ayoubi 1 ,
  • Nashid Shahriar 1 ,
  • Felipe Estrada-Solano 1 , 2 &
  • Oscar M. Caicedo 2  

Journal of Internet Services and Applications volume  9 , Article number:  16 ( 2018 ) Cite this article

136k Accesses

581 Citations

57 Altmetric

Metrics details

Machine Learning (ML) has been enjoying an unprecedented surge in applications that solve problems and enable automation in diverse domains. Primarily, this is due to the explosion in the availability of data, significant improvements in ML techniques, and advancement in computing capabilities. Undoubtedly, ML has been applied to various mundane and complex problems arising in network operation and management. There are various surveys on ML for specific areas in networking or for specific network technologies. This survey is original, since it jointly presents the application of diverse ML techniques in various key areas of networking across different network technologies. In this way, readers will benefit from a comprehensive discussion on the different learning paradigms and ML techniques applied to fundamental problems in networking, including traffic prediction, routing and classification, congestion control, resource and fault management, QoS and QoE management, and network security. Furthermore, this survey delineates the limitations, give insights, research challenges and future opportunities to advance ML in networking. Therefore, this is a timely contribution of the implications of ML for networking, that is pushing the barriers of autonomic network operation and management.

1 Introduction

Machine learning (ML) enables a system to scrutinize data and deduce knowledge. It goes beyond simply learning or extracting knowledge, to utilizing and improving knowledge over time and with experience. In essence, the goal of ML is to identify and exploit hidden patterns in “training” data. The patterns learnt are used to analyze unknown data, such that it can be grouped together or mapped to the known groups. This instigates a shift in the traditional programming paradigm, where programs are written to automate tasks. ML creates the program (i.e. model ) that fits the data. Recently, ML is enjoying renewed interest. Early ML techniques were rigid and incapable of tolerating any variations from the training data [ 134 ].

Recent advances in ML have made these techniques flexible and resilient in their applicability to various real-world scenarios, ranging from extraordinary to mundane. For instance, ML in health care has greatly improved the areas of medical imaging and computer-aided diagnosis. Ordinarily, we often use technological tools that are founded upon ML. For example, search engines extensively use ML for non-trivial tasks, such as query suggestions, spell correction, web indexing and page ranking. Evidently, as we look forward to automating more aspects of our lives, ranging from home automation to autonomous vehicles, ML techniques will become an increasingly important facet in various systems that aid in decision making, analysis, and automation.

Apart from the advances in ML techniques, various other factors contribute to its revival. Most importantly, the success of ML techniques relies heavily on data [ 77 ]. Undoubtedly, there is a colossal amount of data in todays’ networks, which is bound to grow further with emerging networks, such as the Internet of Things (IoT) and its billions of connected devices [ 162 ]. This encourages the application of ML that not only identifies hidden and unexpected patterns, but can also be applied to learn and understand the processes that generate the data.

Recent advances in computing offer storage and processing capabilities required for training and testing ML models for the voluminous data. For instance, Cloud Computing offers seemingly infinite compute and storage resources, while Graphics Processing Units [ 342 ] (GPUs) and Tensor Processing Units [ 170 ] (TPUs) provide accelerated training and inference for voluminous data. It is important to note that a trained ML model can be deployed for inference on less capable devices e.g. smartphones. Despite these advances, network operations and management still remains cumbersome, and network faults are prevalent primarily due to human error [ 291 ]. Network faults lead to financial liability and defamation in reputation of network providers. Therefore, there is immense interest in building autonomic (i.e. self-configuring, self-healing, self-optimizing and self-protecting) networks [ 28 ] that are highly resilient.

Though, there is a dire need for cognitive control in network operation and management [ 28 ], it poses a unique set of challenges for ML. First, each network is unique and there is a lack of enforcement of standards to attain uniformity across networks. For instance, the enterprise network from one organization is diverse and disparate from another. Therefore, the patterns proven to work in one network may not be feasible for another network of the same kind. Second, the network is continually evolving and the dynamics inhibit the application of a fixed set of patterns that aid in network operation and management. It is almost impossible to manually keep up with network administration, due to the continuous growth in the number of applications running in the network and the kinds of devices connected to the network.

Key technological advances in networking, such as network programmability via Software-Defined Networking (SDN), promote the applicability of ML in networking. Though, ML has been extensively applied to problems in pattern recognition, speech synthesis, and outlier detection, its successful deployment for network operations and management has been limited. The main obstacles include what data can be collected from and what control actions can be exercised on legacy network devices. The ability to program the network by leveraging SDN alleviates these obstacles. The cognition from ML can be used to aid in the automation of network operation and management tasks. Therefore, it is exciting and non-trivial to apply ML techniques for such diverse and complex problems in networking. This makes ML in networking an interesting research area, and requires an understanding of the ML techniques and the problems in networking.

In this paper, we discuss the advances made in the application of ML in networking. We focus on traffic engineering, performance optimization and network security. In traffic engineering, we discuss traffic prediction, classification and routing that are fundamental in providing differentiated and prioritized services. In performance optimization, we discuss application of ML techniques in the context of congestion control, QoS/QoE correlation, and resource and fault management. Undoubtedly, security is a cornerstone in networking and in this regard, we highlight existing efforts that use ML techniques for network security.

The primary objective of this survey is to provide a comprehensive body of knowledge on ML techniques in support of networking. Furthermore, we complement the discussion with key insights into the techniques employed, their benefits, limitations and their feasibility to real-world networking scenarios. Our contributions are summarized as follows:

A comprehensive view of ML techniques in networking . We review literature published in peer-reviewed venues over the past two decades that have high impact and have been well received by peers. The works selected and discussed in this survey are comprehensive in the advances made for networking. The key criteria used in the selection is a combination of the year of publication, citation count and merit. For example, consider two papers A and B published in the same year with citation counts x and y , respectively. If x is significantly larger than y , A would be selected for discussion. However, upon evaluating B , if it is evidenced that it presents original ideas, critical insights or lessons learnt, then it is also selected for discussion due to its merit, despite the lower citation count.

A purposeful discussion on the feasibility of the ML techniques for networking . We explore ML techniques in networking, including their benefits and limitations. It is important to realize that our coverage of networking aspects are not limited to a specific network technology (e.g. cellular network, wireless sensor network (WSN), mobile ad hoc network (MANET), cognitive radio network (CRN)). This gives readers a broad view of the possible solutions to networking problems across network technologies.

Identification of key challenges and future research opportunities . The presented discussion on ML-based techniques in networking uncovers fundamental research challenges that confront networking and inhibit ultimate cognition in network operation and management. A discussion of these opportunities will motivate future work and push the boundaries of networking.

Though there are various surveys on ML in networking [ 18 , 61 , 82 , 142 , 246 , 339 ], this survey is purposefully different. Primarily, this is due to its timeliness, the comprehensiveness of ML techniques covered, and the various aspects of networking discussed, irrespective of the network technology. For instance, Nguyen and Armitage [ 339 ], though impactful, is now dated and only addresses traffic classification in networking. Whereas, Fadlullah et al. [ 142 ] and Buczak et al. [ 82 ], both state-of-the-art surveys, have a specialized treatment of ML to specific problems in networking. On the other hand, Klaine et al. [ 246 ], Bkassiny et al. [ 61 ] and Alsheikh et al. [ 18 ], though comprehensive in their coverage of ML techniques in networking, are specialized to specific network technology i.e. cellular network, CRN and WSN, respectively. Therefore, our survey provides a holistic view of the applicability, challenges and limitations of ML techniques in networking.

We organize the remainder of this paper as follows. In Section 2 , we provide a primer on ML, which discusses different categories of ML-based techniques, their essential constituents and their evolution. Sections 3 , 4 and 5 discuss the application of the various ML-based techniques for traffic prediction, classification and routing, respectively. We present the ML-based advances in performance management, with respect to congestion control, resource management, fault management, and QoS/QoE management for networking in Sections 6 , 7 , 8 and 9 . In Section 10 , we examine the benefits of ML for anomaly and misuse detection for intrusion detection in networking. Finally, we delineate the lessons learned, and future research challenges and opportunities for ML in networking in Section 11 . We conclude in Section 12 with a brief overview of our contributions. To facilitate reading, Fig.  1 presents a conceptual map of the survey, and Table  1 provides the list of acronyms and definitions for ML.

figure 1

Conceptual map of the survey

2 Machine learning for networking—a primer

In 1959, Arthur Samuel coined the term “Machine Learning”, as “the field of study that gives computers the ability to learn without being explicitly programmed” [ 369 ]. There are four broad categories of problems that can leverage ML, namely, clustering , classification , regression and rule extraction [ 79 ]. In clustering problems, the objective is to group similar data together, while increasing the gap between the groups. Whereas, in classification and regression problems, the goal is to map a set of new input data to a set of discrete or continuous valued output, respectively. Rule extraction problems are intrinsically different, where the goal is to identify statistical relationships in data.

ML techniques have been applied to various problem domains. A closely related domain consists of data analysis for large databases, called data mining [ 16 ]. Though, ML techniques can be applied to aid in data mining, the goal of data mining problems is to critically and meticulously analyze data—its features, variables, invariants, temporal granularity, probability distributions and their transformations. However, ML goes beyond data mining to predict future events or sequence of events.

Generally, ML is ideal for inferring solutions to problems that have a large representative dataset. In this way, as illustrated in Fig.  2 , ML techniques are designed to identify and exploit hidden patterns in data for (i) describing the outcome as a grouping of data for clustering problems, (ii) predicting the outcome of future events for classification and regression problems, and (iii) evaluating the outcome of a sequence of data points for rule extraction problems. Though, the figure illustrates data and outcome in a two-dimensional plane, the discussion holds for multi-dimensional datasets and outcome functions. For instance, in the case of clustering, the outcome can be a non-linear function in a hyperplane that discriminates between groups of data. Networking problems can be formulated as one of these problems that can leverage ML. For example, a classification problem in networking can be formulated to predict the kind of security attack: Denial-of-Service (DoS), User-to-Root (U2R), Root-to-Local (R2L), or probing, given network conditions. Whereas, a regression problem can be formulated to predict of when a future failure will transpire.

figure 2

Problem categories that benefit from machine learning. a Clustering. b Classification. c Regression. d Rule extraction

Though there are different categories of problems that enjoy the benefits of ML, there is a generic approach to building ML-based solutions. Figure  3 illustrates the key constituents in designing ML-based solutions for networking. Data collection pertains to gathering, generating and, or defining the set of data and the set of classes of interest. Feature engineering is used to reduce dimensionality in data and identify discriminating features that reduce computational overhead and increase accuracy. Finally, ML techniques carefully analyze the complex inter- and intra-relationships in data and learn a model for the outcome.

figure 3

The constituents of ML-based solutions

For instance, consider an example of Gold values over time, as illustrated in Fig.  2 c . Naïvely, a linear regression model, shown as a best-fit line through the historical data, can facilitate in predicting future values of Gold. Therefore, once the ML model is built, it can be deployed to deduce outcomes from new data. However, the outcomes are periodically validated, since they can drift over time, known as concept drifting. This can be used as an indicator for incremental learning and re-training of the ML model. In the following subsections, we discuss each of the components in Fig.  3 .

2.1 Learning paradigms

There are four learning paradigms in ML, supervised , unsupervised , semi-supervised and reinforcement learning . These paradigms influence data collection, feature engineering, and establishing ground truth. Recall, the objective is to infer an outcome, given some dataset. The dataset used in constructing the ML model is often denoted as training data and labels are associated with training data if the user is aware of the description of the data. The outcome is often perceived as the identification of membership to a class of interest.

There are two schools of thought on the methodology for learning; generative and discriminative [ 333 ]. The basis for the learning methodologies is rooted in the famous Bayes’ theorem for conditional probability and the fundamental rule that relates joint probability to conditional probability. Bayes’ theorem is stated as follows. Given two events A and B , the conditional probability is defined as

which is also stated as

The joint probability P ( A , B ) of events A and B is P ( A ∩ B )= P ( B | A )× P ( A ), and the conditional probability is the normalized joint probability. The generative methodology aims at modeling the joint probability P ( A , B ) by predicting the conditional probability. On the other hand, in the discriminative methodology a function is learned for the conditional probability.

Supervised learning uses labeled training datasets to create models. There are various methods for labeling datasets known as ground truth (cf., Section 2.4 ). This learning technique is employed to “learn” to identify patterns or behaviors in the “known” training datasets. Typically, this approach is used to solve classification and regression problems that pertain to predicting discrete or continuous valued outcomes, respectively. On the other hand, it is possible to employ semi-supervised ML techniques in the face of partial knowledge. That is, having incomplete labels for training data or missing labels. Unsupervised learning uses unlabeled training datasets to create models that can discriminate between patterns in the data. This approach is most suited for clustering problems. For instance, outliers detection and density estimation problems in networking, can pertain to grouping different instances of attacks based on their similarities.

Reinforcement learning (RL) is an agent-based iterative process for modeling problems for decision making. Generally, learning is based on exemplars from training datasets. However, in RL there is an agent that interacts with the external world, and instead of being taught by exemplars, it learns by exploring the environment and exploiting the knowledge. The actions are rewarded or penalized. Therefore, the training data in RL constitutes a set of state-action pairs and rewards (or penalties). The agent uses feedback from the environment to learn the best sequence of actions or “policy” to optimize a cumulative reward. For example, rule extraction from the data that is statistically supported and not predicted. Unlike, generative and discriminative approaches that are myopic in nature, RL may sacrifice immediate gains for long-term rewards. Hence, RL is best suited for making cognitive choices, such as decision making, planning and scheduling [ 441 ].

It is important to note that there is a strong relationship between the training data, problem and the learning paradigm. For instance, it is possible that due to lack of knowledge about the training data, supervised learning cannot be employed and other learning paradigms have to be employed for model construction.

2.2 Data collection

ML techniques require representative data, possibly without bias, to build an effective ML model for a given networking problem. Data collection is an important step, since representative datasets vary not only from one problem to another but also from one time period to the next. In general, data collection can be achieved in two phases—offline and online [ 460 ]. Offline data collection allows to gather a large amount of historical data that can be used for model training and testing. Whereas, real-time network data collected in the online phase can be used as feedback to the model, or as input for re-training the model. Offline data can also be obtained from various repositories, given it is relevant to the networking problem being studied. Examples of these repositories include Waikato Internet Traffic Storage (WITS) [ 457 ], UCI Knowledge Discovery in Databases (KDD) Archive [ 450 ], Measurement and Analysis on the WIDE Internet (MAWI) Working Group Traffic Archive [ 474 ], and Information Marketplace for Policy and Analysis of Cyber-risk & Trust (IMPACT) Archive [ 202 ].

An effective way to collect both offline and online data is by using monitoring and measurement tools. These tools provide greater control in various aspects of data collection, such as data sampling rate, monitoring duration and location (e.g. network core vs. network edge). They often use network monitoring protocols, such as Simple Network Management Protocol (SNMP) [ 208 ], Cisco NetFlow [ 100 ], and IP Flow Information Export (IPFIX) [ 209 ]. However, monitoring can be active or passive [ 152 ]. Active monitoring injects measurement traffic, such as probe packets in the network and collects relevant data from this traffic. In contrast, passive monitoring collects data by observing the actual network traffic. Evidently, active monitoring introduces additional overhead due to bandwidth consumption from injected traffic. Whereas, passive monitoring eliminates this overhead, at the expense of additional devices that analyze the network traffic to gather relevant information.

Once data is collected, it is decomposed into training, validation (also called development set or the “dev set”), and test datasets. The training set is leveraged to find the ideal parameters (e.g. weights of connections between neurons in a Neural Network (NN)) of a ML model. Whereas, the validation set is used to choose the suitable architecture (e.g. the number of hidden layers in a NN) of a ML model, or choose a model from a pool of ML models. Note, if a ML model and its architecture are pre-selected, there is no need for a validation set. Finally, test set is used to assess the unbiased performance of the selected model. Note, validation and testing can be performed using one of two methods—holdout or k -fold cross-validation. In the holdout method, part of the available dataset is set aside and used as a validation (or testing) set. Whereas, in the k -fold cross-validation, the available dataset is randomly divided into k equal subsets. Validation (or testing) process is repeated k times, with k −1 unique subsets for training and the remaining subset for validating (or testing) the model, and the outcomes are averaged over the rounds.

A common decomposition of the dataset can conform to 60/20/20% among training, validation, and test datasets, or 70/30% in case validation is not required. These rule-of-thumb decompositions are reasonable for datasets that are not very large. However, in the era of big data, where a dataset can have millions of entries, other extreme decompositions, such as 98/1/1% or 99/0.4/0.1%, are also valid. However, it is important to avoid skewness in the training datasets, with respect to the classes of interest. This inhibits the learning and generalization of the outcome, leading to model over- and/or under-fitting. In addition, both validation and testing datasets should be independent of the training dataset and follow the same probability distribution as the training dataset.

Temporal and spatial robustness of ML model can be evaluated by exposing the model to training and validation datasets that are temporally and spatially distant. For instance, a model that performs well when evaluated with datasets collected a year after being trained or from a different network, exhibits temporal and spatial stability, respectively.

2.3 Feature engineering

The collected raw data may be noisy or incomplete. Before using the data for learning, it must go through a pre-processing phase to clean the data. Another important step prior to learning, or training a model, is feature extraction. These features act as discriminators for learning and inference. In networking, there are many choices of features to choose from. Broadly, they can be categorized based on the level of granularity.

At the finest level of granularity, packet-level features are simplistically extracted or derived from collected packets, e.g. statistics of packet size, including mean, root mean square (RMS) and variance, and time series information, such as hurst. The key advantage of packet-level statistics is their insensitivity to packet sampling that is often employed for data collection and interferes with feature characteristics [ 390 ]. On the other hand, Flow-level features are derived using simple statistics, such as mean flow duration, mean number of packets per flow, and mean number of bytes per flow [ 390 ]. Whereas, connection-level features from the transport layer are exploited to infer connection oriented details. In addition to the flow-level features, transport layer details, such as throughput and advertised window size in TCP connection headers, can be employed. Though these features generate high quality data, they incur computational overhead and are highly susceptible to sampling and routing asymmetries [ 390 ].

Feature engineering is a critical aspect in ML that includes feature selection and extraction. It is used to reduce dimensionality in voluminous data and to identify discriminating features that reduce computational overhead and increase accuracy of ML models. Feature selection is the removal of features that are irrelevant or redundant [ 321 ]. Irrelevant features increase computational overhead with marginal to no gain in accuracy, while redundant features promote over-fitting. Feature extraction is often a computationally intensive process of deriving extended or new features from existing features, using techniques, such as entropy, Fourier transform and principal component analysis (PCA).

Features selection and extraction can be performed using tools, such as NetMate [ 21 ] and WEKA [ 288 ]. However, in this case, the extraction and selection techniques are limited by the capability of the tool employed. Therefore, often specialized filter, embedded, and wrapper-based methods are employed for feature selection. Filtering prunes out the training data after carefully analyzing the dataset for identifying the irrelevant and redundant features. In contrast, wrapper-based techniques take an iterative approach, using a different subset of features in every iteration to identify the optimal subset. Whereas, embedded methods combine the benefits of filter and wrapper-based methods, and perform feature selection during model creation. Examples of feature selection techniques include, colored traffic activity graphs (TAG) [ 221 ], breadth-first search (BFS) [ 496 ], L1 Regularization [ 259 ], backward greedy feature selection (BGFS) [ 137 ], consistency-based (CON) and correlation-based feature selection (CFS) [ 321 , 476 ]. It is crucial to carefully select an ideal set of features that strikes a balance between exploiting correlation and reducing/eliminating over-fitting for higher accuracy and lower computational overhead.

Furthermore, it is important to consider the characteristics of the task we are addressing while performing feature engineering. To better illustrate this, consider the following scenario from network traffic classification. One variant of the problem entails the identification of a streaming application (e.g. Netflix) from network traces. Intuitively, average packet-size and packet inter-arrival times are representative features, as they play a dominant role in traffic classification. Average packet size is fairly constant in nature [ 492 ] and packet inter-arrival times are a good discriminator for bulk data transfer (e.g. FTP) and streaming applications [ 390 ]. However, average packet size can be skewed by intermediate fragmentation and encryption, and packet inter-arrival times and their distributions are affected by queuing in routers [ 492 ]. Furthermore, streaming applications often behave similar to bulk data transfer applications [ 390 ]. Therefore, it is imperative to consider the classes of interest i.e. applications, before selecting the features for this network traffic classification problem.

Finally, It is also essential to select features that do not contradict underlying assumptions in the context of the problem. For example, in traffic classification, features that are extracted from multi-modal application classes (e.g. WWW) tend to show a non-Gaussian behavior [ 321 ]. These relationships not only become irrelevant and redundant, they contradict widely held assumptions in traffic classification, such as feature distributions being independent and following a Gaussian distribution. Therefore, careful feature extraction and selection is crucial for the performance of ML models [ 77 ].

2.4 Establishing ground truth

Establishing the ground truth pertains to giving a formal description (i.e. labels) to the classes of interest. There are various methods for labeling datasets using the features of a class. Primarily, it requires hand-labeling by domain experts, with aid from deep packet inspection (DPI) [ 462 , 496 ], pattern matching (e.g. application signatures) or unsupervised ML techniques (e.g. AutoClass using EM) [ 136 ].

For instance, in traffic classification, establishing ground truth for application classes in the training dataset can be achieved using application signature pattern matching [ 140 ]. Application signatures are built using features, such as average packet size, flow duration, bytes per flow, packets per flow, root mean square packet size and IP traffic packet payload [ 176 , 390 ]. Average packet size and flow duration have been shown to be good discriminators [ 390 ]. Application signatures for encrypted traffic (e.g. SSH, HTTPS) extract the signature from unencrypted handshakes. However, these application signatures must be kept up-to-date and adapted to the application dynamics [ 176 ].

Alternatively, it is possible to design and rely on statistical and structural content models for describing the datasets and infer the classes of interest. For instance, these models can be used to classify a protocol based on the label of a single instance of that protocol and correlations can be derived from unlabeled training data [ 286 ]. On the other hand, common substring graphs capture structural information about the training data [ 286 ]. These models are good at inferring discriminators for binary, textual and structural content [ 286 ].

Inadvertently, the ground truth drives the accuracy of ML models. There is also an inherent mutual dependency on the size of the training data of one class of interest on another, impacting model performance [ 417 ]. The imbalance in the number of training data across classes, is a violation of the assumptions maintained by many ML techniques, that is, the data is independent and identically distributed. Therefore, typically there is a need to combat class imbalance by applying under-, over-, joint-, or ensemble-sampling techniques [ 267 ]. For example, uniform weighted threshold under-sampling creates smaller balanced training sets [ 222 ].

2.5 Performance metrics and model validation

Once an ML model has been built and the ground truth has been ascertained, it is crucial to gauge the performance of the ML model that will describe, predict, or evaluate outcomes. However, it is important to realize that there is no way to distinguish a learning algorithm as the “best” and it is not fair to compare error rates across a whole variety of applications [ 16 ]. The performance metrics can be used to measure the different aspects of the model, such as reliability, robustness, accuracy, and complexity. In this section, we discuss the validation of the ML models with respect to accuracy (cf., Table  2 ), which is a critical aspect in the applicability of the model for networking problems. Moreover, the accuracy is often used as a feedback for incremental learning [ 389 ], to increase model robustness and resilience in a dynamic environment.

Let us consider the accuracy validation of ML models for prediction. Usually, this accuracy validation undergoes an error analysis that computes the difference between the actual and predicted values. Recall, a prediction is an outcome of ML models for classification and regression problems. In classification, the common metrics for error analysis are based on the logistic function, such as binary and categorical cross-entropy—for binary and multi-class classification, respectively. In regression, the conventional error metrics are Mean Absolute Error (MAE) and Mean Squared Error (MSE). Both regression error metrics disregard the direction of under- and over-estimations in the predictions. MAE is simpler and easier to interpret than MSE, though MSE is more useful for heavily penalizing large errors.

The above error metrics are commonly used to compute the cost function of ML-based classification and regression models. Computing the cost function of the training and validation datasets (cf., Section 2.2 ) allow diagnosing performance problems due to high bias or high variance. High bias refers to a simple ML model that poorly maps the relations between features and outcomes (under-fitting). High variance implies an ML model that fits the training data but does not generalize well to predict new data (over-fitting). Depending on the diagnosed problem, the ML model can be improved by going back to one of the following design constituents (cf., Fig.  3 ): (i) data collection, for getting more training data (only for high variance), (ii) feature engineering, for increasing or reducing the set of features, and (iii) model learning, for building a simpler or more complex model, or for adjusting a regularization parameter.

After tuning the ML model for the training and validation datasets, the accuracy metrics for the test dataset are reported as the performance validation of the model. Regression models often use MAE or MSE (i.e. error metrics) to report the performance results. Other error metrics commonly used in the literature to gauge the accuracy of regression models include Mean Absolute Prediction Error (MAPE), Root MSE (RMSE), and Normalized RMSE (NRMSE). MAPE states the prediction error as a percentage, while RMSE expresses the standard deviation of the error. Whereas, NRMSE allows comparing between models working on different scales, unlike the other error metrics described.

In classification, the conventional metric to report the performance of an ML model is the accuracy. The accuracy metric is defined as the proportion of true predictions T for each class C i ∀ i =1... N among the total number of predictions, as follows:

For example, let us consider a classification model that predicts whether an email should go to the spam, inbox, or promotion folder (i.e. multi-class classification). In this case, the accuracy is the sum of emails correctly predicted as spam, inbox, and promotion, divided by the total number of predicted emails. However, the accuracy metric is not reliable when the data is skewed with respect to the classes. For example, if the actual number of spam and promotion emails is very small compared to inbox emails, a simple classification model that predicts every email as inbox will still achieve a high accuracy.

To tackle this limitation, it is recommended to use the metrics derived from a confusion matrix, as illustrated in Fig.  4 . Consider that each row in the confusion matrix represents a predicted outcome and each column represents the actual instance. In this manner, True Positive (TP) is the intersection between correctly predicted outcomes for the actual positive instances. Similarly, True Negative (TN) is when the classification model correctly predicts an actual negative instance. Whereas, False Positive (FP) and False Negative (FN) describe incorrect predictions for negative and positive actual instances, respectively. Note, that TP and TN correspond to the true predictions for the positive and negative classes, respectively. Therefore, the accuracy metric can also be defined in terms of the confusion matrix:

figure 4

Confusion matrix for binary classification

The confusion matrix in Fig.  4 works for a binary classification model. Therefore, it can be used in multi-class classification by building the confusion matrix for a specific class. This is achieved by transforming the multi-class classification problem into multiple binary classification subproblems, using the one-vs-rest strategy. For example, in the email multi-class classification, the confusion matrix for the spam class sets the positive class as spam and the negative class as the rest of the email classes (i.e. inbox and promotion), obtaining a binary classification model for spam and not spam email.

Consequentially, the True Positive Rate (TPR) describing the number of correct predictions is inferred from the confusion matrix as:

The converse, False Positive Rate (FPR) is the ratio of incorrect predictions and is defined as:

Similarly, True Negative Rate (TNR) and False Negative Rate (FNR) are used to deduce the number of correct and incorrect negative predictions, respectively. The terms recall , sensitivity , and detection rate (DR) are often used to refer to TPR. In contrast, a comparison of the TPR versus FPR is depicted in a Received Operating Characteristics (ROC) graph. In a ROC graph, where TPR is on the y -axis and FPR is on the x -axis, a good classification model will yield a ROC curve that has a highly positive gradient. This implies high TP for a small change in FP. As the gradient gets closer to 1, the prediction of the ML model deteriorates, as the number of correct and incorrect predictions is approximately the same. It should be noted that a classification model with a negative gradient in the ROC curve can be easily improved by flipping the predictions from the model [ 16 ] or swapping the labels of the actual instances.

In this way, multiple classification models for the same problem can be compared and can give insight into the different conditions under which one model outperforms another. Though the ROC aids in visual analysis, the area under the ROC curve (AUC), ideally 1, is a measure of the probability of confidence in the model to accurately predict positive outcomes for actual positive instances. Therefore, the precision of the ML model can be formally defined as the frequency of correct predictions for actual positive instances:

The trade-off between recall and precision values allows to tune the parameters of the classification models and achieve the desired results. For example, in the binary spam classifier, a higher recall would avoid missing too many spam emails (lower FN), but would incorrectly predict more emails as spam (higher FP). Whereas, a higher precision would reduce the number of incorrect predictions of emails as spam (lower FP), but would miss more spam emails (higher FN). F-measure allows to analyze the trade-off between recall and precision by providing the harmonic average, ideally 1, of these metrics:

The Coefficient of Variation (CV) is another accuracy metric, particularly used for reporting the performance of unsupervised models that conduct classification using clusters (or states). CV is a standardized measure of dispersion that represents the intra-cluster (or intra-state) similarity. A lower CV implies a small variability of each outcome in relation to the mean of the corresponding cluster. This represents a higher intra-cluster similarity and a higher classification accuracy.

2.6 Evolution of machine learning techniques

Machine learning is a branch of artificial intelligence whose foundational concepts were acquired over the years from contributions in the areas of computer science, mathematics, philosophy, economics, neuroscience, psychology, control theory, and more [ 397 ]. Research efforts during the last 75 years have given rise to a plethora of ML techniques [ 15 , 169 , 397 , 435 ]. In this section, we provide a brief history of ML focusing on the techniques that have been particularly applied in the area of computer networks (cf., Fig.  5 ).

figure 5

The evolution of machine learning techniques with key milestones

The beginning of ML dates back to 1943, when the first mathematical model of NNs for computers was proposed by McCulloch [ 302 ]. This model introduced a basic unit called artificial neuron that has been at the center of NN development to this day. However, this early model required to manually establish the correct weights of the connections between neurons. This limitation was addressed in 1949 by Hebbian learning [ 184 ], a simple rule-based algorithm for updating the connection weights of the early NN model. Like the neuron unit, Hebbian learning greatly influenced the progress of NN. These two concepts led to the construction of the first NN computer in 1950, called SNARC (Stochastic Neural Analog Reinforcement Computer) [ 397 ]. In the same year, Alan Turing proposed a test –where a computer tries to fool a human into believing it is also human– to determine if a computer is capable of showing intelligent behavior. He described the challenges underlying his idea of a “learning machine” in [ 449 ]. These developments encouraged many researchers to work on similar approaches, resulting in two decades of enthusiastic and prolific research in the ML area.

In the 1950s, the simplest linear regression model called Ordinary Least Squares (OLS) –derived from the least squares method [ 266 , 423 ] developed around the 1800s– was used to calculate linear regressions in electro-mechanical desk calculators [ 168 ]. To the best of our knowledge, this is the first evidence of using OLS in computing machines. Following this trend, two linear models for conducting classification were introduced: Maximum Entropy (MaxEnt) [ 215 , 216 ] and logistic regression [ 105 ]. A different research trend centered on pattern recognition exposed two non-parametric models (i.e. not restricted to a bounded set of parameters) capable of performing regression and classification: k -Nearest Neighbors ( k -NN) [ 147 , 420 ] and Kernel Density Estimation (KDE) [ 388 ], also known as Parzen density [ 349 ]. The former uses a distance metric to analyze the data, while the latter applies a kernel function (usually, Gaussian) to estimate the probability density function of the data.

The 1950s also witnessed the first applications of the Naïve Bayes (NB) classifier in the fields of pattern recognition [ 97 ] and information retrieval [ 297 ]. NB, whose foundations date back to the 18th and 19th centuries [ 43 , 261 ], is a simple probabilistic classifier that applies Bayes’ theorem on features with strong independence assumptions. NB was later generalized using KDE, also known as NB with Kernel Estimation (NBKE), to estimate the conditional probabilities of the features. In the area of clustering, Steinhaus [ 422 ] was the first to propose a continuous version of the to be called k -Means algorithm [ 290 ], to partition a heterogeneous solid with a given internal mass distribution into k subsets. The proposed centroid model employs a distance metric to partition the data into clusters where the distance to the centroid is minimized.

In addition, the Markov model [ 159 , 296 ] (elaborated 50 years earlier) was leveraged to construct a process based on discrete-time state transitions and action rewards, named Markov Decision Process (MDP), which formalizes sequential decision-making problems in a fully observable, controlled environment [ 46 ]. MDP has been essential for the development of prevailing RL techniques [ 435 ]. Research efforts building on the initial NN model flourished too: the modern concept of perceptron was introduced as the first NN model that could learn the weights from input examples [ 387 ]. This model describes two NN classes according to the number of layers: Single-Layer Perceptron (SLP), an NN with one input layer and one output layer, and Multi-Layer Perceptron (MLP), an NN with one or more hidden layers between the input and the output layers. The perceptron model is also known as Feedforward NN (FNN) since the nodes from each layer exhibit directed connections only to the nodes of the next layer. In the remainder of the paper, MLP-NNs and NNs in general, will be denoted by the tuple ( i n p u t _ n o d e s , h i d d e n _ l a y e r _ n o d e s + , o u t p u t _ n o d e s ), for instance a (106,60,40,1) MLP-NN has a 160-node input layer, two hidden layers of 60 and 40 nodes respectively, and a single node output layer.

By the end of the 1950s, the term “Machine Learning” was coined and defined for the first time by Arthur Samuel (cf., Section 2 ), who also developed a checkers-playing game that is recognized as the earliest self-learning program [ 401 ]. ML research continued to flourish in the 1960s, giving rise to a novel statistical class of the Markov model, named Hidden Markov Model (HMM) [ 426 ]. An HMM describes the conditional probabilities between hidden states and visible outputs in a partially observable, autonomous environment. The Baum-Welch algorithm [ 41 ] was proposed in the mi-1960s to learn those conditional probabilities. At the same time, MDP continued to instigate various research efforts. The partially observable Markov decision process (POMDP) approach to finding optimal or near-optimal control strategies for partially observable stochastic environments, given a complete model of the environment, was first proposed by Cassandra et al. [ 25 ] in 1965, while the algorithm to find the optimal solution was only devised 5 years later [ 416 ]. Another development in MDP was the learning automata –officially published in 1973 [ 448 ]–, a Reinforcement Learning (RL) technique that continuously updates the probabilities of taking actions in an observed environment, according to given rewards. Depending on the nature of the action set, the learning automata is classified as Finite Action-set Learning Automata (FALA) or Continuous Action-set Learning Automata (CALA) [ 330 ].

In 1963, Morgan and Sonquis published Automatic Interaction Detection (AID) [ 323 ], the first regression tree algorithm that seeks sequential partitioning of an observation set into a series of mutually exclusive subsets, whose means reduces the error in predicting the dependent variable. AID marked the beginning of the first generation of Decision Trees (DT). However, the application of DTs to classification problems was only initiated a decade later by Morgan and Messenger’s Theta AID (THAID) [ 305 ] algorithm.

In the meantime, the first algorithm for training MLP-NNs with many layers –also known as Deep NN (DNN) in today’s jargon– was published by Ivakhnenko and Lapa in 1965 [ 210 ]. This algorithm marked the commencement of the Deep Learning (DL) discipline, though the term only started to be used in the 1980s in the general context of ML, and in the year 2000 in the specific context of NNs [ 9 ]. By the end of the 1960s, Minsky and Papertkey’s Perceptrons book [ 315 ] drew the limitations of perceptrons-based NN through mathematical analysis, marking a historical turn in AI and ML in particular, and significantly reducing the research interest for this area over the next several years [ 397 ].

Although ML research was progressing slower than projected in the 1970s [ 397 ], the 1970s were marked by milestones that greatly shaped the evolution of ML, and contributed to its success in the following years. These include the Backpropagation (BP) algorithm, the Cerebellar Model Articulation Controller (CMAC) NN model [ 11 ], the Expectation Maximization (EM) algorithm [ 115 ], the to-be-referred-to as Temporal Difference (TD) learning [ 478 ], and the Iterative Dichotomiser 3 (ID3) algorithm [ 373 ].

Werbos’s application of BP –originally a control theory algorithm from the 1960s [ 80 , 81 , 233 ]– to train NNs [ 472 ] resurrected the research in the area. BP is to date the most popular NN training algorithm, and comes in different variants such as Gradient Descent (GD), Conjugate Gradient (CG), One Step Secant (SS), Levenberg-Marquardt (LM), and Resilient backpropagation (Rp). Though, BP is widely used in training NNs, its efficiency depends on the choice of initial weights. In particular, BP has been shown to have slower speed of convergence and to fall into local optima. Over the years, global optimization methods have been proposed to replace BP, including Genetic Algorithms (GA), Simulated Annealing (SA), and Ant Colony (AC) algorithm [ 500 ]. In 1975, Albus proposed CMAC, a new type of NN as an alternative to MLP [ 11 ]. Although CMAC was primarily designed as a function modeler for robotic controllers, it has been extensively used in RL and classification problems for its faster learning compared to MLP.

In 1977, in the area of statistical learning, Dempster et al. proposed EM, a generalization of the previous iterative, unsupervised methods, such as the Baum-Welch algorithm, for learning the unknown parameters of statistical HMM models [ 115 ]. At the same time, Witten developed an RL approach to solve MDPs, inspired by animal behavior and learning theories [ 478 ], that was later referred to as Temporal Difference (TD) in Sutton’s work [ 433 , 434 ]. In this approach the learning process is driven by the changes, or differences, in predictions over successive time steps, such that the prediction at any given time step is updated to bring it closer to the prediction of the same quantity at the next time step.

Towards the end of the 1970s, the second generation of DTs emerged as the Iterative Dichotomiser 3 (ID3) algorithm was released. The algorithm, developed by Quinlan [ 373 ], relies on a novel concept for attribute selection based on entropy maximization. ID3 is a precursor to the popular and widely used C4.5 and C5.0 algorithms.

The 1980s witnessed a renewed interest in ML research, and in particular in NNs. In the early 1980s, three new classes of NNs emerged, namely Convolutional Neural Network (CNN) [ 157 ], Self-Organizing Map (SOM) [ 249 ], and Hopfield network [ 195 ]. CNN is a feedforward NN specifically designed to be applied to visual imagery analysis and classification, and thus require minimal image preprocessing. Connectivity between neurons in CNN is inspired by the organization of the animal visual cortex –modeled by Hubel in the 1960s [ 200 , 201 ]–, where the visual field is divided between neurons, each responding to stimuli only in its corresponding region. Similarly to CNN, SOM was also designed for a specific application domain; dimensionality reduction [ 249 ]. SOMs employ an unsupervised competitive learning approach, unlike traditional NNs that apply error-correction learning (such as BP with gradient descent).

In 1982, the first form of Recurrent Neural Network (RNN) was introduced by Hopfield. Named after the inventor, Hopfield network is an RNN where the weights connecting the neurons are bidirectional. The modern definition of RNN, as a network where connections between neurons exhibit one or more than one cycle, was introduced by Jordan in 1986 [ 226 ]. Cycles provide a structure for internal states or memory allowing RNNs to process arbitrary sequences of inputs. As such, RNNs are found particularly useful in Time Series Forecasting (TSF), handwriting recognition and speech recognition.

Several key concepts emerged from the 1980s’ connectionism movement , one of which is the concept of distributed representation [ 187 ]. Introduced by Hinton in 1986, this concept supports the idea that a system should be represented by many features and that each feature may have different values. Distributed representation establishes a many-to-many relationship between neurons and (feature,value) pairs for improved efficiency, such that a (feature,value) input is represented by a pattern of activity across neurons as opposed to being locally represented by a single neuron. The second half of 1980s also witnessed the increase in popularity of the BP algorithm and its successful application in training DNNs [ 263 , 394 ], as well as the emergence of new classes of NNs, such as Restricted Boltzmann Machines (RBM) [ 413 ], Time-Lagged Feedforward Network (TLFN) [ 260 ], and Radial Basis Function Neural Network (RBFNN) [ 260 ].

Originally named Harmonium by Smolensky, RBM is a variant of Boltzmann machines [ 2 ] with the restriction that there are no connections within any of the network layers, whether it is visible or hidden. Therefor, neurons in RBMs form a bipartite graph. This restriction allows for more efficient and simpler learning compared to traditional Boltzmann machines. RBMs are found useful in a variety of application domains such as dimensionality reduction, feature learning, and classification, as they can be trained in both supervised and unsupervised ways. The popularity of RBMs and the extent of their applicability significantly increased after the mid-2000s as Hinton introduced in 2006 a faster learning method for Boltzmann machines called Contrastive Divergence [ 186 ] making RBMs even more attractive for deep learning [ 399 ]. Interestingly, although the use of the term “deep learning” in the ML community dates back to 1986 [ 111 ], it did not apply to NNs at that time.

Towards the end of 1980s, TLFN –an MLP that incorporates the time dimension into the model for conducting TSF [ 260 ]–, and RBFNN –an NN with a weighted set of Radial Basis Function (RBF) kernels that can be trained in unsupervised and supervised ways [ 78 ]– joined the growing list of NN classes. Indeed any of the above NNs can be employed in a DL architecture, either by implementing a larger number of hidden layers or stacking multiple simple NNs.

In addition to NNs, several other ML techniques thrived during the 1980s. Among these techniques, Bayesian Network (BN) arose as a Directed Acyclic Graph (DAG) representation model for the statistical models in use [ 352 ], such as NB and HMM –the latter considered as the simplest dynamic BN [ 107 , 110 ]–. Two DT learning algorithms, similar to ID3 but developed independently, referred to as Classification And Regression Trees (CART) [ 76 ], were proposed to model classification and regression problems. Another DT algorithm, under the name of Reduced Error Pruning Tree (REPTree), was also introduced for classification. REPTree aimed at building faster and simpler tree models using information gain for splitting, along with reduced-error pruning [ 374 ].

Towards the end of 1980s, two TD approaches were proposed for reinforcement learning: TD( λ ) [ 433 ] and Q-learning [ 471 ]. TD( λ ) adds a discount factor (0≤ λ ≤1) that determines to what extent estimates of previous state-values are eligible for updating based on current errors, in the policy evaluation process. For example, TD(0) only updates the estimate of the value of the state preceding the current state. Q-learning, however, replaces the traditional state-value function of TD by an action-value function (i.e. Q-value) that estimates the utility of taking a specific action in specific states. As of today, Q-learning is the most well-studied and widely-used model-free RL algorithm. By the end of the decade, the application domains of ML started expending to the operation and management of communication networks [ 57 , 217 , 289 ].

In the 1990s, significant advances were realized in ML research, focusing primarily on NNs and DTs. Bio-inspired optimization algorithms, such as Genetic Algorithms (GA) and Particle Swarm Optimization (PSO), received increasing attention and were used to train NNs for improved performance over the traditional BP-based learning [ 234 , 319 ]. Probably one of the most important achievements in NNs was the work on Long Short-Term Memory (LSTM), an RNN capable of learning long-term dependencies for solving DL tasks that involve long input sequences [ 192 ]. Today, LSTM is widely used in speech recognition as well as natural language processing. In DT research, Quinlan published the M5 algorithm in 1992 [ 375 ] to construct tree-based multivariate linear models analogous to piecewise linear functions. One well-known variant of the M5 algorithm is M5P, which aims at building trees for regression models. A year later, Quinlan published C4.5 [ 376 ], that builds on and extends ID3 to address most of its practical shortcomings, including data overfitting and training with missing values. C4.5 is to date one of the most important and widely used algorithms in ML and data mining.

Several techniques other than NN and DT also prospered in the 1990s. Research on regression analysis propounded the Least Absolute Selection and Shrinkage Operator (LASSO), which performs variable selection and regularization for higher prediction accuracy [ 445 ]. Another well-known ML technique introduced in the 1990s was Support Vector Machines (SVM). SVM enables plugging different kernel functions (e.g. linear, polynomial, RBF) to find the optimal solution in higher-dimensional feature spaces. SVM-based classifiers find a hyperplane to discriminate between categories. A single-class SVM is a binary classifier that deduces the hyperplane to differentiate between the data belonging to the class against the rest of the data, that is, one-vs-rest . A multi-class approach in SVM can be formulated as a series of single class classifiers, where the data is assigned to the class that maximizes an output function. SVM has been widely used primarily for classification, although a regression variant exists, known as Support Vector Regression (SVR) [ 70 ].

In the area of RL, SARSA (State-Action-Reward-State-Action) was introduced as a more realistic, however less practical, Q-learning variation [ 395 ]. Unlike Q-learning, SARSA does not update the Q-value of an action based on the maximum action-value of the next state, but instead it uses the Q-value of the action chosen in the next state.

A new emerging concept called ensemble learning demonstrated that the predictive performance of a single learning model can be be improved when combined with other models [ 397 ]. As a result, the poor performance of a single predictor or classifier can be compensated with ensemble learning at the price of (significantly) extra computation. Indeed the results from ensemble learning must be aggregated, and a variety of techniques have been proposed in this matter. The first instances of ensemble learning include Weighted Majority Algorithm (WMA) [ 279 ], boosting [ 403 ], bootstrap aggregating (or bagging) [ 75 ], and Random Forests (RF) [ 191 ]. RF focused explicitly on tree models and marked the beginning of a new generation of ensemble DT. In addition, some variants of the original boosting algorithm were also developed, such as Adaptive Boosting (AdaBoost) [ 153 ] and Stochastic Gradient Boosting (SGBoost) [ 155 ].

These advances in ML facilitated the successful deployment of major use cases in the 1990s, particularly, handwriting recognition [ 419 ] and data mining [ 3 ]. The latter represented a great shift to data-driven ML, and since then it has been applied in many areas (e.g., retail, finance, manufacturing, medicine, science) for processing huge amounts of data to build models with valuable use [ 169 ]. Furthermore, from a conceptual perspective, Tom Mitchell formally defined ML: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience E ” [ 317 ].

The 21st century began with a new wave of increasing interest in SVM and ensemble learning, and in particular ensemble DT. Research efforts in the field generated some of the the most widely used implementations of ensemble DT as of today: Multiple Additive Regression Trees (MART) [ 154 ], extra-trees [ 164 ], and eXtreme Gradient Boosting (XGBoost) [ 93 ]. MART and XGBoost are respectively a commercial and open source implementation of Friedman’s Gradient Boosting Decision Tree (GBDT) algorithm; an ensemble DT algorithm based on gradient boosting [ 154 , 155 ]. Extra-trees stands for extremely randomized trees , an ensemble DT algorithm that builds random trees based on k randomly chosen features. However instead to computing an optimal split-point for each one of the k features at each node as in RF, extra-trees selects a split-point randomly for reduced computational complexity.

At the same time, the popularity of DL increased significantly after the term “deep learning” was first introduced in the context of NNs in 2000 [ 9 ]. However, the attractiveness of DNN started decreasing shortly after due to the experienced difficulty of training DNNs using BP (e.g. vanishing gradient problem), in addition to the increasing competitiveness of other ML techniques (e.g. SVM) [ 169 ]. Hinton’s work on Deep Belief Networks (DBN), published in 2006 [ 188 ], gave a new breath and strength to research in DNNs. DBN introduced an efficient training strategy for deep learning models, which was further used successfully in different classes of DNNs [ 49 , 381 ]. The development in ML –particularly, in DNNs– grew exponentially with advances in storage capacity and large-scale data processing (i.e. Big Data) [ 169 ]. This wave of popularity in deep learning has continued to this day, yielding major research advances over the years. One approach that is currently receiving tremendous attention is Deep RL, which incorporates deep learning models into RL for solving complex problems. For example, Deep Q-Networks (DQN) –a combination of DNN and Q-learning– was proposed for mastering video games [ 318 ]. Although the term Deep RL was coined recently, this concept was already discussed and applied 25 years ago [ 275 , 440 ].

It is important to mention that the evolution in ML research has enabled improved learning capabilities which were found useful in several application domains, ranging from games, image and speech recognition, network operation and management, to self-driving cars [ 120 ].

3 Traffic prediction

Network traffic prediction plays a key role in network operations and management for today’s increasingly complex and diverse networks. It entails forecasting future traffic and traditionally has been addressed via time series forecasting (TSF). The objective in TSF is to construct a regression model capable of drawing accurate correlation between future traffic volume and previously observed traffic volumes.

Existing TSF models for traffic prediction can be broadly decomposed into statistical analysis models and supervised ML models. Statistical analysis models are typically built upon the generalized autoregressive integrated moving average (ARIMA) model, while majority of learning for traffic prediction is achieved via supervised NNs. Generally, the ARIMA model is a popular approach for TSF, where autoregressive (AR) and moving average (MA) models are applied in tandem to perform auto-regression on the differenced and “stationarized” data. However, with the rapid growth of networks and increasing complexity of network traffic, traditional TSF models are seemingly compromised, giving rise to more advanced ML models. More recently, efforts have been made to reduce overhead and, or improve accuracy in traffic prediction by employing features from flows, other than traffic volume. In the following subsections, we discuss the various traffic prediction techniques that leverage ML and summarize them in Table  3 .

3.1 Traffic prediction as a pure TSF problem

To the best of our knowledge, Yu et al. [ 489 ] were the first to apply ML in traffic prediction using MLP-NN. Their primary motive was to improve accuracy over traditional AR methods. This was supported by rigorous independent mathematical proofs published in the late eighties and the early nineties by Cybenko [ 106 ], Hornik [ 196 ], and Funahashi [ 158 ]. These proofs showed that SLP-NN approximators, which employed sufficient number of neurons of continuous sigmoidal activation type (a constraint introduced by Cybenko and relaxed by Hornik), were universal approximators, capable of approximating any continuous function to any desired accuracy.

In the last decade, different types of NNs (SLP, MLP, RNN, etc.) and other supervised techniques have been employed for TSF of network traffic. Eswaradass et al. [ 141 ] propose a MLP-NN-based bandwidth prediction system for Grid environments and compare it to the Network Weather Service (NWS) [ 480 ] bandwidth forecasting AR models for traffic monitoring and measurement. The goal of the system is to forecast the available bandwidth on a given path by feeding the NN with the minimum, maximum and average number of bits per second used on that path in the last epoch (ranging from 10 to 30 s). Experiments on the d o t r e s e a r c h . o r g network and the 40 gigabit/s NSF TeraGrid network datasets show that the NN outperforms the NWS bandwidth forecasting models with an error rate of up to 8 and 121.9 % for MLP-NN and NWS, respectively. Indeed the proposed NN-based forecasting system shows better learning ability than NWS’s. However, no details are provided for the characteristics of the MLP employed in the study, nor the time complexity of the system compared to NWS.

Cortez et al. [ 104 ] choose to use a NN ensemble (NNE) of five MLP-NN with one hidden layer each. Resilient backpropagation (Rp) training is used on SNMP traffic data collected from two different ISP networks. The first data represents the traffic on a transatlantic link, while the second represents the aggregated traffic in the ISP backbone. Linear interpolation is used to complete missing SNMP data. The NNE is tested for real-time forecasting (online forecasting on a few-minute sample), short-term (one-hour to several-hours sample), and mid-term forecasting (one-day to several-days sample). The NNE is compared against AR models of traditional Holt-Winters, double Holt-Winters seasonal variant to identify repetitions in patterns at fixed time periods, and ARIMA. The comparison amongst the TSF methods show that in general the NNE produces the lowest MAPE for both datasets. It also shows that in terms of time and computational complexity, NNE outperforms the other methods with an order of magnitude, and is well suited for real-time forecasting.

The applicability of NNs in traffic prediction instigated various other efforts [ 86 , 500 ] to compare and contrast various training algorithms for network traffic prediction. Chabaa et al. [ 86 ] evaluate the performance of various BP training algorithms to adjust the weights in the MLP-NN, when applied to Internet traffic time series. They show superior performance, with respect to RMSE and RPE, of the Levenberg-Marquardt (LM) and the Resilient backpropagation (Rp) algorithms over other BP algorithms.

In contrast to the local optimization in BP, Zhu et al. [ 500 ] propose a hybrid training algorithm that is based on global optimization, the PSO-ABC technique [ 98 ]. It is an artificial bee colony (ABC) algorithm employing particle swarm optimization (PSO), an evolutionary search algorithm. The training algorithm is implemented with a (5,11,1) MLP-NN. Experiments on a 2 weeks hourly traffic measurement dataset show that PSO-ABC has higher prediction accuracy than BP, with an MSE of 0.006 and 0.011, respectively, on normalized data and has stable prediction performance. Furthermore, the hybrid PSO-ABC has a faster training time than ABC or PSO.

On the other hand, SVM has a low computational overhead and is more robust to parameter variations (e.g. time scale, number of samples) in general. However, they are usually applied to classification rather than TSF. SVM and its regression variant, SVR, are scrutinized for their applicability to traffic prediction in [ 52 ]. Bermolen et al. [ 52 ] consider the prospect of applying SVR for link load forecasting. The SVR model is tested on heterogeneous Internet traffic collected at the POP of an ISP network. At 1sec timescale, the SVR model shows a slight improvement over an AR model in terms of RMSE. A more significant 10% improvement is achieved over a MA model. Most importantly, SVR is able to achieve 9000 forecast per sec with 10 input samples, and shows the potential for real-time operation.

3.2 Traffic prediction as a non-TSF problem

In contrast to TSF methods, network traffic can be predicted leveraging other methods and features. For instance, Li et al. [ 274 ] propose a frequency domain based method for network traffic flows, instead of just traffic volume. The focus is on predicting incoming and outgoing traffic volume on an inter-data center link dominated by elephant flows. Their models incorporate FNN, trained with BP using simple gradient descent and wavelet transform to capture both the time and frequency features of the traffic time series. Elephant flows are added as separate feature dimensions in the prediction. However, collecting all elephant flows at high frequencies is more expensive than byte count for traffic volume. Therefore, elephant flow information is collected at lower frequencies and interpolated to fill in the missing values, overcoming the overhead for elephant flow collection.

The dataset contains the total incoming and outgoing traffic collected in 30 s intervals using SNMP counters on the data center (DC) edge routers and inter-DC link at Baidu over a six-week period. The top 5 applications account for 80% of the total incoming and outgoing traffic data, which is collected every 5 min and interpolated to estimate missing values at the 30 s scale. The time series is decomposed using level 10 wavelet transform, leading to 120 features per timestamp.

Thus, k -step ahead predictions, feed k ×120 features into the NN and show a relative RMSE (RRMSE) ranging from 4 to 10% for the NN-Wavelet transformation model as the prediction horizon varies from 30 s to 20 min. Evidently, wavelet transformation reduces the average prediction errors for different prediction horizons by 5.4 and 2.9% for incoming and outgoing traffic, respectively. In contrast, the linear ARIMA model has prediction error of approximately 8.5 and 6.9 % for incoming and outgoing traffic, respectively. The combined NN and wavelet transform model reduces the peak inter-DC link utilization, i.e. the ISP’s billed utilization, by about 9%. However, the model does not seem to be fully considering the features related to the elephant flow, which may explain the inexplicable good performance of the 0-interpolation, a simple method that fills zeros for all unknown points.

Chen et al. [ 94 ], investigate the possibility of reducing cost of monitoring and collecting traffic volume, by inferring future traffic volume based on flow count only. They propose a HMM to describe the relationship between the flow count, flow volume and their temporal dynamic behavior. The Kernel Bayes Rule (KBR) and RNN with LSTM unit is used to predict future traffic volume based on current flow count. A normalized dataset, with mean =0 and standard deviation =1, consists of network traffic volumes and corresponding flow counts collected every 5 min over a 24-week period [ 391 ]. The RNN shows a prediction MSE of 0.3 at best, 0.05 higher than KBR and twice as much as the prediction error of an RNN fed with traffic volume instead of flow count. Therefore, though the motive was to promote flow count based traffic prediction to overcome the cost of monitoring traffic volume, the performance is compromised.

Poupart et al. [ 365 ] explore the use of different ML techniques for flow size prediction and elephant flow detection. These techniques include gaussian processes regression (GPR), online bayesian moment matching (oBMM) and a (106, 60, 40, 1) MLP-NN. Seven features are considered for each flow, including source IP, destination IP, source port, destination port, protocol, server versus client (if protocol is TCP), and the size of the first three data packets after the protocol/synchronization packets.

The datasets consist of three public datasets from two university networks [ 50 ] and an academic building at Dartmouth College [ 251 ] with over three million flows each. Elephant flow detection is based on a flow size threshold that varies from 10KB to 1MB. The experiments show noticeable discrepancies in the performance of the approaches with varying datasets. Although oBMM outperforms all other approaches in one dataset with an average TPR and TNR very close to 100%, it fails miserably in the other datasets with an average TPR below 50% for one dataset. In the latter dataset, oBMM seems to suffer the most from class imbalance. As the detection flow size threshold increases, less flows are tagged as elephant flows, creating class imbalance in the training dataset and leading to lower TPR. However, it is worth noting that oBMM outperforms all other approaches in terms of average TNR in all 3 datasets. On the other hand, NN and GPR, have an average TPR consistently above 80%. Although NN outperforms GPR in terms of robustness to class imbalance by looking at the consistency of its TPR with varying flow size threshold, it has the lowest average TNR of below 80% in all datasets.

The motive for flow size prediction in [ 365 ], is to discriminate elephant flows from mice flows in routing to speed up elephant flow completion time. Presumably, mice flows are routed through Equal-cost multi-path routing, while elephant flows are routed through the least congested path. The performance of the routing policy combined with GPR and oBMM for elephant flow prediction is tested with a varying subset of features. According to the authors, GPR improves the completion time by 6.6 % in average for 99% of elephant flows when only the first packet header information is considered. A 14% improvement is observed when the size of the three first packets is used along with the header information. It is also noticed that considering the size of the first three packets alone leads to over 13.5 % improvement, regardless of whether GPR or oBMM is used, with a very slight impact on the completion time of mice flows.

3.3 Summary

Supervised NNs (including MLP and RNN) have been successfully applied to traffic prediction, as shown in Table  9 . TSF approaches, such as [ 52 , 86 , 104 , 141 , 500 ], where NNs are used to infer traffic volumes from past measured volumes, show high long-term and short-term prediction accuracy at low complexity with limited number of features and limited number of layers and neurons.

Unfortunately, TSF approaches are restrictive in general. In fact they are only possible if past observations on the prediction variable are made. For instance, in order to predict the traffic for a particular flow f on link l , there must be a counter on link l actively measuring the traffic for that particular flow f , which can be challenging on very high speed links. Because it might not be possible to have the appropriate counter in place, or because it might be technically difficult to conduct measurements at the required speed or granularity, non-TSF approaches can be useful.

Non-TSF approaches were investigated in [ 94 , 274 , 365 ] to infer traffic volumes from flow count and packet header fields. Although higher prediction error rates are experienced, these rates remain relatively low not only for NNs but also for other supervised learning techniques, such as GPR and oBMM. According to [ 365 ] a more complex MLP-NN (in terms of number of layers and neurons) might be required to achieve better accuracy in a non-TSF setting, and the performance of supervised learning techniques varies when tested on different datasets. This motivates the need for ensemble learning.

It is envisaged that as the applicability of ML techniques for traffic prediction increases, traffic prediction will improve with respect to computational overhead and accuracy. Furthermore, learning will enable automation of various network operation and management activities, such as network planing, resource provisioning, routing optimization, and SLA/QoS management.

4 Traffic classification

Traffic classification is quintessential for network operators to perform a wide range of network operation and management activities. These include capacity planning, security and intrusion detection, QoS and service differentiation, performance monitoring, and resource provisioning, to name a few. For example, an operator of an enterprise network may want to prioritize traffic for business critical applications, identify unknown traffic for anomaly detection, or perform workload characterization for designing efficient resource management schemes that satisfy diverse applications performance and resource requirements.

Traffic classification requires the ability to accurately associate network traffic to pre-defined classes of interest. These classes of interest can be classes of applications (e.g. HTTP, FTP, WWW, DNS and P2P), applications (e.g. Skype [ 310 ], YouTube [ 488 ] and Netflix [ 331 ]), or class of service [ 390 ]. A class of service, for instance based on QoS, encompasses all applications or classes of applications that have the same QoS requirements. Therefore, it is possible that applications that apparently behave differently, belong to the same class of service [ 462 ].

Generally, network traffic classification methodologies can be decomposed into four broad categories that leverage port number, packet payload, host behavior or flow features [ 31 , 244 ]. The classical approach to traffic classification simply associates Internet Assigned Numbers Authority (IANA) [ 207 ] registered port numbers to applications. However, since it is no longer the de facto, nor, does it lend itself to learning due to trivial lookup, it is not in the scope of this survey. Furthermore, relying solely on port numbers has been shown to be ineffective [ 125 , 228 , 320 ], largely due to the use of dynamic port negotiation, tunneling and misuse of port numbers assigned to well-known applications for obfuscating traffic and avoiding firewalls [ 54 , 109 , 176 , 286 ]. Nevertheless, various classifiers leverage port numbers in conjunction with other techniques [ 31 , 56 , 244 , 417 ] to improve the performance of the traffic classifiers. In the following subsections, we discuss the various traffic classification techniques that leverage ML and summarize them in Tables  4 , 5 , 6 , 7 and 8 .

4.1 Payload-based traffic classification

Payload-based traffic classification is an alternate to port-based traffic classification. However, since it searches through the payload for known application signatures, it incurs higher computation and storage costs. Also, it is cumbersome to manually maintain and adapt the signatures to the ever growing number of applications and their dynamics [ 138 ]. Furthermore, with the rise in security and privacy concerns, payload is often encrypted and its access is prohibited due to privacy laws. This makes it non-trivial to infer a signature for an application class using payload [ 54 , 138 ].

Haffner et al. [ 176 ] reduce the computational overhead by using only the first few bytes of unidirectional, unencrypted TCP flows as binary feature vectors. For SSH and HTTPS encrypted traffic, they extract features from the unencrypted handshake that negotiate the encryption parameters of the TCP connection. They use NB, AdaBoost and MaxEnt for traffic classification. AdaBoost outperforms NB and MaxEnt, and yields an overall precision of 99% with an error rate within 0.5%.

Their ML models are scalable and robust due to the use of partial payloads, and unidirectional flows and diverse usage patterns, respectively. The unidirectional flows circumvent the challenges due to asymmetric routing. In comparison to campus or enterprise networks, residential network data offer an increased diversity, with respect to, social group, age and interest with less spatial and temporal correlation in usage patterns. Unfortunately, performance of AdaBoost traffic classifier deteriorates with noisy data [ 176 ] and their approach requires a priori knowledge about the protocols in the application classes.

Ma et al. [ 286 ] show that payload-based traffic classification can be performed without any a priori knowledge of the application classes using unsupervised clustering. They train their classifiers based on the label of a single instance of a protocol and a list of partially correlated protocols, where a protocol is modeled as a distribution of sessions. Each session is a pair of unidirectional flow distributions, one from the source to the destination and another from the destination to the source. For tractability, the sessions are assumed to be finite and a protocol model is derived as a distribution on n byte flows, rather than pair of flows.

In product distribution (PD) protocol model, the n byte flow distribution is statistically represented as a product of n independent byte distributions, each describing the distribution of bytes at a particular offset in the flow. Similarly, in the Markov process (MP) protocol model, nodes are labeled with unique byte values and the edges are weighted with a transition probability, such that the sum of all egress transition probabilities from a node is one. A random walk through the directed graph identify discriminator strings that are not tied to a fixed offset. In contrast, the common substring graphs (CSG) capture structural information about the flows using longest common subsequence. A subsequence in a series of common substrings that capture commonalities including the fixed offsets in statistical protocol modeling.

Finally, the authors perform agglomerative (bottom-up) hierarchical clustering analysis (HCA) to group the observed protocols and distinguish between the classes of interest. They employ weighted relative entropy for PD and MP, and approximate graph similarity for CSG, as the distance metric. In evaluation, the PD-based protocol models resulted in the lowest total misclassification error, under 5%. Thus, there is a high invariance at fixed offsets in binary and textual protocols, such as DNS and HTTP, respectively. Though, the CSG resulted in a higher misclassification error, approximately 7%, it performed best for SSH encrypted traffic. However, it is important to realize that encryption often introduces randomness in the payload. Hence, techniques such as in Ma et al. [ 286 ] that search for keywords at fixed offsets will suffer in performance.

Techniques that rely on capturing the beginning of flows [ 176 , 286 ] are infeasible for links with high data rates where sampling is often employed. Finamore et al. [ 146 ] overcome this limitation by extracting signatures from any point in a flow. In light of the rise in streaming applications, they focus on analyzing packet payload to extract signature of applications over long-lived UDP traffic. In essence, to extract application signatures, the authors employ Pearsons’s Chi-square ( χ 2 ) test to capture the level of randomness in the first N bytes of each packet divided into G groups of b consecutive bits within a window of C packets. The randomness is evaluated based on the distance between the observed values and a reference uniform distribution. The signatures are then used to train a SVM classifier that distinguishes between the classes of interest with an average TP of 99.6%. However, FP are more sensitive to the window size, and reduce below 5% only for window sizes over 80.

Despite the disadvantages of payload-based classification techniques, the payload-based classifiers achieve high accuracy and are often employed to establish ground truth [ 55 ].

4.2 Host behavior-based traffic classification

This technique leverages the inherent behavioral characteristics of hosts on the network to predict the classes of interest. It overcomes the limitations of unregistered or misused port numbers and encrypted packet payload, by moving the observation point to the edge of the network and examining traffic between hosts (e.g. how many hosts are contacted, by which transport protocol, how many different ports are involved). These classifiers rely on the notion that applications generate different communication patterns. For example, a P2P host may contact several different peers using a different port number for each peer. While, a webserver may be contacted by different clients on the same port.

Schatzmann et al. [ 404 ] exploit correlations across protocols and time, to identify webmail traffic over HTTPS. They exploit the following features: (i) service proximity—webmail servers tend to reside in the same domain or subnet as SMTP, IMAP, and POP servers, that are identifiable using port numbers [ 243 ], (ii) activity profiles—irrespective of the protocol (i.e. IMAP, POP, webmail), users of a mail server share distinct daily and weekly usage habits, (iii) session duration—users of a webmail service spend more time on emails than other web pages and tend to keep the web client open for incoming messages, and (iv) periodicity—webmail traffic exhibit periodic patterns due to application timers (e.g. asynchronous checking for new message from AJAX-based clients). The authors show that these features act as good discriminators for a SVM classifier to differentiate between webmail and non-webmail traffic. Using 5-fold cross validation, the classifier achieves an average accuracy of 93.2% and a precision of 79.2%. The higher FN is attributed to the inability of the classifier to distinguish between VPN and webmail servers.

The data exchanged amongst P2P applications is highly discriminative [ 53 ]. For example, a P2P application may establish long flows to download video content from a few peers. Whereas, another P2P application may prefer to use short flows to download fixed size video chunks from many peers in parallel. Therefore, Bermolan et al. [ 53 ] leverage this behavior to derive P2P application signatures from the packets and bytes exchanged between peers in small time windows. Formally, the application signature is the probability mass function (PMF) of the number of peers that send a given number of packets and bytes to a peer in the time interval △ T .

These signatures are used to train a SVM classifier with a Gaussian kernel function and exponential binning strategy, with a rejection threshold (distance metric) of 0.5, to discriminate between applications belonging to the P2P-TV class (i.e. PPLive, TVAnts, SopCast, Joost). The authors evaluate the sensitivity of parameters to optimize their settings in order to guarantee the best performance, that is higher TPR and lower FPR. The classifier results in a worst-case TPR of about 95%, with FPR well below 0.1%. Also, temporal and spatial portability of signatures is validated with marginal degradation in performance.

However, the accuracy of the host behavior-based traffic classification strongly depends on the location of the monitoring system, especially since the observed communication pattern may be affected by routing asymmetries in the network core [ 229 ].

4.3 Flow Feature-based traffic classification

In contrast to payload-based and host behavior-based traffic classifiers, flow feature-based classifiers have a different perspective. They step back and consider a communication session, which consists of a pair of complete flows. A complete flow is a unidirectional exchange of consecutive packets on the network between a port at an IP address and another port at a different IP address using a particular application protocol [ 100 ]. It is identified with the quintuple 〈 s r c I P , d e s t I P , s r c P o r t , d e s t P o r t , p r o t o c o l 〉. For example, a complete flow in an online game session would consist of all sequential packets sent from source s to destination d (e.g. host to game server). Therefore, a complete flow includes all packets pertaining to session setup, data exchange and session tear-down. A sub-flow is a subset of a complete flow and can be collected over a time window in an on-going session. A feature is an attribute representing unique characteristic of a flow, such as packet length, packet inter-arrival time, flow duration, and number of packets in a flow. Flow feature-based technique uses flow features as discriminators to map flows to classes of interest.

In essence, flow feature-based traffic classification exploits the diversity and distinguishable characteristics of the traffic footprint generated by different applications [ 31 , 467 ]. It has the potential to overcome numerous limitations of other techniques, such as unregistered port numbers, encrypted packet payload, routing asymmetries, high storage and computational overhead [ 55 , 138 , 176 , 347 ]. However, it remains to be evaluated if flow feature-based classifiers can achieve the accuracy of payload-based classifiers [ 176 , 286 ]. The corresponding traffic classification problem can be defined as follows: given a set of flows X ={ x 1 , x 2 , x 3 ,…, x | X | }, such that X consists of either complete or sub-flows, and a set of classes of interest Y ={ y 1 , y 2 , y 3 ,…, y | Y | }, find the mapping g ( X )→ Y . This mapping can be used to classify previously unseen flows. ML is an ideal tool for finding this mapping automatically.

4.3.1 Supervised complete flow feature-based traffic classification

One of the earliest works in network traffic classification using ML is from Roughan et al. [ 390 ]. They employ k -NN and Linear Discriminant Analysis (LDA) to map network traffic into different classes of interest based on QoS requirements. Their traffic classification framework uses statistics that are insensitive to application protocol. The authors employ both packet-level and flow-level features. However, they observe that the average packet size and flow duration act as good discriminators, hence used these in their preliminary evaluation.

In their evaluation, k -NN outperforms LDA with the lowest error rate of 5.1 and 9.4% for four and seven class classification, respectively. They notice that often streaming applications behave very similar to bulk data transfer applications. Therefore, either a prioritization rule is necessary to break the tie, or extended/derivative features must be employed to act as good disciminators. In their extended evaluation, the authors employ inter-arrival variability to distinguish between streaming and bulk data transfer applications.

On the other hand, flow features were also leveraged in Moore and Zuev [ 321 ] that extend NB with Kernel Estimation (NBKE) to overcome the limitations that make it impractical for network traffic classification. Though, NB classifiers have been commonly used in classification, they have two fundamental assumptions, (i) probability of occurrence of each feature being independent from the occurrence of another feature, and (ii) probability distribution of a feature following a Gaussian distribution. Both of these assumptions are unrealistic for traffic classification and lead to poor accuracy. However, NBKE is a feasible alternate that generalizes NB and overcomes the Gaussian distribution approximation assumption.

Features are extracted from the header of packets in TCP flows using Fast Correlation-Based Filter (FCBF) to address the first assumption. In this way, NBKE with FCBF achieves a classification accuracy of upto 95% and TPR of upto 99%. It also achieves temporal stability by classifying new flows collected twelve months later with an accuracy of 93% and TPR of 98%. Moreover, it also outperforms NB with respect to training time. However, it incurs increased inference time, especially for classifying unknown flows [ 347 ].

Realizing the need for lightweight traffic classification, Jiang et al. [ 218 ] further reduce the complexity of KE by employing symmetric uncertainty and correlation measures for feature selection, derived from flows rather than packets. In this manner, NBKE can still be used to classify flows with an accuracy of 91.4%. Though, the classification accuracy is lower than the NBKE in [ 321 ], the techniques of varying sampling and application aware feature selection increases its applicability for online classification. Generally, when training time is important, NBKE classifiers are preferred over tree-based approaches, such as C4.5 DT and NB tree [ 347 , 476 ]. However, DT performs better than NBKE, with respect to execution time and space in memory [ 347 ]. Unfortunately, DT suffers from overfitting with noisy data, which deteriorates performance [ 347 ].

It is not always possible to collect bidirectional flows due to routing asymmetries [ 138 , 347 ]. However, it is possible to derive components of the feature vector for an application class given a priori knowledge [ 347 ], or estimate the missing statistics [ 138 ]. In addition, filtering can be leveraged to reduce the dimensionality of the feature space and the training time. Park et al. [ 347 ] employ supervised tree-based classifiers on unidirectional flows and compare them against NBKE using WEKA [ 288 ]. They exploit the faster classification time and low memory storage requirements of DT to employ Reduced Error Pruning Tree (REPTree) for classification. REPTree finds a sub-optimal tree that minimizes classification error. In addition, the Bagging ensemble is used to classify flows using majority rule to aggregate multiple REPTree predictions. Recall, that P2P bulk data transfer and streaming applications often behave similar to each other [ 390 ]. Therefore, the authors in [ 347 ] employ a burst feature to better discriminate between such classes. The burst feature is based on packet inter-arrival statistics and a predetermined threshold that dictates whether packets are exhibiting “bursty” behavior. Evidently, bulk data transfer applications exhibit higher burst than streaming applications.

Though, it was presumed that Bagging will outperform REPTree, both classifiers exhibited similar performance. REPTree achieves over 90% accuracy in classification of unidirectional flows and plateaus at seven features. This is in contrast to NBKE, where the classification accuracy deteriorates dramatically with increasing number of features. Though, the accuracy of REPTree is sensitive to packet sampling, the degradation is limited if the same sampling rate is used for both training and testing data.

Evidently, supervised learning yields high classification accuracy, due to a priori information about the characteristics of the classes of interest. However, it is infeasible to expect complete a priori information for applications, since often network operators do not even know all the applications that are running in the network. Therefore, Zhang et al. [ 496 ] present a traffic classification scheme suitable for a small set of supervised training dataset. The small labeled training set can be trivially hand-labeled based on the limited knowledge of the network operators. They use discretized statistical flow features and Bag-of-Flow (BoF)-based traffic classification. A BoF consists of discretized flows (i.e. correlated flows) that share the destination IP address, destination port and transport layer protocol.

Traditional NB classifiers are simple and effective in assigning the test data a posterior conditional probability of belonging to a class of interest. The BoF-based traffic classification leverages NB to generate predictions for each flow, and aggregate the predictions using rules, such as sum, max, median, majority, since flows are correlated. The F-measure, used to evaluate per class performance, of BoF-NB outperforms NB irrespective of the aggregation rule. For example, the F-measure of BoF-NB with the sum rule is over 15 and 10% higher than NB for DNS and POP3, respectively. Evidently, the accuracy increases and error sensitivity decreases as the size of BoFs increase, due to the growth in BoF intra-diversity [ 496 ].

Zhang et al. [ 497 ] propose a scheme, called Robust Traffic Classification (RTC). They combine supervised and unsupervised ML techniques for the classification of previously unknown zero-day application traffic, and improving the accuracy of known classes in the presence of zero-day application traffic. Their motivation is that unlabeled data contains zero-day traffic. The proposed RTC framework consists of three modules, namely unknown discovery, BoF-based traffic classification, and system update.

The unknown discovery module uses k -Means to identify zero-day traffic clusters, and RF to extract zero-day samples. The BoF module guarantees the purity of zero-day samples, which classifies correlated flows together, while the system update module complements knowledge by learning new classes in identified zero-day traffic. RTC is novel in its ability to reflect realistic scenarios using correlated flows and identify zero-day applications. Therefore, even with small labeled training datasets, RTC can achieve a higher flow and byte accuracy of up to 15% and 10%, respectively, in comparison to the second best technique.

The accuracy of traffic classification can be increased to over 99%, by using the discriminative MLP-NN classifier to assign membership probabilities to flows. Auld et al. [ 26 ] employ hyperbolic tangent for activation function and a softmax filter to ensure that activation to output generates a positive, normalized distribution over the classes of interest. Their MLP-NN with Bayesian trained weights (BNN) also increases the temporal accuracy of the classifier to 95%. The increase in accuracy is primarily achieved due to the ability to reject predictions. Though the NN with Bayesian weights attain very high performance, it comes at the cost of high compute and storage overhead. Furthermore, some employed features, such as effective bandwidth based on entropy and fourier transform of packet inter-arrival time are computationally intensive, inhibiting its use for online classification. The authors purport that their Bayesian trained weights are robust and efficient, and require only zero or one hidden layer.

On the other hand, Probabilistic Neural Network (PNN) uses Bayes inference theory for classification. Sun et al. [ 431 ] leverage PNN that requires no learning processes, no initial weights, no relationship between learning and recalling processes, and the difference between inference and target vectors are not used to modify weights. They employ an activation function that is typical in radial basis function networks and filter out mice flows. Elephant versus mice flows is a prevalent problem in traffic classification, since there is often a lack of representative data for the short-lived mice flows. Often, these flows are discarded for efficient classification. The authors detect mice flows as those flows that contain less than 10 packets and the duration is less than 0.01s. They show that PNN outperforms RBFNN, a feed forward neural network with only two layers and a typical non-linear RBF activation function.

In contrast, Este et al. [ 140 ] use single-class SVM followed by multi-class SVM for traffic classification. They consider “semantically valid” bidirectional TCP flows, while ignoring short flows. A grid is maintained to keep track of the percentage of vectors of training sets that are correctly and incorrectly classified as a class. To reduce the overhead in the grid search, they randomly select a small number of flows from the training set to satisfactorily train both single and multi-class classifiers to classify using the first few packets payload sizes. The Multi-class stage is only resorted to, if the single-class stage is unable to clearly identify the application classes. The authors apply their technique to different datasets, with TP of over 90% and low FP for most classes. However, the performance is compromised for encrypted traffic, where ground truth is established using unreliable port-based labeling.

A traffic classification problem with more than two classes, naïvely transforms the SVM into N one-vs-rest binary subproblems, resulting in a higher computation cost for a large number of classes. However, Jing et al. [ 223 ] propose a SVM based on tournaments for multi-class traffic classification. In the tournament design of SVM, in each round the candidate classes are randomly organized into pairs, where one class of each pair is selected by a binary SVM classifier, reducing the candidate classes by half. This limits the number of support vectors, which is now based only on the two classes in the pair in contrast to using the entire training dataset. This all-vs-all approach to multi-class classification in SVM results in a much lower computational cost for classification. The tournament in [ 223 ] results in only one candidate class being left as the classified class.

It is important to note that it is possible that the most appropriate class is eliminated, resulting in higher misclassification. To overcome this, a fuzzy policy is used in the tournament. It allows competing classes to proceed to the next round without being eliminated, if neither class has a clear advantage over the other. However, if two classes are continually paired against each other, the fuzzy rule will break the tie. Unfortunately, this special handling results in higher computational cost. The authors compare their proposed basic tournament and fuzzy tournament (FT-SVM) schemes with existing SVM [ 270 ] and [ 140 ]. The FT-SVM scheme achieves a high overall accuracy of up to 96%, reduces classification error ratio by up to 2.35 times, and reduces average computation cost by up to 7.65 times.

Traditional SVM and multi-class SVM fall short in efficiency for large datasets. Therefore, Wang et al. [ 464 ] use multi-class SVM along with an unbalanced binary SVM to perform statistics-based app-level classification for P2P traffic. Unlike the typical approach of decomposing a multi-class problem into multiple binary classification problems and using one-vs-all approach, the authors employ the all-together approach. They leverage NetFlow to collect TCP flows on the edge router of their campus network. In the classification process, unknown flows go through the unbalanced binary model first. Only if identified as P2P, they go through a weighted multi-class model. The unbalanced binary SVM model is built using non-P2P and N types of P2P traffic to help decrease FP (i.e. misclassification of non-P2P traffic as P2P). Whereas, the weighted multi-class model is trained using N types of P2P traffic, giving more weight to data traffic than control/signaling traffic. The proposed scheme correctly classifies atleast 75% and atmost 99% of the entire P2P traffic with generally low misclassification.

4.3.2 Unsupervised complete flow feature-based traffic classification

It is not always possible to apply supervised learning on network traffic, since information about all applications running in the network is rarely available. An alternate is unsupervised learning, where the training data is not labeled. Therefore, the classes of interest are unknown. In this case, ML techniques are leveraged to learn about similarities and patterns in data and generate clusters that can be used to identify classes of interest. Therefore, clustering essentially identifies patterns in data and groups them together. In hard clustering, an unknown data point must belong to a single cluster, whereas, in soft clustering, a data point can be mapped into multiple different clusters.

Hard clustering often relies on distance and similarity metrics to select a cluster that most closely resembles a data point. Liu et al. [ 283 ] employ hard clustering using k -Means with unsupervised training and achieve an accuracy of up to 90%. They use complete TCP flows and log transformation of features to extract and approximate features to a normal distribution, disposing of any noise and abnormality in data. However, it is unrealistic to apply hard clustering for membership, since flow features from applications, such as HTTP and FTP can exhibit high similarity [ 303 ]. Therefore, it is impractical to assume a cluster can accurately represent an application [ 492 ]. Hence, a fine-grained view of applications is often necessary by employing soft clustering and assigning an unknown data point to a set of clusters. EM is an iterative and probabilistic soft clustering technique, which guesses the expected cluster(s) and refines the guess using statistical characteristics, such as mean and variance.

McGregor et al. [ 303 ] employ EM to group flows with a certain probability and use cross-validation to find the optimal number of clusters. To refine the clusters, they generate classification rules from the clusters and use the rules to prune features that have insignificant impact on classification and repeat the clustering process. Though, promising preliminary results indicate stable clusters and an alternative to disaggregate flows based on traffic types, very limited results are provided.

Similarly, AutoClass is an EM approximation approach employed in Zander et al. [ 492 ] that automatically creates clusters from unlabeled training datasets. The boundaries of the classes are improved using an intra-class homogeneity metric, defined as the largest fraction of flows belonging to one application in the class. Their objective is to maximize the mean of intra-class homogeneities and achieve a good separation between different application data. On average, the intra-class homogeneity improves as number of features increase. It eventually plateaus at approximately 0.85-0.89, which implies that it is possible to achieve a good separation between classes without using the full set of features. However, this is still computationally expensive.

Erman et al. [ 136 ] uncover that this computational overhead can be reduced by training with half of the clusters, since majority of flows were grouped into these clusters. The trade-off between an intra-class homogeneity metric [ 492 ] versus iterative clustering remains to be investigated. The suitability of unsupervised learning in traffic classification reached an milestone when AutoClass was employed to achieve an accuracy of over 91%, higher than the 82.5% accuracy of the supervised NBKE [ 136 ]. Thus, it became possible to identify previously unknown applications. Unfortunately, the training time of AutoClass is magnitudes higher than the training time of NBKE [ 321 ].

In contrast to EM-based unsupervised clustering, density-based clustering has been found to have a significantly lower training time. Furthermore, density-based spatial clustering of applications with noise (DBSCAN) has the ability to classify noisy data in contrast to k -Means and AutoClass. It differs from conventional clustering, as it exploits the idea of a core object and objects connected to it. An object that does not belong in the neighborhood of a core object and is not a core object itself, is noise. Noisy objects are not assigned to any clusters. The neighborhood around an object is demarcated by epsilon ( eps ). An object is determined to be the core of the cluster when the number of objects in its neighborhood exceeds minimum number of points ( minPts ). In this manner, data points are evaluated to be core points, neighbors of core, or noise, and assigned to clusters or discarded accordingly.

Erman et al. [ 135 ] leverage DBSCAN with transport layer statistics to identify application types. Flows are collected from TCP-based applications, identifying flow start and end based on the TCP three way handshake and FIN/RST packets, respectively. They employ logarithms of features due to their heavy-tail distribution and deduce similarity based on Euclidean distance. The authors use WEKA [ 288 ] and Cluster 3.0 [ 194 ] for evaluating DBSCAN, AutoClass and k -Means clustering and found that AutoClass outperforms DBSCAN. However, training time of DBSCAN is magnitudes lower than AutoClass, 3 min vs. 4.5 h. Furthermore, its non-spherical clusters are more precise than the spherical clusters of k -Means. Uniquely, DBSCAN has the ability to classify data into the smallest number of clusters, while the accuracy of k -Means is dependent on the cluster size.

Erman et al. [ 138 ] extend their previous work [ 136 ] to employ unidirectional TCP flows to classify network traffic using k -Means. The motivation for unidirectional flows is justified, since it may not always be possible to collect bidirectional flows due to routing asymmetries [ 138 , 347 ]. Therefore, the authors analyze the effectiveness of flows in one direction. To this end, they divide their dataset into three sets, consisting of server-to-client flows, client-to-server flows, and random flows that have a combination of both. The beginning and ending of the TCP flows are identified using SYN/SYNACK packets and FIN/RST packets, respectively. Whereas, a cluster is labeled to the traffic class with the majority of flows. As the number of clusters increase, it is possible to identify applications with a finer granularity.

It is observed that server-to-client flows consistently exhibit the highest average classification accuracy of 95 and 79% for flows and bytes, respectively. However, the random flows attain an average accuracy of 91 and 67% for flows and bytes, respectively. Whereas, the client-to-server flows show the average classification accuracy of 94 and 57% for flows and bytes, respectively. Also, the ML model using the server-to-client dataset has precision and recall values of 97% for Web traffic, while the P2P traffic had a precision of 82% and a recall of 77%. It is apparent that features from traffic in the server-to-client direction act as a good discriminators to classify unidirectional TCP flows. Furthermore, many network application’s payload size is much higher in the server-to-client direction.

4.3.3 Early and sub-flow-based traffic classification

Relying on the completion of a flow for traffic classification not only surmounts to extensive classifier training time and memory overhead, but also delays time-sensitive classification decisions. Therefore, Bernaille et al. [ 55 ] leverage the size and direction of the first few P packets from an application’s negotiation phase (i.e. during TCP connection setup) for early traffic classification. They inspect packet payload to establish flow labels and employ unsupervised k -Means clustering to model the application classes. The authors empirically deduce the optimal P and number of clusters ( k ) that strikes a balance between behavior separation and complexity. They represent a flow in a P -dimensional space, where the p t h coordinate is the size of the p t h packet in the flow and compute a behavioral similarity between flows using Euclidean distance between their spatial representations.

In the online classification phase, the distance between the spatial representation of a new flow and the centroid of all the clusters is computed. The flow is classified to the cluster with the minimum distance, hence to the dominant application class in the cluster. This approach achieves a classification accuracy exceeding 80% with P =5 and k =50 for most application classes. However, if different application classes exhibit similar behavior during the application’s negotiation phase, their flows can be easily misclassified. For example, POP3 is always misclassified as NNTP and SMTP, the dominant application classes in corresponding clusters. However, this issue can be resolved by leveraging port number as a feature during cluster composition [ 56 ], which increases the classification accuracy of POP3 to over 90%.

Key advantages of the traffic classification approach in [ 55 ] is the ability to classify the same set of application classes from another network, since network packet sizes are similar across different traces. Furthermore, as their approach does not depend on packet payload, it is suitable for classifying encrypted traffic. Though, the packet size may increase due to encryption method used, it can be adjusted for based on a simple heuristic [ 56 ]. However, a fundamental requirement is to extract the first few packets during the negotiation phase of a TCP connection. Though simplistic, it is not always possible, especially in networks that use packet sampling. Furthermore, it is impractical for networks that have high load or miss packet statistics.

The classification accuracy of stand-alone classifiers that perform early classification, can be significantly improved by combining the outputs of multiple classifiers using combination algorithms [ 108 , 121 ]. Donato et al. [ 121 ] propose Traffic Identification Engine (TIE), a platform that allows the development and evaluation of ML (and non-ML) classification techniques as plugins. Furthermore, TIE capitalizes on complementarity between classifiers to achieve higher accuracy in online classification. This is realized by using classifier output fusion algorithms, called combiners, including NB, majority voting (MV), weighted majority voting (WMV), Dempster-Shafer (D-S) [ 484 ], Behavior-Knowledge Space (BKS) method [ 199 ], and Wernecke (WER) method [ 473 ]. Note, BKS-based combiners overcome the independent classifier assumption of the other combiners [ 108 ]. However, due to the reliance of classifiers on the first few packets, it inherits the limitations of [ 56 ].

The authors [ 108 , 121 ] evaluate the accuracy of the stand-alone classifiers and the combiners. They extract features for ML from flows in proprietary dataset, which is split into 20% classifier training set, 40% classifier and combiner validation set, and 40% classifier and combiner test set. The authors label the dataset using a ground truth classifier e.g. payload-based (PL) classifier. In stand-alone mode, the J48 classifier achieves the highest overall accuracy of 97.2%. Combining the output of J48 with other classifiers (i.e. Random Tree, RIPPER, PL) using BKS method, increases the overall accuracy to 98.4%, when considering first 10 packets per bidirectional flow. Most notably, an average gain in accuracy of 42% is achieved when extracting features from only the first packet, which is significant for online classification. However, higher accuracies are achieved when PL classifier is considered by the combiners in the pool of classifiers, thus increasing computational overhead.

The objective of Nguyen et al. [ 337 ] is to design a traffic classifier that can perform well, irrespective of missing statistics, using a small number of most recent packets, called sub-flows [ 336 ]. These sub-flows are derived by using a small sliding window over each flow, of N consecutive packets and a step fraction S , to start at packet 0 and slide across the training dataset in steps of N / S . Parameters N and S are critically chosen based on the memory limitations and the trade-off between classification time and accuracy.

To ensure high accuracy of the classifier it is imperative to identify and select sub-flows that best capture the distinctive statistical variations of the complete flows. To this end, the authors manually identify sub-flow starting positions based on protocol knowledge. They leverage Assistance of Clustering Technique (ACT) [ 338 ] to automate the selection of sub-flows using unsupervised EM clustering to establish ground truth. To account for directionality, Synthetic Sub-flow Pairs (SSP) are created for every sub-flow recorded with forward and backward features swapped, both labeled as the same application class [ 335 ].

Finally, the authors in [ 337 ] use NetMate [ 21 ] to compute feature values and employ supervised NB and C4.5 DT for traffic classification using WEKA [ 288 ]. Both classifiers perform well when evaluated with missing flow start, missing directionality, or 5% independent random packet loss. However, the C4.5 DT classifier performs better with 99.3% median recall and 97% median precision, and achieved 95.7% median recall and 99.2% median precision, for sub-flow size N =25 packets for Enemy Territory and VoIP traffic, respectively. The authors also evaluate a real-world implementation of their approach, called DIFFUSE, for online traffic classification. DIFFUSE achieves a high accuracy of 98-99% for Enemy Territory (online game) and VoIP traffic replayed across the network, while monitoring one or more 1 Gb/s links. Despite the high accuracy, this technique lacks in flexibility, since the classifier can only recognize the application classes that were known a priori.

Erman et al. [ 137 ] propose a semi-supervised TCP traffic classification technique that partially overcomes a limitation of [ 55 ] and [ 337 ]—a priori knowledge of application class or protocol. They have the following objectives: (i) use a small number of labeled flows mixed with a large number of unlabeled flows, (ii) accommodate both known and unknown applications and allow iterative development of classifiers, and (iii) integrate with flow statistics collection and management solutions, such as Bro [ 350 ] and NetFlow [ 100 ], respectively.

To accomplish this, the authors employ backward greedy feature selection (BGFS) [ 173 ] and k -Means clustering to group similar training flows together, while using Euclidean distance as the similarity metric. Here, the objective is to use the patterns hidden in flows to assign them together in clusters, without pre-determined labels. Note, a small set of pre-determined labels are assigned to clusters using maximum likelihood, mapping clusters to application classes. While, other clusters remain unlabeled, accommodating for new or unknown applications. Thus, unknown flows are assigned to an unlabeled cluster. This gives network operators the flexibility to add new unlabeled flows and improve classifier performance by allowing identification of application classes that were previously unknown. The authors establish ground truth using payload-based signature matching with hand classification for validation.

In offline classification, the authors in [ 137 ] achieve over 94% flow accuracy with just two randomly labeled flows per cluster, amongst a mix of 64,000 unlabeled flows and k =400. For real-time classification, the authors leverage a layered classification approach, where each layer represents a packet milestone, that is, the number of packets a flow has sent or received within a pre-defined sliding window. Each layer uses an independent model to classify ongoing flows based on statistics available at the given milestone.

Though, the model is trained with flows that have reached each specific packet milestone, previously assigned labels are disregarded upon reclassification. A significant increase in the average distance of new flows to their nearest cluster mean is indicative of the need for retraining, which could be achieved by incremental learning. This approach not only has a small memory footprint, it allows to update the model and potentially improve classification performance [ 137 ]. The authors integrate their layered online classification in Bro and achieve byte accuracies in the 70-90% range. Furthermore, the classifier remains fairly robust over time for different traces.

Similar to [ 137 , 337 ], Li et al. [ 270 ] classify TCP traffic into application classes, including unknown application classes using a few packets in a flow. Their approach uniquely trains the ML model for the application class “Attack”, that enables early detection and classification of anomalous traffic. They employ C4.5 DT to achieve high accuracy for online classification and reduce complexity in the number of features, by using correlation-based filtering. They perform their evaluations on WEKA [ 288 ], and find C4.5 DT to outperform C4.5 DT with AdaBoost and NBKE.

The classification accuracy of C4.5 DT with 0.5% randomly selected training flows, exceed 99% for most classes except Attack, which exhibits moderate-high recall. This is because Attack is a complex application class that shows no temporal stability and its characteristics dynamically change over time. However, it may be possible to overcome this by iterative retraining of the classifier, either by using approach similar to [ 137 ] or introducing rules (e.g. based on port numbers or flow metrics) in the DT to increase temporal stability of the classifier. Furthermore, the use of port numbers in conjunction with other features results in a slightly higher accuracy. However, this leaves the classifier vulnerable to issues in port-based classification.

In contrast to the semi-supervised and unsupervised techniques for TCP and UDP traffic classification, Jin et al. [ 222 ] employ a supervised approach. They classify network traffic using complete flows, while achieving high accuracy, temporal and spatial stability, and scalability. For accuracy and scalability, their system offers two levels of modularity, partitioning flows and classifying each partition. In the first level, domain knowledge is exploited to partition a flow into m non-overlapping partitions based on flow features, such as protocol or flow size. Second, each partition can be classified in parallel, leveraging a series of k -binary classifiers. Each binary classifier, i , assigns a likelihood score that is reflective of the probability of the flow belonging to the i t h traffic class. Eventually, the flow is assigned to the class with the highest score.

They design and leverage weighted threshold sampling and logistic calibration to overcome the imbalance of training and testing data across classes. Though, non-uniform weighted threshold sampling creates smaller balanced training sets, it can distort the distribution of the data. This may violate the independent and identically distributed assumption held by most ML algorithms, invalidating the results of the binary classifiers. Therefore, logistic calibrators are trained for each binary classifier and used at runtime to adjust the prediction of the binary classifiers.

The authors in [ 222 ] evaluate their system with respect to spatial and temporal stability, classification accuracy, and training and runtime scalability. With training and testing data collected two months apart from two different locations, result in low error rates of 3 and 0.4% for TCP and UDP traffic, respectively. However, with a larger time difference between training and testing data collection, the error rates increase to 5.5 and 1.2% for TCP and UDP traffic, respectively. By employing collective traffic statistics [ 221 ] via colored traffic activity graphs (TAGs) improves the accuracy for all traffic classes, reducing the overall error rate by 15%.

This diminishes the need for frequent retraining of the classifiers. Their system also provides flexible training configuration. That is, given a training time budget it can find the suitable amount of training data and iterations of the ML algorithm. It took the system about two hours to train the classifiers resulting in the reported error rates. Furthermore, their system on a multi-core machine using multi-threads, was able to handle 6.5 million new flows arriving per minute.

4.3.4 Encrypted traffic classification

Various applications employ encryption, obfuscation and compression techniques, that make it difficult to detect the corresponding traffic. Bonfiglio et al. [ 69 ] perform controlled experiments to reverse engineer the structure of Skype messages between two Skype clients (E2E) and between a Skype client and a traditional PSTN phone (E2O). The proposed framework uses three technique to identify Skype traffic, with a focus on voice calls, regardless of the transport layer protocol, TCP or UDP. The first technique uses a classifier based on the Pearson’s χ 2 test that leverages the randomness in message content bits, introduced by the Skype encryption process, as a signature to identify Skype traffic. Whereas, the second technique is based on NB classifier that relies on the stochastic characteristics of traffic, such as message size and average inter-packet gap, to classify Skype traffic over IP. The third technique uses DPI to create a baseline payload-based classifier.

In the evaluation, the NB classifier is effective in identifying all voice traffic, while the χ 2 classifier accurately identifies all Skype traffic over UDP and all encrypted or compressed traffic over TCP. Jointly, NB and χ 2 classifier outperform the classifiers in isolation by detecting Skype voice traffic over UDP and TCP with nearly zero FP. However, higher FNs are noticeable in comparison to the isolated classifiers, as the combination disregards video and data transfers, and correctly identify only those Skype flows that actually carry voice traffic.

The identification of Skype traffic at the flow level is also addressed in Alshammari et al. [ 17 ] by employing supervised AdaBoost, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), SVM, NB and C4.5 DT classifiers. Additionally, these classifiers are used to identify Secure Shell (SSH) encrypted traffic. The authors use flow-based statistical features extracted using NetMate [ 21 ] and leverage WEKA [ 288 ] to train the classifiers using a sampled dataset for SSH, non-SSH, and Skype, non-Skype traffic. The trained models are applied to complete datasets to label flows as SSH, non-SSH, Skype and non-Skype.

In the evaluation, C4.5 DT outperform the other classifiers for the majority of datasets. For SSH traffic, it achieves 95.9% DR and 2.8% FPR on the Dalhousie dataset, 97.2% DR and 0.8% FPR on the AMP dataset, and 82.9% DR and 0.5% FPR on the MAWI dataset. Furthermore, when trained and tested across datasets (i.e. across networks), it achieves 83.7% DR and 1.5% FPR. Hence, it generalizes well from one network to another. The C4.5 DT classifier also performed well for Skype traffic with 98.4% DR and 7.8% FPR in the Dalhousie dataset. However, secure communication in SSH and HTTPS sessions can contain a variety of applications, identification of which may be needed for granularity. Unfortunately, Alshammari et al. [ 17 ] do not detect the precise applications within a secure session.

This problem is addressed by Shbair et al. [ 409 ] by adopting a hierarchical classification to identify the service provider (e.g. google.com, dropbox.com), followed by the type of service (e.g. maps.google.com, drive.google.com) that are encapsulated in TLS-based HTTPS sessions. They start with the reconstruction of TLS connections from the HTTPS traces and label them using the Server Name Identification (SNI) field, creating the service provider-service hierarchy. These labeled connections are used to build: (i) a classifier to differentiate between service providers, and (ii) a classifier for each service provider to differentiate between their corresponding services. This hierarchical approach reduces the effort required to retrain the classifiers in the event of an addition of a new service. They use statistical features extracted over encrypted payload with CFS and employ C4.5 DT and RF classifiers.

In the evaluation, RF performs better in comparison to C4.5 DT with a precision of 92.6%, recall of 92.8%, and F-measure of 92.6%, to classify service providers with selected features. Furthermore, the accuracy of service classification is between 95-100% for majority of the providers. Thus, asserting the benefit of a hierarchical approach to traffic classification. Also, overall accuracy of the system across both levels is 93.10% with a degradation of less than 20% over a period of 23 weeks without retraining.

4.3.5 NFV and SDN for traffic classification

Recent advances in network paradigms, such as Network Functions Virtualization (NFV) and SDN enable flexible and adaptive techniques for traffic classification. The efforts discussed in this subsection present contrasting approaches for traffic classification using ML in softwarized and virtualized networks.

It is well-known that the performance of classifiers vary significantly based on the type of flow features used. Furthermore, flows inherently exhibit specific characteristics of network applications and protocols. Therefore, finding the ideal set of features is fundamental to achieve efficiency in traffic classification. In a preliminary effort, He et al. [ 182 ] propose a NFV-based traffic-driven learning framework for traffic classification, called v TC. v TC consists of a controller, and a set of ML classifiers and feature collectors as virtual network functions (VNFs). Their objective is to dynamically select the most effective ML classifiers and the most cost-efficient flow features, by leveraging a controller and a group of VNFs, for traffic classification. The v TC framework strives to achieve a balance between classification accuracy and speed, and the choice of features have a significant impact on these criteria. Therefore, it is critical to determine the most suitable classifier and dynamically adjust feature collection for a given flow protocol (e.g. TCP, UDP, ICMP).

The cost of extracting different features vary from one another. The same holds true for the execution of classifiers. Therefore, it is important to: (i) identify whether a feature should be collected on the data plane or the control plane, and (ii) have a centralized view of network resources while selecting the appropriate classifier. The controller in v TC is responsible for maintaining the ML models from offline training, and selecting the most suitable classifier and flow features to collect by chaining the corresponding VNFs at runtime. It also monitors the load on the VNFs for scaling resources, if necessary. The adaptive selection of features and classifiers in v TC, based on the flow protocol, result in an accuracy of 95.6%. However, the performance overhead of adaptive selection of classifiers and features, and chaining of corresponding VNFs in v TC is not discussed. Furthermore, fine-grained classification and corresponding results are missing.

SDN offers built-in mechanisms for data collection via the OpenFlow (OF) protocol. Amaral et al. [ 19 ] harness SDN and OF to monitor and classify TCP enterprise network traffic. They leverage ML to extract knowledge from the collected data. In their architecture, a SDN application collects flow statistics from controlled switches and pro-actively installs a flow entry to direct all packets to the controller. For TCP traffic, the controller skips the TCP control packets, and stores the features of sizes, timestamps, MAC and IP addresses, and port numbers for the first five packets, along with their inter-arrival times. Then, the controller installs a flow entry with an idle timeout for local processing at the switch. Upon timeout, flow features of packet count, byte count and duration are collected at the controller.

The collected features are pruned using PCA and adjusted to eliminate high variability and scaling effects. However, the use of port numbers as a feature leaves the classifiers susceptible to issues in port-based classification. Nevertheless, the authors evaluate three ensemble ML classifiers, namely RF, Stochastic Gradient Boosting (SGBoost) and Extreme Gradient Boosting (XGBoost). The results exhibit high accuracy for some application classes (e.g. Web Browsing), while poor performance for others (e.g. LinkedIn). The authors do not provide justifications for the performance of the classifiers. However, this can be attributed to the fairly small training dataset used in their evaluation.

In contrast, Wang et al. [ 462 ] propose a framework to classify network traffic to QoS classes rather than applications. They assume that applications with similar QoS requirements exhibit similar statistical properties. This allows for equal treatment of different applications having similar QoS requirements. Their framework consists of two components: (i) traffic identification component that resides in switches at the network edge, to detect QoS-significant (i.e. elephant or long-lived) flows, and (ii) QoS aware traffic classification engine in the SDN controller that leverages DPI (for offline labeling) and semi-supervised ML to map long-lived flows to QoS classes. A significant number of flows remain unlabeled due to limited information on all possible/existing applications, thus calling for semi-supervised learning.

Similar to [ 137 ], periodic retraining of classifier is required to cater to new applications. The Laplacian-SVM classifier employed uses flow features from the first twenty packets to classify flows into QoS classes. Furthermore, they employ forward feature selection to reduce the number of features to nine from the initial sixty features. In the evaluation, the accuracy of classifying long-lived flows to QoS classes exceed 90%. However, the performance of the proposed framework is not evaluated, especially in light of the entropy-based features used for traffic classification.

4.4 Summary

Traditionally, Internet traffic has been classified using port numbers, payload and host-based techniques. Port-based techniques are unreliable and antiquated, largely due to the use of dynamic port negotiation, tunneling and misuse of port numbers assigned to well-known applications for obfuscating traffic and avoiding firewalls [ 54 , 109 , 176 , 286 ]. In contrast, payload-based techniques are designed to inspect application payload. Though, they are computationally intensive and complicated due to encryption, supervised and unsupervised ML has been successfully applied for traffic classification with high accuracy. Generally, unencrypted handshake payload is used for traffic classification, which is infeasible for high data rate links. On the other hand, long-lived UDP traffic lends itself to supervised payload-based traffic classification, where payload is inspected randomly in an observation window [ 146 ]. However, it is not widely applicable and is highly sensitive to the observation window size. Similarly, host-based traffic classification is highly susceptible to routing asymmetries.

In contrast to these myopic approaches, flow feature-based traffic classification techniques inspect the complete communication session, which includes all consecutive, unidirectional packets in the network. This is the most widely studied technique for traffic classification that leverages both supervised and unsupervised ML. In supervised learning, various kernel estimation, NN and SVM-based ML techniques have been employed to achieve high accuracy. Though, traditional kernel estimation techniques are simple and effective, their underlying assumptions are unrealistic and infeasible for traffic classification. In this light, NBKE has been explored for traffic classification, but NN-based traffic classification has shown higher accuracy with probabilistic and, or Bayesian trained weights. Similarly, traditional and multi-class SVM have been applied jointly to increase the accuracy of traffic classification and its applicability to large datasets [ 464 ].

Rarely do network operators have complete information about all the applications in their network. Therefore, it is impractical to expect complete a priori knowledge about all applications for traffic classification. Therefore, unsupervised ML techniques have been explored for practical traffic classification using flow features. For traffic classification with unsupervised ML, both hard and soft clustering techniques have been investigated. Since flow features from applications can exhibit high similarity, it is unrealistic to apply hard clustering for fine-grained traffic classification. On the other hand, soft clustering achieves the required granularity with density-based clustering techniques, which also has a lower training time than EM-based soft clustering technique.

Complete flow feature-based traffic classification has been shown to achieve high spatial and temporal stability, high classification accuracy, and training and runtime scalability [ 222 ]. However, it requires extensive memory for storage and delays time sensitive classifier decisions. Flow feature-based traffic classification can be achieved using only a small of number of packets in a flow, rather than the complete flow. These sub-flows can be extended synthetically [ 55 ] or derived from a small sliding window over each flow [ 337 ]. Sub-flow-based traffic classification achieves high accuracy using fast and efficient C4.5 DT classifier with correlation-based filtering. Similar to payload-based traffic classification, encryption can also complicate flow feature-based traffic classification. However, it is possible to circumvent these challenges. For instance, a hierarchical method that identifies service provider followed by type of service, using statistical features from encrypted payload has been highly accurate and temporally stable [ 409 ].

Undoubtedly, supervised ML lends itself to accuracy in traffic classification, while unsupervised techniques are more robust. Consequentially, joint application of supervised and unsupervised ML for traffic classification [ 137 , 497 ] has demonstrated success. Not only are semi-supervised classifiers resilient, they can be easily adapted for zero-day traffic or retrained for increased accuracy against previously unknown applications. Recent advances in networking increase the opportunities in traffic classification with SDN- and NFV-based identification of applications and classes of QoS. Though, some preliminary work in this area has achieved high accuracy, more scrutiny is required with respect to their resilience, temporal and spatial stability, and computational overhead. Most importantly, it is imperative to assess the feasibility of these technologies for time sensitive traffic classification decisions.

5 Traffic routing

Network traffic routing is fundamental in networking and entails selecting a path for packet transmission. Selection criteria are diverse and primarily depend on the operation policies and objectives, such as cost minimization, maximization of link utilization, and QoS provisioning. Traffic routing requires challenging abilities for the ML models, such as the ability to cope and scale with complex and dynamic network topologies, the ability to learn the correlation between the selected path and the perceived QoS, and the ability to predict the consequences of routing decisions. In the existing literature, one family of ML techniques has dominated research in traffic routing, Reinforcement Learning.

Recall, RL employs learning agents to explore, with no supervision, the surrounding environment, usually represented as a MDP with finite states, and learn from trial-and-error the optimal action policy that maximizes a cumulative reward. RL models are as such defined based on a set of states \(\mathcal S\) , a set of actions per state \(\mathcal A(s_{t})\) , and the corresponding rewards (or costs) r t . When \(\mathcal S\) is associated with the network, a state s t represents the status at time t of all nodes and links in the network. However, when it is associated with the packet being routed, s t represents the status of the node holding the packet at time t . In this case, \(\mathcal A(s_{t})\) represents all the possible next-hop neighbor nodes, which may be selected to route the packet to a given destination node. To each link or forwarding action within a route may be associated an immediate static or dynamic reward (respectively cost) r t according to a single or multiple reward (respectively cost) metrics, such as queuing delay, available bandwidth, congestion level, packet loss rate, energy consumption level, link reliability, retransmission count, etc.

At routing time, the cumulative reward, i.e. the total reward accumulated by the time the packet reaches its destination, is typically unknown. In Q-learning, a simple yet powerful model-free technique in RL, an estimate of the remaining cumulative reward, also known as Q-value , is associated with each state-action pair. A Q-learning agent learns the best action-selection policy by greedily selecting at each state the action a t with highest expected Q-value \(\max _{a\in \mathcal A(s_{t})}Q(s_{t},a)\) . Once the action a t is executed and the corresponding reward r t is known, the node updates the Q-value Q ( s t , a t ) accordingly as follows:

α (0< α ≤1) and γ (0≤ γ ≤1) denote the learning rate and discount factor respectively. The closer α is to 1, the higher is the impact of the most recently learned Q-value. While higher γ values make the learning agent aim for longer-term high rewards. Indeed, the greedy action-selection approach is only optimal if the learning agent knows the current Q-values of all possible actions. The agent can then exploit this knowledge to select the most rewarding action. If not, an ε -greedy approach may be used such that with probability ε the agent chooses to explore a random action rather than choosing deterministically the one with highest Q-value.

Though, RL is gaining a lot of attention these days, its application in network traffic routing dates back to the early 1990s. Boyan and Littman’s [ 71 , 280 ] seminal work introduced Q-routing, a straight-forward application of the Q-learning algorithm to packet routing. In Q-routing, a router x learns to map a routing policy, such as routing to destination d via neighbor y , to its Q-value. The Q-value is an estimate of the time it will take for the packet to reach d through y , including any time the packet would have to spend in node x ’s queue plus the transmission time over the link x , y . Upon reception of the packet, y sends back to x the new estimated remaining routing delay, and x adjusts accordingly its Q-value based on a learning rate. After convergence of the algorithm, optimal routing policies are learned.

Q-routing does not require any prior knowledge of the network topology or traffic patterns. However, experiments on a 36-node network demonstrated the Q-routing outperforms the shortest path first routing algorithm in terms of average packet delivery time. It was also found that, although Q-routing does no exploration or fine-tuning after policies and Q-values are learned, it still outperforms in a dynamically changing network topology, a full-echo Q-routing algorithm where the policy is dynamically adjusted to the current estimated time to destination. In fact, under heavy load, the full-echo Q-routing algorithm constantly changes the routing policy creating bottlenecks in the network. On the contrary, the original Q-routing shows better stability and robustness to topology changes under higher loads.

Since then the application of Q-learning to packet routing has attracted immense attention. A number of research efforts from late 1990s and early 2000s, built on and proposed improvements to Q-learning, resulting in three main research directions: (a) improving the performance of Q-routing to increase learning and convergence speed [ 96 , 254 ], (b) leveraging the low complexity of Q-learning and devising Q-learning-inspired algorithms adapted to the specificities of the network (e.g. energy-constrained networks) and/or routing paradigm (e.g. multicast routing [ 430 ]), and (c) enforcing further collaboration between the routing learning agents to achieve complex global performance requirements [ 424 , 479 ].

In 1996, a memory-based Q-learning algorithm called predictive Q-routing (PQ-routing) is proposed to keep past experiences to increase learning speed. PQ-routing keeps past best estimated delivery times to destination via each neighboring node y and reuses them in tandem with more current ones. In 1997, Kumar et al. apply dual reinforcement Q-routing (DRQ-Routing) to minimize packet delivery time [ 254 ]. DRQ-Routing integrates dual reinforcement learning [ 167 ] with Q-routing, so that nodes along the route between the source and the destination receive feedbacks in both directions (i.e. from both the up-stream and down-stream nodes). Both PQ-routing and DRQ-Routing are fully distributed as in Q-routing, and use only local information plus the feedbacks received from neighboring nodes. While PQ-routing shows better performance than Q-routing at lower-loads, DRQ-routing converges faster and the protocol shows better overall performance at the cost of slightly increased communication overhead due to backward rewards.

The problem of ML-based multicast routing was first addressed by Sun et al. [ 430 ] in the context of MANETs. Q-MAP, a Q-learning-based algorithm, was proposed to find and build the optimal multicast tree in MANETs. In Q-MAP, Q-values are associated with different upstream nodes and the best Q-values are disseminated directly from the sinks to the nodes thus making exploration of routes unnecessary, while speeding up the convergence of the learning process. Indeed an exploration-free approach eventually leads to maximum routing performance since only actions with maximum Q-values are selected, however it reduces the protocol to a static approach that is insensitive to topology changes.

The traditional single-agent RL model, which is greedy in nature, provides local optimizations regardless of the global performance. Therefore, it is not sufficient to achieve global optimizations such as network lifetime maximization or network-wide QoS provisioning. Multi-Agent Reinforcement Learning (MARL) entails that, in addition to learning information from the environment, each node exchanges local knowledge (i.e. state, Q-value, reward) and decisions (i.e. actions) with other nodes in the network in order to achieve global optimizations. This helps the nodes to consider not only their own performance, but also the one of their neighbors and eventually others, in selecting the routing policy. Generally, this comes at the price of increased complexity as the state is a joint state of all the learning agents, and the transitions are the result of the joint action of all the agents in the system. Q-routing and Q-routing-inspired approaches like PR-routing and DRQ-Routing do use a form of MARL, as Q-values are exchanged between neighboring nodes. This form of MARL is soft in that it is easy to implement and has low communication and computational complexity as opposed to the general more complex form of MARL like in [ 124 , 425 ].

Team-partitioned opaque-transition reinforcement learning (TPOT-RL), proposed by Stone and Veloso for the RoboCup-1998 (Robot Soccer World Cup II) [ 425 ], is the first fully collaborative MARL technique to be applied to packet routing [ 424 ]. Routing was used by the authors as a proof-of-concept of the applicability of their algorithm to real world problems. However, in practice this algorithm has high computational complexity considering the very large number of states to be explored, and high communication overhead as every routed packet is acknowledged back by the sink along the path from the source for reward computation.

These early works paved the way to a decade of continued and prolific research in the area. While existing research studies preeminently consider routing as a decentralized operation function, and as such distribute the learning function across the routing nodes, works like [ 276 , 461 ] take a centralized and a partially decentralized approach respectively. In the following we discuss representative works in the area and summarize them in Table  9 .

5.1 Routing as a decentralized operation function

RL when applied in a fully distributed fashion, turns each routing node into a learning agent that makes local routing decisions from information learned from the environment. Routing nodes can take their decisions either independently or through collaboration in a multi-agent system fashion.

In [ 151 ], Forster et al. use a Q-learning approach in a multicast routing protocol, called FROMS (Feedback Routing for Optimizing Multiple Sinks). The goal of FROMS is to route data efficiently, in terms of hop count, from one source to many mobile sinks in a WSN by finding the optimal shared tree. Like in [ 430 ], a FROMS node is a learning agent that runs Q-learning to incrementally learn the real costs of different possible routes. Its state is updated with every data packet that needs to be routed, and the set of actions is defined by the possible next hop neighbors ( n i ) and their route to the sinks \(\left (hops^{n_{i}}_{D_{p}}\right)\) . Rewards are received back from the upstream nodes and used to update the Q-values of the corresponding actions. However, unlike [ 430 ], next-hop neighbors are selected using a variant of the ε − g r e e d y algorithm, such that the routing algorithm alternates between an exploration phase and a greedy exploitation phase. FROMS shows up to 5 times higher delivery rates than the popular directed diffusion algorithm [ 205 ] in the presence of node failure, and 20% less network overhead per packet due to route aggregation.

Arroyo-Valles et al. [ 24 ] propose Q-probabilistic routing (Q-PR), a localization-aware routing scheme for WSNs that applies Q-learning to achieve a trade-off between packet delivery rate, expected transmission count (ETX), and network lifetime. A node’s decision as to drop a packet or forward it to one of the neighbors is a function of the energy cost at transmission and reception, packet priority, and the ETX to the sink through the neighbor. A node greedily chooses among its next-hop candidate neighbors the one that minimizes the cost of the route to the sink, which is estimated by the Q-value of the nodes. It updates its Q-value every time it relays a packet, and broadcast it so it is received by its neighbors. Experimental evaluations are carried out through simulations with over 50 different topologies of connected networks.

Q-PR is compared to the greedy perimeter stateless routing algorithm (GPSR) and the Expected progress-Face-Expected progress (EFE), both localization-aware routing algorithms. Results show that Q-PR as well as EFE outperform GPSR in terms of successful delivery rate (over 98% against 75.66 % ). Moreover, Q-PR shows lower number of retransmission retries and acknowledgements (on average over 50% and 40% less than GPSR and EFE respectively). Thus the Q-PR algorithm preserves better the lifetime of the WSN (3× and 4× more than GPSR and EFE respectively). However the algorithm requires that each node maintains locally a number of information regarding each of its neighbors. These include the distance between the nodes, the distance of the neighbor to the sink, the delivery probability between nodes, the estimated residual energy at the neighboor, and the 2-hop neighboors. This hampers the scalability of the approach.

Hu and Fei [ 197 ] propose QELAR, a model-based variant of the Q-routing algorithm, to provide faster convergence, route cost reduction, and energy preservation in underwater WSNs. In QELAR, rewards account for both the packet transmission energy (incurred for forwarding the packet to the neighbor node) and the neighbor node’s residual energy. Taking into account the residual energy helps achieve a balanced energy distribution among nodes by avoiding highly utilized routes (hotspots). A model representation for each packet is adopted such that the state is defined as per which node holds the packet. Next-hop nodes are selected greedily based on their expected Q-values. The latter are maintained by the node along with corresponding transition probabilities learned at runtime. Each time a node forwards a packet, it appends its Q-value along with its energy level.

QELAR is evaluated and compared against the vector-based forwarding protocol (VBF) through simulations with 250 mobile sensor nodes uniformly deployed in a 3D space. Results show that QELAR is 25% more energy efficient than VBF. The lifetime of the network is 25 % ∼ 30 % higher with QELAR in the presence of failures and network partition compared with VBF with comparable transmission range, which makes QELAR more robust to faults. Whereas, both show comparable routing efficiency and delivery rates. On the other hand, further research could be pursued to study the convergence speed of the model-based learning algorithm of QELAR compared to the model-free Q-learning when appropriate learning rate and discount factor are used.

In [ 277 ] Lin and Schaar address the problem of routing delay-sensitive applications in the more general context of multi-hop wireless ad hoc networks. They rely on a n -step temporal difference (TD) [ 433 ] learning method, and aim at reducing the frequency of message exchange, and thus the communication overhead without jeopardizing the convergence speed. The routing protocol is evaluated in a simulated multi-hop network with 2 sources transmitting videos to a same destination node. Results show that by reducing the frequency of message exchange by 95% (from every 1 m s to every 20 m s ), the goodput and effective data rate are increased by over 40%, and the video quality, calculated in terms of peak signal-to-noise ratio (PSNR), is increased by 10%. The convergence time seems to be only slightly affected (1 ∼ 2 s e c ). This is an interesting finding considering the bandwidth that can be saved and the interferences that can be avoided by spacing information exchanges.

Bhorkar et al. also address the problem of routing in multi-hop wireless ad hoc networks. They propose d-AdaptOR [ 59 ], a distributed adaptive opportunistic routing protocol which minimizes the average packet routing cost. d-AdaptOR is based on Q-learning with adaptive learning rate.

In opportunistic routing, instead of pre-selecting a specific relay node at each packet transmission as in traditional routing, a node broadcasts the data packet so that it is overheard by multiple neighbors. Neighbors who successfully acknowledge the packet form the set of candidate relays. The node will then choose among the candidate relays the one that will be forwarding the packet to destination. This property is an opportunity for the Q-learner to receive from the candidate relays their up-to-date Q-values. Traditionally, in Q-learning action selection is based on older, previously received Q-values.

Routing in d-AdaptOR consists of four main steps: (1) the sender transmits the data packet, (2) neighbors acknowledge the packet while sending its Q-value, the estimated cumulative cost-aware packet delivery reward, (3) the sender selects a routing action, either a next-hop relay or the termination of packet transmission, based on the outcome of the previous step using an ε -greedy selection rule (4) after the packet is transmitted, the sender updates its own Q-value at a learning rate that is specific to the selected next-hop relay. The learning rate is adjusted using a counter that keeps track of the number of packets received from that neighbor node. The higher the value of the counter, the higher is the convergence rate, though at the expense of Q-values fluctuations. Indeed the value of the counter depends also on the frequency of explorations. Further research could be pursued to investigate the optimal exploration-exploration strategy and the effects of different strategies on the convergence rate.

d-AdaptOR performance was investigated on the QualNet simulator using a random network benchmark consisting of 36 randomly placed wireless nodes. Simulations show that d-AdaptOR consistently outperforms existing adaptive routing algorithms, in terms of number of retransmissions per packet. Further study could be pursued to investigate the added value of node-specific learning rates in Q-value computation, compared to the traditional node-oblivious learning rate that is more efficient in terms of storage and computation.

Xia et al. [ 482 ] apply a spectrum-aware DRQ-routing approach in cognitive radio networks. In CRNs, the availability of a channel is dynamic, and is dependent on the activity level of the primary user (PU). The purpose of the routing scheme is to enable a node to select a next-hop neighbor node with higher estimate of total number of available channels up to destination. Indeed, higher number of available channels reduces channel contention, and hence reduces the MAC layer delay. However, relying on the total number of available channels along the path to destination can lead to very poor results in practice. The dual DRQ-routing approach was tested through simulations on a tailored stationary (non-mobile) multi-hop network topology with 10 cognitive radio nodes and 2 PUs operating on different channels. DRQ-routing was also compared against spectrum-aware Q-routing and spectrum-aware shortest path routing (SP-routing) at different activity levels. Simulation results show that after convergence, DRQ-routing minimizes end-to-end delay, is faster to converge than Q-routing (50% faster at lower activity level), and that it significantly reduces end-to-end delay compared to SP-routing at higher activity levels. However, although the nodes are not mobile and the topology is fixed, the convergence time at a 2 packet/s activity level is around 700sec which implies that 1400 periods have elapsed before DQR-routing has converged. As the activity level reaches 2.75packet/s, over 3000 periods are necessary for DRQ-routing to converge. These numbers are quite significant, but that is not surprising considering that a discount factor of 1 was used.

Elwhishi et al. [ 133 ] propose a Collaborative Reinforcement Learning (CRL) -based routing scheme for delay tolerant networks. CRL is an extension to RL introduced by Dowling et al. in 2004 for solving system-wide optimization problems in decentralized multi-agent systems with no global state [ 123 ], and was first applied to routing in MANETs by the same authors in [ 124 ]. Routing schemes for delay tolerant networks are characterized by the lack of end-to-end aspect, and each node explores network connectivity through finding a new link to a next-hop neighbor node when a new packet arrives, which must be kept in the buffer while a link is formed. SAMPLE, the proposed routing mechanism, selects a reliable next-hop neighbor node while taking into account three factors; two factors relevant to the channel availability (node mobility and congestion level), and a factor relevant to the buffer utilization (remaining space in the buffer). These are learned through feedback exchange among agents. Tested with different network topologies and mobility models, SAMPLE shows better performance than the traditional AODV and DSR routing algorithms in terms of packet delivery ratio and throughput.

5.2 Routing as a partially decentralized operation function

In [ 461 ] Wang et al. present AdaR, a routing mechanism for WSNs based on a centralized implementation of the model-free Least Squares Policy Iteration (LSPI) RL technique [ 258 ]. AdaR uses an offline learning procedure, and is claimed to converge to the fixed point routing policy faster than the traditional Q-learning. The algorithm takes into account the node’s load, its residual energy, and hop count to the sink, as well as the reliability of the links. The algorithm runs in learning episodes. The base station is the learning agent, while the routing nodes are passive in terms of learning. However, actions are selected by the routing nodes in a decentralized fashion based on the Q-values assigned by the base station, and the ε -greedy selection algorithm. During each episode, the current Q-values are used to select a route to the base station. At each hop, the full hop information is appended to the packet and is used by the base station to calculate immediate rewards. When the base station has received enough information (the required number of packets is undefined), it calculates the new Q-values of the nodes offline, and disseminates them via a network-wide broadcast.

AdaR is tested on a simulated WSN with varying node residual energy and link reliability. Results show that the algorithm converges faster than Q-learning; a routing success rate of ∼  95 % with a low deviation was reached even before the 5th learning episode, whereas, it took 40 episodes for Q-learning to reach comparable success rates. This can be explained by Q-learning’s initial Q-values and the selected learning rate ( α =0.5). Appropriate initial Q-values and higher learning rate would have helped Q-learning converge faster. In fact, the authors show that Q-learning is more sensitive to the initial choice of Q-values than AdaR. Indeed AdaR has some useful properties, like taking into account different routing cost metrics and having faster convergence time. However, this comes at the price of higher computational complexity, and communication overhead due to the growing size of the packets at each hop and the broadcasting of Q-values, which also makes it more sensitive to link failures and node mobility.

5.3 Routing as a centralized control function

More recently, a centralized SARSA with a softmax policy selection algorithm has been applied by Lin et al. [ 276 ] to achieve QoS-aware adaptive routing (QAR) in SDN. Although a multi-layer hierarchical SDN control plane is considered by the authors, the proposed SARSA-based routing algorithm is not specific to such an architecture, and is meant to run on any controller that has global visibility of the different paths and links in the network.

For each new flow, the first packet is transmitted by the switch to the controller. The controller implicitly recognizes the QoS requirements of the flow, calculates the optimal route using the SARSA-based QAR algorithm, and accordingly updates the forwarding tables of the switches along the path. The QoS requirements consist in what metric to minimize/maximize (delay, loss, throughput, etc.). They are used to control the weight of each metric in the reward function.

It is suggested that the controller iterates the SARSA algorithm until convergence, which in practice results in delayed routing. The question is, how long is the delay and how suitable is the solution for real-time traffic. Also the impact of routing new flows on the QoS of other flows in the network is overlooked. If the flow is an elephant flow, it may congest the links and severely impact the QoS of flows with tight delay requirements.

5.4 Summary

The low computational and communication requirements of traditional RL algorithms, in particular Q-learning, and their ability to perform well at finding an optimal solution and adapting to changes in the environment, have motivated their—reportedly successful—application to traffic routing in a variety of network settings, as shown in Table  9 .

Different approaches have been considered in applying RL to the traffic routing problem. These approaches vary in terms of: (i) level of distribution of the learning capability, and (ii) level of collaboration among multiple learners. Clearly, different approaches lend themselves more naturally to different network topologies and utility functions. For instance, in SDN [ 276 ] as well as WSN, the existence of a central node—the controller in SDN and the sink in WSN, respectively—allows for centralized learning. Whereas, routing in wireless ad hoc networks calls for decentralized RL [ 59 , 277 ] where the learning capability is distributed among the routing nodes.

For the nodes to select the optimal routing policy, they need to evaluate different routing policies (actions) against a given utility function (reward). Rewards can be calculated in a central node, such as a sink or base station like in AdaR [ 461 ]. Alternatively, rewards are locally estimated by the nodes, which requires the nodes to exchange information. The nature and the amount of information, as well as the dissemination process, vary according to the utility function, as shown in Table  9 . Indeed utility functions such as QoS provisioning, load balancing and network lifetime maximization, as in Q-PR [ 24 ], QELAR [ 197 , 277 ], require more information to be disseminated at the cost of an increased complexity and communication overhead.

It is also important to notice that learners are very loosely coupled in most recently adopted decentralized RL approaches, where routers tend to select routing policies in an asynchronous, independent, very soft MARL fashion. Clearly, MARL aims at coordinating learning agents in order to achieve the optimal network-wide performance. This should further enhance the routing performance. However, several challenges arise from MARL. In fact, the difficulty of defining a good global learning goal, the overhead for an agent to coherently coordinate with other learning agents, and the longer convergence time can be prohibitive when applying MARL to realistic problem sizes. Indeed, there is a need for understanding the trade-off between benefits and overhead when applying MARL, particularly in resource-constrained and dynamic wireless networks where coordination has eventually a lot to offer.

6 Congestion control

Congestion control is fundamental to network operations and is responsible for throttling the number of packets entering the network. It ensures network stability, fairness in resource utilization, and acceptable packet loss ratio. Different network architectures deploy their own set of congestion control mechanisms. The most well-known congestion control mechanisms are those implemented in TCP, since TCP along with IP constitute the basis of the current Internet [ 13 ]. TCP congestion control mechanisms operate in the end-systems of the network to limit the packet sending rate when congestion is detected. Another well-known congestion control mechanism is queue management [ 72 ] that operates inside the intermediate nodes of the network (e.g. switches and routers) to complement TCP. There have been several improvements in congestion control mechanisms for the Internet and evolutionary network architectures, such as Delay-Tolerant Networks (DTN) and Named Data Networking (NDN). Despite these efforts, there are various shortcomings in areas such as packet loss classification, queue management, Congestion Window (CWND) update, and congestion inference.

This section describes several research works that demonstrate the potential of applying ML to enhance congestion control in different networks. Majority of the techniques have been applied to TCP/IP networks. It is important to note that the first ML-based approaches for congestion control were proposed in the context of asynchronous transfer mode (ATM) networks [ 175 , 264 , 284 , 437 ]. However, we exclude these works from the survey because, to the best of our knowledge, this type of network has a low impact on present and future networking research interests [ 177 ].

6.1 Packet loss classification

In theory, TCP works well regardless of the underlying transmission medium, such as wired, wireless, and optical. In practice the standard TCP congestion control mechanism has been optimized for wired networks. However, the major problem in TCP is that it recognizes and handles all packet losses as network congestion, that is buffer overflow. Hence, performing unjustified congestion control when a loss is due to other reasons, such as packet reordering [ 150 ], fading and shadowing in wireless networks [ 130 ], and wavelength contention in optical networks [ 214 ]. As a consequence, TCP unnecessarily reduces its transmission rate at each detected packet loss, lowering the end-to-end throughput.

Therefore, the TCP throughput for wireless networks can be improved by accurately identifying the cause of packet loss [ 34 , 62 , 490 ] and reducing the TCP transmission rate only when congestion is detected. However, TCP congestion control has no mechanism for identifying the cause of packet loss. We term this problem as packet loss classification and various efforts have been made to propose solutions to this problem. In general, the solutions for packet loss classification fall in two broad categories, depending on where the solution is implemented in the network, that is, at intermediate nodes or in end-systems. The former requires additional implementation at the intermediate nodes that either hide the error losses from the sender [ 32 , 33 ], or communicate to the sender extra statistics about the network state, such as congestion notification [ 483 ] and burst acknowledgment (ACK) [ 490 ]. It is important to mention that hiding error losses may violate TCP end-to-end principle as it may require splitting the TCP connection by sending an ACK to the sender before the packet arrives at the receiver [ 129 ].

In the latter approach, end-systems are complemented with solutions, such as TCP-Veno [ 156 ] and TCP-Westwood [ 463 ]. These leverage information available at end-systems, such as inter-arrival time (IAT), round-trip time (RTT), and one-way delay, to distinguish causes of packet loss and aid TCP congestion control mechanism. However, it has been shown that it is difficult to perform a good classification using simple tests, such as the ones implemented by TCP-Veno and TCP-Westwood, on these metrics, since they lack correlation to the cause for packet loss [ 60 ].

Therefore, various ML-based solutions have been proposed for packet loss classification in end-systems for different networks, such as hybrid wired-wireless [ 38 , 129 , 130 , 163 , 282 ], wired [ 150 ], and optical networks [ 214 ]. Generally, the classifier is trained offline, leveraging diverse supervised and unsupervised ML algorithms for binary classification. The majority of these techniques use the metrics readily available at end-systems, and evaluate their classifier on synthetic data on network simulators, such as ns-2 [ 203 ]. We delineate the proposed ML-based solutions for packet loss classification in Table  10 and discuss these techniques in this subsection.

Liu et al. [ 282 ] proposed, to the best of our knowledge, the first approach using ML for inferring the cause of packet loss in hybrid wired-wireless networks. Particularly, they distinguish between losses due to congestion and errors in wireless transmission. They employ EM to train a 4-state HMM based on loss pair RTT values, that is RTT measured before a packet loss. The Viterbi algorithm [ 455 ] is applied on the trained HMM to infer the cause of packet loss. The resultant ML-based packet loss classifier exhibits greater flexibility and superiority over TCP-Vegas [ 73 ]. Since, TCP-Vegas has been shown to outperform non-ML-based packet loss classifiers [ 60 ], the ML-based solution of [ 282 ] was fundamental in creating a niche and instigating the feasibility of ML-based solutions for packet loss classification problems. However, the authors assume that the RTT values never change during measurement. This is an unrealistic assumption since a modification in the return path changes the RTT values without affecting the cause of packet loss. Thus, affecting the correlation between RTT and cause of packet loss.

Barman and Matta [ 38 ] use EM on a 2-state HMM and consider discrete delay values to improve the accuracy of the above packet loss classifier, though at the expense of a higher computational cost. This work substitutes the Viterbi algorithm with a Bayesian binary test that provides comparable accuracy, while being computationally efficient. However, this ML-based packet loss classifier, unlike others, requires support from the network to obtain one of its input features, the estimated probability of wireless loss. Furthermore, [ 38 , 282 ] evaluate their packet loss classifiers on simple linear topologies, which is far from realistic network topologies.

In contrast, El Khayat et al. [ 129 , 130 , 163 ] simulate more than one thousand random hybrid wired-wireless topologies for collecting a dataset of congestion and wireless error losses. The authors compute 40 input features from this dataset by using information that is only available at end-systems, including one-way delay and IAT of packets preceding and succeeding a packet loss. Several supervised ML algorithms are leveraged to build packet loss classifiers using these features. All the classifiers achieve a much higher classification accuracy than non-ML solutions, such as TCP-Veno and TCP-Westwood. In particular, Boosting DT with 25 trees provide the highest accuracy and the second fastest training time. It is important to realize that the training time of DT is the fastest, with a small reduction in accuracy of less than 4% compared to Boosting DT. Therefore, in case of computational constraints, DT achieves the best balance between accuracy and training time.

The authors continue on to improve TCP with the Boosting DT classifier, which exhibit throughput gains over the standard TCP-NewReno [ 185 ] and TCP-Veno. The results also show that the improved TCP can maintain a fair link share with legacy protocols (i.e. TCP-friendly). Their ML-based packet loss classifier is flexible and enables the selection between TCP throughput gain and fairness without retraining the classifier.

On the other hand, Fonseca and Crovella [ 150 ] focus on detecting the presence of packet loss by differentiating Duplicated ACKs (DUPACK) caused by congestion losses and reordering events. Similar to [ 282 ], they employ loss pair RTT as an input feature, however, to infer the network state and not the state of a single TCP connection. Thus, avoiding the poor correlation between RTT and the cause of packet loss. The authors construct a Bayesian packet loss classifier that achieves up to 90% detection probability with a false alarm of 20% on real wired network datasets from the Boston University (BU) and Passive Measure Analysis (PMA) [ 1 ]. The performance is superior for the BU dataset due to the poor quality of RTT measurements in the PMA dataset. In addition, the authors adapt an analytic Markov model to evaluate a TCP variant enhanced with the Bayesian packet loss classifier, resulting in a throughput improvement of up to 25% over the standard TCP-Reno.

In the context of optical networks, Jayaraj et al. [ 214 ] tackle the classification of congestion losses and contention losses in Optical Burst Switching (OBS) networks. The authors collect data by simulating the National Science Foundation Network (NSFNET) with OBS modules and derive a new feature from the observed losses, called the number of burst between failures (NBBF). They construct two ML-based packet loss classifiers by applying EM for both HMM and clustering. These classifiers integrate two TCP variants that keep a low control overhead for providing better performance (e.g. higher throughput and fewer timeouts) over the standard TCP-NewReno [ 185 ] and TCP-SACK [ 299 ], and Burst-TCP [ 490 ] for OBS networks. The TCP variant using EM for clustering perform slightly better than EM for HMM, as the former produce states (clusters) with a higher degree of similarity, while requiring a similar training time.

6.2 Queue management

Queue management is a mechanism in the intermediate nodes of the network that complements TCP congestion control mechanisms. Specifically, queue management is in charge of dropping packets when appropriate, to control the queue length in the intermediate nodes [ 72 ]. The conventional technique for queue management is Drop-tail, which adopts the First-In-First-Out (FIFO) scheme to handle packets that enter a queue. In Drop-tail, each queue establishes a maximum length for accepting incoming packets. When the queue becomes full, the subsequent incoming packets are dropped until the queue becomes available again. However, the combination of Drop-Tail with the TCP congestion avoidance mechanism leads to TCP synchronization that may cause serious problems [ 68 , 72 ]: (i) inefficient link utilization and excessive packet loss due to a simultaneous decrease in TCP rate, (ii) unacceptable queuing delay due to a continuous full queue state; and (iii) TCP unfairness due to a few connections that monopolize the queue space (i.e. lock-out phenomenon).

Active Queue Management (AQM) is a proactive approach that mitigates the limitations of Drop-tail by dropping packets (or marking them for drop) before a queue becomes full [ 72 ]. This allows end-systems to respond to congestion before the queue overflows and intermediate nodes to manage packet drops. Random Early Detection (RED) [ 148 ] is the earliest and most well known AQM scheme. RED continually adjusts a dropping (marking) probability according to a predicted congestion level. This congestion level is based on a pre-defined threshold and a computed average queue length. However, RED suffers from poor responsiveness, fails to stabilize the queue length to a target value, and its performance (w.r.t. link utilization and packet drop) greatly depends on its parameter tuning, which has not been successfully addressed [ 269 ]. Many AQM schemes have been proposed to improve these shortcomings [ 4 ]. However, they rely on fixed parameters that are insensitive to the time-varying and nonlinear network conditions.

For this reason, significant research has been conducted to apply ML for building an effective and reliable AQM scheme, which is capable of intelligently managing the queue length and tuning its parameters based on network and traffic conditions. The proposals presented in this survey conduct online training in the intermediate nodes of the network and evaluate their solutions by simulating diverse network topologies, mostly in ns2, using characteristics of wired networks. As highlighted in Table  11 , these AQM schemes apply different supervised techniques for TSF [ 160 , 179 , 212 , 498 ] and reinforcement-based methods for deducing the increment in the packet drop probability [ 298 , 427 , 428 , 485 , 499 ]. It is important to note that in this section we use the term increment to refer to a small positive or negative change in the value of the packet drop probability. The accuracy results depict the quality of the ML technique, for either correctly predicting future time series values or stabilizing the queue length. In addition, the computational complexity of these AQM schemes depend on the learning algorithm employed and the elements that constitute the ML component. For example, the NN structure and its complementing components. In the following, we discuss these ML-based AQM schemes.

PAQM [ 160 ], to the best of our knowledge, is the first approach using ML for improving AQM. Specifically, PAQM used OLS on time series of traffic samples (in bytes) for predicting future traffic volume. Based on such predictions, PAQM dynamically adjusted the packet dropping probability. The proposed OLS method relies on the normalized least mean square (NLMS) algorithm to calculate the linear minimum mean square error (LMMSE). Through simulations, the authors demonstrated that their linear predictor achieves a good accuracy, enabling PAQM to enhance the stability of the queue length when compared to RED-based schemes. Therefore, PAQM is capable of providing high link utilization while incurring low packet loss. Similarly, APACE [ 212 ] configure the packet dropping probability by using a similar NLMS-based OLS on time series of queue lengths to predict the current queue length. Simulations show that APACE is comparable to PAQM in terms of prediction accuracy and queue stability, while providing better link utilization with lower packet loss and delay under multiple bottleneck links. However, these NLMS-based predictors have a high computational overhead that is unjustified in comparison to a simpler predictor based on, for instance, a low pass filter.

To address these shortcomings, α _SNFAQM [ 498 ] was proposed to predict future traffic volume by applying the BP algorithm to train a neuro-fuzzy hybrid model using NN and fuzzy logic, called α _SNF. This α _SNF predictor uses time series of traffic samples and the predicted traffic volume as features. Then, α _SNFAQM leverage the predicted traffic volume and the instantaneous queue length to classify the network congestion as either severe or light. On this basis, α _SNFAQM decides to either drop all packets, drop packets with probability, or drop none. Simulations demonstrate that the α _SNF predictor slightly exceeds the accuracy of the NMLS-based predictor and incurs lower computational overhead. Furthermore, α _SNFAQM achieves smaller and more stable queue length than PAQM and APACE, while providing comparable goodput. However, α _SNFAQM produce more packet drops in order to notify the congestion earlier.

Similarly, to keep a low computational overhead, NN-RED [ 179 ] apply an SLP-NN on time series of queue length to predict a future queue length. The predicted queue length is compared to a threshold to decide if packet dropping is needed for preventing severe congestion. The SLP-NN is trained using the least mean square (LMS) algorithm ( a.k.a. , delta-rule), which is marginally less complex than NLMS. Basic simulations exhibit that NN-RED outperforms RED and Drop-tail in terms of queuing delay, dropped packets, and queue stability. However, this work lacks comparison of NN-RED with similar approaches, such as PAQM, APACE, and α _SNFAQM, in terms of performance and computational overhead.

On the other hand, DEEP BLUE [ 298 ] focus on addressing the limitations of BLUE [ 144 ], an AQM scheme proposed for improving RED. BLUE suffers from inaccurate parameter setting and is highly dependent on its parameters. DEEP BLUE addresses these problems by introducing a fuzzy Q-learning (FQL) approach that learns to select the appropriate increment ( actions ) for achieving the optimal packet drop probability. The features for inferring the FQL states are the current queue length and packet drop probability. Whereas, the reward signal adopts a linear combination of the throughput and queuing delay. The authors use the OPNET simulator to show that DEEP BLUE improves BLUE in terms of queue stabilization and dropping policy. In addition, the authors mention that DEEP BLUE only generates a slight surplus of storage and computational overhead over BLUE, though no evaluation results are reported.

Other NN-based AQM schemes adopt an RL approach for deciding the proper increment of the packet drop probability. Neuron PID [ 428 ] uses a Proportional-Integral-Derivative (PID) controller that incorporates an SLP-NN to tune the controller parameters. Specifically, the SLP-NN receives three terms from the PID component and updates their weights by applying the associative Hebbian learning. The three terms of this PID-based SLP-NN (PIDNN) are computed from the queue length error, which is the difference between the target and current queue lengths. The latter represents the reward signal of the PID control loop. It is important to note that the PID component includes a transfer function that increases the computational overhead of the PIDNN, when compared to a simple SLP-NN.

AN-AQM [ 427 ] extends Neuron PID by including another PID component. Therefore, the SLP-NN of AN-AQM receives three terms more from the second PID component for updating their weights. The three terms of the second PID component are generated from the sending rate error, which is the mismatch between the bottleneck link capacity and the queue input rate. The latter serves as the reward signal for the PID control loop. This modification improves the performance of the PIDNN in more realistic scenarios. However, it incurs a higher computational overhead, due to an additional PID transfer function and the increase in the number of input neurons. Similarly, FAPIDNN [ 485 ] adopts a fuzzy controller to dynamically tune the learning rate of a PIDNN. As in Neuron PID, FAPIDNN includes only one PID component to calculate the three terms from the queue length error. However, the fuzzy controller of FAPIDNN also adds computational complexity. Alternatively, NRL [ 499 ] directly uses an SLP-NN—without a PID or fuzzy component—that relies on a reward function to update the learning parameters. This reward function is computed from the queue length error and the sending rate error.

Li et al. [ 269 ] carry out extensive simulations on ns-2 to perform a comparative evaluation of the above NN-based AQM schemes (i.e. Neuron PID, AN-AQM, FAPIDNN, and NRL) and AQM schemes based on RED and Proportional-Integral (PI) controllers. For a single bottleneck link, the results demonstrate that the NN-based schemes outperform the RED/PI schemes in terms of queue length accuracy (QL Acc ) and queue length jitter (QL Jit ), under different network settings. Where, QL Acc is the difference between the average queue length and a target value, while QL Jit is the standard deviation of the average queue length. However, the NN-based schemes result in a higher packet drop than PI schemes. For multiple bottleneck links, one of the PI schemes (i.e. IAPI [ 429 ]) present better QL Acc and packet drop, yet producing higher QL Jit . When comparing only NN-based schemes, FAPIDNN provides the best QL Acc , while Neuron PID has the least QL Jit and packet drop. Nevertheless, AN-AQM is superior in these performance metrics for realistic scenarios involving UDP traffic noise.

6.3 Congestion window update

CWND is one of the TCP per-connection state variables that limits the amount of data a sender can transmit before receiving an ACK. The other state variable is the Receiver Window (RWND), which is a limit advertised by a receiver to a sender for communicating the amount of data it can receive. The TCP congestion control mechanisms use the minimum between these state variables to manage the amount of data injected into the network [ 13 ]. However, TCP was designed based on specific network conditions and assumes all losses as congestion (cf., Section 6.1 ). Therefore, TCP in wireless lossy links unnecessarily lowers its rate by reducing CWND at each packet loss, negatively affecting the end-to-end performance. Furthermore, the CWND update mechanism of TCP is not suitable for the diverse characteristics of different network technologies [ 30 , 122 ]. For example, networks with a high Bandwidth-Delay Product (BDP), such as satellite networks, require a more aggressive CWND increase. Whereas, networks with a low BDP, such as Wireless Ad hoc Networks (WANET), call for a more conservative CWND increase.

The challenge of properly updating CWND in resource-constrained wireless networks, like WANET and IoT, is difficult. This is due to their limited bandwidth, processing, and battery power, and their dynamic network conditions [ 271 , 380 ]. In fact, the deterministic nature of TCP is more prone to cause higher contention losses and CWND synchronization problems in WANET, due to node mobility that continuously modifies the wireless multi-hop paths [ 29 , 379 ]. Several TCP variations, such as TCP-Vegas and TCP-Westwood, have been proposed to overcome these shortcomings. However, the fixed rule strategies used by such TCP variations are inadequate for adapting CWND to the rapidly changing wireless environment. For example, TCP-Vegas fails to fully utilize the available bandwidth in WANETs, as its RTT-based rate estimate is incorrect under unstable network conditions [ 219 ]. Furthermore, methods for improving TCP-Vegas (e.g. Vegas-W [ 119 ]) are still insufficient to account for such variability, as their operation relies on the past network conditions rather than present or future.

As summarized in Table  12 , this survey reviews several approaches based on RL that have been proposed to cope with the problems of properly updating CWND (or sending rate) according to the network conditions. Some of these approaches are particularly designed for resource-constrained networks, including WANETs [ 29 , 219 , 379 , 380 ] and IoT [ 271 ], while others address a wider range of network architectures [ 30 , 122 , 477 ], such as satellite, cellular, and data center networks. Unless otherwise stated, the RL component conducts online training in the end-systems of the network to decide the increment for updating CWND. Although some approaches may apply the same RL technique, they differ in either the defined action-set (i.e. finite or continuous) or the utilization of a function approximation.

The evaluation of these RL-based approaches rely on synthetic data generated from multiple network topologies simulated in tools, such as GloMoSim, ns-2, and ns-3. A couple of these approaches [ 29 , 122 ] also include experimental evaluation. The performance results show the improvement ratio of each RL-based approach against the best TCP implementation baseline. For example, if an approach is compared to TCP-Reno and TCP-NewReno, we present the improvement over the latter, as it is an enhancement over the former. It is important to note that an optimal CWND update reduces the number of packets lost and delay, and increases the throughput and fairness. Therefore, the selected improvement metrics allow to measure the quality of the RL component for deciding the best set of actions to update CWND.

To the best of our knowledge, TCP-FALA [ 380 ] is the first RL-based TCP variant that focuses on CWND adaptation in wireless networks, particularly in WANETs. TCP-FALA introduces a CWND update mechanism that applies FALA to learn the congestion state of the network. On receipt of a packet, it computes five states using IATs of ACKs, and distinguishes DUPACKs to compute the states in a different way. Each state corresponds to a single action that defines the increment for updating CWND. The probabilities for each possible action are continually updated, which are used by TCP-FALA for stochastically selecting the action to be executed. Such stochastic decision facilitates in adapting to changing network conditions and prevents CWND synchronization problem. Simulations in GloMoSim demonstrate that TCP-FALA experiences lower packet loss and higher throughput than standard TCP-Reno in different network conditions. However, the limited size of the action-set makes difficult mapping the range of responses provided by the network to the appropriate actions. In addition, WANETs require a much finer update of the CWND due to their constrained bandwidth.

To overcome this limitation, Learning-TCP [ 29 , 379 ] extends TCP-FALA by employing CALA for enabling a finer and more flexible CWND update. Instead of separately calculating probabilities for each action, Learning-TCP continually updates an action probability distribution, which follows a normal distribution and requires less time to compute. Similar to TCP-FALA, Learning-TCP uses IATs of ACKs for computing the states, though without distinguishing DUPACKs and reducing the number of states to two. Several simulations in ns-2 and GloMoSim show that both Learning-TCP and TCP-FALA outperform standard TCP-NewReno in terms of packet loss, goodput, and fairness. Furthermore, the simulations demonstrate that Learning-TCP is superior to TCP-FALA and TCP-FeW [ 329 ] (a non-ML TCP variant enhanced for WANETs) with respect to these performance metrics. Whereas, TCP-FALA only achieves better fairness than TCP-FeW. The authors also provide experimental results that are consistent with the simulations.

TCP-GVegas [ 219 ] also focuses on updating CWND in WANETs. It improves TCP-Vegas by combining a grey model and a Q-learning model. The grey model predicts the real throughput of the next stage, while the Q-learning model adapts CWND to network changes. This Q-learning model uses the three stages of CWND changes (defined by TCP-Vegas) as the states and the throughput as the reward. The state is determined from CWND, RTT, and actual and predicted throughput. The action-set is continuous and limited by a range computed from RTT, throughput, and a pre-defined span factor. TCP-GVegas adopts an ε -greedy strategy for selecting the optimal action that maximizes the quality of the state-action pair. Simulations in ns-2 reveal that TCP-GVegas outperforms TCP-NewReno and TCP-Vegas in terms of throughput and delay for different wireless topologies and varying network conditions. However, TCP-GVegas has higher computational and storage overhead compared to standard TCP and TCP-Vegas. In fact, TCP-GVegas even has a higher computational and storage overhead than TCP-FALA and Learning-TCP, though a more thorough performance evaluation is required to determine the trade-off between these RL-based TCP variants.

In the similar context of resource constrained networks, TCPLearning [ 271 ] apply Q-learning for updating CWND in IoT networks. This Q-learning model computes the states by using a 10-interval discretization of each of the following four features: IAT of ACKs, IAT of packets sent, RTT, and Slow Start Threshold (SSThresh). TCPLearning defines a finite action-set that provides five increments for updating CWND and a selection strategy based on ε -greedy. The reward for each state-action pair is calculated from the throughput and RTT. To cope with the memory restrictions of IoT devices, the authors use two function approximation methods: tile coding [ 435 ] and Fuzzy Kanerva (FK) [ 481 ]. The latter significantly reduces the memory requirements, hence, is incorporated in a modification of TCPLearning, called FK-TCPLearning. Specifically, FK-TCPLearning with a set of 100 prototypes, needs only 1.2 % (2.4 K B ) of the memory used by TCPLearning based on pure Q-learning (200 K B ), for storing 50,000 state-action pairs. Furthermore, basic simulations in ns-3 reveal that FK-TCPLearning improves the throughput and delay of TCP-NewReno, while being marginally inferior to TCPLearning.

The approaches above are specifically designed for resource constrained networks, hence, restricting their applicability. For example, TCP-FALA and Learning-TCP estimate the congestion state from IATs of ACKs, which are prone to fluctuations in single-hop-wireless networks with high and moderate BDP, such as satellite networks, cellular networks, and WLAN. UL-TCP [ 30 ] address this gap by modifying Learning-TCP to compute the two congestion states from three different network features: RTT, throughput, and CWND at retransmission timeout (RTO). Simulations of single-hop-wireless networks in ns-2 show that UL-TCP achieves significantly better packet loss and fairness than TCP-NewReno and TCP-ATL [ 10 ] (a non-ML TCP variant designed for single-hop-wireless networks). However, UL-TCP is slightly inferior in terms of goodput than TCP-ATL. It is important to note that, unlike UL-TCP, TCP-ATL requires additional implementation in the intermediate nodes of the network. For multi-hop-wireless networks (i.e. WANETs), UL-TCP is compared to TCP-NewReno and TCP-FALA, and exhibit similar results to Learning-TCP. However, UL-TCP is slightly more complex than Learning-TCP due to the storage and usage of more parameters for computing the states.

Remy [ 477 ] and PCC [ 122 ] went further by introducing congestion control mechanisms that learn to operate in multiple network architectures. Remy designed an RL-based algorithm that is trained offline under many simulated network samples. It aims to find the best rule map (i.e. RemyCC) between the network state and the CWND updating actions to optimize a specific objective function. The simulated samples are constructed based on network assumptions (e.g. number of senders, link speeds, traffic model) given at design time—along with the objective function—as prior knowledge to Remy. The generated RemyCC is deployed in the target network without further learning to update CWND according to the current network state and the rule map. Several tests on simulated network topologies, such as cellular and data center networks, reveal that most of the generated RemyCCs provide a better balance between throughput and delay, in comparison to the standard TCP and its many enhanced variants, including TCP-Vegas, TCP-Cubic [ 174 ], and TCP-Cubic over Stochastic Fair Queuing [ 304 ] with Controlled-Delay AQM [ 340 ] (SFQ-CD). However, if the target network violates the prior assumptions or if the simulated samples incompletely consider the parameters of the target network, the performance of the trained RemyCC may degrade.

To tackle this uncertainty, PCC [ 122 ] avoids network assumptions and proposes an online RL-based algorithm that continually selects the increment for updating the sending rate, instead of CWND, based on a utility function. This utility function aggregates performance results (i.e. throughput, delay, and loss rate) observed for new sending rates during short periods of time. The authors emulate various network topologies, such as satellite and data center networks, on experimental testbeds for evaluating their proposal. The results demonstrate that PCC outperforms standard TCP and other variants specially designed for particular networks, such as TCP-Hybla [ 83 ] for satellite networks and TCP-SABUL [ 172 ] for inter-data center networks.

6.4 Congestion inference

Network protocols adapt their operation based on estimated network parameters that allow to infer the congestion state. For example, some multicast and multipath protocols rely on predictions of TCP throughput to adjust their behavior [ 238 , 316 ], and the TCP protocol computes the retransmission timeout based on RTT estimations [ 22 ]. However, the conventional mechanisms for estimating these network parameters remain inaccurate, primarily because the relationships between the various parameters are not clearly understood. This is the case of analytic and history-based models for predicting the TCP throughput and the Exponential Weighted Moving Average (EWMA) algorithm used by TCP for estimating RTT.

For the aforementioned reasons, several ML-based approaches have addressed the limitations of inferring the congestion in various network architectures by estimating different network parameters: throughput [ 238 , 316 , 371 ], RTT [ 22 , 128 ], and mobility [ 309 ] in TCP-based networks, table entries rate in NDNs [ 230 ], and congestion level in DTNs [ 412 ]. As depicted in Table  13 , the majority of these proposals apply diverse supervised learning techniques, mostly for prediction. While, the one focused on DTN uses Q-learning for building a congestion control mechanism. The location of the final solution and the training type (i.e. online or offline) differ throughout the proposals, as well as the dataset and tools used for evaluating them. Similarly, the accuracy column shows a variety of metrics mainly due to the lack of consistency from the authors for evaluating the quality of their ML-based components for correctly predicting a specific parameter.

El Khayat et al. [ 238 ] apply multiple supervised learning techniques for predicting the TCP throughput in a wired network. From the different features used in the learning phase for building the ML models, the authors find that the Timeout Loss Rate (TLR) adds significant improvement in prediction accuracy. This is because TLR helps to discriminate two types of losses: triple duplicates and timeout. The ML models are trained and tested using synthetic data collected from ns-2 simulations. MLP-NN achieve the lowest accuracy error, followed by MART with 25 trees. Both methods are recommended by the authors when learning from the full feature set. In addition, the authors demonstrate that all their ML-based predictors are more TCP-friendly than conventional throughput analytic models, like SQRT [ 300 ] and PFTK [ 343 ], which are generally used by multicast and real-time protocols.

Mirza et al. [ 316 ] also focus on throughput prediction, however, for multi-path wired networks, such as multi-homed and wide-area overlay networks. The authors train and test a supervised time series regression model using SVR on synthetic data collected from two distinct testbeds: authors laboratory deployment and the Resilient Overlay Networks (RON) project [ 332 ]. They also include a confidence interval estimator that triggers retraining if the predicted throughput falls outside the interval. The results reveal that the SVR model yields more predictions with a relative prediction error (RPE) of at least 10% than a simple history-based predictor. Moreover, the evaluation show that using active measurement tools for computing the model features, provide predictions as accurate as relying on ideal passive measurements. This is important because it is difficult to correctly collect passive measurements in real wide-area paths.

Another approach for predicting TCP throughput is proposed by Quer et al. [ 371 ]. Their ML solution resides in the access point of a WLAN, instead of the end-systems as in the above approaches. The authors apply BN for constructing a DAG that contains the probabilistic structure between the multiple features that allow predicting the throughput. A simplified probabilistic model is derived from the constructed DAG by using a subset of the features for inference. The training and testing of the BN model rely on synthetic data collected from ns-3 simulations. The results demonstrate that for a good amount of training samples (≥ 1000), this model provides a low prediction error. Furthermore, the authors exhibit that the prediction based only on the number of MAC Transmissions (MAC-TX) achieves a comparable error—and sometimes even lower—than using the full set of features.

A similar BN-based approach is proposed by Mezzavilla et al. [ 309 ], for classifying the mobility of the nodes in WANETs as either static or mobile. The DAG is built from a fewer number of features, therefore, reducing its number of vertices and edges. As in [ 371 ], the authors derive a simplified probabilistic model from the DAG by using two features for inference: MAC-TX and MAC Retransmissions (MAC-RTX). The results reveal that the simplified BN model achieves a good accuracy for classifying mobility in WANETs, when varying the radio propagation stability. This mobility classifier was used to implement a TCP variant that outperforms TCP-NewReno in terms of throughput and outage probability.

Fixed-Share Experts [ 22 ] and SENSE [ 128 ] concentrate on a different challenge, i.e. predicting RTT for estimating the congestion state at the end-systems of the network. Both are based on the WMA ensemble method and conduct online training for TFS. It is important to note that WMA uses the term experts to refer to algorithms or hypotheses that form the ensemble model. Particularly, SENSE extends Fixed-Share Experts by adding: (i) EWMA equations with different weights as experts, (ii) a meta-learning step for modifying experts penalty regarding recent past history, and (iii) a level-shift for adapting to sudden changes by restarting parameter learning. The two RTT predictors are trained and tested on real data collected from file transfers in a hybrid wired-wireless network. Only Fixed-Share Experts is evaluated on synthetic data collected from QualNet simulations. The results on real data show that SENSE achieves a lower prediction error—measured in ticks of 4 m s —than Fixed-Share Experts for predicting RTT. For synthetic data, Fixed-Share Experts provide a lower prediction error in comparison to real data even with a higher tick value of 500 m s . In terms of complexity, it is important to mention that SENSE requires more computational resources than Fixed-Share Experts, due to the adoption of EWMA as experts and the meta-learning step.

Finally, other works apply ML techniques to build novel congestion control mechanisms for non-TCP-based networks. ACCPndn [ 230 ] propose to include a TLFN into a controller node for predicting the rate of entries arriving to the Pending Interest Table (PIT) of NDN routers. The controller node gathers historical PIT entries rate from contributing routers and sends the prediction back to the corresponding router. The defined TLFN consists of two hidden layers between the input and output layers. The number of neurons for each layer corresponds to the number of contributing routers. To improve the parameter tuning of the TLFN trained using BP, the authors introduce a hybrid training algorithm that combines two optimization methods: PSO and GA. Various tests on synthetic data collected from ns-2 simulations demonstrate that the TLFN trained with PSO-GA provides a lower prediction error than the TLFN with other training algorithms, such as GA-PSO, GA or PSO only, and BP. Additionally, ACCPndn incorporate fuzzy decision-making in each router that uses the predicted PIT entries rate to proactively respond to network congestion. This congestion control mechanism considerably outperforms other NDN congestion control protocols, such as NACK [ 487 ] and HoBHIS [ 392 ], in terms of packet drop and link utilization.

Smart-DTN-CC [ 412 ] is another congestion control mechanism based on ML for DTN nodes. In particular, Smart-DTN-CC applies Q-learning for adjusting the congestion control behavior to the operating dynamics of the environment. Four congestion states are computed from information locally available at each node. The actions are selected from a finite set of 12 actions based on Boltzmann or Win-or-Learn Fast (WoLF) strategies. The reward of the state-action pairs depend on the transition between the states caused by a specific action. Simulations in the Opportunistic Network Environment (ONE) tool show that Smart-DTN-CC achieves higher delivery ratio and significantly lower delay than existing DTN congestion control mechanisms, such as CCC [ 265 ] and SR [ 406 ].

6.5 Summary

In the current Internet, TCP implements the most prevalent congestion control mechanism. TCP degrades the throughput of the network when packet losses are due to reasons other than congestion. Therefore, identifying the cause of packet loss can improve TCP throughput. Table  10 summarizes various solutions that have leveraged ML for classifying packet losses at the end-systems of different network technologies. In hybrid wired-wireless networks, the unsupervised EM for HMM and various supervised techniques were used to differentiate wireless losses (e.g. fading and shadowing) from congestion. In wired networks, a supervised Bayesian classifier was proposed to distinguish DUPACKs caused by reordering from the ones due to congestion. In optical networks, the unsupervised algorithm EM was employed on HMM training and clustering for classifying contention and congestion losses.

TCP variants built upon these ML-based classifiers outperform standard and diverse non-ML TCP versions (e.g. TCP-Veno and Burst-TCP). The majority of the ML-based classifiers were tested using synthetic data collected from simulations. EM-based classifiers simulate simpler topologies. Only the Bayesian classifier was evaluated on real data, though the small number of losses in the data negatively affects the results. In addition, all the classifiers perform binary classification of packet losses. Therefore, it would be interesting to explore an ML-based classifier that distinguishes between multiple causes of packet loss.

The other well-known congestion control mechanism is queue management. Several variations of AQM schemes (e.g. RED) have been proposed to overcome the TCP synchronization problem. However, these schemes suffer from poor responsiveness to time-varying and nonlinear network conditions. Therefore, different AQM schemes have integrated ML for better queue length stabilization and parameter tuning in changing network traffic conditions. As depicted in Table  11 , half of the ML-based AQM schemes apply supervised OLS and NN for predicting future time series values of either traffic volume or queue length. The predicted values are used to dynamically adjust the packet drop probability. The other half of ML-based schemes employ reinforcement-based methods for deducing the increment in the packet drop probability.

All these ML-based AQM schemes improve and speed up the queue stabilization over non-ML AQM schemes for varying network conditions. However, the evaluation was based only on simulations of wired networks, though including single and multiple bottleneck topologies. Additionally, none of the ML-based AQM schemes have considered providing a fair link share among senders and have not been tested under coexisting legacy schemes in other bottleneck links in the network.

Another shortcoming in TCP is that its CWND update mechanism does not fit the distinct characteristics of different networks. For example, while satellite networks demand an aggressive CWND increase, WANETs perform better under a conservative approach. Table  12 outlines several solutions that have used RL techniques to appropriately update CWND according to the network conditions. Half of these ML-based approaches apply FALA, CALA, or Q-learning (including the FK function approximation) on resource-constrained networks (i.e. WANET and IoT). Whereas, the other half either use CALA or an own RL design on a wider range of network architectures, including satellite, cellular, and data center networks.

TCP variants built upon these ML-based CWND updating mechanisms perform better in terms of throughput and delay than standard and non-ML TCP versions particularly enhanced for specific network conditions (e.g. TCP-FeW and TCP-Cubic). Some of the ML-based TCP also show improvements in packet loss and fairness. The evaluation has been only based on synthetic data collected from simulations and experimental testbeds. In this case, it would be interesting to explore other ML techniques rather than RL for properly updating CWND.

TCP, as well as some multicast and multipath protocols, infer the congestion state from estimated network parameters (e.g. RTT and throughput) to adapt their behavior. However, such estimation remains imprecise mainly because of the difficulty in modeling the relationships between the various parameters. As summarized in Table  13 , several solutions have leveraged ML for inferring congestion in different network architectures by estimating diverse network parameters. In the context of TCP-based networks, various supervised techniques were employed to predict the throughput in wired and WLAN networks. A supervised BN was also built to classify the node mobility in WANETs, while the ensemble WMA was used for predicting RTT in WANETs and hybrid wired-wireless networks. In the context of evolutionary network architectures, a supervised TLFN predicted the rate of entries arriving to the PIT of NDN routers, whereas Q-learning was employed in DTN nodes to select the proper action for the corresponding congestion state.

All these ML-based estimators outperform the accuracy of conventional estimation mechanisms (e.g. EWMA and history-based). The evaluation was mostly performed using synthetic data collected from simulations and laboratory testbeds. Only the WMA-based estimators collected real data to test their approaches.

As final remarks, note that the majority of the ML-based solutions rely on synthetic data to conduct the evaluation. However, synthetic data rely on simulations that may differ from real conditions. Therefore, there is a need to collect data from real networks to successfully apply and evaluate ML-based solutions. In some cases, such as in queue management, real collected data might not be enough to accomplish a realistic evaluation because the solutions impact the immediate network conditions. Therefore, recent networking technologies, like SDN and NFV, might support the evaluation in real networks. In addition, despite some ML-based solutions work on the same problem and report similar evaluation metrics, there is still the need of establishing a common set of metrics, data, and conditions that facilitate their comparison in terms of performance and complexity.

7 Resource management

Resource management in networking entails controlling the vital resources of the network, including CPU, memory, disk, switches, routers, bandwidth, AP, radio channels and its frequencies. These are leveraged collectively or independently to offer services. Naïvely, network service providers can provision a fixed amount of resources that satisfies an expected demand for a service. However, it is non-trivial to predict demand, while over and under estimation can lead to both poor utilization and loss in revenue. Therefore, a fundamental challenge in resource management is predicting demand and dynamically provisioning and reprovisioning resources, such that the network is resilient to variations in service demand. Despite the widespread application of ML for load prediction and resource management in cloud data centers [ 367 ], various challenges still prevail for different networks, including cellular networks, wireless networks and ad hoc networks. Though, there are various challenges in resource management, in this survey, we consider two broad categories, admission control and resource allocation.

Admission control is an indirect approach to resource management that does not need demand prediction. The objective in admission control is to optimize the utilization of resources by monitoring and managing the resources in the network. For example, new requests for compute and network resources are initiated for a VoIP call or connection setup. In this case, admission control dictates whether the new incoming request should be granted or rejected based on available network resources, QoS requirements of the new request and its consequence on the existing services utilizing the resources in the network. Evidently, accepting a new request generates revenue for the network service provider. However, it may degrade the QoS of existing services due to scarcity of resources and consequentially violate SLA, incurring penalties and loss in revenue. Therefore, there is an imminent trade-off between accepting new requests and maintaining or meeting QoS. Admission control addresses this challenge and aims to maximize the number of requests accepted and ser ved by the network without violating SLA.

In contrast, resource allocation is a decision problem that actively manages resources to maximize a long-term objective, such as revenue or resource utilization. The underlying challenge in resource allocation is to adapt resources for long-term benefits in the face of unpredictability. General model driven approaches for resource allocation have fallen short in keeping up with the velocity and volume of the resource requests in the network. However, resource allocation is exemplar for highlighting the advantages of ML, which can learn and manage resource provisioning in various ways.

7.1 Admission control

As shown in Table  14 , Admission control has leveraged ML extensively in a variety of networks, including ATM networks [ 95 , 189 , 190 ], wireless networks [ 8 , 36 , 359 ], cellular networks [ 66 , 67 , 281 , 372 , 458 ], ad hoc networks [ 452 ], and next generation networks [ 311 ]. To the best of our knowledge, Hiramatsu [ 189 ] was the first to propose NN based solutions controlling the admission of a service requesting resources for a basic call setup in ATM networks. He demonstrated the feasibility of NN based approaches for accepting or rejecting requests, for resilience to dynamic changes in network traffic characteristics. However, there were unrealistic underlying assumptions. First, all calls had similar traffic characteristics, that is, single bit or multi bitrate. Second, cell loss rate was the sole QoS parameter.

Later, Hiramatsu [ 190 ] overcome these limitations, by integrating admission control for calls and link capacity control in ATM networks using distributed NNs. The NNs could now handle a number of bit-rate classes with unknown characteristics and adapt to the changes in traffic characteristics of each class. Cheng and Chang [ 95 ] use a congestion-status parameter, a cell-loss probability, and three traffic parameters, including peak bitrate, average bitrate, and mean peak-rate duration, to achieve a 20% improvement over Hiramatsu. To reduce the dimensionality of the feature space, they transform peak bitrate, average bitrate, and mean peak-rate duration of a call into a unified metric.

Piamrat et al. [ 359 ] propose an admission control mechanism for wireless networks based on subjective QoE perceived by end-users. This is in contrast to leveraging quantitative parameters, such as bandwidth, loss and latency. To do so, they first choose configuration parameters, such as codec, bandwidth, loss, delay, and jitter, along with their value ranges. Then, the authors synthetically distort a number of video samples by varying the chosen parameters. These distorted video samples are evaluated by human observers who provide a mean opinion score (MOS) for each sample. The configurations and corresponding MOSs are used to construct the training and testing datasets for a Random Neural Network (RandNN), which predicts MOSs in real-time without human interaction. Though, they evaluate their admission control mechanism for user satisfaction and throughput based metrics, no accuracy or error analysis is reported for the RandNN.

Baldo et al. [ 36 ] propose a ML-based solution using MLP-NN to address the problem of user driven admission control for VoIP communications in a WLAN. In their solution, a mobile device gathers measurements on the link congestion and the service quality of past voice calls. These measurements are used to train the MLP-NN to learn the relationship between the VoIP call quality and the underlying link layer, thus inferring whether an access point can satisfactorily sustain the new VoIP call. The authors report 98.5% and 92% accuracy for offline and online learning, respectively.

On the other hand, Liu et al. [ 281 ] propose a self-learning call admission control mechanism for Code Division Multiple Access (CDMA) cellular networks that have both voice and data services. Their admission control mechanism is built atop a novel learning control architecture (e.g., adaptive critic design) that has only one controller module, namely, a critic network. The critic network is trained with an 3:6:1 MLP-NN that uses inputs such as network environment (e.g. total interference received at the base station), user behavior (e.g. call type—new or hand off call), call class (e.g. voice, data), and the action to accept or reject calls. The output is the Grade of Service (GoS) measure. The MLP-NN is retrained to adapt to changes in the admission control requirements, user behaviors and usage patterns, and the underlying network itself. Through simulation of cellular networks with two classes of services, the authors demonstrate that their admission control mechanism outperforms non-adaptive admission control mechanisms, with respect to GoS, in CDMA cellular networks.

In contrast, Bojovic et al. [ 66 ] design an ML-based radio admission control mechanism to guarantee QoS for various services, such as voice, data, video and FTP, while maximizing radio resource utilization in long term evolution (LTE) networks. In their mechanism, the MLP-NN is trained using features, such as application throughput, average packet error rate, and average size of payload. The MLP-NN is then used to predict how the admission of a new session would affect the QoS of all sessions to come. Using a LTE simulator, it is shown that MLP-NN can achieve up to 86% accurate decisions provided it has been trained over a relatively long period of time. Despite its high accuracy, a critical disadvantage of MLP-NN is over-fitting, thus it fails to generalize in the face of partial new inputs.

Vassis et al. [ 452 ] propose an adaptive and distributed admission control mechanism for variable bitrate video sessions, over ad hoc networks with heterogeneous video and HTTP traffic. Unlike previous admission control approaches that only consider the new request, this mechanism takes into account the QoS constraints of all the services in the network. The authors evaluate five different NNs, namely MLP, probabilistic RBFNN, learning vector quantization network (LVQ) –a precursor to SOM–, HNN, and SVM network. Using network throughput and packet generation rates of all nodes prior to starting each session and the average packet delays of those sessions as the training and validation data, respectively, they found probabilistic RBFNN to always converge with a success rate between 77 and 88%.

Similarly, Ahn et al. [ 8 ] propose a dynamic admission control algorithm for multimedia wireless networks based on unsupervised HNN. In this mechanism, new or hand-off connections requesting admission are only granted admission if the bandwidth of the corresponding cell is sufficient to meet the bandwidth required for their best QoS level. Otherwise, the QoS levels of the existing connections are degraded to free up some bandwidth for the new or hand-off connection requesting admission. The compromised QoS levels of existing connections and the QoS levels of the new or hand-off connections are then computed using a hardware-based HNN that permits real-time admission control. Most importantly, the HNN does not require any training, and can easily adapt to dynamic network conditions. This admission control mechanism achieves significant gains in ATM networks, in terms of minimizing the blocking and dropping probabilities and maximizing fairness in resource allocation.

Recently, Blenk et al. [ 63 ] employ RNN for admission control for the online virtual network embedding (VNE) problem. Before running a VNE algorithm to embed a virtual network request (VNR), the RNN predicts the probability whether VNR will be accepted by the VNE algorithm based on the current state of the substrate and the request. This allows the VNE algorithm to process only those requests that are accepted by the RNN, thus reducing the overall runtime and improving the system performance. The RNN is trained with new representations of substrate networks and VNRs that are based on topological and network resource features. To obtain a compact representation, the authors apply PCA on a set of feature vectors and select features that are sensitive to high load, including number of nodes, spectral radius, maximum effective eccentricity, average neighbor degree, number of eigenvalues, average path length, and number of edges. A total of 18 different RNNs are trained offline using a supervised learning algorithm and a dataset generated through simulation of two VNE algorithms, namely Shortest Distance Path and Load Balanced. These RNNs achieve accuracies between 89% and 98%, demonstrating that this admission control mechanism can learn from the historical performances of VNE algorithms.

However, a potential disadvantage of NN based systems is that the confidence of the predicted output is unknown. As a remedy, a BN can predict the probability distribution of certain network variables for better performance in admission control [ 67 , 372 ]. Specifically, Bojovic et al. [ 67 ] compare NN and BN models by applying them for admission control of calls in LTE networks. Both models are trained to learn the network behavior from the observation of the selected features. Upon arrival of an incoming VoIP call and assuming that the call is accepted, these two models are used to estimate the R-factor [ 206 ] QoS metric. A major difference between NN and BN is that NN can directly predict the value of the R-factor, while BN provides a distribution over its possible values. In NN, if the estimated R-factor is greater or smaller than a QoS threshold, the call is accepted or rejected, respectively. In contrast, the BN model accepts a call if the probability of the R-factor exceeding a threshold is greater than a probability threshold, or drops it otherwise. This gives the admission control mechanism additional flexibility to choose the probability threshold that allows to meet different system requirements by opportunistically tuning these thresholds. Through a simulation of macro cell LTE admission control scenario in ns-3, the BN model shows less FPs and FNs compared to NN.

Similarly, Quer et al. [ 372 ] develop an admission control mechanism for VoIP calls in a WLAN. They employ BN to predict the voice call quality as a function of link layer conditions in the network, including the fraction of channel time occupied by voice and background best effort traffic, estimated frame error probabilities of voice and background traffic, and R-factor representing the posteriori performance. The BN model is built upon four phases, (i) a structure learning phase to find qualitative relationships among the variables, (ii) a parameter learning phase to find quantitative relationships, (iii) the design of an inference engine to estimate the most probable value of the variable of interest, and (iv) an accuracy verification to obtain the desired level of accuracy in the estimation of the parameter of interest. The authors evaluate the BN model via ns-3 based simulation of a WLAN, having both VoIP and TCP traffic, and show an accuracy of 95%.

Besides NN and BN, the admission control problem has also been formulated as an MDP [ 311 , 458 ]. Traditionally, dynamic programming (DP) is used to solve a MDP. However, DP suffers from two limitations in the context of admission control. First, it expects the number of states in the MDP to be in polynomial order, which is seldom the case in real networks. Second, DP requires explicit state transition probabilities, which are non-trivial to determine a priori. Therefore, RL, that can handle MDP problems with very large state spaces and unknown state transition probabilities, has been successfully applied to solve MDP-based admission control problems in networking.

Mignanti et al. [ 311 ] employ Q-learning to address admission control for connections in next generation networks. In their approach, when a connection request arrives, the Q-values of accepting and rejecting the request are computed. The request is accepted or rejected depending on whether the Q-value for acceptance or rejection is higher. Similarly, Q-learning has been used to allocate guard channels as part of the admission control mechanism for new calls in the LTE femtocell networks [ 458 ]. It is important to realize that allocating a guard channel for a new or hand-off call can raise the blocking probability. Therefore, Q-learning has to find the optimal policy that minimizes the cumulative blocking probability.

RL has also been leveraged for more complex problems that pertain to admission control with routing [ 295 , 446 ]. In such problems, when a request is admitted, a route has to be established such that each link in the route meets the QoS requirements of the request. Therefore, RL-based solutions discussed earlier for admission control, with only two possible actions, are infeasible for admission control with routing. Here, the action space consists of selecting a route from a predefined set of routes in the network. Tong et al. [ 446 ] formulate this problem as a semi-MDP, and leverage Q-learning to define policies for route selection, such that the revenue is maximized and QoS requirements of the requests are met. In the formulation, they consider two important classes of QoS constraints, (i) state dependent constraint (e.g. capacity constraint) that is a function of only the current state, and (ii) past dependent constraint (e.g. fairness constraint) that depends on statistics over the past history. Since a detailed specification of a network state is computationally intractable [ 446 ], they exploit statistical independence of the links in the network for developing a decentralized RL training and decision making algorithm. In this approach, each link in the network performs Q-learning locally using only the link state information, instead of network state information. The authors evaluate their approach to admission control with routing via simulation of a network with 4 nodes and 12 links. The results show significant improvement over heuristic based algorithms.

Similarly, Marbach et al. [ 295 ] use RL to construct a dynamic admission control with routing policy for new calls in integrated service networks. As the traditional DP-based models for admission control with routing are computationally intractable, Marbach et al. [ 295 ] propose an approximation architecture consisting of an MLP with internal tunable weights that can be adjusted using TD(0). However, TD(0) has a slow rate of convergence, hence the authors integrate it with decomposition approach to represent the network as a set of decoupled link processes. This allows to adopt a decentralized training and decision making, which not only significantly reduce training time, but also achieve sophisticated admission control with routing policies that are otherwise difficult to obtain via heuristics approaches.

7.2 Resource allocation

Recall that the challenge in resource allocation lies in predicting demand variability and future resource utilization. ML-based techniques can be leveraged to learn the indicators that can aid in resource allocation as summarized in Table  15 . The most suitable ML-based approach for the resource allocation decision problem is RL. The primary advantage of RL is that it can be deployed without any initial policies, and it can learn to adapt to the dynamic demands for a reactive resource allocation. For instance, Tesauro [ 442 ] use decompositional RL to allocate and reallocate data center server resources to two different workloads, a web-based time-varying transactional workload and a non-web-based batch workload. Since the impact of a resource allocation decision is Markovian, the RA problem benefits largely from an MDP-based formulation. However, the state and action space of an MDP grows exponentially and leads to the dimensionality problem.

To address this problem, the authors in [ 442 ] propose a decompositional formula of RL for composite MDPs. The decompositional RL uses a localized version of SARSA(0) algorithm to learn a local value function based on local state and local resource allocation of a request instead of global knowledge. Vengerov [ 454 ] go further in applying RL to the allocation of multiple resource types (e.g. CPU, memory, bandwidth), using fuzzy rules where some or all the fuzzy categories can overlap. Whereas, Mao et al. [ 294 ] use DNN to approximate functions in large scale RL task in order to develop a multi-resource cluster scheduler. Most recently, Pietrabissa et al. [ 361 ] propose a scalable RL based solution to the MDP problem for resource allocation using policy reduction mechanism proposed in [ 360 ] and state aggregation that combines lightly loaded states into one single state.

More specifically, Baldo et al. [ 35 ] and Bojovic et al. [ 65 ] optimize network resource allocation. Baldo et al. [ 35 ] use a supervised MLP-NN for real-time characterization of the communication performance in wireless networks and optimize resource allocation. On the other hand, Bojovic et al. [ 65 ] use MLP-NN to select the AP that will provide the best performance to a mobile user in IEEE 802.11 WLAN. In their proposals, each user collects measurements from each AP, such as signal to noise ratio (SNR), probability of failure, business ratio, average beacon delay, and number of detected stations. These metrics are used to describe different APs and train a two layer MLP-NN. The output of the MLP-NN is the downlink throughput, which is a standard performance metric used by mobile clients. The MLP-NN is trained rigorously with different configuration parameters to result in the lowest normalized RMSE (NRMSE). Finally, the MLP-NN is deployed to select the AP that will yield the optimal throughput in different scenarios and evaluated on EXTREME testbed [ 364 ]. Undoubtedly, the ML-based AP selection for network resource allocation, outperforms AP selection mechanisms based on the signal to noise ratio (SNR), the load based scheme and the beacon delay scheme, especially in dynamic environments.

Similarly, Adeel et al. [ 6 ] leverage RNN to build an intelligent LTE-Uplink system that can optimize radio resource allocation based on user requirements, surrounding environments, and equipment’s ability. In particular, their system can allocate the optimal radio parameters to serving users and suggest the acceptable transmit power to users served by adjacent cells for inter-cell-interference coordination. To analyze the performance of RNN, three learning algorithms are analyzed, namely GD, adaptive inertia weight particle swarm optimization (AIWPSO), and differential evolution (DE). One RNN is trained and validated using each of the above learning algorithms with a dataset of 6000 samples. The dataset is synthetically generated by executing multiple simulations of the LTE environment using a SEAMCAT simulator. Evaluation results show that AIWPSO outperforms the other learning algorithms, with respect to accuracy (based on MSE). However, AIWPSO’s better accuracy is achieved at the expense of longer convergence time due to extra computational complexity. Unfortunately, [ 6 ] does not evaluate the effectiveness of resource allocation for the proposed LTE system. However, the analysis of the learning algorithms can provide valuable insights in applying ML to similar networking problems.

Though, admission control and resource allocation have been studied separately, Testolin et al. [ 443 ] leverage ML to address them jointly for QoE-based video requests in wireless networks. They combine unsupervised learning, using stochastic RNN, also known as RBM, with supervised classification using a linear classifier, to estimate video quality in terms of the average Structural SIMilarity (SSIM) index. The corresponding module uses video frame size that is readily available at the network layer to control admission and resource provisioning. However, the relationship between video frame size and SSIM in non-linear and RBM extracts an abstract representation of the features that describe the video. The linear classifier maps the abstractions to the SSIM coefficients, which are leveraged to accept or reject new video requests, and to adapt resource provisioning to meet the network resource requirements. The authors report a RMSE of below 3% using videos from a pool of 38 video clips with different data rates and durations.

Virtualization of network resources through NFV and virtual networks brings forward a new dimension to the resource allocation problem, that is, provisioning virtual resources sitting on top of physical resources. To leverage the benefits of virtualization, Mijumbi et al. [ 312 ] propose a dynamic resource management approach for virtual networks (VNs) using distributed RL that dynamically and opportunistically allocates resources to virtual nodes and links. The substrate network is modeled as a decentralized system, where multiple agents use Q-learning on each substrate node and link. These agents learn the optimal policy to dynamically allocate substrate network resources to virtual nodes and links. The percentage of allocated and unused resources (e.g. queue size, bandwidth) in substrate nodes or links represent the states of Q-learning, with two explicit actions to increase or decrease the percentage of allocated resource. A biased learning policy is exploited with an initialization phase to improve the convergence rate of Q-learning. This Q-learning based action selection approach for resource allocation outperforms ε -greedy and softmax in ns-3 simulation with real Internet traffic traces. Furthermore, in comparison to static allocation, the proposed method improve the ratio of accepting VNs without affecting their QoSs.

In addition, Mijumbi et al. [ 312 ] use FNN to predict future resource requirements for each VNF component (VNFC) in a service function chain [ 313 ]. Each VNFC is modeled using a pair of supervised FNNs that learn the trend of resource requirements for the VNFC by combining historical local VNFC resource utilization information with the information collected from its neighbors. The first FNN learns the dependence of the resource requirements for each of the VNFCs, which is used by the second FNN to forecast the resource requirements for each VNFC. The predictions are leveraged to spin-up and configure new VNFCs or deallocate resources to turn off VNFCs. Evaluation based on real-time VoIP traffic traces on a virtualized IP Multimedia Subsystem (IMS) reveals a prediction accuracy of approximately 90%.

In contrast, Shi et al. [ 410 ] use BN to predict future resource reliability, the ability of a resource to ensure constant system operation without disruption, of NFV components based on historical resource usage of VNFC. The learning algorithm is triggered when an NFV component is initially allocated to resources. As time evolves, the BN is continuously trained with resource reliability responses and transition probabilities of the BN are updated, resulting in improved prediction accuracy. The predictions are leveraged in an MDP to dynamically allocate resources for VNFCs. Using WorkflowSim simulator, the authors demonstrate that the proposed method outperforms greedy methods in terms of overall cost.

7.3 Summary

As evident from Tables  14 and 15 , the ML-based resource management schemes studied in this paper can be broadly classified into two groups—supervised learning-based and RL-based. Application of unsupervised techniques in resource management is rather unexplored, with the exception of a few works. In addition, MLP-NN, though applied with a variety of parameter settings, is the most popular supervised technique, while Q-learning dominates the choice of RL-based approaches. Furthermore, other works have leveraged BN techniques, to introduce the flexibility of having a probability distribution rather than individual values produced by NN-based approaches. However, MLP-NNs offer better scalability than BN and RL, since the number of neurons in different layers of an MLP-NN can be tuned based on the problem dimension. Whereas, the number of states in RL can explode very quickly in a moderate size network. In the past, several techniques, such as decomposition, decentralization, and approximation have been used to deal with the dimensionality issue of applying RL. Recently, RL combined with deep-learning has been shown as a promising alternative [ 294 ] that can be leveraged to tackle various resource management problems in practical settings. Nonetheless, NN-based supervised approaches exhibit steady performance in terms of both accuracy and convergence.

Although the ML-based resource management schemes studied in this paper differ in terms of the feature sets, they either predict one or more QoS metrics of interest or generate an acceptance/rejection decision for an incoming request, based on a QoS estimation. The ML-based resource management approaches also exhibit a similarity regarding the network and dataset. The focus of the majority of approaches is on wireless networks, where resource contention is more profound than wired networks. Due to the lack of real-life traces, these approaches adopt different methods to simulate the network of interest and produce training and testing data. Therefore, more research is needed that can evaluate the performance of the proposed ML techniques in real networks and with real data.

8 Fault management

Fault management involves detection, isolation, and correction of an abnormal condition of a network. It requires network operators and administrators to have a thorough knowledge of the entire network, its devices and all the applications running in the network. This is an unrealistic expectation. Furthermore, recent advances in technology, such as virtualization and softwarization makes today’s network monumental in size, complexity and highly dynamic. Therefore, fault management is becoming increasingly challenging in today’s networks.

Naïve fault management is reactive and can be perceived as a cyclic process of detection, localization and mitigation of faults. First, fault detection jointly correlates vairous different network symptoms to determine whether one or more network failures or faults have occurred. For example, faults can occur due to reduced switch capacity, increased rate of packet generation for a certain application, disabled switch, and disabled links [ 37 ]. Therefore, the next step in fault management is localization of the root cause of the fault(s), which requires pinpointing the physical location of the faulty network hardware or software element, and determining the reason for the fault. And lastly, fault mitigation aims to repair or correct the network behaviour. In contrast, fault prediction is proactive and aims to prevent faults or failures in the future by predicting them and initiating mitigation procedures to minimize performance degradation. ML-based techniques have been proposed to address these challenges and promote cognitive fault management in the areas of fault prediction, detection, localization of root cause, and mitigation of the faults. In the following subsections, we describe the role ML has played in these prominent challenges for fault management.

8.1 Predicting fault

One of the fundamental challenges in fault management is fault prediction to circumvent upcoming network failures and performance degradation. One of the first ML-based approaches for detecting anomalous events in communication networks is [ 301 ]. This approach performs fault prediction by continuously learning to distinguish between normal and abnormal network behaviors and triggering diagnostic measures upon the detection of an anomaly. The continuous learning enables adaptation of the fault prediction and diagnostic measure to the network dynamics without explicit control. Although the work in [ 301 ] leverages ML in fault prediction, it does not mention any specific technique. On the other hand, BNs have been widely used in communication and cellular networks to predict faults [ 193 , 247 , 248 ].

In a BN, the normal behavior of a network and deviations from the normal are combined in the probabilistic framework to predict future faults in communication and cellular networks. However, one shortcoming of the system in [ 193 ] is that it cannot predict impact on network service deterioration. Nevertheless, a common drawback of BN is that they are not sensitive to temporal factors, and fail to model IP networks that dynamically evolve over time. For such networks, Ding et al. [ 118 ] apply dynamic BN to model both static and dynamic changes in managed entities and their dependencies. The dynamic BN model is robust in fault prediction of a network element, localization of fault and its cause and effect on network performance.

Snow et al. [ 414 ] use a NN to estimate the dependability of a 2G wireless network that is used to characterize availability, reliability, maintainability, and survivability of the network. Though the NN is trained vigorously with analytical and empirical datasets, it is limited, due to the wireless network topology having a fixed topology. This is far from reality. Furthermore, network fault predictions are tightly coupled with wireless link quality. Therefore, in Wang et al. [ 466 ], the estimation of the link quality in WSNs is postulated as a classification problem, and solved by leveraging supervised DT, rule learner, SVM, BN, and ensemble methods. The results reveal that DTs and rule learners achieve the highest accuracy and result in significant improvement in data delivery rates.

A daunting and fundamental prerequisite for fault prediction is feature selection. It is non-trivial to extract appropriate features from an enormous volume of event logs of a large scale or distributed network system [ 285 ]. Therefore, feature selection and dimensionality reduction are imperative for accurate fault prediction. Wang et al. [ 466 ] propose to employ local over the global features, as local features can be collected without costly communications in a wireless network. In contrast, Lu et al. [ 285 ] use a manifold learning technique called Supervised Hessian Locally Linear Embedding (SHLLE), to automatically extract the failure features and generate failure prediction. Based on an empirical experiment, the authors show that SHLLE outperforms the feature extraction algorithm, such as PCA, and classification methods, including k -NN and SVM.

Pellegrini et al. [ 355 ] propose an ML-based framework to predict the remaining time to failure (RTTF) of applications. Their framework is application-agnostic, that is, it is applicable to scenarios where a sufficient number of observations of the monitored phenomena can be collected in advance. The framework uses different ML techniques for building prediction models, namely linear regression, M5P, REPTree, LASSO, SVM, and Least-Square SVM, allowing network operators to select the most suitable technique based on their needs. In addition, other ML techniques can be easily integrated in the framework. The ML techniques in the framework are compared for a multi-tier e-commerce web application running on a virtualized testbed, and show that the REPTree and M5P outperform the other ML techniques for predicting RTTF. It is essential to note that the model has a high prediction error when the network system is temporally far from the occurrence of the failure. However, as the network system approaches the time of the occurrence of the failure, the number of accumulated anomalies increase and the model is able to predict the RTTF with a high accuracy.

Wang et al. [ 469 ] present a mechanism for predicting equipment failure in optical networks using ML-based techniques and TSF. The operational states of an equipment are built by leveraging physical indicators, such as input optical power, laser bias current, laser temperature offset, output optical power, environmental temperature, and unusable time. A double-exponential smoothing time series algorithm uses the historical data from time t − n to time t −1 to predict the values of the physical indicators at a future time instance t + T . This is accomplished by using a kernel function and penalty factor in an SVM to model non-linear relationships and reduce misclassification, respectively. The enhanced SVM accomplish an accuracy of 95% in predicting equipment failure based on real data from an optical network operator.

Most recently, Kumar et al. [ 255 ] explore the applicability of a wide range of regression and analytical models to predict inter-arrival time of faults in a cellular network. They analyze time-stamped faults over a period of one month from multiple base stations of a national mobile operator in USA. The authors observe that current networks barely reside in a healthy state and patterns of fault occurrence is non-linear. In a comparison of the different ML-based techniques for fault prediction, they show that DNN with autoencoders outperform other ML techniques, including autoregressive NN, linear and non-linear SVM, and exponential and linear regression. An autoencoder is a variant of NN that consists of an encoder and a decoder and used for dimensionality reduction. The autoencoder is pre-trained on the testing data and then converted into a traditional NN for computing prediction error. The pre-training of each layer in an unsupervised manner allows for better initial weights, and results in higher prediction accuracy.

8.2 Detecting fault

Unlike fault prediction, fault detection is reactive and identifies and, or classifies a failure after it has occurred, using network symptoms, performance degradation, and other parameters. Rao [ 382 ] propose fault detection for cellular networks that can detect faults at different levels, base station, sector, carrier, and channel. They employ a statistical hypothesis testing framework which combines parametric, semi-parametric, and non-parametric test statistics to model expected behavior. In parametric and semi-parametric statistical tests, a fault is detected when significant deviations from the expected activity is observed. In the case of non-parametric statistical tests, where the expected distribution is not known a-priori, the authors use a combination of empirical data and statistical correlations to conduct the hypothesis test. The test is dependent on a threshold value that is initially set through statistical analysis of traffic patterns. However, improper threshold settings may lead to high FPs and FNs. Hence, the threshold should be adapted to changing traffic patterns due to spatial, temporal, and seasonal effects. In the background, an open loop routine continuously learns and updates the threshold, in an adjustable period of time. However, the time for learning may be large for certain applications that may impact fault detection time.

Baras et al. [ 37 ] implement a reactive system to detect and localize the root cause of faults for X.25 protocol, by combining an NN with an expert system. Performance data, such as blocking of packets, queue sizes, packet throughput from all applications, utilization of links connecting subnetworks, and packet end-to-end delays, are used to train a RBFNN for various faults. The output of the NN is a fault code that represents one of the various fault scenarios. A classifier leverages the aggregated output of the NN to determine the current status of the network as normal or faulty. The detection phase is repeated until a confidence of K out of M is achieved, which activates the expert system to collect and deduce the location and cause of the fault.

Recently, Adda et al. [ 5 ] build a real-time fault detection and classification model using k -Means, Fuzzy C Means (FCM), and EM. They leverage SNMP to collect information from the routers, switches, hubs, printers and servers in an IP network of a college campus. The authors select 12 features that exhibit sensitivity to the behavior of network traffic [ 370 ], and use the traffic patterns to form clusters that represent normal traffic, link failure, server crash, broadcast storm and protocol error. Their evaluation results reveal that though k -Means and EM are relatively faster than FCM, FCM is more accurate.

Moustapha and Selmic [ 324 ] detect faulty nodes in a WSN using RNN. The nodes in the RNN hidden layers model sensor nodes in WSN, while the weights on the edges are based on confidence factors of the received signal strength indicators (RSSI). Whereas, the output of the RNN is an approximation of the operation of the WSN. Fault detection is achieved by identifying discrepancies between approximated and real WSN values. The RNN successfully detect faults, without early false alarms, for a small scale WSN with 15 sensors and synthetically introduced faults.

Recall, supervised fault detection requires models to be trained with normal and failure-prone datasets. However, Hajji [ 178 ] propose an unsupervised fault detection mechanism for fast detection of anomalies in LAN through traffic analysis. They design a parametric model of network traffic, and a method for baselining normal network operations using successive parameter identification, instead of EM. The fault detection problem is formulated as a change point problem that observes the baseline random variable and raises an alarm as soon as the variable exceeds an expected value. Experimental evaluation validate the fault detection mechanism in real-time on a real network with high detection accuracy.

Recently, Hashmi et al. [ 181 ] use different unsupervised algorithms, such as k -Means, FCM, Kohonens SOM, Local Outlier Factor, and Local Outlier Probabilities, to detect faults in a broadband service provider network that serves about 1.3 million customers. For this purpose, they analyze a real network failure log (NFL) dataset that contains status of customer complaints, along with network generated alarms affecting a particular region during a certain time. The selected data spans a duration of 12 months and contains about 1 million NFL data points from 5 service regions of the provider. The collected NFL dataset has 9 attributes, out of which 5 are selected for the analysis: (i) fault occurrence date, (ii) time of the day, (iii) geographical region, (iv) fault cause, and (v) resolution time. At first, k -Means, FCM and Kohonens SOM clustering techniques are applied to cluster the NFL dataset that is completely unlabeled. Afterwards, density-based outlier determination algorithms, such as Local Outlier Factor, and Local Outlier Probabilities, are used on the clustered data to determine the degree of anomalous behavior for every SOM node. The evaluation results show that SOM outperforms k -Means and FCM in terms of error metric. Furthermore, Local Outlier Probabilities algorithm applied on SOM is more reliable in identifying the spatio-temporal patterns linked with high fault resolution times.

8.3 Localizing the root cause of fault

The next step in fault management is to identify the root cause and physically locate the fault to initiate mitigation. This minimizes the mean time to repair in a network that does not deploy a proactive fault prediction mechanism. Chen et al. [ 91 , 92 ] use DTs and clustering to diagnose faults in large network systems. The DTs are trained using a new learning algorithm, MinEntropy [ 91 ], on datasets of failure prone network traces. To minimize convergence time and computational overhead, MinEntropy uses an early stopping criteria and follows the most suspicious path in the DT. Chen et al. [ 91 ] complement the DT with heuristics, to correlate features with the number of detected failures to aid in feature selection and fault localization. MinEntropy is validated against actual failures observed for several months on eBay [ 127 ]. For single fault cases, the algorithm identifies more than 90% of the faults with low FPRs. In contrast, Chen et al. [ 92 ] employ clustering to group the successes and failures of requests. A faulty component is detected and located by analyzing the components that are only used in the failed requests. In addition to the single fault cases, the clustering approach can also locate faults occurring due to interactions amongst multiple components, with a high accuracy and relatively low number of false positives.

Ruiz et al. [ 393 ] use a BN to localize and identify the most probable cause of two types of failures, the tight filtering and inter-channel interference, in optical networks. They discretize the continuous real-valued features of Quality of Transmission (QoT), such as received power and pre-forward error correction bit error rate (pre-FEC BER) for categories. The authors use these categories and type of failures to train the BN, which can identify the root cause of the failure at the optical layer when a service experiences excessive errors. The BN achieves high accuracy of 99.2% on synthetically generated datasets.

Similarly, Khanafer et al. [ 237 ] develop an automated diagnosis model for Universal Mobile Telecommunications System (UMTS) networks using BN. The core elements of the diagnosis model are the causes and symptoms of faults. The authors consider two types of symptoms, i.e., alarms and Key Performance Indicators (KPI). To automatically specify KPI thresholds, they investigated two different discretization methods, an unsupervised method called Percentile-based Discretization (PBD) and a supervised method called Entropy Minimization Discretization (EMD). The performances of the two discretization methods are evaluated on a semi-dynamic UMTS simulator that allows the generation of a large amount of causes and symptoms data required to construct the diagnosis model. As EMD technique outperforms PBD by a large margin in the simulation study, the authors analyze the diagnosis model consisting of BN and EMD in a real UMTS network, utilizing alarms and KPIs extracted from an operations and maintenance center. Using a 3-fold cross-validation test, the correct faults are diagnosed in 88.1% of the cases. In the remaining cases, the diagnosis is incorrect for the first cause but correct for the second, and the diagnosis model converges from around 100 data points.

Kiciman and Fox [ 241 ] propose PinPoint for fault detection and localization that requires no a priori knowledge of the faults. The models capture the runtime path of each request served by the network and delineates it as the causal path in the network. It exploits the paths to extract two low-level behaviors of the network, the path shape and the interaction of the components. Using the set of previous path shapes modeled as a Probabilistic Context-Free Grammar (PCFG), it builds a dynamic and self-adapting reference model of the network. Therefore, fault prediction is a search for anomalies against the reference model. Pinpoint uses DT with ID3 to correlate the anomaly to its probable cause in the network. The DT is converted to an equivalent set of rules by generating a rule for each path from the root of the tree to a leaf. PinPoint ranks the rules, based on the number of paths classified as anomalous, to identify the hardware and, or software components that are correlated with the failures.

Johnsson et al. [ 225 ] use discrete state-space particle filtering to determine the locations of performance degradations in packet switched networks. Their approach is based on active network measurements, probabilistic inference, and change detection in the network. They define a PMF to define the location of faulty components in the network. It is a lightweight fault detection and isolation mechanism, which is capable of automatically detecting and identifying the location of the fault in simulation of different sized tree topologies. It is imperative to realize that time to fault localization is dependent on precise position of the fault in the topology. This is because the links closer to the root are measured more often in comparison to links close to the leaf nodes. Hence, the filter is able to learn the positions close to the root. In addition, the algorithm minimizes false positives or false negatives for the chosen parameter values.

Barreto et al. [ 40 ] develop an unsupervised approach to monitor the condition of cellular networks using competitive neural algorithms, including Winner-Take-All (WTA), Frequency-Sensitive Competitive Learning (FSCL), SOM, and Neural-Gas algorithm (NGA). The model is trained on state vectors that represent the normal functioning of a CDMA2000 wireless network. Global and local normality profiles (NPs) are built from the distribution of quantization errors of the training state vectors and their components, respectively. The overall state of the cellular network is evaluated using the global NP and the local NPs are used to identify the causes of faults. Evidently, the joint use of global and local NPs is more accurate and robust than applying these methods in isolation.

8.4 Automated mitigation

Automated mitigation improves fault management by minimizing and, or eliminating human intervention, and reducing downtime. For proactive fault prediction, automated mitigation involves gathering information from the suspected network elements to help find the origin of the predicted fault. For building this information base, a fault manager may either actively poll selected network elements, or rely on passive submission of alarms from them. In both cases, actions should be selected carefully since frequent polling wastes network resources, while too many false alarms diminish the effectiveness of automated mitigation. On the other hand, in the case of reactive fault detection, automated mitigation selects a workflow for troubleshooting the fault. Therefore, the fundamental challenge in automated mitigation is to select the optimal set of actions or workflow in a stochastic environment.

He et al. [ 183 ] address this fundamental challenge for proactive fault management using a POMDP, to formulate the trade-off between monitoring, diagnosis, and mitigation. They assume partial observability, to account for the fact that some monitored observations might be missing or delayed in a communication network. They propose an RL algorithm to obtain approximate solutions to the POMDP with large number of states representing real-life networks. The authors devise a preliminary policy where the states are completely observable. Then, they fine-tune this policy by updating the belief space and transition probabilities in the real world, where the states are incompletely observed.

In contrast, for reactive fault detection, Watanabe et al. [ 470 ] propose a method for automatically extracting a workflow from unstructured trouble tickets to troubleshoot a network fault. A trouble ticket contains free-format texts that provide a complete history of troubleshooting a failure. The authors use supervised NB classifier to automatically classify the correct labels for each sentence of a trouble ticket and remove unrelated sentences. They propose an efficient algorithm to align the same actions described with different sentences by using multiple sequence alignment. Furthermore, clustering is used to find the actions that have different mitigation steps depending on the situation. This aid the operators in selecting the appropriate next action. Using real trouble tickets, obtained from an enterprise service, the authors report a precision of over 83%.

8.5 Summary

As summarized in Tables  16 , 17 and 18 , most of the ML-based fault management approaches use different supervised learning techniques that depend on training data to predict/detect/locate faults in the network. However, a common challenge faced by these techniques is the scarcity of fault data generated in a production network. While both normal and fault data are easily available for a test or simulated network, only normal data with infrequent faults are routinely available for a production network. Although injecting faults can help produce the required data [ 285 ], it is unrealistic to inject faults in the production network just for the sake of generating training data. On the other hand, synthetic data generated in a test or simulated network may not perfectly mimic the behavior of a production network. Such limitations increase the probability of ML techniques being ill-trained in an unfamiliar network setting. As a remedy, some approaches leverage unsupervised techniques that rely on detecting changes in network states instead of using labeled fault data. However, unsupervised techniques can take longer time to converge than supervised approaches, potentially missing any fault occurring before the convergence. Therefore, a potential research direction can be to explore the applicability of semi-supervised and RL-based techniques for fault management.

The ML-based fault management approaches surveyed in this paper focus on a variety of networks. Consequently, the fault scenarios studied in these approaches vary greatly as they depend both on the layer (e.g. physical, link, or IP layer) and the type (e.g. cellular, wireless, local area network) of the network. The same holds for feature set and output of these schemes, as both features and outputs depend on the fault scenario of a particular network. In addition, the evaluation settings adopted by these approaches lack uniformity. Therefore, a pairwise comparison between the evaluation results of two approaches in any of the Tables  16 , 17 and 18 may be misleading. Nonetheless, it is clear that ML techniques can aid the cumbersome and human centric fault management process, by either predicting faults in advance, or narrowing down the cause or location of the fault that could not be avoided in the first place.

9 QoS and QoE management

The knowledge about the impact of network performance on user experience is crucial, as it determines the success, degradation or failure of a service. User experience assessment has attracted a lot of attention. In early works, there was no differentiation between user experience and network QoS. User experience was then measured in terms of network parameters (e.g. bandwidth, packet loss rate, delay, jitter), and application parameters, such as bitrate for multimedia services. While monitoring and controlling QoS parameters is essential for delivering high service quality, it is more crucial, especially for service providers, to evaluate service quality from the user’s perspective.

User QoE assessment is complex as individual experience depends on individual expectation and perception. Both are subjective in nature, and hard to quantify and measure. QoE assessment methods went through different stages this last decade, from subjective testing to engagement measurement through objective quality modeling. Subjective testing, where users are asked to rate or assign opinions scores averaged into a mean opinion score (MOS), has been and is still widely used. Subjective testing is simple and easy to implement, and the MOS metric is easy to compute. However because one cannot force users to rate a service and rate it objectively, MOS scores can be unfair and biased, and are subjected to outliers. Objective quality models, such as the video quality metric (VQM) [ 362 ], the perceptual evaluation of speech quality (PESQ) metric [ 386 ] and the E-model [ 51 ] for voice and video services, were proposed to objectively assess service quality by human beings and infer more “fair” and unbiased MOS. Full-reference (FR) quality models, like PESQ and VQM, compute quality distortion by comparing the original signal against the received one. They are as such accurate, but at the expense of a high computational effort. On the contrary, no-reference (NR) models like E-model try to assess the quality of a distorted signal without any reference to the original signal. They are more efficient to compute, however they may be less accurate. More recently, measurable user engagement metrics, such as service time and probability of return, have emerged from data-driven QoE analysis. Such metrics are found to draw more directly the impact of user quality perception to content providers; business objectives.

Statistical and ML techniques have been found useful in linking QoE to network- and application-level QoS, and understanding the impact of the latter on the former. Linear and non-linear regression (e.g. exponential, logarithmic, power regression) was used to quantify the individual and collective impact of network- and application- level QoS parameters (e.g. packet loss ratio, delay, throughput, round-trip time, video bitrate, frame rate, etc.) on the user’s QoE. In the literature, simple-regression models with a single feature are most dominant [ 145 , 240 , 383 , 408 ], although the collective impact of different QoS parameters was also considered [ 23 , 132 ].

Simple regression:

In [ 408 ], Shaikh et al. study existing correlation between different network-level QoS parameters and MOS in the context of a web surfing. They show that a correlation does exist and that among 3 forms of regression (linear, exponential, and logarithmic), linear regression renders the best correlation coefficient between QoE and packet loss rate, exponential regression captures the correlation between QoE and file download time with highest accuracy, whereas logarithmic regression is the best fit for linking QoE to throughput.

Reichl et al. [ 383 ], in alignment with the Weber-Fechner law from the field of psychophysics, use logarithmic regression to quantify the correlation between available bandwidth and mobile broadband service users’ MOS.

In [ 145 ], Fiedler et al. test the IQX hypothesis according to which QoE and QoS parameters are connected through an exponential relationship. Their experiment validates the IQX hypothesis for VoIP services, where PESQ-generated MOS is expressed as a function of packet loss, and reordering ratio caused by jitter. For web surfing, exponential mappings are shown to outperform a previously published logarithmic function.

Steven’s power law from the field of psychophysics, according to which there is a power correlation between the magnitude of a physical stimulus and the intensity or strength that people feel, was applied by Khorsandroo et al. [ 239 , 240 ] to find a power mapping function between MOS and packet loss ratio. A comparative study shows that the proposed power correlation is outperformed by the logarithmic correlation from [ 383 ].

Multi-parameter regression:

In order to gasp the impact of the global network condition on the QoE, Elkotob et al. [ 132 ] propose to map MOS to a set of QoS parameters (e.g. packet loss rate, frame rate, bandwidth, round trip time and jitter) as opposed to a single one. This idea was further promoted by Aroussi et al. [ 23 ] who propose a generic exponential correlation model between QoE and several QoS parameters based on the IQX hypothesis.

More complex regression and classification models based on supervised and unsupervised ML techniques (including deep learning) were also proposed and tested against real-life and trace-driven datasets. We report below on the characteristics of surveyed models and their performance in terms of accuracy, generally measured in terms of MSRE, and linearity, generally measured in terms of Pearson correlation coefficient (PCC), all summarized in Tables  19 and 20 .

9.1 QoE/QoS correlation with supervised ML

In [ 235 , 236 ], Khan et al. propose an Adaptive Neural Fuzzy Inference System (ANFIS)-based model to predict streamed video quality in terms of MOS. They also investigate the impact of QoS on end-to-end video quality for H.264 encoded video, and in particular the impact of radio link loss models in UMTS networks. A combination of physical and application layer parameters is used to train both models. Simulation results show that both models give good prediction accuracy. However, the authors conclude that the choice of parameters is crucial in achieving good performance. The proposed models in this paper need to be validated by more subjective testing. Other works like [ 501 ] have also used the ANFIS approach to identify the causal relationship between the QoS parameters that affect the QoE and the overall perceived QoE.

MLP-NNs are also reported to efficiently estimate the QoE by Machado et al. [ 287 ], who adopt a methodology that is similar to Khan et al. [ 235 ]. In this work, QoE is estimated by applying an MLP over network-related features (delay, jitter, packet loss, etc.) as well as video-related features (type of video, e.g. news, football, etc.). Different MLP models are generated for different program-generated QoE metrics, including Peak-Signal-to-Noise-Ratio (PSNR), MOS, Structural SIMilarity (SSIM) [ 468 ], and VQM. A synthetic video streaming dataset of 565 data points is created with EvalVid integrated to NS-2, and the models are trained over 70% of the database for parameter fine-tuning. It is observed that different QoE metrics can lead to very different model parameters. For instance, for the estimated MOS metric, best results are achieved by a single hidden-layer MLP with 10 neurons trained over 2700 epochs. Whereas for SSIM, 2 hidden layers with, respectively, 12 and 24 neurons trained over 1800 epochs are needed to achieve similar results. With a MSE of ≈0.01, the MOS-MLP model outperforms the other models. Nevertheless, with appropriate configuration all the models are able to predict the QoE with very high accuracy.

In [ 328 ], Mushtaq et al. apply six ML classifiers to model QoE/QoS correlation, namely NB, SVM, k -NN, DT, RF and NN. A dataset is generated from a controlled network environment where streamed video traffic flows through a network emulator and different delay, jitter, and packet loss ratio are applied. Opinion scores are collected from a panel of viewers and MOS are calculated. ML models are fed with nine features related to the viewers, network condition and the video itself, namely, viewer gender, frequency of viewing, interest, delay, jitter, loss, conditional loss, motion complexity and resolution. A 4-fold cross-validation is performed to estimate the performance of the models. Results show that DT and RF perform slightly better than the other models with a mean absolute error of 0.126 and 0.136 respectively, and a TPR of 74% and 74.8 % respectively. The parameters of the models are not disclosed, and neither is the significance of the selected features in particular the viewer-related ones, whose usefulness and practicality in real-life deployment are questionable.

In [ 89 ] Charonyktakis et al. develop a modular user-centric algorithm MLQoE based on supervised learning to correlate the QoE and network QoS metrics for VoIP services. The algorithm is modular in that it trains several supervised learning models based on SVR, single hidden layer MLP-NN, DT, and GNB, and after cross-validation, it selects the most accurate model. The algorithm is user-centric in that a model is generated for each user, which makes it computationally costly and time consuming. 3 datasets are generated synthetically with calls established in 3 different testbeds under different network conditions: during handover (dataset 1), in a network with heavy UDP traffic (dataset 2), in a network with heavy TCP traffic (dataset 3). OMNET++ and a VoIP tool are used in this matter. The QoE of the received calls are assessed through both subjective testing (user-generated MOS) and objective measurement (PESQ and E-model). The no-reference ML models are trained with 10 network metrics (including average delay, packet loss, average jitter, and more) to output predicted MOS. The accuracy of the MLQoE model in predicting MOS and the accuracies of pure SVR, NN, DT and GNB models are further compared against the full-reference PSEQ’s, the no-reference E-model’s, as well as the predictive accuracies of the single-feature simple-regression WFL [ 383 ] and IQX [ 145 ] models. Experiments show that, in terms of mean absolute error MAE, the supervised learning models generally outperform E-model and even the full-reference PESQ, only one exception is observed with dataset 2. It also shows that there is no single ML model that outperforms all others; while the SVR model has the lowest MAE with dataset 1 (0.66), DT achieves the best result with dataset 2 (0.55) and GBN with dataset 3 (0.43). MLQoE further outperforms the WFL-model and the IQX-model with a MAE improvement of 18 ∼ 42 % . Indeed this motivates the need for a modular ML-based QoE prediction algorithm. However, further research could be pursued to study the correlation between the performance of the different ML-models and the way the QoS parameters evolve in each of the 3 datasets.

Another subset of ML techniques are considered by Demirbilek et al. [ 114 ] and used to develop no-reference models to predict QoE for audiovisual services. These techniques include: decision tree ensemble methods (RF and BG), and deep learning (DNN). Genetic programming (GP) is also considered and compared against the ML techniques. All models are trained and validated through 4 ∼ 10-fold cross-validation on the INRS dataset [ 113 ]. The dataset includes user-generated MOS on audiovisual sequences encoded and transmitted with varying video frame rates, quantization parameters, filters and network packet loss rates. 34 no-reference application- and network-level features are considered. Experiments with different feature sets show that, apart from the DNN model, all models perform better with the complete set of features, and hence do not require feature processing. On the contrary the DNN model performs better when trained only with 5 independent features, namely: video frame rate, quantization, noise reduction, video packet loss rate, and audio packet loss rate. Also, the one-hidden layer DNN model outperforms the model with 20 hidden layers in terms of RMSE (0.403 vs. 0.437) and PCC (0.909 vs. 0.894). The conducted experiments also show that all models perform quite well and that the RF model with complete set of features performs the best (lowest RMSE 0.340 and highest PCC 0.930). The video packet loss rate seems to be the most influential feature on the RF model. The model is further trained on other publicly available audiovisual datasets and still performs well. However it is not compared to the other models, which would be useful to confirm or infirm the supremacy of RF.

9.2 QoE prediction under QoS impairments

In [ 453 ], Vega et al. propose an unsupervised deep learning model based on Restricted Boltzmann Machines (RBMs) for real-time quality assessment of video streaming services. More precisely, the model is intended to infer the no-reference features of the received video from only a subset of those features that the client extracts in real-time fashion. 10 video-related features are extracted: one related to the bit stream, five to the frame, two to the inter-frame and the last two to the content. Network QoS parameters are not considered in the feature set, however the impact of the network conditions is studied in the paper based on two synthetic network-impaired video datasets, namely ReTRiEVED (for general condition networks) and LIMP (for extremely lossy networks). It is observed that the PCC between the VQM of the received video and the bit rate feature is the highest amongst the ten features, under network delay, jitter and throughput impairments. However, it is the blur ratio that correlates the most with VQM under severe packet loss condition. A discrepancy between video types was also recorded. This eventually motivated the need for one RBM model (with different feature set) per video type and network impairment, which raised the number of devised models to 32. Video-type and network-condition specific RBMs (with 100 hidden neurons) eventually shows a better performance than the single video-type and network-condition oblivious model on the ReTRiEVED dataset, according to the authors, which contradicts the results shown on the tables. Assuming that there is improvement, the practicality and overhead of the multi-RBMs solution are yet to be evaluated. In fact, delay, jitter, and throughput impairments are treated as if they were independent conditions and a condition-specific model is created. In practice, however impairments are correlated and happen together. Therefore, if the client has to assess the quality of the streamed video, it will also have to find out what impairment there is prior to selecting the appropriate predictor.

9.3 QoS/QoE prediction for HAS and DASH

Recently, the widespread adoption of HTTP adaptive streaming (HAS) drove increasing interest in developing QoE/QoS-aware HAS clients. Data-driven approaches, in particular ML, have been employed mainly in two different ways: (1) to predict changes in network QoS, namely throughput, and trigger adaptation mechanism to reduce rebuffering time [ 432 ], and (2) to select appropriate adaptation action [ 102 ].

It has been shown in recent work [ 432 ] that accurate throughput prediction can significantly improve the QoE for adaptive video streaming. ML has been widely used in throughput prediction in general as shown in Section 3 . In the particular context of adaptive video streaming, Sun et al. propose in [ 432 ] the Cross Session Stateful Predictor (CS2P), a throughput prediction system to help with bitrate selection and adaptation in HAS clients. CS2P uses HMM for modeling the state-transition evolution of throughput, one model per session cluster , where sessions are clustered according to common features (e.g. ISP, region). The system is testing with a video provider (iQIYI) dataset consisting of 20 million sessions covering 3 million unique client IPs, 18 server IPs, and 87 ISPs. The HMM model is trained offline via the expectation maximization algorithm, and 4-fold cross-validation is used for tuning the number of states (6 states in total). Online prediction provides an estimate of the throughput 1 ∼ 10 epochs ahead using maximum likelihood estimation. Throughput is continuously monitored and the model is updated online accordingly. Midstream throughput prediction experiments show that the model achieves 7% median absolute normalized prediction error ( ∼  20 % 75th-percentile error) reducing the median prediction error by up to 50% compared to history-based predictors (last sample, harmonic mean, AR) as well as other ML-based predictors (SVR, GBR, and HMM trained on all sessions as opposed to the session cluster). It is also shown that CS2P achieves 3.2 % improvement on overall QoE and 10.9 % higher average bitrate over state-of-art Model Predictive Control (MPC) approach, which uses harmonic mean for throughput prediction. The authors claim that SVR and GBR perform poorly when trained on the session cluster. This might be due to the smaller size of the session cluster dataset, but requires further investigation.

In [ 102 ] (that extends [ 103 ] - [ 357 ]), a Q-learning-based HAS client is proposed to dynamically adjust to the current network conditions, while optimizing the QoE. Adaptation is assumed at the segment level; the quality level (e.g. bitrate) of the next video segment may go higher or lower depending on network conditions. States are defined as a combination of the client buffer filling level and throughput level. B max / T seg +1 different buffer filling levels are considered where B max denotes the maximum client buffer size in seconds, and T seg the segment duration in seconds. Whereas N +1 throughput levels are considered, ranging between 0 and the client link capacity, where N is the number of quality levels. The reward function to be maximized is a measure of the QoE, calculated on the basis of the targeted segment quality level, the span between the current and targeted quality level, and the rebuffering level (which may result in video freezes).

The model is trained and tested on N S −3 with 10-min different video sequences (6 in total), split into 2sec segments each encoded at N =7 different bitrates. The algorithm is trained for 400 episodes of streaming each of the video sequences over a publicly available 3 G network bandwidth trace [ 384 , 385 ]. The authors claim that the Q-learning client achieves in average 9.12 % higher estimated MOS (program-generated), with 16.65 % lower standard deviation, than the traditional Microsoft ISS Smooth Streaming (MSS) client. Similar performance is recorded when alternating between 2 video sequences every 100 streaming episodes. However, shifting to a random new video sequence after convergence time was not investigated.

9.4 Summary

Research in QoS/QoE provisioning has been leveraging ML for both prediction and adaptation, as shown in Tables  19 and 20 . Clearly, research has been dominated by works on predicting QoE based on video-level and network-level features. As such, a number of different QoS/QoE correlation models have been proposed in the literature for different media types (e.g. voice, video and image) ranging from simple regression models to NNs, including SVM, DT, RF, etc. For each media type, different QoE assessment methods and metrics have been used (e.g. MOS, VQM), each with its own set of computational and operational requirements. The lack of a common, standard QoE measure makes it difficult to compare the efficiency of different QoS/QoE prediction and correlation models. In addition, there is a lack of a clear quantitative description of the impact of network QoS on QoE. This impact is poorly understood and varies from one scenario to another. While some find it sufficient to correlate the QoE to a single network QoS parameter [ 145 , 240 , 383 , 408 ], e.g. delay or throughput, others argue that multiple QoS parameters impact the QoE and need to be considered in tandem as features in a QoE/QoS correlation model [ 89 , 102 , 114 , 235 , 287 , 328 , 432 ]. Still others consider QoS as a confounding parameter and build different QoE assessment models for different network QoS conditions [ 453 ].

This motivates the need for an efficient methodology for QoE/QoS correlation, based on a combination of quantifiable subjective and objective QoS measures and outcomes of service usage. This calls for the identification of the influential factors of QoE for a given type of service and understanding their impact on user’s expectation and satisfaction. QoE measures, such as MOS, and user engagement metrics are very sensitive to contextual factors. Though, contextual information undoubtedly influences QoE and is necessary to develop relevant QoE optimization strategies, it can raise privacy concerns.

Results depicted in Table  19 show that supervised learning techniques, such as NNs, SVR, DT and RF have consistent low MOS prediction errors. According to [ 89 ], RF is a better classifier than NN when it comes to predicting MOS. Table  20 also shows that using ML in HAS and DASH for prediction and adaptation, using supervised learning and RL, can improve QoE. However, this still needs to be validated in a real-world testbed.

10 Network security

Network security consists of protecting the network against cyber-threats that may compromise the network’s availability, or yield unauthorized access or misuse of network-accessible resources. Undoubtedly, businesses are constantly under security threats [ 231 ], which not only costs billions of dollars in damage and recovery [ 227 ], but could also have a detrimental impact to their reputation. Therefore, network security is a fundamental cornerstone in network operations and management.

It is undeniable that we are now facing a cyber arms race, where attackers are constantly finding clever ways to attack networks, while security experts are developing new measures to shield the network from known attacks, and most importantly zero-day attacks. Examples of such security measures include:

Encryption of network traffic , especially the payload, to protect the integrity and confidentiality of the data in the packets traversing the network.

Authorization using credentials , to restrict access to authorized personnel only.

Access control , for instance, using security policies to grant different access rights and privileges to different users based on their roles and authorities.

Anti-viruses , to protect end-systems against malwares, e.g. Trojan horse, ransom-wares, etc.

Firewalls , hardware or software-based, to allow or block network traffic based on pre-defined set of rules.

However, encryption keys and login credentials can be breached, exposing the network to all kinds of threats. Furthermore, the prevention capabilities of firewalls and anti-viruses are limited by the prescribed set of rules and patches. Hence, it is imperative to include a second line of defense that can detect early symptoms of cyber-threats and react quickly enough before any damage is done. Such systems are commonly referred to as Intrusion Detection/Prevention Systems (IDS/IPS). IDSs monitor the network for signs of malicious activities and can be broadly classified into two categories—Misuse- and Anomaly-based systems. While the former rely on signatures of known attacks, the latter is based on the notion that intrusions exhibit a behavior that is quite distinctive from normal network behavior. Hence, the general objective of anomaly-based IDSs is to define the “normal behavior” in order to detect deviations from this norm.

When it comes to the application of ML for network security, through our literature survey we have found that the majority of works have focused on the application of ML for intrusion detection. Here, intrusion detection refers to detecting any form of attacks that may compromise the network e.g. probing, phishing, DoS, DDoS, etc. This can be seen as a classification problem. While there is a body of work on host-based intrusion detection (e.g. malware and botnet detection), we do not delve into this topic, as most of these works utilize traces collected from the end-host (sometimes in correlation with network traces). Concretely, in our discussion, we focus on network-based intrusion detection and we classify the works into three categories, namely misuse, anomaly, and hybrid network IDSs.

Previous surveys [ 82 , 161 , 447 ] looked at the application of ML for cyber-security. However, [ 161 , 447 ] cover the literature between 2000-2008, leaving out a decade of work. More recently, [ 82 ] looked at the application of Data Mining and ML for cyber-security intrusion detection. The proposed taxonomy consists of the different ML techniques with a sample of efforts that apply the corresponding technique. Our discussion is different, as we focus on ML-based approaches with a quantitative analysis of existing works (Tables  21 , 22 23 , 24 and 25 ). Furthermore, we survey efforts related to SDN and reinforcement learning, which have been recently published.

10.1 Misuse-based intrusion detection

Misuse-based IDSs consist of monitoring the network and matching the network activities against the expected behavior of an attack. The key component of such a system is the comprehensiveness of the attack signatures. Typically, the signatures fed to a misuse-IDS rely on expert knowledge [ 84 ]. The source of this knowledge can either be human experts, or it can be extracted from data. However, the huge volume of generated network traces renders manual inspection practically impossible. Furthermore, attack signatures extracted by sequentially scanning network traces will fail to capture advanced persistent threats or complex attacks with intermittent symptoms. Intruders can easily evade detection if the signatures rely on a stream of suspicious activities by simply inserting noise in the data.

In light of the above, ML became the tool of choice for misuse-based IDSs. Its ability to find patterns in big datasets, fits the need to learn signatures of attacks from collected network traces. Hence, it comes as no surprise to see a fair amount of literature [ 20 , 84 , 90 , 252 , 322 , 344 , 354 , 402 , 421 ] that rely on ML for misuse-detection. These efforts are summarized in Table  21 . Naturally, all existing works employ supervised learning, and the majority perform the detection offline. Note, we classify all work that use normal and attack data in their training set as misuse-detection.

The earliest work that employed ML for misuse detection is [ 84 ]. It was among the first to highlight the limitations of rule-based expert systems, namely that they (i) fail to detect variants of known attacks, (ii) require constant updating, and (iii) fail to correlate between multiple individual instances of suspicious activities if they occur in isolation. Following the success of NN in the detection of computer viruses, the application of NN for misuse detection as an alternative to rule-based systems is proposed. The advantages of NN are its ability to analyze network traces in a less structured-manner (as opposed to rule-based systems), and to provide prediction in the form of a probability. The latter can enable the detection of variants of known attacks. For evaluation, training and testing dataset are generated using RealSecure T M —a tool that monitors network data and compares it against signatures of known attacks. For attack dataset, InternetScanner T M [ 368 ] and Satan Scanner [ 143 ] tools are used to generate port scans and syn-flood attacks on the monitored host. Results show that the NN is able to correctly identify normal and attack records 89-91% of the time.

In 1999, the KDD cup was launched in conjunction with the KDD’99 conference. The objective of the contest was the design of a classifier that is capable of distinguishing between normal and attack connections in a network. A dataset was publicly provided for this contest [ 257 ], and since then became the primary dataset used in ML-based intrusion detection literature. It consists of 5 categories of attacks, including DoS, probing, user-to-root (U2R) and root-to-local (R2L), in addition to normal connections. The top three contestants employed DT-based solutions [ 421 ]. The winner of the contest [ 358 ] used an ensemble of 50 times 10 C5 DTs with a mixture of bagging and boosting [ 377 ]. The results of the proposed method are presented in Table  21 . Clearly, the proposed approach performs poorly for U2R and R2L attack categories. The authors do mention that many of the decisions were pragmatic and encouraged more scientific endeavors. Subsequently, an extensive body of literature emerged for ML-based intrusion detection using the KDD’99 dataset, in efforts to improve on these results, where some use the winners’ results as a benchmark.

For instance, Moradi et al. [ 322 ] investigate the application of NN for multi-class classification using the KDD’99 dataset. Specifically, the authors focused on DoS and probing attacks. As opposed to the work of [ 84 ], two NNs were trained: one with a single hidden layer and the second with two hidden layers, to increase the precision of attack classification. They leverage the Early Stopping Validation Method [ 366 ] to reduce training and validation time of the NN to less than 5 hours. As expected, the NN with 2 hidden layers achieves a higher accuracy of 91%, compared to the 87% accuracy of the NN with a single hidden layer.

Amor et al. [ 20 ] compare NB and DT also using KDD’99 dataset, and promote NB’s linear training and classification times as a competitive alternative to DT. NB is found to be 7 times faster in learning and classification than DT. For whole attacks, DT shows a slightly higher accuracy over NB. However, NB achieves better accuracy for DoS, R2L, and probing attacks. Both NB and DT perform poorly for R2L and U2R attacks. In fact, Sabhnani and Serpen [ 398 ] expose that no classifiers can be trained successfully on the KDD dataset to perform misuse detection for U2R or R2L attack categories. This is due to the deficiencies and limitations of the KDD dataset rather than the inadequacies of the proposed algorithms.

The authors found via multiple analysis techniques that the training and testing datasets represent dissimilar hypothesis for the U2R and R2L attack categories; so if one would employ any algorithm that attempts to learn the signature of these attacks using the training dataset is bound to perform poorly on the testing dataset. Yet, the work in [ 344 ] reports surprisingly impressive detection accuracy for U2R and R2L. Here, a hybrid of BP NN with C4.5 is proposed, where BP NN is used to detect DoS and probing attacks, and C4.5 for U2R and R2L. For U2R and R2L only a subcategory of attacks is considered (yielding a total of 11 U2R connections out of more than 200 in the original dataset and ∼  2000 out of more than 15000 for R2L connections). After-the-event analysis is also performed to feed C4.5 with new rules in the event of misclassification.

Other seminal works consider hybrid and ensemble methods for misuse detection [ 90 , 354 , 421 ]. The goal of ensemble methods is to integrate different ML techniques to leverage their benefits and overcome their individual limitations. When applied to misuse detection, and more specifically for the KDD’99 dataset, these work focused on looking at which ML technique works best for a class of connections. For instance, Peddabachigari et al. [ 354 ] propose an IDS that leverages an ensemble of DT, SVM with polynomial kernel based function, and hybrid DT-SVM to detect various different cases of misuse. Through empirical evaluation, the resultant IDS consist of using DT for U2R, SVM for DoS, and and DT-SVM to detect normal traffic. The ensemble of the 3 methods together (with a voting mechanism) is used to detect probing and R2L attacks. The resultant accuracy for each class is presented in Table  21 .

Stein et al. [ 421 ] employ DT with GA. The goal of GA is to pick the best feature set out of the 41 features provided in KDD’99 dataset. DT with GA is performed for every category of attacks, rendering a total of 4 DTs. The average error rate achieved by each DT at the end of 20 runs is reported in Table  21 . Another interesting ensemble learning approach is the one proposed in [ 90 ], where the ensemble is composed of pairs of feature set and classification technique. More specifically, BN and CART classification techniques are evaluated on the KDD’99 dataset with different feature sets. Markov blanket [ 353 ] and Gini [ 76 ] are adopted as feature selection techniques for BN and CART, respectively. Markov blanket identifies the only knowledge needed to predict the behavior of a particular node; a node here refers to the different categories of attacks. Gini coefficient measures how well the splitting rules in CART separates between the different categories of attacks. This is achieved by pruning away branches with high classification error. For BN, 17 features out of 41 are chosen during the data reduction phase. For CART, 12 variables are selected. CART and BN are trained on the 12 and 17 features set, as well as 19 features set from [ 326 ]. They describe the final ensemble method using pairs (#features, classification), which delineates the reduced feature set and the classification technique that exhibits the highest accuracy for the different categories of attacks and normal traffic. The ensemble model achieves 100% accuracy for normal (12 features set, CART), probe (17 features set, CART), and DoS (17 features set, Ensemble), and 84% accuracy for U2R (19 features set, CART), and 99.47% accuracy for R2L (12 features set, Ensemble).

Miller et al. [ 314 ] also devise an ensemble method but based on NB classifiers, denoted as Multi-perspective Machine Learning (MPML). The key idea behind MPML is that an attack can be detected by looking at different network characteristics or “perspective”. These characteristics in turn are represented by a subset of network features. Hence, they group the features of a perspective together, and train a classifier using each feature set. The intuition behind this approach is to consider a diverse and rich set of network characteristics (each represented by a classifier), to enhance the overall prediction accuracy. The predictions made by each classifier are then fed to another NB model to reach a consensus.

A limitation of the aforementioned approaches is that they are all employed offline, which inhibits their application in real life. A few related works focused on the training and detection times of their IDS. Most classifiers (e.g., image, text recognition systems) require re-training from time to time. However, for IDSs this retraining may be performed daily (or even hourly) due to the fast and ever changing nature of cyber-threats [ 180 ]. Hence, fast training times are critical for an adaptable and robust IDS. [ 198 ] tackled the challenge of devising an IDS with fast training time using an Adaboost algorithm. The proposed algorithm consists of an ensemble of weak classifiers (decision stumps), where their decisions are then fed to a strong classifier to make the final decision. The fast training time achieved (of 73 s) is attributed to the use of weak classifiers. Another advantage of decision stumps is the ability to combine weak classifiers for categorical features with weak classifiers for continuous features, without any forced conversation as is typically done in most works. During the evaluation, a subset of attack types are omitted from the training set in order to evaluate the algorithm’s ability to detect unknown attacks. While the reported accuracy is not significantly high (90%), the training time is promising for real-time deployment. Clearly, there is still a need for a model that can achieve fast training time, without sacrificing the detection accuracy.

Sangkatsanee et al. [ 402 ] propose a real-time misuse-based IDS. Information gain is applied to reduce the number of features used (for faster detection), resulting in 12 features. Different ML techniques were assessed, among which DT provided the best empirical results. They developed a tool that runs on traces collected in 2 s time intervals, and shows a detection accuracy of 98%. A post-processing technique is also proposed to reduce FP, which consists of flagging an attack only if 3-out-of 5 consecutive records belonging to the same connection were classified as an attack. While this work is indeed promising, given it is performed in real-time, it suffers from a few limitations: (i) it can only detect two types of attacks (DoS and probe), (ii) it is not compared against other real-time signature-based IDS (e.g. Snort [ 87 ]), (iii) it only looks at attacks in windows of 2 s, and (iv) its post-processing approach correlates records between 2 IPs, making it vulnerable to persistent threats and distributed attacks.

A final effort that merits a discussion here is [ 272 ]. This work employs Transductive Confidence Machine for k -NN (TCM-KNN), a supervised classification algorithm with a strangeness measure. A high strangeness measure indicates that the given instance is an outlier in a particular class (for which the measurement is being conducted). The strangeness measure is calculated for every instance against each possible classification class. This is achieved by measuring the ratio of the sum of the k-nearest distances from a given class to the sum of the k -nearest distances from all other classes. The strangeness measure is also employed for active learning. Since getting labeled data for attacks is a cumbersome task, active learning can relieve part of this tedious process by indicating the subset of data points that should be labeled to improve the confidence of the classifier. TCM-KNN is evaluated over the KDD’99 dataset and the results are reported in Table  21 . The benefits of active learning is also evaluated. Starting with a training set of just 12 instances, TCM-KNN requires the labeling of an additional 40 actively selected instances to reach a TP of 99.7%. Whereas, random sampling requires the labeling of 2000 instances to attain the same accuracy.

10.2 Anomaly-based intrusion detection

Though misuse-based IDSs are very successful at detecting known attacks, they fail to identify new ones. Network cyber-threats are constantly changing and evolving, making it crucial to identify “zero-day” attacks. This is where anomaly-based intrusion detection comes in. Anomaly-based IDS models normal network behavior, and identify anomalies as a deviation from the expected behavior. A big issue with anomaly-based IDSs is false alarms, since it is difficult to obtain a complete representation of normality. ML for anomaly detection has received significant attention, due to the autonomy and robustness it offers in learning and adapting profiles of normality as they change over time. With ML, the system can learn patterns of normal behavior across environments, applications, group of users, and time. In addition, it offers the ability to find complex correlations in the data that cannot be deduced from mere observation. Though anomaly detection can be broadly divided into flow feature or payload-based detection, recently, deep learning and reinforcement learning are being aptly exploited. Primarily, this is due to their intrinsic ability to extrapolate data from limited knowledge. We delineate and summarize the seminal and state-of-the-art ML-based techniques for anomaly detection in Tables  22 , 23 and 24 .

10.2.1 Flow feature-based anomaly detection

Flow-based anomaly detection techniques rely on learning the expected (benign) network activities from flow features. The immediate observation in contrast to misuse detection is the application of unsupervised learning and hybrid supervised/unsupervised learning. Some works employed supervised learning for anomaly detection as well. The main difference is instead of teaching the model the expected behavior, in unsupervised learning the model is fed with an unlabeled training set to find a structure, or a hidden pattern, in the data. In anomaly detection, the notion is that benign network behavior is more common and will naturally group together, whereas, anomalous behavior is more sparse and will appear as outliers in the dataset. Hence, the larger and more dense clusters will indicate normal connections, while the smaller more distant data points (or clusters of data points) will indicate malicious behavior. A quick glance at Tables  22 , 23 , and 24 will reveal that the KDD’99 dataset is the dataset of choice in most anomaly-based intrusion detection literature, where some have also employed the improved version of the dataset, NSL-KDD [ 438 ] released in 2009. In the sequel, we elucidate the most influential work in the application of flow feature-based ML for anomaly detection.

We start-off our discussion by looking at supervised learning techniques. KOAD [ 7 ] is an online kernel function-based anomaly detection IDS. The key feature of KOAD is its ability to model normal behavior in face of variable traffic characteristics. It leverages a real-time anomaly detection algorithm that incrementally constructs and maintains a dictionary of input vectors defining the region of normal behavior. This dictionary is built using time series of the number of packets and IP flows. In the evaluation, the authors use a dataset collected by monitoring 11 core routers in the Abilene backbone network for a week. It comprises of two multi-variate time series, the number of packets and the number of individual IP flows. KOAD is evaluated against PCA and One-Class Neighbor Machine (OCNM). In packet time series, OCNM flags 26 out of 34 anomalies but generates 14 FPs, while KOAD gives different TP and FP under different parameters. For instance, it can detect 30 anomaly records with 17 FPs, and 26 anomaly records with 1 FP. However, PCA can detect 25 anomalies with 0 FP. On the other hand, for the flow-count time series, KOAD outperforms PCA and OCNM in terms of detection rate but at the cost of a higher FP.

More recently, Boero et al. [ 64 ] leverage a SVM with radial basis function kernel (RBF-SVM) to devise an IDS for SDN-based malware detection. A reduced feature set is evaluated based on features that are collectible via OF and commercial SDN switches. This limits the number of features to 7 consisting of the number of packets, number of bytes, flow duration, byte rate, packet rate, length of the first packet, and average packet length. The dataset used for evaluation consists of normal traffic traces from a university campus and malware traffic traces from [ 126 , 292 , 348 , 351 ]. For a dataset with known attacks, both RBF-SVM with limited and all features return a TP above 98% for the malware traces, while TP of RBF-SVM is 86.2% for normal traces.

However, detecting new attacks using the RBF-SVM with limited and full features achieve comparable TP with a high FP of approximately 18% for normal traces. This shows that restricting the features set to those that can be collected via SDN switches slightly impacts the TP rate; however it comes at a cost of a higher FP. Hence there is a need to enlarge the features set that SDN switches monitor and collect. As we will see in the following, the battle between FP and TP will constantly resurface throughout our discussion. This is expected since guaranteeing the ground truth is difficult and requires manual labeling. Furthermore, obtaining a complete representation of normal behavior is extremely challenging. Thus, any future legitimate behavior that was not part of the trained set might be flagged as an anomaly.

The main application of unsupervised learning for anomaly detection is clustering on the basis that normal data connections will create larger more dense clusters. Jiang et al. [ 220 ] challenge this notion by showcasing that the size of the cluster is not sufficient to detect anomalies and has to be coupled with the distance of the cluster from other clusters, to increase accuracy of detection. To this end, the authors propose an Improved Nearest Neighbor (IMM) technique for calculating cluster radius threshold. The KDD dataset is used for evaluation and shows that IMM outperforms three related works [ 131 , 139 , 363 ] in terms of detection rate and FP. A snippet of their reported results is presented in Table  22 .

Kayacik et al. [ 232 ] leverage unsupervised NN with SOM and investigate their detection capabilities when trained with only 6 of the most basic TCP features, including protocol type, service type, status flag, connection duration, and total bytes sent to destination/source host. They evaluate their work on the KDD dataset, and observe that SOM-based anomaly detection achieves an average DR (ADR) of 89% with FP in the range of [1.7%-4.6%]. Other interesting applications of unsupervised learning for anomaly detection is RF [ 495 ] and an ensemble of single-class classifiers [ 165 ]. Giacinto et al. [ 165 ] train a single-class classifier, based on v -SVC [ 405 ], for each individual protocol and network service; e.g. ICMP, HTTP, FTP, and Mail. This ensures that each classifier is specialized in detecting normal and abnormal characteristics for one these protocols and services. The application of one-class classifier is particularly interesting for cases where there is a skewness in the data. This is in-line with the fact that normal traffic traces are more common than malicious network activities. Thus, the one-class classifier learns the behavior of the dominant class, and dissimilar traffic patterns are then flagged as an anomaly. Results of the evaluation can be found in Table  22 .

The majority of works in anomaly-based IDS employed a hybrid of supervised/unsupervised learning techniques. Panda et al. [ 345 ] evaluate several hybrid approaches to identify the best combination of supervised and unsupervised data filtering and base classifiers for detecting anomalies. The authors evaluate DT, PCA, stochastic primal estimated sub-gradient solver for SVM (SPegasos), ensembles of balanced nested dichotomies (END), Grading, and RF. They show that RF with nested dichotomies (ND) and END achieve the best results, with a detection rate of 99.5% and a FP of 0.1%. It is also the fastest in terms of performance, requiring 18.13 s to build and provides F-measure, precision, and recall of 99.7%, 99.9, and 99.9%, respectively. Enhanced SVM [ 411 ] combines a supervised version of SVM: soft-margin SVM with an unsupervised version: one-class SVM. The intuition here is that this combination will allow to get the best of both worlds: low FP with ability to detect zero-day attacks. Enhanced SVM consists of 4 phases:

Create a profile of normal packets using Self-organized Feature Map.

Packet filtering scheme, using p0f [ 491 ], based on passive TCP/IP fingerprinting to reject incorrectly formed TPC/IP packets.

GA to perform feature selection

Temporal correlation of packets during packet processing

Enhanced-SVM is only trained with normal traffic. The normal to abnormal ration in the data set consists of 98.5-99 to 1-1.5%. Compared to two commercial IDSs, Bro and Snort, the Enhanced-SVM slightly improves in anomaly detection accuracy on a real dataset with unknown traffic traces. However, for known attacks, Snort and Bro significantly outperform Enhanced-SVM.Wagner et al. [ 456 ] also leverage a hybrid supervised and unsupervised single-class SVM to detect anomalies in IP NetFlow records. A new kernel function is proposed to measure the similarity between two windows of IP flow records of n seconds. The hybrid SVM is evaluated on a normal dataset obtained from an ISP, with synthetically generated attacks using Flame [ 74 ], and with n =5 s. Results show that the hybrid SVM can achieve an ADR of 92%, FP in the range [0-0.033], and TN in the range [0.967-1]. Finally, Muniyandi et al. [ 327 ] propose a hybrid anomaly detection mechanism that combines k -Means with C4.5 DT. They build k clusters using k -Means and employ DT for each cluster. DT overcomes the forced assignment problem in k -Means, where k is too small and a class dominates due to skewed dataset. The authors evaluate the hybrid detection on the KDD dataset and show that it outperforms k -Means, ID3, NB, k -NN, SVM, and TCM-KNN, over 6 different metrics, including TP, FP, precision, accuracy, F-measure, and ROC. However, TCM-KNN achieves better results in terms of TPR and FPR.

10.2.2 Payload-based anomaly detection

Payload-based anomaly detection systems learn patterns of normality from packet payload. This provides the ability to detect attacks injected inside the payload that can easily evade flow feature-based IDSs. In this subsection, we discuss ML techniques that have been employed to detect anomalies using packet payload alone or in conjunction with flow features.

PAYL [ 459 ] use the 1-gram method to model packet payloads. n-gram is widely used for text analysis. It consists of a sliding window of size n that scans the payload while counting the occurrence/frequency of each n-gram. In addition to counting the frequency of each byte in the payload, the mean and the standard deviation is computed. As the payload exhibits different characteristics for different services, PAYL generates a payload model for each service, port, direction of payload, and payload length range. Once the models are generated, Mahalanobis distance is used to measure the deviation between incoming packets and the payload models. The larger the distance, the higher the likelihood that the newly arrived packet is abnormal. The authors leverage incremental learning to keep the model up to date, by updating the Mahalanobis distance to include new information gathered from new packets. PAYL’s ability to detect attacks on TCP connections is evaluated using the KDD dataset and data traces collected from Columbia University Computer Science (CUCS) web server. PAYL is able to detect 60% of the attacks on ports 21 and 80 with a FP of 1%. However, it performs poorly when the attacks target applications running on ports 23 and 25. This is due to the fact that attacks on ports 21 and 80 exhibit distinctive patterns in the format of the payload, making them easier to detect than attacks on ports 23 and 25. PAYL can be used as an unsupervised learning technique under the assumption that malicious payloads are a minority, and will have a large distance to the profile than the average normal samples. Hence, by running the learned model on the training set, malicious packets in the set can be detected, omitted, and then the models are retrained on the new training set.

Perdisci et al. [ 356 ] design Multiple-Classifier Payload-based Anomaly Detector (McPAD) to infer shell and polymorphic shell code attacks. Shell code attacks inject malicious executable code in the packet payload. As opposed to 1-gram analysis performed by PAYL, McPAD runs a 2 v -gram analysis technique to model the payload ( v =[0−10]). It measures the occurrence of a pair of bytes that are v positions apart. By varying v and applying feature reduction, different compact representations of the payload are obtained. Each of these representations is then fed to a 1-class classifier model and majority vote is used to make the final prediction. For evaluation, normal traffic is extracted from two datasets: the 1st week of KDD dataset and 7 weeks of HTTP traffic collected from College of Computing School at the Georgia Tech (GATECH). Attack traffic is collected from a generic dataset in [ 204 ], in addition to synthetically generated polymorphic attacks [ 117 ] and Polymorphic Blending Attacks (PBAs). In comparison to PAYL, McPAD achieves a DR of 60, 80 and 90% for generic, polymorphic CLET, and shell-code attacks, respectively, with an FP of 10 −5 for all attacks. While, PAYL reports very low DRs for the same FP. However, the computational overhead of McPAD is much higher than that of PAYL with an average processing time of 10.92 ms over KDD and 17.11 ms over GATECH whereas PAYL runs in 0.039 ms and 0.032 ms, respectively.

Zanero et al. [ 493 ] propose a two-tier architecture for anomaly detection. The first tier consists of an unsupervised outliers detection algorithm that classifies each packet. This tier provides a form of feature reduction as the result of the classification “compresses” each packet into a single byte of information. The results from the first tier are fed into the second tier anomaly detection algorithm. In the first tier, both packet header and payload are used for outliers detection. The authors compare three different techniques, including SOM, Principal Direction Divisive Partitioning (PDDP) algorithm and k -Means, with SOM outperforming PDDP and k -Means in terms of classification accuracy with a reasonable computational cost. A preliminary prototype that combines a first tier SOM with a second tier SOM is evaluated over the Nessus [ 44 ] vulnerabilities scans. The results show a 75% improvement in DR over an IDS that does not include the first tier.

Gornitz et al. [ 171 ] leverage semi-supervised Support Vector Data Description (SVDD) and active learning to build the active SVDD (ActiveSVDD) model for payload-based anomaly detection. It is first trained with unlabeled examples, and subsequently refined by incorporating labeled data that has been queried by active learning rules. The empirical evaluation consists of comparing an unsupervised SVDD with random sampling against ActiveSVDD. The dataset used for the evaluation is HTTP traffic recorded within 10 days at Fraunhofer Institute. Attack data is generated using Metasploit [ 307 ] framework. In addition, mimicry attacks are added in the form of cloaked data to evaluate the ability to detect adversarial attacks. The model achieve high accuracy, with random sampling for online applications with cloaked data, 96% DR with a very low FP and 64% DR for ActiveSVDD and SVDD, respectively.

10.3 Deep and reinforcement learning for intrusion detection

As we contemplate the applications of ML for misuse and anomaly detection, we observe that all applications of NN were restricted to networks with at most 2 hidden-layers. DNNs are attractive for the ability to train large NNs with several hidden-layers. As we survey the literature on DL for intrusion detection, we will observe much larger and deeper NNs in terms of number of nodes in each layer, and the number of hidden layers. Conceptually, the results of DNNs get better with more data and larger models.

10.3.1 Deep learning for anomaly detection

Over the past decade, anomaly detection has particularly benefited from self-taught learning (STL) [ 213 ], DBN [ 14 , 273 ], and RNN [ 245 ]. Once more, all these works have been evaluated using KDD dataset, and its enhanced version NSL-KDD [ 438 ] dataset. Their results are summarized in Table  24 .

In 2007, STL [ 378 ] emerged as an improvement over semi-supervised learning. STL uses unlabeled data from other, but relevant, object class to enhance a supervised classification task e.g. using random unlabeled images from the Internet to enhance the accuracy of a supervised classification task for cat images. This is achieved by learning a good feature representation from the unlabeled data and then applying this representation to the supervised classifier. The potential benefit of STL for anomaly detection is clear: intrusion detection suffers from the lack of sufficient amount of labeled data, more specifically for attacks. To this extent, the work in [ 213 ] explore the application of STL for anomaly detection. Their proposed model consists of two stages, an Unsupervised Feature Learning (UFL) stage using sparse auto encoder, followed by a classification stage that uses the learned features with soft-max regression (SMR). They evaluate their solution using the NSL-KDD Cup dataset for 2-class and 5-class classifications, and compare against a SMR technique that is not preceded by a UFL stage. The 2-class classification achieves a higher accuracy of 88.39% compared to 78.06% of SMR, and outperforms SMR with respect to recall and F-measure. However, SMR outperforms STL in precision.

Li et al. [ 273 ] and Alom et al. [ 14 ] explore the use of DBN for anomaly detection. DBN is an interesting class of NN, when trained using unlabeled data it works as a features selector, and when trained with labeled data it acts as a classifier. In [ 273 ] DBN is used to perform both of these two tasks. More specifically, an auto-encoder is first used for dimensionality reduction. The proposed DBN is composed of multi-layers of RBM and a layer of BP NN. Unsupervised training is performed on every layer of RBM and the final output is fed to the BP NN for classification. Pre-training and pre-tuning the DBN with auto-encoder over 10 iterations, result in an accuracy of 92.10%, FP of 1.58% and TP of 92.20%. DBN without auto-encoder achieves an accuracy, FP, and TP of 91.4, 9.02, and 95.34%, respectively.

In [ 14 ], the authors perform a compare analysis to evaluate the performance of DBN (composed of two-layers RBM) against against SVM, and a hybrid DBN-SVM. This comparative analysis was performed using the NSL-KDD dataset. The results show that DBN runs in 0.32 s, and achieves an accuracy of 97.5% when trained with only 40% of the NSL-KDD dataset, outperforming SVM and DBN-SVM. This exceeds the performance, with respect to training time of similar existing work [ 400 ]. In contrast, Tang et al. [ 436 ] use DNN for a flow-based anomaly detection in SDNs. They extract six features from the SDN switches and evaluate the accuracy of anomaly detection using the NSL-KDD Cup dataset. As the learning rate is varied, the DNN achieves an accuracy, precision, recall, and F-measure in the range [72.05-75.75%], [79-83%], [72-76%], and [72-75%], respectively. It is important to note that for the highest learning rate, DNN achieves the highest accuracy on the training dataset. However, its accuracy, recall and F-measure on the test datasets drops. The authors note that such an accuracy drop occurs with a high learning rate since the model becomes trained “too accurately”, i.e. over-fitting. Nevertheless, the accuracy of the DNN is lower than the winner of the KDD Cup, RF, which has an accuracy of 81.59%.

10.3.2 Reinforcement learning for intrusion detection (RL)

MARL [ 407 ] is a Multi-Agent Reinforcement Learning system for the detection of DoS and DDoS attacks. MARL is based on Q-learning, and the system consists of a set of heterogeneous sensor agents (SA) and a hierarchy of decision agents (DA). In the proposed setup three SAs and 1 DA are used. Each SA is responsible of collecting either congestion, delay, or flow-based network metrics. These collected metrics represent the local state of every SA. Every SA runs a local RL mechanism to match its local state to a particular communication action-signal. These signals are received by the DA, which given a global view of the state of the network triggers a final action signal that is forwarded to a human in the loop. If the DA (or the higher-layer agent in case of a hierarchy of DAs) makes the appropriate call, all the agents in the system are rewarded. Otherwise, they will all be penalized. MARL is evaluated on NS-2 with 7 nodes, where two nodes generate normal FTP and UDP traffic and one generates the UDP attacks. The remaining four nodes constitute the SA agents and a single DA agent. There is a single baseline run and seven tests are conducted, where each test differs in the normal traffic, attack patterns, or both. The corresponding accuracy, recall, and FPR for each test is presented in Table  24 . MARL is also tested on a dataset that contains mimicry attacks, it achieved a recall and accuracy of ∼  30% and ∼  70%. When little change is inflicted in the traffic pattern, MARL can achieve high 99% accuracy and recall with 0 FP.

A less conventional application of RL is [ 85 ], which consists of an online IDS based on adaptive NN with modified RL. Here RL consists of a feedback mechanism. The focus is to detect DoS attacks using Cerebellar Model Articulation Controller (CMAC) NN. The learning algorithm incorporates feedback from the protected system in the form of system state (i.e. response rate, heartbeat). The objective is to leverage the system state to assist in detecting the attacks earlier since the responsiveness of the system reduces under attack. The authors evaluate CMAC NN using a prototype application that simulates ping flooding and UDP packet storm attacks. First, they assess the system’s ability to autonomously learn attacks. They find that when the system is trained with gradual ping flood attack vectors, the error rate is 2.199%, which reduces to 1.94 −7 % as the training progresses. The authors also evaluate the system’s ability to learn new attacks and recognize learned attacks. The error rate results are presented in Table  24 . Finally, the benefit of the system’s feedback mechanism illustrates that as attacks progress, the system state’s responsiveness approaches 0 and the error rate reaches 8.53 −14 %.

10.4 Hybrid intrusion detection

We conclude our survey of ML for intrusion detection by looking at hybrid IDSs that apply both misuse and anomaly-based intrusion detection. Such a hybrid system can make the best of both worlds i.e. high accuracy in detecting patterns of known attacks, along with the ability to detect new attacks. Every time a new attack is detected, it can then be fed to the misuse-detection system to enhance the comprehensiveness of its database. We start off our discussion by looking at the work of Depren et al. [ 116 ] that leverages J.48 DT and SOM for misuse and anomaly detection, respectively. Three SOM modules are trained, one for each of the TCP, UDP and ICMP traffic. The output of the misuse and anomaly detection modules are combined using a simple decision support system, that raises an alarm if either one of the modules detect an attack. The authors evaluate their work over the KDD Cup dataset and find that their hybrid IDS achieves a DR of 99.9% with a missed rate of 0.1% and a FP of 1.25%.

Similarly, Mukkamala et al. [ 325 ] compare a SVM-based with an NN-based hybrid misuse and anomaly detection models. Their models are trained with normal and attack data and evaluated using the KDD Cup dataset. The SVM-based hybrid model achieves 99.5% accuracy with training and testing times of 17.77 s and 1.63 s, respectively. While, three different NNs are trained and tested, each with a different structure of hidden layers. The three NNs achieve an accuracy of 99.05, 99.25, and 99%, respectively, with a training time of 18 min. Therefore, SVM outperforms NN, slightly in accuracy and significantly in runtime.

Zhang et al. [ 494 ] develop a hierarchical IDS framework based on RBF to detect both misuse and anomaly attacks, in real-time. Their hierarchical approach is modular and decreases the complexity of the system. It enables different modules to be retrained separately, instead of retraining the entire system. This is particularly useful in the event of a change that only affects a subset of the modules. Serial hierarchical IDS (SHIDS) is compared against a parallel hierarchical IDS (PHIDS). SHIDS begins by training a classifier with only normal data and as the classifier detects abnormal packets, it logs them in a database. c -Means clustering [ 58 ] groups the data based on their statistical distributions, and as the number of attack records in the largest group exceeds a pre-defined threshold, a new classifier is trained with that specific attack data and appended to the end of the SHIDS. PHIDS on the other hand consists of three layers. The anomaly and misuse classifiers are in the first two layers, while the third layer is dedicated to different attack categories. Over time, the data in each attack category is updated as new attacks are identified. The performance of RBF is evaluated using the KDD dataset against a Back-Propagation learning algorithm (BPL). Though, BPL achieves a higher DR for misuse detection, RBF has a smaller training time of 5 min compared to 2 h for BPL. Training time is critical for online IDSs. Further, when training the model with just normal data for anomaly detection, RBF outperforms BPL for each attack category, with respect to DR and FP. Overall, RBF achieves a DR of 99.2% and FP of 1.2%, compared to BPL with a DR of 93.7% and FP of 7.2%. The evaluation of SHIDS and PHIDS are in Table  25 .

10.5 Summary

Our survey on the application of ML for network security focused on network-based intrusion detection. We grouped the work into misuse, anomaly, and hybrid network IDSs. In each category, we expose the different ML techniques that were applied, including recent applications of DL and RL. One clear take-away message is the significant benefit that ML has brought to misuse-based intrusion detection. It has really improved on the rule-based systems, and allowed the extraction of more complex patterns of attacks from audit data. It even allowed the ability to detect variants of known attacks. In the field of misuse-detection, a preference is given to white-box models (e.g. DT) as their decision rules can be extracted, as opposed to black-box models (e.g. NN). Ensemble-based methods were also heavily employed by training ML models on different subsets of the dataset or with different feature sets. Ensemble-based methods have been particularly useful in achieving very fast training time.

While the benefits of ML for IDS is clear, there is a lot of speculation on the application of ML for anomaly detection. Despite the extensive literature on ML-based anomaly detection, it has not received the same traction in real deployments [ 415 ]. Indeed, the most widely deployed IDS (Snort [ 45 ]) is in fact misuse-based [ 101 ]. The main culprit for this aversion is not only the susceptibility of anomaly detection to high FPs, but also the high-cost of misclassification in the event of FNs. Compared to the cost of misclassification in an ads recommender system, a missed malicious activity can bring down the system or cause a massive data breach. Another main weakness that we observe in the literature is that most works consist of raising an alarm if an anomaly is detected without giving any hints or leads on the observed malicious behavior (e.g. the attack target). Providing such semantics can be extremely valuable to network analysts [ 415 ], and even in reducing FP.

The dataset of choice in the majority of the surveyed literature has been based on KDD’99, an out-dated dataset. On one hand, this has provided the community with the ability to compare and contrast different methods and techniques. On the other hand, it does not reflect the recent more relevant types of attacks. Moreover, even the normal connection traces represent basic applications (e.g. email and file-transfer) without any inclusion to more recent day-to-day applications that swarms the network (e.g. social media and video streaming). This is further aggravated by the several limitations and flaws reported about this dataset [ 438 ]. Indeed, there is a dire need for a new dataset for intrusion detection.

To conclude, most works on the application of ML for intrusion detection are offline, and amongst the few real-time IDSs, there is no consideration for early detection (i.e. detecting a threat from the first few packets of a flow). Moreover, there is a gap in the ML for intrusion detection literature with regards to intrusion detection for persistent threats, or correlating among isolated anomaly instances over time. Finally, only a handful of works have actually evaluated the robustness of their algorithm in the event of mimicry attacks, an aspect of critical importance as attackers are constantly looking for ways to evade detection.

11 Lessons learned, insights and research opportunities

We have discussed the existing efforts in employing ML techniques to address various challenges and problems in networking. The success of ML primarily lies in the availability of data, compounded with improved and resilient ML algorithms to solve complex problems. Future networks are envisaged to support an explosive growth in traffic volume and connected devices with unprecedented access to information. In addition, these capabilities will have to be achieved without significantly increasing CAPEX, OPEX or customer tariffs.

In order to be sustainable in a competitive environment, network operators must adopt efficient and affordable deployment, operations and management. Enabling technologies for future networks include SDN, network slicing, NFV, and multi-tenancy, which reduce CAPEX, increase resource utilization and sharing. Similarly, autonomic network management frameworks coupled with SDN is envisioned to reduce OPEX. The aforementioned technologies will allow future networks to host a wide variety of applications and services, and a richer set of use cases, including massive broadband, ultra low latency and highly reliable services, machine to machine communications, tactile Internet, industrial applications, autonomous vehicles, real-time monitoring and control.

In this subsection, we describe and delineate prominent challenges and open research opportunities pertaining to the application of ML in current and future networks, from the network, system and knowledge acquisition perspectives.

11.1 Network perspective

11.1.1 cost of predictions.

The accuracy of network monitoring data comes at the cost of increased monitoring overhead (e.g. consumed network bandwidth and switch memory). This raises the need for network monitoring schemes that are both accurate and cost-effective. Monitoring applications in traditional networks rely on a predefined set of monitoring probes built into the hardware/firmware, which limits their flexibility. With SDN customizable software-based monitoring probes can be deployed on-demand to collect more diverse monitoring data. However, in many instances, e.g. monitoring traffic volume over a given switch interface, these probes need to operate at line rate, which is very expensive over very high speed links and difficult to achieve in software. This makes TSF-based approaches for traffic prediction prohibitive.

Recently, two solutions have been investigated in order to overcome this issue, (i) traffic sampling and interpolation [ 274 ], and (ii) leveraging features other than traffic volume for traffic prediction [ 365 ]. Indeed, various flow sampling techniques (stochastic/deterministic, spacial/temporal, etc.) to reduce monitoring overhead have been proposed in the literature. Unfortunately, the current ML-based solution proposed in [ 274 ], is not conclusive and shows contradicting prediction accuracy results. Instead, Poupart et al. [ 365 ] use classifiers to identify elephant flows. Indeed, classifiers operate at a coarser granularity. Therefore, their accuracy can not be compared to the accuracy of regression model operating on the same set of features. Using features other than traffic volumes for accurate traffic prediction remains an open research direction.

11.1.2 Cost of errors and detailed reports

ML for anomaly detection has received significant interest in the literature, without gaining traction in the industry. This is primarily due to the high FPR [ 27 , 415 ], making them inapplicable in an operational setting. FPRs waste expensive analyst time to investigate the false alarms, and reduce the trust and confidence in the IDS. Another major concern with anomaly detection techniques is the lack of detailed reports on detected anomalies [ 415 ]. Typically, a flag is raised and an alarm is triggered whenever there is a deviation from the norm. An efficient IDS is not only responsible for detecting attacks and intrusions in the network, it must provide a detailed log of anomalies for historical data collection and model retraining.

11.1.3 Complexity matters

When performing traffic prediction, classification, routing and congestion control on intermediate nodes in the network, it is crucial that they consume less time and computing resources to avoid degradation in network performance. This requirement is non-trivial, especially, in resource-constrained networks, such as WANETs and IoT. Though, performance metrics for ML evaluation are well-defined, it is difficult to evaluate the complexity of ML-based approaches a priori. Unlike traditional algorithms, the complexity of ML algorithms also rely on the size and quality of data, and the performance objectives. The issue is further exacerbated, if the model is adaptive and relearning is intermittently triggered due to varying network conditions over time. The traditional complexity metrics fail to cover these aspects. Therefore, it is important to identify well-rounded evaluation metrics that will help in assessing the complexity of given ML techniques, to strike a trade-off between performance improvement and computational cost.

11.1.4 ML in the face of the new Web

In an effort to improve security and QoE for end-users, new application protocols (e.g. HTTP/2 [ 48 ], SPDY [ 47 ], QUIC [ 211 ]) have emerged that overcome various limitations of HTTP/1.1. For instance, HTTP/2 offers payload encryption, multiplexing and concurrency, resource prioritization, and server push. Though, the WEB applications over HTTP/2 enjoy the benefits of these enhancements, it further complicates traffic classification by introducing unpredictability in the data used for ML. For example, if we employ flow feature-based traffic classification, the feature statistics can be skewed, as several requests can be initiated over the same TCP connection and responses can be received out of order. Therefore, the challenge lies in exploring the behavior and performance of ML techniques when confronted with such unpredictability in a single TCP connection and even parallel TCP connections [ 293 ] in HTTP/2. Similarly, prioritization requested by different WEB clients diminish the applicability of a generic ML-based classification technique for identifying WEB applications.

11.1.5 Rethinking evaluation baseline

Often, proposed ML-based networking solutions are assessed and evaluated against existing non-ML frameworks. These latter act as baseline and are used to demonstrate the benefits, if any, of using ML. Unfortunately, these baseline solutions are often deprecated and outdated. For instance, ML-based congestion control mechanisms are often compared against default TCP implementations, e.g. CTCP, CUBIC, or BIC with typical loss recovery mechanisms, such as Reno, NewReno, or SACK. However, Yang et al. [ 486 ] applied supervised learning techniques to identify the precise TCP protocol used in Web traffic and uncovered that though majority of the servers employ the default, there is a small amount of web traffic that employs non-default TCP implementation for congestion control and loss recovery. Therefore, it is critical to consider TCP variants as comparison baselines that have taken the lead, and are prominently employed for congestion control and loss recovery.

ML-based congestion control mechanisms should be designed and evaluated under the consideration that the standard TCP is no longer the de facto protocol, and current networks implement heterogeneous TCP protocols that are TCP-friendly. Furthermore, it is a good practice to consider TCP variants, particularly enhanced for specific network technologies, such as TCP-FeW for WANETs and Hybla for satellite networks. ML-based approaches, such as Learning-TCP [ 29 ] and PCC [ 122 ], have already taken these considerations into account and provide an enhanced evaluation of their proposed solutions. Therefore, it is imperative to design a standardized set of performance metrics for enabling a fair comparison between various ML-based approaches to different problems in networking.

11.1.6 RL in face of network (in)stability and QoS

There are various challenges in finding the right balance between exploration of and exploitation in RL. When in comes to traffic routing, various routes must be explored before the system can converge to the optimal routing policy. However, exploring new routes can lead to performance instability and fluctuation in network delay, throughput and other parameters that impact QoS. On the other hand, exploiting the same “optimal” route to forward all the traffic may lead to congestion and performance degradation, which would also impact the QoS. Different avenues can be explored to overcome these challenges. For example, increasing the learning rate can help detect early signs of performance degradation. While, load balancing can be achieved with selective routing, which can be implemented by assigning different reward functions to different types of flows (elephant vs. mice, ToS, etc.). Furthermore, instability-awareness at exploration time can be implemented by limiting the scope of the routes to explore those with highest rewards. Indeed, this requires an in-depth study to gauge the impact of such solutions on network performance and their convergence time to optimal routing.

Another direction worth pursuing is to correlate the reward function of an RL-based routing to a desired level of QoS. This involves finding ways to answer questions, such as, which reward function can guarantee that the delay in the network does not exceed a given threshold? or, given a reward function, what would be the expected delay in the network?

11.1.7 Practicality and applicability of ML

Benchmarks used in the literature for the training and validation of proposed ML-based networking solutions are often far from being realistic. For instance, ML-based admission control mechanisms, are based on simulations that consider traffic from only a small set of applications or services. Furthermore, they disregard diversity of QoS parameters when performing admission control. However, in practice, networks carry traffic from heterogeneous applications and services, each having its own QoS requirements, with respect to throughput, loss rate, latency, jitter, reliability, availability, and so on. Hence, the optimal decision in the context of a simulated admission control mechanism may not be the optimal for a practical network. Furthermore, often synthetic network datasets are used in training and validation. Although, ML models perform well in such settings, their applicability in practical settings remains questionable. Therefore, more research is needed to develop practical ML-based network solutions.

11.1.8 SDN meets ML

Though, there has been a growing interest in leveraging ML to realize autonomic networks, there is little evidence of its application to date. Prohibiting factors include the distributed control and vendor-specific nature of legacy network devices. Several technological advances have been made in the last decade to overcome these limitations. The advent of network softwarization and programmability through SDN and NFV offers centralized control and alleviates vendor lock-in.

SDN can facilitate adaptive and intelligent network probing. Probes are test transactions that are used to monitor network behavior and obtain measurements from network elements. Finding the optimal probe rate will be prohibitively expensive in future networks, due to the large number of devices, the variety of parameters to measure, and the small time intervals to log data. Aggressive probing can exponentially increase the amount of traffic overhead resulting in network performance degradation. In contrast, conservative probing may have the risk of missing some significant anomalies or critical network events. Hence, it is imperative to adapt probing rates that keep traffic overhead within a target value, while minimizing performance degradation. SDN can leverage ML techniques to offer the perfect platform to realize adaptive probing. For example, upon predicting a fault or detecting an anomaly, the SDN controller can probe suspected devices at a faster rate. Similarly, during network overload, the controller may reduce the probing rate and rely on regression to predict the value of the measured parameters.

11.1.9 Virtualization meets ML

Due to the anticipated rise in the number of devices and expansion in network coverage, future networks will be exposed to a higher number of network faults and security threats. If not promptly addressed, such failures and, or attacks can be detrimental, as a single instance may affect many users and violate the QoS requirements of a number of applications and services. Thus, there is a dire need for an intelligent and responsive fault and security management framework. This framework will have to deal with new faults and attacks across different administrative and technological domains within a single network, introduced by concepts of network slicing, NFV, and multi-tenancy. For instance, any failure in the underlying physical resource can propagate to the hosted virtual resources, though the reverse is not always true. Hence, it will be nearly impossible for traditional approaches to locate the root cause or compromised elements of the fault or an attack, in such a complex network setting.

On the other hand, ML-based approaches on fault and security management focus mostly on single tenant in single layer networks. To develop the fault and security management framework for future networks, existing ML-based approaches need to be extended or re-engineered to take into account the notion of multi-tenancy in multi-layer networks. Due to the versatility of the problem, DNN can be explored to model complex multi-dimensional state spaces.

11.1.10 ML for smart network policies

The unprecedented scale and degree of uncertainty in future networks will amplify the complexity of traffic engineering tasks, such as congestion control, traffic prediction, classification, and routing. Although ML-based solutions have shown promising results to address many traffic engineering challenges, their time complexity needs to be evaluated with the envisioned dynamics, volume of data, number of devices and stringent applications requirements in future networks. To address this, smart policy-based traffic engineering approaches can be adopted where operators can efficiently and quickly apply adaptive traffic engineering policies. Policy-based traffic classification using SDN has shown promising results in the treatment of QoS requirements based on operator-engineered policies [ 334 ]. Incorporating ML to assist in developing and extracting adaptive policies for policy-based traffic engineering solutions, remains rather unexplored. One possible avenue is to apply RL for generating policies for traffic engineering in future networks.

11.1.11 ML in support of autonomy

Networks are experiencing a massive growth in traffic, and will continue to grow even faster with the advent of IoT devices, tactile Internet, virtual/augmented reality, high definition media delivery, etc. Furthermore, Cisco reports that there is a substantial difference between busy hour and average Internet traffic, such that in 2016, the busy hour Internet traffic increased by 51% in comparison to the 32% growth in average Internet traffic [ 99 ]. Such difference is expected to grow further in the next half a decade, where Cisco predicts that the growth rate of busy hour traffic will be almost 1.5 times that of average Internet traffic.

To accommodate such dynamic traffic, network operators can no longer afford the CAPEX for static resource provisioning as per the peak traffic requirements. Therefore, network operators must employ dynamic resource allocation that can scale based on the varying traffic demand. ML is an integral part of dynamic resource allocation that enables demand prediction, facilitates proactive provisioning and release of network resources. In addition, contextual information can be leveraged by ML to anticipate exceptional resource demand and reserve emergency resource in highly volatile environments.

Networks are also experiencing an exponential growth in terms of the number and diversity of supported applications and services. These have stringent and heterogeneous QoS requirements, in terms of latency, jitter, reliability, availability and mobility. It is likely that network operators may not only be unaware of all the devices in their network but also unconscious of all the applications and their QoS requirements. Therefore, it is challenging to devise efficient admission control and resource management mechanisms with limited knowledge. Existing works have demonstrated that both admission control and resource management can be formulated as learning problems, where ML can help improve performance and increase efficiency. A further step would be to explore if admission control and resource management strategies can be learned directly from network operation experience. Considering the intricate relationship between network experience and management strategies, DL can be leveraged to characterize the inherent relationship between inputs and outputs of a network.

11.2 System perspective

11.2.1 support for adaptive, incremental learning in dynamic network environments.

Networks are dynamic in nature. Traffic volume, network topology, and security attack signatures, are some of the many aspects that may change, often in an unexpected and previously unobserved way. Thus, it is fundamental to constantly retrain the ML model to account for these changes. Most ML models are trained offline. Retraining a model from scratch can be computationally intensive, time consuming, and prohibitive. The ability to retrain the model as new data is generated is fundamental to achieve fast incremental learning, which remains an open research direction. Indeed incremental learning comes with special system needs. In the particular case of RL applied to routing in SDN, a number of simulations are required before the model can converge to the optimal observation-to-action mapping policy. Every time a new flow is injected in the network, the SDN controller is required to find the optimal routing policy for that flow, and a number of simulations are performed as changes are observed in the link status. This calls for a system that fully exploits data and model parallelism to provide millisecond-level training convergence time.

11.2.2 Support for secure learning

ML is prone to adversarial attacks [ 39 ], also known as mimicry attacks, that aim to confuse learning. For instance, when employing ML for intrusion detection, an adversarial attack can trick the model into misclassifying malicious events as benign by poisoning the training data. Hence, it is fundamental to train robust ML models that are capable of detecting mimicry attacks. An interesting initiative worth mentioning is Cleverhans [ 346 ], a useful library that allows to craft adversarial examples. It provides training datasets that can be used to build robust ML models, capable of distinguishing legitimate datasets from poisoned ones, in the particular area of image recognition. There is indeed an urgent need for a system capable of generating adversarial use cases to be used in training robust models. Secure learning also demands a system that protects the training data from leakage and tampering, enforces privacy, data confidentiality and integrity, and support the secure sharing of data across domains.

11.2.3 Architectures for ML-driven networking

Modern networks generate massive volumes of different types of data (e.g. logs, traffic flow records, network performance metrics, etc.). At 100’s of Gbps, even with high sampling rates, a single large network infrastructure element can easily generate hundreds of millions of flow records per day. Recently, the availability of massive data drove rapid advancement in computer hardware and software systems, for storage, processing and analytics. This is evidenced by the emergence of massive-scale datacenters, with tens of thousands of servers and EB storage capacity, the widespread deployment of large-scale software systems like Hadoop MapReduce and Apache Spark, and the increasing number of ML and in particular deep learning libraries built on top of these systems, such as Tensor-Flow, Torch, Caffe, Chainer, Nvidia’s CUDA and MXNet. Mostly open-source, these libraries are capable of scaling out their workloads on CPU clusters enabled by specialized hardware, such as GPUs and TPUs.

GPUs are anticipated to be a key enabler for the next generation SDN [ 166 , 465 ]. GPU-accelerated SDN routers are reported to have a much improved packet processing capability. Furthermore, the GPUs on SDN controllers may be particularly useful for executing ML and DL algorithms for learning various networking scenarios, and acting according to the acquired knowledge. On the other hand, smaller, resource constrained, smart networked devices, are more likely to benefit from a cloud-edge ML system. A cloud-edge ML system would leverage the large processing and memory resources, robust networks, and massive storage capabilities of the cloud for training computationally intensive models and sharing these with edge devices. Data collection and analytics that require immediate or near-immediate response time would be handled by edge devices. Light-weight ML software systems, such as Caffe2Go and TensorFlowLite, would eventually enable edge devices to by-pass the cloud and build leaner models locally.

11.3 Knowledge perspective

11.3.1 lack of real-world data.

As we surveyed the literature, we observed that numerous works relied on synthetic data, particularly in resource and fault management, network security, and QoE/QoS correlation. Synthetic datasets are usually simplistic and do not truly reflect the complexity of real-world settings. This is not surprising, since obtaining real-world data traces is difficult due to the critical and private nature of network traffic, especially the payload. Furthermore, establishing the ground truth is particularly challenging, given the voluminous amount of traffic making any manual inspection intractable. Although injecting faults and, or attacks in the network can help produce the required data as adopted by [ 285 ], it is unrealistic to jeopardize a production network for the sake of generating training data. Such limitations increase the probability of ML techniques being ill-trained and inapplicable in real-world network settings. Thus, it remains unclear how the numerous works in the literature would perform over real data traces. Therefore, a combined effort from both academia and industry is needed, to create public repositories of data traces annotated with ground truth from various real networks.

11.3.2 The need for standard evaluation metrics

As we survey existing works, it became apparent that comparing them within each networking domain is not possible. This is due to the adoption of non-standardized performance metrics, evaluation environments, or datasets [ 109 ]. Furthermore, even when the same dataset is adopted, different portions of the data are used for training and testing, thereby inhibiting any possibility for comparative analysis. Standardization of metrics, data, and environment for evaluating similar approaches is fundamental to provide the ability to contrast and compare the different techniques, and evaluate their suitability for different networking tasks. To fulfill this need, standard bodies such as the Internet Engineering Task Force (IETF), can play a pivotal role by promoting standardization of evaluation procedures, performance metrics, and data formats through Requests for Comments (RFCs).

11.3.3 Theory and ML techniques for networking

As the compute and data storage barriers that thwarted the application of ML in networking are no longer an issue, what is now preventing an ML-for-networking success story as in games, vision and speech recognition? Lack of a theoretical model is one obstacle that ML faces in networking. This concern was raised by David Meyer during his talk at IETF97 on machine intelligence and networking [ 308 ]. Without a unified theory, each network has to be learned separately. This could truly hinder the speed of adoption of ML in networking. Furthermore, the currently employed ML techniques in networking have been designed with other applications in mind. An open research direction in this realm is to design ML algorithms tailored for networks [ 306 ]. Another key issue is the lack of expertise, that is, ML and networking are two different fields, and there is currently a scarcity in the number of people that are experts in both domains. This mandates more cross-domain collaborations involving experts from both networking and ML communities.

12 Conclusion

Over the past two decades, ML has been successfully applied in various areas of networking. This survey provides a comprehensive body of knowledge on the applicability of ML techniques in support of network operation and management, with a focus on traffic engineering, performance optimization and network security. We review representative literature works, explore and discuss the feasibility and practicality of the proposed ML solutions in addressing challenges pertaining to the autonomic operation and management of future networks.

Clearly, future networks will have to support an explosive growth in traffic volume and connected devices, to provide exceptional capabilities for accessing and sharing information. The unprecedented scale and degree of uncertainty will amplify the complexity of traffic engineering tasks, such as congestion control, traffic prediction, classification, and routing, as well as the exposure to faults and security attacks. Although ML-based solutions have shown promising results to address many traffic engineering challenges, their scalability needs to be evaluated with the envisioned volume of data, number of devices and applications. On the other hand, existing ML-based approaches for fault and security management focus mostly on single-tenant and single-layer networks. To develop the fault and security management framework for future networks, existing ML approaches should be extended or re-architected to take into account the notion of multi tenancy in multi layer networks.

In this survey, we discuss the above issues along with several other challenges and opportunities. Our findings motivate the need for more research to advance the state-of-the-art, and finally realize the long-time vision of autonomic networking.

Aben E. NLANR PMA data. 2010. https://labs.ripe.net/datarepository/data-sets/nlanr-pma-data . Accessed 27 Dec 2017.

Ackley DH, Hinton GE, Sejnowski TJ. A learning algorithm for boltzmann machines. Cogn Sci. 1985; 9(1):147–69.

Article   Google Scholar  

ACM Special Interest Group on KDD. KDD cup archives. 2016. http://www.kdd.org/kdd-cup . Accessed 22 Nov 2017.

Adams R. Active queue management: A survey. IEEE Commun Surv Tutor. 2013; 15(3):1425–76.

Adda M, Qader K, Al-Kasassbeh M. Comparative analysis of clustering techniques in network traffic faults classification. Int J Innov Res Comput Commun Eng. 2017; 5(4):6551–63.

Google Scholar  

Adeel A, Larijani H, Javed A, Ahmadinia A. Critical analysis of learning algorithms in random neural network based cognitive engine for lte systems. In: Vehicular Technology Conference (VTC Spring), 2015 IEEE 81st. IEEE: 2015. p. 1–5.

Ahmed T, Coates M, Lakhina A. Multivariate online anomaly detection using kernel recursive least squares.IEEE; 2007. pp. 625–33.

Ahn CW, Ramakrishna RS. Qos provisioning dynamic connection-admission control for multimedia wireless networks using a hopfield neural network. IEEE Trans Veh Technol. 2004; 53(1):106–117.

Aizenberg IN, Aizenberg NN, Vandewalle JP. Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications. Norwell: Kluwer Academic Publishers; 2000.

Book   MATH   Google Scholar  

Akan OB, Akyildiz IF. ATL: an adaptive transport layer suite for next-generation wireless internet. IEEE J Sel Areas Commun. 2004; 22(5):802–17.

Albus JS. A new approach to manipulator control: the cerebellar model articulation controller (cmac). J Dyn Syst Meas. Control. 1975; 97:220–7.

Article   MATH   Google Scholar  

Alhoniemi E, Himberg J, Parhankangas J, Vesanto J. S.o.m toolbox; 2000. http://www.cis.hut.fi/projects/somtoolbox/ . Accessed 28 Dec 2017.

Allman M, Paxson V, Blanton E. TCP Congestion Control. RFC 5681, Internet Engineering Task Force. 2009. https://tools.ietf.org/html/rfc5681 .

Alom MZ, Bontupalli V, Taha TM. Intrusion detection using deep belief networks. In: Aerospace and Electronics Conference (NAECON), 2015 National. IEEE: 2015. p. 339–44.

Alpaydin E. Introduction to Machine Learning, 2nd ed. Cambridge: MIT Press; 2010.

MATH   Google Scholar  

Alpaydin E. Introduction to Machine Learning, 3rd ed. Cambridge: MIT Press; 2014.

Alshammari R, Zincir-Heywood AN. Machine learning based encrypted traffic classification: Identifying ssh and skype. In: Computational Intelligence for Security and Defense Applications, 2009. CISDA 2009. IEEE Symposium on. IEEE: 2009. p. 1–8.

Alsheikh MA, Lin S, Niyato D, Tan HP. Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Commun Surv Tutor. 2014; 16(4):1996–2018.

Amaral P, Dinis J, Pinto P, Bernardo L, Tavares J, Mamede HS. Machine learning in software defined networks: Data collection and traffic classification. In: Network Protocols (ICNP), 2016 IEEE 24th International Conference on. IEEE: 2016. p. 1–5.

Amor NB, Benferhat S, Elouedi Z. Naive bayes vs decision trees in intrusion detection systems. In: Proceedings of the 2004 ACM symposium on Applied computing. ACM: 2004. p. 420–4.

Arndt D. HOW TO: Calculating Flow Statistics Using NetMate. 2016. https://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/ . Accessed 01 Aug 2017.

Arouche Nunes BA, Veenstra K, Ballenthin W, Lukin S, Obraczka K. A machine learning framework for tcp round-trip time estimation. EURASIP J Wirel Commun Netw. 2014; 2014(1):47.

Aroussi S, Bouabana-Tebibel T, Mellouk A. Empirical QoE/QoS correlation model based on multiple parameters for VoD flows. In: Global Communications Conference (GLOBECOM), 2012 IEEE. IEEE: 2012. p. 1963–8.

Arroyo-Valles R, Alaiz-Rodriguez R, Guerrero-Curieses A, Cid-Sueiro J. Q-probabilistic routing in wireless sensor networks. In: Intelligent Sensors, Sensor Networks and Information, 2007. ISSNIP 2007. 3rd International Conference on. IEEE: 2007. p. 1–6.

Astrom K. Optimal control of markov processes with incomplete state information. J Mathl Anal Appl. 1965; 10(1):174–205.

Article   MathSciNet   MATH   Google Scholar  

Auld T, Moore AW, Gull SF. Bayesian neural networks for internet traffic classification. IEEE Trans neural networks. 2007; 18(1):223–39.

Axelsson S. The base-rate fallacy and the difficulty of intrusion detection. ACM Trans Inf Syst Secur (TISSEC). 2000; 3(3):186–205.

Ayoubi S, Limam N, Salahuddin MA, Shahriar N, Boutaba R, Estrada-Solano F, Caicedo OM. Machine learning for cognitive network management. IEEE Commun Mag. 2018; 1(1):1.

Badarla V, Murthy CSR. Learning-tcp: A stochastic approach for efficient update in tcp congestion window in ad hoc wireless networks. J Parallel Distrib Comput. 2011; 71(6):863–78.

Badarla V, Siva Ram Murthy C. A novel learning based solution for efficient data transport in heterogeneous wireless networks. Wirel Netw. 2010; 16(6):1777–98.

Bakhshi T, Ghita B. On internet traffic classification: A two-phased machine learning approach. J Comput Netw Commun. 2016; 2016:2016.

Bakre A, Badrinath BR. I-tcp: Indirect tcp for mobile hosts. In: Proceedings of the 15th International Conference on Distributed Computing Systems. ICDCS ’95. Washington: IEEE Computer Society: 1995. p. 136.

Balakrishnan H, Seshan S, Amir E, Katz RH. Improving tcp/ip performance over wireless networks. In: Proceedings of the 1st Annual International Conference on Mobile Computing and Networking. MobiCom ’95. New York: ACM: 1995. p. 2–11.

Balakrishnan H, Padmanabhan VN, Seshan S, Katz RH. A comparison of mechanisms for improving tcp performance over wireless links. IEEE/ACM Trans Netw. 1997; 5(6):756–69.

Baldo N, Zorzi M. Learning and adaptation in cognitive radios using neural networks. In: Consumer Communications and Networking Conference, 2008. CCNC 2008. 5th IEEE. IEEE: 2008. p. 998–1003.

Baldo N, Dini P, Nin-Guerrero J. User-driven call admission control for voip over wlan with a neural network based cognitive engine. In: Cognitive Information Processing (CIP), 2010 2nd International Workshop on. IEEE: 2010. p. 52–6.

Baras JS, Ball M, Gupta S, Viswanathan P, Shah P. Automated network fault management. In: MILCOM 97 Proceedings. IEEE: 1997. p. 1244–50.

Barman D, Matta I. Model-based loss inference by tcp over heterogeneous networks. In: Proceedings of WiOpt 2004 Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks. Cambridge: 2004. p. 364–73.

Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD. Can machine learning be secure? In: Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, ACM, ASIACCS ’06. New York: ACM: 2006. p. 16–25.

Barreto GA, Mota JCM, Souza LGM, Frota RA, Aguayo L. Condition monitoring of 3g cellular networks through competitive neural models. IEEE Trans Neural Netw. 2005; 16(5):1064–75.

Baum LE, Petrie T. Statistical inference for probabilistic functions of finite state markov chains. Ann Math Statist. 1966; 37(6):1554–63.

Bay SD, Kibler D, Pazzani MJ, Smyth P. The uci kdd archive of large data sets for data mining research and experimentation. SIGKDD Explor Newsl. 2000; 2(2):81–5.

Bayes M, Price M. An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, f. r. s. communicated by mr. price, in a letter to john canton, a. m. f. r. s. Philos Trans. 1763; 53(1683-1775):370–418.

Beale J, Deraison R, Meer H, Temmingh R, Walt CVD. Nessus network auditing. Burlington: Syngress Publishing; 2004.

Beale J, Baker AR, Esler J. Snort: IDS and IPS toolkit. Burlington: Syngress Publishing; 2007.

Bellman R. Dynamic Programming, 1st ed. Princeton: Princeton University Press; 1957.

Belshe M, Peon R. SPDY Protocol. Tech. rep., Network Working Group. 2012. https://tools.ietf.org/pdf/draft-mbelshe-httpbis-spdy-00.pdf .

Belshe M, Peon R, Thomson M. Hypertext Transfer Protocol Version 2 (HTTP/2). RFC 7540, IETF. 2015. http://www.rfc-editor.org/info/rfc7540.txt .

Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. In: Proceedings of the 19th, International Conference on Neural Information Processing Systems. NIPS’06. Cambridge: MIT Press: 2006. p. 153–60.

Benson T. Data Set for IMC 2010 Data Center Measurement. 2010. http://pages.cs.wisc.edu/tbenson/IMC10_Data.html . Accessed 28 Dec 2017.

Bergstra JA, Middelburg C. Itu-t recommendation g. 107: The e-model, a computational model for use in transmission planning: ITU; 2003.

Bermolen P, Rossi D. Support vector regression for link load prediction. Comput Netw. 2009; 53(2):191–201.

Bermolen P, Mellia M, Meo M, Rossi D, Valenti S. Abacus: Accurate behavioral classification of P2P-tv traffic. Comput Netw. 2011; 55(6):1394–411.

Bernaille L, Teixeira R. Implementation issues of early application identification. Lect Notes Compu Sci. 2007; 4866:156.

Bernaille L, Teixeira R, Akodkenou I, Soule A, Salamatian K. Traffic classification on the fly. ACM SIGCOMM Comput Commun Rev. 2006a; 36(2):23–6.

Bernaille L, Teixeira R, Salamatian K. Early application identification. In: Proceedings of the 2006 ACM CoNEXT Conference. ACM: 2006b. p. 61–6:12.

Bernstein L, Yuhas CM. How technology shapes network management. IEEE Netw. 1989; 3(4):16–9.

Bezdek JC, Ehrlich R, Full W. Fcm: The fuzzy c-means clustering algorithm. Comput Geosci. 1984; 10(2-3):191–203.

Bhorkar AA, Naghshvar M, Javidi T, Rao BD. Adaptive opportunistic routing for wireless ad hoc networks. IEEE/ACM Trans Netw. 2012; 20(1):243–56.

Biaz S, Vaidya NH. Distinguishing congestion losses from wireless transmission losses: a negative result. In: Proceedings 7th International Conference on Computer Communications and Networks (Cat. No.98EX226). Piscataway: IEEE: 1998. p. 722–31.

Bkassiny M, Li Y, Jayaweera SK. A survey on machine-learning techniques in cognitive radios. IEEE Commun Surv Tutor. 2013; 15(3):1136–59.

Blanton E, Allman M. On making tcp more robust to packet reordering. SIGCOMM Comput Commun Rev. 2002; 32(1):20–30.

Blenk A, Kalmbach P, van der Smagt P, Kellerer W. Boost online virtual network embedding: Using neural networks for admission control. In: Network and Service Management (CNSM), 2016 12th International Conference on. Piscataway: IEEE: 2016. p. 10–8.

Boero L, Marchese M, Zappatore S. Support vector machine meets software defined networking in ids domain. In: Proceedings of the 29th International Teletraffic Congress (ITC), vol. 3. New York: IEEE: 2017. p. 25–30.

Bojovic B, Baldo N, Nin-Guerrero J, Dini P. A supervised learning approach to cognitive access point selection. In: GLOBECOM Workshops (GC Wkshps), 2011 IEEE. Piscataway: IEEE: 2011. p. 1100–5.

Bojovic B, Baldo N, Dini P. A cognitive scheme for radio admission control in lte systems. In: Cognitive Information Processing (CIP), 2012 3rd International Workshop on. Piscataway: IEEE: 2012. p. 1–3.

Bojovic B, Quer G, Baldo N, Rao RR. Bayesian and neural network schemes for call admission control in lte systems. In: Global Communications Conference (GLOBECOM), 2013 IEEE. Piscataway: IEEE: 2013. p. 1246–52.

Bonald T, May M, Bolot JC. Analytic evaluation of red performance. In: Proceedings IEEE INFOCOM 2000. Conference on, Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064), vol 3.2000. p. 1415–24.

Bonfiglio D, Mellia M, Meo M, Rossi D, Tofanelli P. Revealing skype traffic: when randomness plays with you. In: ACM SIGCOMM Computer Communication Review. ACM: 2007. p. 37–48.

Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, New York, NY, USA, COLT ’92. New York: ACM: 1992. p. 144–52.

Boyan JA, Littman ML. Packet routing in dynamically changing networks: A reinforcement learning approach. In: Advances in neural information processing systems: 1994. p. 671–8.

Braden B, Clark D, Crowcroft J, Davie B, Deering S, Estrin D, Floyd S, Jacobson V, Minshall G, Partridge C, Peterson L, Ramakrishnan K, Shenker S, Wroclawski J, Zhang L. Recommendations on queue management and congestion avoidance in the internet. RFC 2309, Internet Engineering Task Force. 1998. https://tools.ietf.org/html/rfc2309 .

Brakmo LS, O’Malley SW, Peterson LL. Tcp vegas: New techniques for congestion detection and avoidance. In: Proceedings of the Conference on Communications Architectures, Protocols and Applications, ACM, New York, NY, USA, SIGCOMM ’94. New York: ACM: 1994. p. 24–35.

Brauckhoff D, Wagner A, May M. FLAME: A flow-level anomaly modeling engine. In: Proceedings of the conference on Cyber security experimentation and test (CSET). Berkley: USENIX Association: 2008. p. 1.

Breiman L. Bagging predictors. Mach Learn. 1996; 24(2):123–40.

MathSciNet   MATH   Google Scholar  

Breiman L, Friedman J, Stone C, Olshen R. Classification and Regression Trees. The Wadsworth and Brooks-Cole statistics-probability series. New York: Taylor & Francis; 1984.

Brill E, Lin JJ, Banko M, Dumais ST, Ng AY, et al. Data-intensive question answering. In: TREC, vol. 56.2001. p. 90.

Broomhead DS, Lowe D. Radial basis functions, multi-variable functional interpolation and adaptive networks. Memorandum No. 4148. Malvern: Royal Signals and Radar Establishment; 1988.

Brownlee J. Practical Machine Learning Problems. 2013. https://machinelearningmastery.com/practical-machine-learning-problems/ . Accessed 28 Dec 2017.

Bryson A, Ho Y. Applied optimal control: optimization, estimation and control. Blaisdell book in the pure and applied sciences. Waltham: Blaisdell Pub. Co; 1969.

Bryson AE. A gradient method for optimizing multi-stage allocation processes. In: Harvard University Symposium on Digital Computers and their Applications. Boston: 1961. p. 72.

Buczak AL, Guven E. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun Surv Tutor. 2016; 18(2):1153–76.

Caini C, Firrincieli R. Tcp hybla: a tcp enhancement for heterogeneous networks. Int J Satell Commun Netw. 2004; 22(5):547–66.

Cannady J. Artificial neural networks for misuse detection. In: Proceedings of the 21st National information systems security conference, vol. 26. Virginia: 1998. p. 368–81.

Cannady J. Next generation intrusion detection: Autonomous reinforcement learning of network attacks. In: Proceedings of the 23rd national information systems security conference. Baltimore: 2000. p. 1–12.

Chabaa S, Zeroual A, Antari J. Identification and prediction of internet traffic using artificial neural networks. J Int Learn Syst Appl. 2010; 2(03):147.

Chakrabarti S, Chakraborty M, Mukhopadhyay I. Study of snort-based ids. In: Proceedings of the International Conference and Workshop on Emerging Trends in Technology. New York: ACM: 2010. p. 43–7.

Chang CC, Lin CJ. Libsvm: a library for support vector machines. ACM trans int syst technols (TIST). 2011; 2(3):27.

Charonyktakis P, Plakia M, Tsamardinos I, Papadopouli M. On user-centric modular qoe prediction for voip based on machine-learning algorithms. IEEE Trans Mob Comput. 2016; 15(6):1443–56.

Chebrolu S, Abraham A, Thomas JP. Feature deduction and ensemble design of intrusion detection systems. Comput secur. 2005; 24(4):295–307.

Chen M, Zheng AX, Lloyd J, Jordan MI, Brewer E. Failure diagnosis using decision trees. In: Autonomic Computing, 2004. Proceedings. International Conference on. Piscataway: IEEE: 2004. p. 36–43.

Chen MY, Kiciman E, Fratkin E, Fox A, Brewer E. Pinpoint: Problem determination in large, dynamic internet services. In: Dependable Systems and Networks, 2002. DSN 2002. Proceedings. International Conference on. Piscataway: IEEE: 2002. p. 595–604.

Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM: 2016. p. 785–94.

Chen Z, Wen J, Geng Y. Predicting future traffic using hidden markov models. In: Proceddings of 24th IEEE International Conference on Network Protocols (ICNP). IEEE: 2016. p. 1–6.

Cheng RG, Chang CJ. Neural-network connection-admission control for atm networks. IEE Proc-Commun. 1997; 144(2):93–8.

Choi SP, Yeung DY. Predictive q-routing: A memory-based reinforcement learning approach to adaptive traffic control. In: Advances in Neural Information Processing Systems.1996. p. 945–51.

Chow CK. An optimum character recognition system using decision functions. IRE Trans Electron Comput EC. 1957; 6(4):247–54.

Article   MathSciNet   Google Scholar  

Chun-Feng W, Kui L, Pei-Ping S. Hybrid artificial bee colony algorithm and particle swarm search for global optimization. Math Probl Eng. 2014;2014.

Cisco. The Zettabyte Era: Trends and Analysis. 2017. https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/vni-hyperconnectivity-wp.html . Accessed 28 Dec 2017.

Cisco Systems. Cisco IOS Netflow. 2012. http://www.cisco.com/go/netflow . Accessed 01 Aug 2017.

Cisco Systems. Snort: The worlds most widely deployed ips technology. 2014. https://www.cisco.com/c/en/us/products/collateral/security/brief_c17-733286.html . Accessed 25 Apr 2018.

Claeys M, Latre S, Famaey J, De Turck F. Design and evaluation of a self-learning http adaptive video streaming client. IEEE commun lett. 2014a; 18(4):716–9.

Claeys M, Latré S, Famaey J, Wu T, Van Leekwijck W, De Turck F. Design and optimisation of a (fa) q-learning-based http adaptive streaming client. Connect Sci. 2014b; 26(1):25–43.

Cortez P, Rio M, Rocha M, Sousa P. Internet traffic forecasting using neural networks. In: Proceedings of IEEE International Joint Conference on Neural Networks (IJCNN). IEEE: 2006. p. 2635–42.

Cox DR. The regression analysis of binary sequences. J R Stat Soc Ser B (Methodological). 1958; 20(2):215–42.

Cybenko G. Approximation by superpositions of a sigmoidal function. Mathematics of Control. Signals Syst (MCSS). 1989; 2(4):303–14.

Dagum P, Galper A, Horvitz EJ. Temporal Probabilistic Reasoning: Dynamic Network Models for Forecasting. Stanford: Knowledge Systems Laboratory, Medical Computer Science, Stanford University; 1991.

Dainotti A, Pescapé A, Sansone C. Early classification of network traffic through multi-classification. In: International Workshop on Traffic Monitoring and Analysis. Springer: 2011. p. 122–35.

Dainotti A, Pescape A, Claffy KC. Issues and future directions in traffic classification. IEEE Netw. 2012; 26(1):35–40.

Dean T, Kanazawa K. A model for reasoning about persistence and causation. Comput Intell. 1989; 5(2):142–50.

Dechter R. Learning while searching in constraint-satisfaction-problems. In: Proceedings of the Fifth AAAI National Conference on Artificial Intelligence, AAAI Press, AAAI’86.Palo Alto: AAAI Press: 1986. p. 178–83.

Demirbilek E. The INRS Audiovisual Quality Dataset. 2016. https://github.com/edipdemirbilek/TheINRSAudiovisualQualityDataset . Accessed 28 Dec 2017.

Demirbilek E, Grégoire JC. INRS audiovisual quality dataset.ACM; 2016, pp. 167–71.

Demirbilek E, Grégoire JC. Machine learning–based parametric audiovisual quality prediction models for real-time communications. ACM Transactions on Multimedia Computing. Commun Appl (TOMM). 2017; 13(2):16.

Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Method). 1977; 39(1):1–38.

Depren O, Topallar M, Anarim E, Ciliz MK. An intelligent intrusion detection system (ids) for anomaly and misuse detection in computer networks. Expert syst Appl. 2005; 29(4):713–22.

Detristan T, Ulenspiegel T, Malcom Y, Underduk M. Polymorphic shellcode engine using spectrum analysis. 2003. http://www.phrack.org/show.php?p=61&a=9 . Accessed 25 May 2018.

Ding J, Kramer B, Xu S, Chen H, Bai Y. Predictive fault management in the dynamic environment of ip networks. In: IP Operations and Management, 2004. Proceedings IEEE Workshop on. Piscataway: IEEE: 2004. p. 233–9.

Ding L, Wang X, Xu Y, Zhang W. Improve throughput of tcp-vegas in multihop ad hoc networks. Comput Commun. 2008; 31(10):2581–8.

Domingos P. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World.Basic Books; 2015.

Donato W, Pescapé A, Dainotti A. Traffic identification engine: an open platform for traffic classification. IEEE Netw. 2014; 28(2):56–64.

Dong M, Li Q, Zarchy D, Godfrey PB, Schapira M. Pcc: Re-architecting congestion control for consistent high performance. In: Proceedings of the 12th, USENIX Conference on Networked Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, NSDI’15. Berkley: USENIX Association: 2015. p. 395–408.

Dowling J, Cunningham R, Curran E, Cahill V. Collaborative reinforcement learning of autonomic behaviour. In: Proceedings. 15th International Workshop on Database and Expert Systems Applications, 2004.2004. p. 700–4. https://doi.org/10.1109/DEXA.2004.1333556 .

Dowling J, Curran E, Cunningham R, Cahill V. Using feedback in collaborative reinforcement learning to adaptively optimize manet routing. IEEE Transactions on Systems. Man and Cybern-Part A: Syst Hum. 2005; 35(3):360–72.

Dreger H, Feldmann A, Mai M, Paxson V, Sommer R. Dynamic application-layer protocol analysis for network intrusion detection. In: USENIX Security Symposium. Berkeley: USENIX Security Symposium: 2006. p. 257–72.

Dump CM. Dde command execution malware samples. 2017. http://contagiodump.blogspot.it . Accessed 1 Mar 2017.

eBay Inc. eBay. 2017. https://www.ebay.com/ . Accessed 01 Aug 2017.

Edalat Y, Ahn JS, Obraczka K. Smart experts for network state estimation. IEEE Trans Netw Serv Manag. 2016; 13(3):622–35.

El Khayat I, Geurts P, Leduc G. Improving TCP in Wireless Networks with an Adaptive Machine-Learnt Classifier of Packet Loss Causes. Berlin, Heidelberg: Springer Berlin Heidelberg; 2005, pp. 549–60.

Book   Google Scholar  

El Khayat I, Geurts P, Leduc G. Enhancement of tcp over wired/wireless networks with packet loss classifiers inferred by supervised learning. Wirel Netw. 2010; 16(2):273–90.

Elkan C. Results of the kdd’99 classifier learning. ACM SIGKDD Explor Newsl. 2000; 1(2):63–4.

Elkotob M, Grandlund D, Andersson K, Ahlund C. Multimedia qoe optimized management using prediction and statistical learning. In: Local Computer Networks (LCN),2010 IEEE 35th Conference on. IEEE: 2010. p. 324–7.

Elwhishi A, Ho PH, Naik K, Shihada B. Arbr: Adaptive reinforcement-based routing for dtn. In: Wireless and Mobile Computing, Networking and Communications (WiMob), 2010 IEEE 6th International Conference on. IEEE: 2010. p. 376–85.

Erickson BJ, Korfiatis P, Akkus Z, Kline TL. Machine learning for medical imaging. RadioGraphics. 2017; 37(2):505–15.

Erman J, Arlitt M, Mahanti A. Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM workshop on Mining network data. ACM: 2006a. p. 281–6.

Erman J, Mahanti A, Arlitt M. Internet traffic identification using machine learning. In: Global Telecommunications Conference, 2006. GLOBECOM’06. IEEE. IEEE: 2006b. p. 1–6.

Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C. Offline/realtime traffic classification using semi-supervised learning. Perform Eval. 2007a; 64(9):1194–213.

Erman J, Mahanti A, Arlitt M, Williamson C. Identifying and discriminating between web and peer-to-peer traffic in the network core. In: Proceedings of the 16th international conference on World Wide Web. ACM: 2007b. p. 883–92.

Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. Appl data min comput secur. 2002; 6:77–102.

Este A, Gringoli F, Salgarelli L. Support vector machines for tcp traffic classification. Comput Netw. 2009; 53(14):2476–90.

Eswaradass A, Sun XH, Wu M. Network bandwidth predictor (nbp): A system for online network performance forecasting. In: Proceedings of 6th IEEE International Symposium on Cluster Computing and the Grid (CCGRID). IEEE: 2006. p. 4–pp.

Fadlullah Z, Tang F, Mao B, Kato N, Akashi O, Inoue T, Mizutani K. State-of-the-art deep learning: Evolving machine intelligence toward tomorrow’s intelligent network traffic control systems. IEEE Commun Surv Tutor. 2017; PP(99):1.

Farmer D, Venema W. Satan: Security administrator tool for analyzing networks. 1993. http://www.porcupine.org/satan/ . Accessed 28 Dec 2017.

Feng W-C, Shin KG, Kandlur DD, Saha D. The blue active queue management algorithms. IEEE/ACM Trans Netw. 2002; 10(4):513–28.

Fiedler M, Hossfeld T, Tran-Gia P. A generic quantitative relationship between quality of experience and quality of service. IEEE Netw. 2010;24(2).

Finamore A, Mellia M, Meo M, Rossi D. Kiss: Stochastic packet inspection classifier for udp traffic. IEEE/ACM Trans Netw. 2010; 18(5):1505–15.

Fix E, Hodges JL. Discriminatory analysis-nonparametric discrimination: consistency properties. Report No. 4, Project 21-49-004, USAF School of Aviation Medicine. 1951.

Floyd S, Jacobson V. Random early detection gateways for congestion avoidance. IEEE/ACM Trans Netw. 1993; 1(4):397–413.

Fogla P, Sharif MI, Perdisci R, Kolesnikov OM, Lee W. Polymorphic blending attacks. In: USENIX Security Symposium. Berkley: USENIX Association: 2006. p. 241–56.

Fonseca N, Crovella M. Bayesian packet loss detection for tcp. In: Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies., vol 3.2005. p. 1826–37.

Forster A, Murphy AL. Froms: Feedback routing for optimizing multiple sinks in wsn with reinforcement learning. In: Intelligent Sensors, Sensor Networks and Information, 2007. ISSNIP 2007. 3rd International, Conference on. IEEE: 2007. p. 371–6.

Fraleigh C, Diot C, Lyles B, Moon S, Owezarski P, Papagiannaki D, Tobagi F. Design and deployment of a passive monitoring infrastructure. In: Thyrrhenian Internatinal Workshop on Digital Communications. Springer: 2001. p. 556–75.

Freund Y, Schapire RE. Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, ICML’96. San Francisco: Morgan Kaufmann Publishers Inc.: 1996. p. 148–56.

Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001; 29(5):1189–232.

Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002; 38(4):367–78.

Fu CP, Liew SC. Tcp veno: Tcp enhancement for transmission over wireless access networks. IEEE J Sel Areas Commun. 2003; 21(2):216–28.

Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980; 36(4):193–202.

Funahashi KI. On the approximate realization of continuous mappings by neural networks. Neural netw. 1989; 2(3):183–92.

Gagniuc P. Markov Chains: From Theory to Implementation and Experimentation.Hoboken: Wiley; 2017.

Gao Y, He G, Hou JC. On exploiting traffic predictability in active queue management. In: Proceedings. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 3. Piscataway: IEEE: 2002. p. 1630–9.

Garcia-Teodoro P, Diaz-Verdejo J, Maciá-Fernández G, Vázquez E. Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput Secur. 2009; 28(1-2):18–28.

Gartner Inc. Gartner Says 8.4 Billion Connected “Things” Will Be in Use in 2017, Up 31 Percent From 2016. 2017. https://www.gartner.com/newsroom/id/3598917 . Accessed 01 Aug 2017.

Geurts P, Khayat IE, Leduc G. A machine learning approach to improve congestion control over wireless computer networks. In: Data Mining, 2004. ICDM ’04. Fourth IEEE International Conference on: 2004. p. 383–6.

Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006; 63(1):3–42.

Giacinto G, Perdisci R, Del Rio M, Roli F. Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf Fusion. 2008; 9(1):69–82.

Go Y, Jamshed MA, Moon Y, Hwang C, Park K. Apunet: Revitalizing gpu as packet processing accelerator. In: NSDI.2017. p. 83–96.

Goetz P, Kumar S, Miikkulainen R. On-line adaptation of a signal predistorter through dual reinforcement learning. In: ICML.1996. p. 175–81.

Goldberger AS. Econometric computing by hand. J Econ Soc Meas. 2004; 29(1-3):115–7.

Goodfellow I, Bengio Y, Courville A. Deep Learning.MIT Press; 2016. http://www.deeplearningbook.org .

Google. Cloud TPUs - ML accelerators for TensorFlow, Google Cloud Platform. 2017. https://cloud.google.com/tpu/ . Accessed 01 Aug 2017.

Görnitz N, Kloft M, Rieck K, Brefeld U. Active learning for network intrusion detection. In: Proceedings of the 2nd ACM workshop on Security and artificial intelligence. New York: ACM: 2009. p. 47–54.

Gu Y, Grossman R. Sabul: A transport protocol for grid computing. J Grid Comput. 2003; 1(4):377–86.

Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;1157–82.

Ha S, Rhee I, Xu L. Cubic: A new tcp-friendly high-speed tcp variant. SIGOPS Oper Syst Rev. 2008; 42(5):64–74.

Habib I, Tarraf A, Saadawi T. A neural network controller for congestion control in atm multiplexers. Comput Netw ISDN Syst. 1997; 29(3):325–34.

Haffner P, Sen S, Spatscheck O, Wang D. Acas: automated construction of application signatures. In: Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data. New York: ACM: 2005. p. 197–202.

Haining W. An economical broadband network with high added value. In: Evolving the Access Network, International Engineering Consortium. Chicago: International Engineering Consortium: 2006. p. 67–70.

Hajji H. Statistical analysis of network traffic for adaptive faults detection. IEEE Trans Neural Netw. 2005; 16(5):1053–63.

Hariri B, Sadati N. Nn-red: an aqm mechanism based on neural networks. Electron Lett. 2007; 43(19):1053–5.

Harrison V, Pagliery J. Nearly 1 million new malware threats released every day. 2015. http://money.cnn.com/2015/04/14/technology/security/cyber-attack-hacks-security/index.html . Accessed 28 Dec 2017.

Hashmi US, Darbandi A, Imran A. Enabling proactive self-healing by data mining network failure logs. In: Computing, Networking and Communications (ICNC), 2017 International Conference on. Piscataway: IEEE: 2017. p. 511–7.

He L, Xu C, Luo Y. vtc: Machine learning based traffic classification as a virtual network function. In: Proceedings of the 2016 ACM International Workshop on Security in Software Defined Networks & Network Function Virtualization. ACM: 2016. p. 53–56.

He Q, Shayman MA. Using reinforcement learning for proactive network fault management. In: Proceedings of the International Conference on Communication Technologies.1999.

Hebb D. The Organization of Behavior: A Neuropsychological Theory. New York: Wiley; 1949.

Henderson T, Floyd S, Gurtov S, Nishida Y. sThe newreno modification to tcp’s fast recovery algorithm. RFC 6582, Internet Engineering Task Force. 2012. https://tools.ietf.org/html/rfc6582 .

Hinton GE. Training products of experts by minimizing contrastive divergence. Training. 2006;14(8).

Hinton GE, McClelland JL, Rumelhart DE. Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1 In: Rumelhart DE, McClelland JL, PDP Research Group C, editors. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA, USA, chap Distributed Representations: MIT Press: 1986. p. 77–109.

Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006; 18(7):1527–54.

Hiramatsu A. Atm communications network control by neural networks. IEEE Trans Neural Netw. 1990; 1(1):122–30.

Hiramatsu A. Integration of atm call admission control and link capacity control by distributed neural networks. IEEE J Sel Areas Commun. 1991; 9(7):1131–8.

Ho TK. Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1, IEEE Computer Society, Washington, DC, USA, ICDAR ’95. Piscataway: IEEE: 1995. p. 278.

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997; 9(8):1735–80.

Hood CS, Ji C. Proactive network-fault detection. IEEE Trans Reliab. 1997; 46(3):333–41.

de Hoon M, Imoto S, Nolan J, Miyano S. Open source clustering software. Bioinformatics. 2004; 20(9):1453–4.

Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci. 1982; 79(8):2554–8. http://www.pnas.org/content/79/8/2554.full.pdf .

Hornik K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991; 4(2):251–7.

Hu T, Fei Y. Qelar: a machine-learning-based adaptive routing protocol for energy-efficient and lifetime-extended underwater sensor networks. IEEE Trans Mob Comput. 2010; 9(6):796–809.

Hu W, Hu W, Maybank S. Adaboost-based algorithm for network intrusion detection. IEEE Transactions on Systems, Man, and Cybernetics. Part B (Cybernetics). 2008; 38(2):577–83.

Huang YS, Suen CY. A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans Pattern Anal Mach Intell. 1995; 17(1):90–4.

Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat’s striate cortex. J Physiol. 1959; 148(3):574–91.

Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol. 1962; 160(1):106–54.

IMPACT Cyber Trust. Information Marketplace for Policy and Analysis of Cyber-risk and Trust. 2017. https://www.impactcybertrust.org . Accessed 01 Aug 2017.

Information Sciences Institute. The network simulator ns-2. 2014. http://www.isi.edu/nsnam/ns/ . Accessed 10 Oct 2017.

Ingham KL, Inoue H. Comparing anomaly detection techniques for http. In: International Workshop on Recent Advances in Intrusion Detection (RAID). Berlin: Springer: 2007. p. 42–62.

Intanagonwiwat C, Govindan R, Estrin D, Heidemann J, Silva F. Directed diffusion for wireless sensor networking. IEEE/ACM Trans Netw (ToN). 2003; 11(1):2–16.

International Telecommunications Union. G.107 : The E-model: a computational model for use in transmission planning. 2008. https://www.itu.int/rec/T-REC-G.107 . Accessed 28 Dec 2017.

Internet Assigned Numbers Authority. IANA. 2017. https://www.iana.org/ . Accessed 01 Aug 2017.

Internet Engineering TaskForce. n Architecture for Describing Simple Network Management Protocol (SNMP) Management Frameworks. 2002. https://tools.ietf.org/html/rfc3411 . Accessed 01 Aug 2017.

Internet Engineering Task Force. Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information. 2008. https://tools.ietf.org/html/rfc5101 . Accessed 01 Aug 2017.

Ivakhnenko A, Lapa V, ENGINEERING PULISOE. Cybernetic Predicting Devices. Purdue University School of Electrical Engineering; 1965.

Iyengar J, Swett I. QUIC: A UDP-Based Secure and Reliable Transport for HTTP/2. Tech. rep. Network Working Group; 2015. https://tools.ietf.org/pdf/draft-tsvwg-quic-protocol-00.pdf .

Jain A, Karandikar A, Verma R. An adaptive prediction based approach for congestion estimation in active queue management (apace). In: Global Telecommunications Conference, 2003. GLOBECOM ’03. IEEE, vol. 7. Piscataway: IEEE: 2003. p. 4153–7.

Javaid A, Niyaz Q, Sun W, Alam M. A deep learning approach for network intrusion detection system. In: Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), ICST, (Institute for Computer Sciences Social-Informatics and Telecommunications Engineering). Brussels: 2016. p. 21–6.

Jayaraj A, Venkatesh T, Murthy CSR. Loss classification in optical burst switching networks using machine learning techniques: improving the performance of tcp. IEEE J Sel Areas Commun. 2008; 26(6):45–54.

Jaynes ET. Information theory and statistical mechanics. Phys Rev. 1957a; 106:620–30.

Jaynes ET. Information theory and statistical mechanics. ii. Phys Rev. 1957b; 108:171–90.

Jennings A. A learning system for communications network configuration. Eng Appl Artif Intell. 1988; 1(3):151–60.

Jiang H, Moore AW, Ge Z, Jin S, Wang J. Lightweight application classification for network management. In: Proceedings of the 2007 SIGCOMM workshop on Internet network management. ACM: 2007. p. 299–304.

Jiang H, Luo Y, Zhang Q, Yin M, Wu C. Tcp-gvegas with prediction and adaptation in multi-hop ad hoc networks. Wirel Netw. 2017; 23(5):1535–48.

Jiang S, Song X, Wang H, Han JJ, Li QH. A clustering-based method for unsupervised intrusion detections. Pattern Recog Lett. 2006; 27(7):802–10.

Jin Y, Duffield N, Haffner P, Sen S, Zhang ZL. Inferring applications at the network layer using collective traffic statistics. In: Teletraffic Congress (ITC), 2010 22nd International. IEEE: 2010. p. 1–8.

Jin Y, Duffield N, Erman J, Haffner P, Sen S, Zhang ZL. A modular machine learning system for flow-level traffic classification in large networks. ACM Trans Knowl Discov Data (TKDD). 2012; 6(1):4.

Jing N, Yang M, Cheng S, Dong Q, Xiong H. An efficient svm-based method for multi-class network traffic classification. In: Performance Computing and Communications Conference (IPCCC), 2011 IEEE 30th International. IEEE: 2011. p. 1–8.

Joachims T. SVMlight. 1999. http://svmlight.joachims.org/ . Accessed 28 Dec 2017.

Johnsson A, Meirosu C. Towards automatic network fault localization in real time using probabilistic inference. In: Integrated Network Management (IM 2013), 2013 IFIP/IEEE International Symposium on. Piscataway: IEEE: 2013. p. 1393–8.

Jordan MI. Serial order: A parallel distributed processing approach. Tech. Rep. ICS Report 8604. San Diego: University of California; 1986.

Juniper Research. Cybercrime will cost businesses over $2 trillion by 2019. 2015. https://www.juniperresearch.com/press/press-releases/cybercrime-cost-businesses-over-2trillion . Accessed 10 Nov 2017.

Karagiannis T, Broido A, Brownlee N, Claffy KC, Faloutsos M. Is p2p dying or just hiding?[p2p traffic measurement]. In: IEEE Global Telecommunications Conference (GLOBECOM), vol. 3.2004. p. 1532–8.

Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: multilevel traffic classification in the dark. ACM SIGCOMM Comput Commun Rev. 2005; 35(4):229–40.

Karami A. Accpndn: Adaptive congestion control protocol in named data networking by learning capacities using optimized time-lagged feedforward neural network. J Netw Comput Appl. 2015; 56(Supplement C):1–18.

Lab Kaspersky. Damage control: The cost of security breaches. IT security risks special report series Report, Kaspersky. 2015. https://media.kaspersky.com/pdf/it-risks-survey-report-cost-of-security-breaches.pdf . Accessed 10 Nov 2017.

Kayacik HG, Zincir-Heywood AN, Heywood MI. On the capability of an SOM-based intrusion detection system. In: Proceedings of the International Joint Conference on Neural Networks. New York: IEEE: 2003. p. 1808–13.

Kelley HJ. Gradient theory of optimal flight paths. ARS J. 1960; 30(10):947–54.

Kennedy J, Eberhart R. Particle swarm optimization. In: Neural Networks, 1995. Proceedings., IEEE International Conference on, vol 4.1995. p. 1942–8.

Khan A, Sun L, Ifeachor E. Content-based video quality prediction for mpeg4 video streaming over wireless networks. J Multimedia. 2009a;4(4).

Khan A, Sun L, Ifeachor E. Content clustering based video quality prediction model for mpeg4 video streaming over wireless networks.IEEE; 2009b, pp. 1–5.

Khanafer RM, Solana B, Triola J, Barco R, Moltsen L, Altman Z, Lazaro P. Automated diagnosis for umts networks using bayesian network approach. IEEE Trans veh technol. 2008; 57(4):2451–61.

Khayat IE, Geurts P, Leduc G. Machine-learnt versus analytical models of tcp throughput. Comput Netw. 2007; 51(10):2631–44.

Khorsandroo S, Noor RM, Khorsandroo S. The role of psychophysics laws in quality of experience assessment: a video streaming case study. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics. ACM: 2012. p. 446–52.

Khorsandroo S, Md Noor R, Khorsandroo S. A generic quantitative relationship to assess interdependency of qoe and qos. KSII Trans Internet Inf Syst. 2013;7(2).

Kiciman E, Fox A. Detecting application-level failures in component-based internet services. IEEE Trans Neural Netw. 2005; 16(5):1027–41.

Kim DS, Nguyen HN, Park JS. Genetic algorithm to improve svm based network intrusion detection system. New York: IEEE; 2005, pp. 155–8.

Kim H, Fomenkov M, claffy kc, Brownlee N, Barman D, Faloutsos M. Comparison of internet traffic classification tools. In: IMRG Workshop on Application Classification and Identification: 2007. p. 1–2.

Kim H, Claffy KC, Fomenkov M, Barman D, Faloutsos M, Lee K. Internet traffic classification demystified: myths, caveats, and the best practices.ACM; 2008, p. 11.

Kim J, Kim J, Thu HLT, Kim H. Long short term memory recurrent neural network classifier for intrusion detection. International Conference on.IEEE; 2016, pp. 1–5.

Klaine PV, Imran MA, Onireti O, Souza RD. A survey of machine learning techniques applied to self organizing cellular networks. IEEE Commun Surv Tutor. 2017; PP(99):1.

Kogeda OP, Agbinya JI, Omlin CW. A probabilistic approach to faults prediction in cellular networks.IEEE; 2006, p. 130.

Kogeda P, Agbinya J. Prediction of faults in cellular networks using bayesian network model.UTS ePress; 2006.

Kohonen T. Self-organized formation of topologically correct feature maps. Biol Cybern. 1982; 43(1):59–69.

Kohonen T, Hynninen J, Kangas J, Laaksonen J. Som pak: The self-organizing map program package. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science.1996.

Kotz D, Henderson T, Abyzov I, Yeo J. CRAWDAD dataset dartmouth/campus (v. 2009-09-09). 2009. https://crawdad.org/dartmouth/campus/20090909 . Accessed 28 Dec 2017.

Kruegel C, Toth T. Using decision trees to improve signature-based intrusion detection. In: Recent Advances in Intrusion Detection. Springer: 2003. p. 173–91.

Kumano Y, Ata S, Nakamura N, Nakahira Y, Oka I. Towards real-time processing for application identification of encrypted traffic. In: International Conference on Computing, Networking and Communications (ICNC): 2014. p. 136–40.

Kumar S, Miikkulainen R. Dual reinforcement q-routing: An on-line adaptive routing algorithm. In: Proceedings of the artificial neural networks in engineering Conference.1997. p. 231–8.

Kumar Y, Farooq H, Imran A. Fault prediction and reliability analysis in a real cellular network. In: Wireless Communications and Mobile Computing Conference (IWCMC), 2017 13th International. IEEE: 2017. p. 1090–1095.

Labs ML. Kdd cup 1998 data. 1998. https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html . Accessed 28 Dec 2017.

Labs ML. Kdd cup 1999 data. 1999. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html . Accessed 28 Dec 2017.

Lagoudakis MG, Parr R. Model-free least-squares policy iteration. In: Advances in neural information processing systems.2002. p. 1547–54.

Lal TN, Chapelle O, Weston J, Elisseeff A. Embedded methods. In: Feature extraction. Springer: 2006. p. 137–65.

Lapedes AS, Farber RM. How neural nets work.1987.

Laplace PS. Théorie analytique des probabilités. Paris: Courcier; 1812.

Lawrence Berkeley NationalLaboratoryandICSI. LBNL/ICSI Enterprise Tracing Project. 2005. http://www.icir.org/enterprise-tracing/ . Accessed 01 Aug 2017.

Le Cun Y. Learning Process in an Asymmetric Threshold Network. Berlin, Heidelberg: Springer Berlin Heidelberg; 1986, pp. 233–40.

Lee SJ, Hou CL. A neural-fuzzy system for congestion control in atm networks. IEEE Transactions on Systems, Man, and Cybernetics. Part B (Cybernetics). 2000; 30(1):2–9.

Leela-Amornsin L, Esaki H. Heuristic congestion control for message deletion in delay tolerant network. In: Proceedings of the Third Conference on Smart Spaces and Next Generation Wired, and 10th International Conference on Wireless Networking, Springer-Verlag, Berlin, Heidelberg, ruSMART/NEW2AN’10: 2010. p. 287–98.

Legendre A. Nouvelles méthodes pour la détermination des orbites des comètes. Nineteenth Century Collections Online (NCCO): Science, Technology, and Medicine: 1780-1925, F. Didot. 1805.

Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017; 18(17):1–5.

Leonardi E, Mellia M, Horvath A, Muscariello L, Niccolini S, Rossi D, Young K. Building a cooperative p2p-tv application over a wise network: the approach of the european fp-7 strep napa-wine. IEEE Commun Mag. 2008; 46(4):20–2.

Li F, Sun J, Zukerman M, Liu Z, Xu Q, Chan S, Chen G, Ko KT. A comparative simulation study of tcp/aqm systems for evaluating the potential of neuron-based aqm schemes. J Netw Comput Appl. 2014; 41(Supplement C):274–99.

Li W, Moore AW. A machine learning approach for efficient traffic classification. In: Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007. MASCOTS’07. 15th International, Symposium on. IEEE: 2007. p. 310–7.

Li W, Zhou F, Meleis W, Chowdhury K. Learning-based and data-driven tcp design for memory-constrained iot. In: 2016 International Conference on Distributed Computing in Sensor Systems (DCOSS).2016a. p. 199–205.

Li Y, Guo L. An active learning based tcm-knn algorithm for supervised network intrusion detection. Comput Secur. 2007; 26(7):459–67.

Li Y, Ma R, Jiao R. A hybrid malicious code detection method based on deep learning. Methods. 2015;9(5).

Li Y, Liu H, Yang W, Hu D, Xu W. Inter-data-center network traffic prediction with elephant flows: IEEE; 2016b, pp. 206–13.

Lin LJ. Reinforcement learning for robots using neural networks. PhD thesis. Pittsburgh, PA, USA: Carnegie Mellon University; 1992. uMI Order No. GAX93-22750.

Lin SC, Akyildiz IF, Wang P, Luo M. Qos-aware adaptive routing in multi-layer hierarchical software defined networks: a reinforcement learning approach. In: Services Computing (SCC) 2016, IEEE International Conference on. IEEE: 2016. p. 25–33.

Lin Z, van der Schaar M. Autonomic and distributed joint routing and power control for delay-sensitive applications in multi-hop wireless networks. IEEE Trans Wirel Commun. 2011; 10(1):102–13.

Lincoln Laboratory MIT. DARPA Intrusion Detection Evaluation. 1999. https://www.ll.mit.edu/ideval/data/1999data.html . Accessed 01 Aug 2017.

Littlestone N, Warmuth MK. The weighted majority algorithm. In: 30th Annual Symposium on Foundations of Computer Science.1989. p. 256–61.

Littman M, Boyan J. A distributed reinforcement learning scheme for network routing. In: Proceedings of the international workshop on applications of neural networks to telecommunications. Psychology Press: 1993. p. 45–51.

Liu D, Zhang Y, Zhang H. A self-learning call admission control scheme for cdma cellular networks. IEEE trans neural netw. 2005; 16(5):1219–28.

Liu J, Matta I, Crovella M. End-to-end inference of loss nature in a hybrid wired/wireless environment.2003.

Liu Y, Li W, Li YC. Network traffic classification using k-means clustering: IEEE; 2007. pp. 360–5.

Liu YC, Douligeris C. Static vs. adaptive feedback congestion controller for atm networks. In: Global Telecommunications Conference, 1995. GLOBECOM ’95., vol 1. IEEE: 1995. p. 291–5.

Lu X, Wang H, Zhou R, Ge B. Using hessian locally linear embedding for autonomic failure prediction. In: Nature & Biologically Inspired, Computing, 2009. NaBIC 2009. World Congress on. IEEE: 2009. p. 772–6.

Ma J, Levchenko K, Kreibich C, Savage S, Voelker GM. Unexpected means of protocol inference. In: Proceedings of the 6th ACM SIGCOMM conference on Internet measurement.2006. p. 313–26.

Machado VA, Silva CN, Oliveira RS, Melo AM, Silva M, Francês CR, Costa JC, Vijaykumar NL, Hirata CM. A new proposal to provide estimation of qos and qoe over wimax networks: An approach based on computational intelligence and discrete-event simulation.IEEE; 2011, pp. 1–6.

Machine Learning Group, University of Waikato. WEKA. 2017. http://www.cs.waikato.ac.nz/ml/weka/ . Accessed 01 Aug 2017.

Macleish KJ. Mapping the integration of artificial intelligence into telecommunications. IEEE J Sel Areas Commun. 1988; 6(5):892–8.

MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, University of California Press, Berkeley, 1: 1967. p. 281–97.

Mahmoud Q. Cognitive Networks: Towards Self-Aware Networks.Wiley-Interscience; 2007.

Malware-Traffic-Analysisnet. A source for pcap files and malware samples. 2017. http://www.malware-traffic-analysis.net . Accessed 15 Dec 2017.

Manzoor J, Drago I, Sadre R. The curious case of parallel connections in http/2. In: 12th International Conference on Network and Service Management (CNSM).2016. p. 174–80.

Mao H, Alizadeh M, Menache I, Kandula S. Resource management with deep reinforcement learning. In: HotNets.2016. p. 50–6.

Marbach P, Mihatsch O, Tsitsiklis JN. Call admission control and routing in integrated services networks using neuro-dynamic programming. IEEE J Sel areas commun. 2000; 18(2):197–208.

Markov AA. An example of statistical investigation in the text of eugene onegin illustrating coupling of tests in chains. In: Proceedings of the Royal Academy of Sciences of St. Petersburg, St. Petersburg, Rusia, vol 1.1913. p. 153.

Maron ME. Automatic indexing: An experimental inquiry. J ACM. 1961; 8(3):404–17.

Masoumzadeh SS, Taghizadeh G, Meshgi K, Shiry S. Deep blue: A fuzzy q-learning enhanced active queue management scheme. In: 2009 International Conference on Adaptive and Intelligent Systems: 2009. p. 43–8.

Mathis M, Mahdavi J, Floyd S, Romanow A. Tcp selective acknowledgment options. RFC 2018, Internet Engineering Task Force.1996. https://tools.ietf.org/html/rfc2018 .

Mathis M, Semke J, Mahdavi J, Ott T. The macroscopic behavior of the tcp congestion avoidance algorithm. SIGCOMM Comput Commun Rev. 1997; 27(3):67–82.

Maxion RA. Anomaly detection for diagnosis. In: Fault-Tolerant Computing, 1990. FTCS-20. Digest of, Papers., 20th International Symposium. IEEE: 1990. p. 20–7.

McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys. 1943; 5(4):115–33.

McGregor A, Hall M, Lorier P, Brunskill J. Flow clustering using machine learning techniques. Passive and Active Netw Meas. 2004;205–14.

McKenney PE. Stochastic fairness queueing. In: INFOCOM ’90, Ninth Annual Joint Conference of the IEEE Computer and Communication Societies. The Multiple Facets of Integration. Proceedings. IEEE: 1990. p. 733–40. vol.2.

Messenger R, Mandell L. A modal search technique for predictibe nominal scale multivariate analys. J Am Stat Assoc. 1972; 67(340):768–72.

Mestres A, Rodriguez-Natal A, Carner J, Barlet-Ros P, Alarcón E, Solé M, Muntés-Mulero V, Meyer D, Barkai S, Hibbett MJ, et al. Knowledge-defined networking. ACM SIGCOMM Comput Commun Rev. 2017; 47(3):2–10.

Metasploit L. The metasploit framework. 2007. http://www.metasploit.com . Accessed 28 Dec 2017.

Meyer D. Machine Intelligence and Networks. 2016. https://www.youtube.com/watch?v=XORRw6Sqi9Y . Accessed 28 Dec 2017.

Mezzavilla M, Quer G, Zorzi M. On the effects of cognitive mobility prediction in wireless multi-hop ad hoc networks. In: 2014 IEEE International Conference on Communications (ICC): 2014. p. 1638–44.

Microsoft Corporation. Skype. 2017. https://www.skype.com/ . Accessed 01 Aug 2017.

Mignanti S, Di Giorgio A, Suraci V. A model based rl admission control algorithm for next generation networks. In: Networks, 2009. ICN’09. Eighth International Conference on. IEEE: 2009. p. 191–6.

Mijumbi R, Gorricho JL, Serrat J, Claeys M, De Turck F, Latré S. Design and evaluation of learning algorithms for dynamic resource management in virtual networks.IEEE; 2014, pp. 1–9.

Mijumbi R, Hasija S, Davy S, Davy A, Jennings B, Boutaba R. A connectionist approach to dynamic resource management for virtualised network functions. In: Network and Service Management (CNSM) 2016 12th International Conference on. IEEE: 2016. p. 1–9.

Miller ST, Busby-Earle C. Multi-perspective machine learning a classifier ensemble method for intrusion detection. In: Proceedings of the 2017 International Conference on Machine Learning and Soft Computing. ACM: 2017. p. 7–12.

Minsky M, Papert S. Perceptrons: An Introduction to Computational Geometry. Cambridge: MIT Press; 1972.

Mirza M, Sommers J, Barford P, Zhu X. A machine learning approach to tcp throughput prediction. IEEE/ACM Trans Netw. 2010; 18(4):1026–39.

Mitchell TM. Machine Learning, 1st ed. New York: McGraw-Hill, Inc.; 1997.

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature. 2015; 518(7540):529–33.

Montana DJ, Davis L. Training feedforward neural networks using genetic algorithms. In: Proceedings of the 11th International Joint Conference on Artificial Intelligence - Volume 1, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’89.1989. p. 762–7.

Moore AW, Papagiannaki K. Toward the accurate identification of network applications. In: PAM. Springer: 2005. p. 41–54.

Moore AW, Zuev D. Internet traffic classification using bayesian analysis techniques. In: ACM SIGMETRICS Performance Evaluation Review, ACM, vol 33.2005. p. 50–60.

Moradi M, Zulkernine M. A neural network based system for intrusion detection and classification of attacks. In: Proceedings of the IEEE International Conference on Advances in Intelligent Systems-Theory and Applications.2004. p. 15–8.

Morgan JN, Sonquist JA. Problems in the analysis of survey data, and a proposal. J Am Stat Assoc. 1963; 58(302):415–34.

Moustapha AI, Selmic RR. Wireless sensor network modeling using modified recurrent neural networks: Application to fault detection. IEEE Trans Instrum Meas. 2008; 57(5):981–8.

Mukkamala S, Janoski G, Sung A. Intrusion detection using neural networks and support vector machines.2002, pp. 1702–7.

Mukkamala S, Sung AH, Abraham A. Intrusion detection using ensemble of soft computing paradigms. In: Intelligent systems design and applications. Springer: 2003. p. 239–48.

Muniyandi AP, Rajeswari R, Rajaram R. Network anomaly detection by cascading k-means clustering and c4. 5 decision tree algorithm. Procedia Eng. 2012; 30:174–82.

Mushtaq MS, Augustin B, Mellouk A. Empirical study based on machine learning approach to assess the qos/qoe correlation. In: Networks and Optical Communications (NOC), 2012 17th European Conference on. IEEE: 2012. p. 1–7.

Nahm K, Helmy A, Jay Kuo CC. Tcp over multihop 802.11 networks: Issues and performance enhancement. In: Proceedings of the 6th ACM International Symposium on Mobile Ad Hoc Networking and Computing, ACM, New York, NY, USA, MobiHoc ’05: 2005. p. 277–87.

Narendra KS, Thathachar MAL. Learning automata - a survey. IEEE Transactions on Systems. Man Cybern SMC-. 1974; 4(4):323–34.

Netflix Inc. Netflix. 2017. https://www.netflix.com/ . Accessed 01 Aug 2017.

Networks and Mobile Systems Group. Resilient overlay networks RON. 2017. http://nms.csail.mit.edu/ron/ . Accessed 27 Dec 2017.

Ng AY, Jordan MI. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, MIT Press, Cambridge, MA, USA, NIPS’01.2001. p. 841–8. http://dl.acm.org/citation.cfm?id=2.980539.2980648 .

Ng B, Hayes M, Seah WKG. Developing a traffic classification platform for enterprise networks with sdn: Experiences amp; lessons learned. In: IFIP Networking Conference.2015. p. 1–9.

Nguyen T, Armitage G. Synthetic sub-flow pairs for timely and stable ip traffic identification. In: Proc. Australian Telecommunication Networks and Application Conference.2006a.

Nguyen TT, Armitage G. Training on multiple sub-flows to optimise the use of machine learning classifiers in real-world ip networks. In: Local Computer Networks, Proceedings 2006 31st IEEE Conference on. IEEE: 2006b. p. 369–76.

Nguyen TT, Armitage G, Branch P, Zander S. Timely and continuous machine-learning-based classification for interactive ip traffic. IEEE/ACM Trans Netw (TON). 2012; 20(6):1880–94.

Nguyen TTT, Armitage G. Clustering to assist supervised machine learning for real-time ip traffic classification. In: IEEE International Conference on Communications.2008a. p. 5857–62.

Nguyen TTT, Armitage G. A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surv Tutor. 2008b; 10(4):56–76.

Nichols K, Jacobson V. Controlling queue delay. Queue. 2012; 10(5):20:20–20:34.

NMapWin. Nmap security scanner. 2016. http://nmapwin.sourceforge.net/ . Accessed 1 Mar 2017.

NVIDIA. Graphics Processing Unit (GPU). 2017. http://www.nvidia.com/object/gpu.html . Accessed 01 Aug 2017.

Padhye J, Firoiu V, Towsley D, Kurose J. Modeling tcp throughput: A simple model and its empirical validation. In: Proceedings of the ACM SIGCOMM ’98 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, ACM, New York, SIGCOMM ’98.1998. p. 303–14.

Pan ZS, Chen SC, Hu GB, Zhang DQ. Hybrid neural network and c4.5 for misuse detection.2003, pp. 2463–7.

Panda M, Abraham A, Patra MR. A hybrid intelligent approach for network intrusion detection. Procedia Eng. 2012; 30:1–9.

Papernot N, Goodfellow I, Sheatsley R, Feinman R, McDaniel P. cleverhans v1. 0.0: an adversarial machine learning library. 2016. arXiv preprint arXiv:161000768.

Park J, Tyan HR, Kuo CCJ. Internet traffic classification for scalable qos provision. In: Multimedia and Expo, vol 2006 IEEE International Conference on. IEEE: 2006. p. 1221–4.

Parkour M. Pcap traffic patterns. 2013. http://www.mediafire.com/?a49l965nlayad . Accessed 1 Mar 2017.

Parzen E. On estimation of a probability density function and mode. Ann Math Statist. 1962; 33(3):1065–76.

Paxson V. Bro: A system for detecting network intruders in real-time. Comput Netw. 1999; 31(23-24):2435–63.

Pcap-Analysis. Malware. 2017. http://www.pcapanalysis.com . Accessed 1 Mar 2017.

Pearl J. Bayesian networks: A model of self-activated memory for evidential reasoning. Irvine: University of California; 1985, pp. 329–34.

Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference.Morgan Kaufmann; 2014.

Peddabachigari S, Abraham A, Grosan C, Thomas J. Modeling intrusion detection system using hybrid intelligent systems. J netw comput appl. 2007; 30(1):114–32.

Pellegrini A, Di Sanzo P, Avresky DR. A machine learning-based framework for building application failure prediction models. In: Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International. IEEE: 2015. p. 1072–81.

Perdisci R, Ariu D, Fogla P, Giacinto G, Lee W. Mcpad: A multiple classifier system for accurate payload-based anomaly detection. Comput netw. 2009; 53(6):864–81.

Petrangeli S, Claeys M, Latré S, Famaey J, De Turck F. A multi-agent q-learning-based framework for achieving fairness in http adaptive streaming.IEEE; 2014, pp. 1–9.

Pfahringer B. Winning the kdd99 classification cup: bagged boosting. ACM SIGKDD Explor Newsl. 2000; 1(2):65–6.

Piamrat K, Ksentini A, Viho C, Bonnin JM. Qoe-aware admission control for multimedia applications in ieee 802.11 wireless networks. In: Vehicular Technology Conference, 2008. VTC 2008-Fall. IEEE 68th. IEEE: 2008. p. 1–5.

Pietrabissa A. Admission control in umts networks based on approximate dynamic programming. Eur J control. 2008; 14(1):62–75.

Pietrabissa A, Priscoli FD, Di Giorgio A, Giuseppi A, Panfili M, Suraci V. An approximate dynamic programming approach to resource management in multi-cloud scenarios. Int J Control. 2017; 90(3):492–503.

Pinson MH, Wolf S. A new standardized method for objectively measuring video quality. IEEE Trans broadcast. 2004; 50(3):312–22.

Portnoy L, Eskin E, Stolfo S. Intrusion detection with unlabeled data using clustering. In: Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001). Citeseer: 2001.

Portoles-Comeras M, Requena-Esteso M, Mangues-Bafalluy J, Cardenete-Suriol M. Extreme: Combining the ease of management of multi-user experimental facilities and the flexibility of proof of concept testbeds. In: Testbeds and Research Infrastructures for the Development of Networks and Communities, s2006. TRIDENTCOM 2006 2nd International, Conference on. IEEE: 2006. p. 10.

Poupart P, Chen Z, Jaini P, Fung F, Susanto H, Geng Y, Chen L, Chen K, Jin H. Online flow size prediction for improved network routing.IEEE; 2016, pp. 1–6.

Prechelt L. Early stopping-but when?Neural Netw: Tricks of the trade. 1998;553.

Prevost JJ, Nagothu K, Kelley B, Jamshidi M. Prediction of cloud data center networks loads using stochastic and neural models. IEEE: 2011. p. 276–281.

Proactcouk. ISS Internet Scanner. 2017. http://www.tech.proact.co.uk/iss/iss_system_scanner.htm . Accessed 28 Dec 2017.

Puget JF. What is machine learning? (IT best kept secret is optimization). 2016. https://www.ibm.com/developerworks/community/blogs/jfp/entry/What_Is_Machine_Learning . Accessed 01 Aug 2017.

Qader K, Adda M. Fault classification system for computer networks using fuzzy probabilistic neural network classifier (fpnnc) International Conference on Engineering Applications of Neural Networks. Springer: 2014. p. 217–26.

Quer G, Meenakshisundaram H, Tamma B, Manoj BS, Rao R, Zorzi M. Cognitive network inference through bayesian network analysis.2010, pp. 1–6.

Quer G, Baldo N, Zorzi M. Cognitive call admission control for voip over ieee 802.11 using bayesian networks. In: Global Telecommunications Conference (GLOBECOM 2011), 2011 IEEE. IEEE: 2011. p. 1–6.

Quinlan J. Discovering rules form large collections of examples: A case study. Edingburgh: Expert Systems in the Micro Electronic Age Edinburgh Press; 1979.

Quinlan JR. Simplifying decision trees. Int J Man-Mach Stud. 1987; 27(3):221–34.

Quinlan JR. Learning with continuous classes. In: Proceedings of Australian Joint Conference on Artificial Intelligence, World Scientific.1992. p. 343–8.

Quinlan JR. C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc.; 1993.

Quinlan JR, et al, Vol. 1. Bagging, boosting, and c4. 5; 1996. 725–30.

Raina R, Battle A, Lee H, Packer B, Ng AY. Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th international conference on Machine learning. ACM: 2007. p. 759–66.

Ramana BV, Murthy CSR. Learning-tcp: A novel learning automata based congestion window updating mechanism for ad hoc wireless networks. In: Proceedings of the 12th International Conference on High Performance Computing, Springer-Verlag, Berlin, Heidelberg, HiPC’05.2005. p. 454–464.

Ramana BV, Manoj BS, Murthy CSR. Learning-tcp: a novel learning automata based reliable transport protocol for ad hoc wireless networks. In: 2nd International Conference on Broadband Networks 2005.2005. p. 484–493. Vol. 1.

Ranzato M, Poultney C, Chopra S, LeCun Y. Efficient learning of sparse representations with an energy-based model. In: Proceedings of the 19th International Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA, USA, NIPS’06: 2006. p. 1137–44.

Rao S. Operational fault detection in cellular wireless base-stations. IEEE Trans Netw Serv Manag. 2006; 3(2):1–11.

Reichl P, Egger S, Schatz R, D’Alconzo A. The logarithmic nature of qoe and the role of the weber-fechner law in qoe assessment.IEEE; 2010, pp. 1–5.

Riiser H, Endestad T, Vigmostad P, Griwodz C, Halvorsen P. DATASET: HSDPA-bandwidth logs for mobile HTTP streaming scenarios. 2011. http://home.ifi.uio.no/paalh/dataset/hsdpa-tcp-logs/ . Accessed 28 Dec 2017.

Riiser H, Endestad T, Vigmostad P, Griwodz C, Halvorsen P. Video streaming using a location-based bandwidth-lookup service for bitrate planning. ACM Trans Multimedia Comput Commun Appl. 2012; 8(3):24:1–24:19. https://doi.org/10.1145/2.240136.2240137 , http://doi.acm.org/10.1145/2.240136.2240137 .

Rix AW, Beerends JG, Hollier MP, Hekstra AP. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In: Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP’01). 2001 IEEE International Conference on, IEEE, vol 2.2001. p. 749–52.

Rosenblatt F. The perceptron, a perceiving and recognizing automaton. Report No. 85-460-1 Project PARA: Cornell Aeronautical Laboratory; 1957.

Rosenblatt M. Remarks on some nonparametric estimates of a density function. Ann Math Statist. 1956; 27(3):832–837.

Ross DA, Lim J, Lin RS, Yang MH. Incremental learning for robust visual tracking. Int J Comput Vision. 2008; 77(1-3):125–141.

Roughan M, Sen S, Spatscheck O, Duffield N. Class-of-service mapping for qos: a statistical signature-based approach to ip traffic classification. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement. ACM: 2004a. p. 135–148.

Roughan M, Zhang Y, Ge Z, Greenberg A. Abilene network; 2004b. http://www.maths.adelaide.edu.au/matthew.roughan/data/Abilene.tar.gz . Accessed 28 Dec 2017.

Rozhnova N, Fdida S. An effective hop-by-hop interest shaping mechanism for ccn communications. In: 2012 Proceedings IEEE INFOCOM Workshops: 2012. p. 322–327.

Ruiz M, Fresi F, Vela AP, Meloni G, Sambo N, Cugini F, Poti L, Velasco L, Castoldi P. Service-triggered failure identification/localization through monitoring of multiple parameters. In: ECOC 2016; 42nd European Conference on Optical Communication: Proceedings of. VDE: 2016. p. 1–3.

Rumelhart DE, Hinton GE, Williams RJ. Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1 In: Rumelhart DE, McClelland JL, PDP Research Group C, editors. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1. Cambridge, MA, USA, chap Learning Internal Representations by Error Propagation: MIT Press: 1986. p. 318–62.

Rummery GA, Niranjan M. On-line Q-learning using connectionist systems. CUED/F-INFENG/TR 166, Cambridge University Engineering Department. 1994.

Rüping S. mySVM. 2004. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/ . Accessed 28 Dec 2017.

Russell S, Norvig P. Artificial Intelligence: A Modern Approach, 3rd ed. Upper Saddle River: Prentice Hall Press; 2009.

Sabhnani M, Serpen G. Why machine learning algorithms fail in misuse detection on kdd intrusion detection data set. Intell data anal. 2004; 8(4):403–15.

Salakhutdinov R, Hinton G. Deep boltzmann machines. In: Artificial Intelligence and Statistics: 2009. p. 448–55.

Salama M, Eid H, Ramadan R, Darwish A, Hassanien A. Hybrid intelligent intrusion detection scheme. Soft comput ind appl. 2011;293–303.

Samuel AL. Some studies in machine learning using the game of checkers. IBM J Res Dev. 1959; 3(3):210–29.

Sangkatsanee P, Wattanapongsakorn N, Charnsripinyo C. Practical real-time intrusion detection using machine learning approaches. Comput Commun. 2011; 34(18):2227–35.

Schapire RE. The strength of weak learnability. Mach Learn. 1990; 5(2):197–227.

Schatzmann D, Mühlbauer W, Spyropoulos T, Dimitropoulos X. Digging into https: Flow-based classification of webmail traffic. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement: 2010. p. 322–27.

Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural comput. 2001; 13(7):1443–71.

Seligman M, Fall K, Mundur P. Alternative custodians for congestion control in delay tolerant networks. In: Proceedings of the 2006 SIGCOMM Workshop on Challenged Networks, CHANTS ’06.New York: ACM: 2006. p. 229–36.

Servin A, Kudenko D. Multi-agent reinforcement learning for intrusion detection: A case study and evaluation. In: German Conference on Multiagent System Technologies. Springer: 2008. p. 159–70.

Shaikh J, Fiedler M, Collange D. Quality of experience from user and network perspectives. annals of telecommun-annales des telecommun. 2010; 65(1-2):47–57.

Shbair WM, Cholez T, Francois J, Chrisment I. A multi-level framework to identify https services. In: IEEE/IFIP Network Operations and Management Symposium (NOMS)2016. p. 240–8.

Shi R, Zhang J, Chu W, Bao Q, Jin X, Gong C, Zhu Q, Yu C, Rosenberg S. Mdp and machine learning-based cost-optimization of dynamic resource allocation for network function virtualization. In: Serv Comput (SCC) 2015 IEEE International Conference on. IEEE: 2015. p. 65–73.

Shon T, Moon J. A hybrid machine learning approach to network anomaly detection. Inf Sci. 2007; 177(18):3799–821.

Silva AP, Obraczka K, Burleigh S, Hirata CM. Smart congestion control for delay- and disruption tolerant networks. In: 2016 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).2016. p. 1–9.

Smolensky P. Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1 In: Rumelhart DE, McClelland JL, PDP Research Group C, editors. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1„ MIT Press, Cambridge, MA, USA, chap Information Processing in Dynamical Systems: Foundations of Harmony Theory.1986. p. 194–281.

Snow A, Rastogi P, Weckman G. Assessing dependability of wireless networks using neural networks. In: Military Communications Conference, 2005. MILCOM 2005. IEEE. IEEE: 2005. p. 2809–15.

Sommer R, Paxson V. Outside the closed world: On using machine learning for network intrusion detection. In: Security and Privacy (SP), 2010 IEEE Symposium on, IEEE: 2010. p. 305–16.

Sondik EJ. The optimal control of partially observable markov decision processes. PhD thesis. California: Stanford University; 1971.

Soysal M, Schmidt EG. Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison. Perform Eval. 2010; 67(6):451–67.

Sprint. IP network performance. 2017. https://www.sprint.net/performance/ . Accessed 28 Dec 2017.

Srihari SN, Kuebert EJ. Integration of hand-written address interpretation technology into the united states postal service remote computer reader system. In: Proceedings of the 4th International Conference on Document Analysis and Recognition, ICDAR ’97. Washington: IEEE Computer Society: 1997. p. 892–6.

Stanfill C, Waltz D. Toward memory-based reasoning. Commun ACM. 1986; 29(12):1213–28.

Stein G, Chen B, Wu AS, Hua KA. Decision tree classifier for network intrusion detection with ga-based feature selection. In: Proceedings of the 43rd annual Southeast regional conference-Volume 2. ACM: 2005. p. 136–41.

Steinhaus H. Sur la division des corp materiels en parties. Bull Acad Polon Sci. 1956; 1:801–4.

Stigler SM. Gauss and the invention of least squares. Ann Statist. 1981; 9(3):465–74.

Stone P. Tpot-rl applied to network routing. In: ICML.2000. p. 935–42.

Stone P, Veloso M. Team-partitioned, opaque-transition reinforcement learning. In: Proceedings of the third annual conference on Autonomous Agents. ACM: 1999. p. 206–12.

Stratonovich RL. Conditional markov processes. Theory Probab Appl. 1960; 5(2):156–78.

Sun J, Zukerman M. An adaptive neuron aqm for a stable internet. In: Proceedings of the 6th International IFIP-TC6 Conference on Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet, Springer-Verlag, Berlin, Heidelberg, NETWORKING’07.2007. p. 844–54.

Sun J, Chan S, Ko Kt, Chen G, Zukerman M. Neuron pid: A robust aqm scheme. In: Proceedings of the Australian Telecommunication Networks and Applications Conference (ATNAC) 2006.2006. p. 259–62.

Sun J, Chan S, Zukerman M. Iapi: An intelligent adaptive pi active queue management scheme. Comput Commun. 2012; 35(18):2281–93.

Sun R, Tatsumi S, Zhao G. Q-map: A novel multicast routing method in wireless ad hoc networks with multiagent reinforcement learning. In: TENCON’02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering, vol 1. IEEE: 2002. p. 667–670.

Sun R, Yang B, Peng L, Chen Z, Zhang L, Jing S. Traffic classification using probabilistic neural networks. In: Natural computation (ICNC), 2010 sixth international conference on, vol. 4. IEEE: 2010. p. 1914–9.

Sun Y, Yin X, Jiang J, Sekar V, Lin F, Wang N, Liu T, Sinopoli B. Cs2p: Improving video bitrate selection and adaptation with data-driven throughput prediction. In: Proceedings of the 2016 conference on ACM SIGCOMM 2016 Conference. ACM: 2016. p. 272–85.

Sutton RS. Learning to predict by the methods of temporal differences. Mach learn. 1988; 3(1):9–44.

Sutton RS, Barto AG. A temporal-difference model of classical conditioning. In: Proceedings of the ninth annual conference of the cognitive science society. Seattle, WA: 1987. p. 355–78.

Sutton RS, Barto AG. Reinforcement Learning: An Introduction, 2nd ed. MIT Press; 2016.

Tang TA, Mhamdi L, McLernon D, Zaidi SAR, Ghogho M. Deep learning approach for network intrusion detection in software defined networking. In: Wireless Networks and Mobile Communications (WINCOM), 2016 International Conference on. IEEE: 2016. p. 258–263.

Tarraf AA, Habib IW, Saadawi TN. Congestion control mechanism for atm networks using neural networks. In: Communications 1995. ICC ’95 Seattle, ’Gateway to Globalization’ 1995 IEEE International Conference on, vol 1.1995. p. 206–10.

Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the kdd cup 99 data set.IEEE; 2009, pp. 1–6.

Telecommunication Networks Group - Politecnico di Torino. Skype testbed traces, TSTAT - TCP Statistic and Analysis Tool. 2008. http://tstat.tlc.polito.it/traces-skype.shtml . Accessed 01 Aug 2017.

Tesauro G. Practical issues in temporal difference learning. Mach Learn. 1992; 8(3):257–77.

Tesauro G. Reinforcement learning in autonomic computing: A manifesto and case studies. IEEE Internet Comput. 2007; 11(1):22–30.

Tesauro G, et al. Online resource allocation using decompositional reinforcement learning.2005. pp. 886–91.

Testolin A, Zanforlin M, De Grazia MDF, Munaretto D, Zanella A, Zorzi M, Zorzi M. A machine learning approach to qoe-based video admission control and resource allocation in wireless systems. In: Ad Hoc Networking Workshop (MED-HOC-NET), 2014 13th Annual, Mediterranean. IEEE: 2014. p. 31–38.

TFreak. Smurf tool. 2003. www.phreak.org/archives/exploits/denial/smurf.c .

Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol). 1996; 58(1):267–288.

Tong H, Brown TX. Adaptive call admission control under quality of service constraints: a reinforcement learning solution. IEEE J sel Areas Commun. 2000; 18(2):209–21.

Tsai CF, Hsu YF, Lin CY, Lin WY. Intrusion detection by machine learning: A review. Expert Systems with Applications. 2009; 36(10):11,994–12,000.

Tsetlin M. Automaton Theory and Modeling of Biological Systems. Automaton Theory and Modeling of Biological Systems.Academic Press; 1973.

Turing AM. Computing machinery and intelligence. Mind. 1950; 59(236):433–60.

UCI KDD Archive. 2005. https://kdd.ics.uci.edu/ . Accessed 01 Aug 2017.

University of California San Diego Supercomputer Center. CAIDA: Center for Applied Internet Data Analysis. 2017. http://www.caida.org . Accessed 01 Aug 2017.

Vassis D, Kampouraki A, Belsis P, Skourlas C. Admission control of video sessions over ad hoc networks using neural classifiers.IEEE; 2014, pp. 1015–20.

Vega MT, Mocanu DC, Liotta A. Unsupervised deep learning for real-time assessment of video streaming services. Multimedia Tools Appl. 2017;1–25.

Vengerov D. A reinforcement learning approach to dynamic resource allocation. Eng Appl Artif Intell. 2007; 20(3):383–90.

Viterbi A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory. 1967; 13(2):260–9.

Wagner C, François J, Engel T, et al. Machine learning approach for ip-flow record anomaly detection. In: International Conference on Research in Networking. Springer: 2011. p. 28–39.

WAND Network Research Group. WITS: Waikato Internet Traffic Storage. 2017. https://wand.net.nz/wits . Accessed 01 Aug 2017.

Wang J, Qiu Y. A new call admission control strategy for lte femtocell networks. In: 2nd international conference on advances in computer science and engineering.2013.

Wang K, Stolfo SJ. Anomalous payload-based network intrusion detection. In: RAID, vol 4. Springer: 2004. p. 203–22.

Wang M, Cui Y, Wang X, Xiao S, Jiang J. Machine learning for networking: Workflow, advances and opportunities. IEEE Netw. 2018a; 32(2):92–9.

Wang P, Wang T. Adaptive routing for sensor networks using reinforcement learning. In: Computer and Information Technology, 2006. CIT’06. The Sixth IEEE International Conference on. IEEE: 2006. p. 219.

Wang P, Lin SC, Luo M. A framework for qos-aware traffic classification using semi-supervised machine learning in sdns. In: Services Computing (SCC), 2016 IEEE International Conference on. IEEE: 2016. p. 760–5.

Wang R, Valla M, Sanadidi MY, Ng BKF, Gerla M. Efficiency/friendliness tradeoffs in TCP westwood. In: Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.2002. p. 304–11.

Wang R, Liu Y, Yang Y, Zhou X. Solving the app-level classification problem of p2p traffic via optimized support vector machines. In: Intelligent Systems Design and Applications, 2006. ISDA’06. Sixth International Conference on, IEEE, vol 2: 2006. p. 534–9.

Wang X, Zhang Q, Ren J, Xu S, Wang S, Yu S. Toward efficient parallel routing optimization for large-scale sdn networks using gpgpu. J Netw Comput Appl. 2018b.

Wang Y, Martonosi M, Peh LS. Predicting link quality using supervised learning in wireless sensor networks. ACM SIGMOBILE Mob Comput Commun Rev. 2007; 11(3):71–83.

Wang Y, Xiang Y, Yu S. Internet traffic classification using machine learning: a token-based approach. IEEE: 2011. p. 285–9.

Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE trans image process. 2004; 13(4):600–12.

Wang Z, Zhang M, Wang D, Song C, Liu M, Li J, Lou L, Liu Z. Failure prediction using machine learning and time series in optical network. Optics Express. 2017; 25(16):18,553–18,565.

Watanabe A, Ishibashi K, Toyono T, Kimura T, Watanabe K, Matsuo Y, Shiomoto K. Workflow extraction for service operation using multiple unstructured trouble tickets.IEEE; 2016, pp. 652–8.

Watkins CJ. Models of delayed reinforcement learning. PhD thesis: Psychology Department, Cambridge University; 1989.

Werbos P. Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis: Harvard University; 1975.

Wernecke KD. A coupling procedure for the discrimination of mixed data. Biometrics. 1992; 48(2):497–506.

WIDE Project. MAWI Working Group Traffic Archive. 2017. http://mawi.wide.ad.jp/mawi . Accessed 01 Aug 2017.

Williams M. Net tools 5. 2011. https://www.techworld.com/download/networking-tools/net-tools-5-3248881/ . Accessed 1 Mar 2017.

Williams N, Zander S, Armitage G. A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. ACM SIGCOMM Comput Commun Rev. 2006; 36(5):5–16.

Winstein K, Balakrishnan H. Tcp ex machina: Computer-generated congestion control. In: Proceedings of the ACM SIGCOMM 201 Conference on SIGCOMM, SIGCOMM ’13. New York: ACM: 2013. p. 123–34.

Witten IH. An adaptive optimal controller for discrete-time markov environments. Inf Control. 1977; 34(4):286–95.

Wolpert DH, Tumer K, Frank J. Using collective intelligence to route internet traffic. Adv neural inf process syst. 1999;952–60.

Wolski R. Dynamically forecasting network performance using the network weather service. Cluster Comput. 1998; 1(1):119–32.

Wu C, Meleis W. Fuzzy kanerva-based function approximation for reinforcement learning. In: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS).2009. p. 1257–8.

Xia B, Wahab MH, Yang Y, Fan Z, Sooriyabandara M. Reinforcement learning based spectrum-aware routing in multi-hop cognitive radio networks. In: Cognitive Radio Oriented Wireless Networks and Communications, 2009. CROWNCOM’09. 4th International, Conference on. IEEE: 2009. p. 1–5.

Xu K, Tian Y, Ansari N. Tcp-jersey for wireless ip communications. IEEE J Sel Areas Commun. 2004; 22(4):747–56.

Xu L, Krzyzak A, Suen CY. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems. Man Cybern. 1992; 22(3):418–35.

Yan Q, Lei Q. A new active queue management algorithm based on self-adaptive fuzzy neural-network pid controller. In: 2011 International Conference on, Internet Technology and Applications: 2011. p. 1–4.

Yang P, Luo W, Xu L, Deogun J, Lu Y. Tcp congestion avoidance algorithm identification. In: 2011 31st International Conference on Distributed Computing Systems.2011. p. 310–21.

Yi C, Afanasyev A, Moiseenko I, Wang L, Zhang B, Zhang L. A case for stateful forwarding plane. Comput Commun. 2013; 36(7):779–791.

YouTube LLC. YouTube. 2017. https://www.youtube.com/ . Accessed 01 Aug 2017.

Yu E, Chen CR. Traffic prediction using neural networks. In: Proceedings of IEEE GLOBECOM. IEEE: 1993. p. 991–5.

Yu X, Qiao C, Liu Y. Tcp implementations and false time out detection in obs networks. In: IEEE INFOCOM 2004, vol 2.2004. p. 774–84.

Zalewski M, Stearns W. p0f. 2014. http://lcamtuf.coredump.cx/p0f . Accessed 1 Mar 2017.

Zander S, Nguyen T, Armitage G. Automated traffic classification and application identification using machine learning.IEEE; 2005, pp. 250–7.

Zanero S, Savaresi SM. Unsupervised learning techniques for an intrusion detection system: ACM; 2004, pp. 412–9.

Zhang C, Jiang J, Kamel M. Intrusion detection using hierarchical neural networks. Pattern Recogn Lett. 2005; 26(6):779–91.

Zhang J, Zulkernine M. Anomaly based network intrusion detection with unsupervised outlier detection. In: Communications, 2006. ICC’06. IEEE International Conference on. IEEE: 2006. p. 2388–93.

Zhang J, Chen C, Xiang Y, Zhou W, Xiang Y. Internet traffic classification by aggregating correlated naive bayes predictions. IEEE Trans Inf Forensic Secur. 2013; 8(1):5–15.

Zhang J, Chen X, Xiang Y, Zhou W, Wu J. Robust network traffic classification. IEEE/ACM Trans Netw (TON). 2015; 23(4):1257–70.

Zhani MF, Elbiaze H, Kamoun F. α _snfaqm: an active queue management mechanism using neurofuzzy prediction. In: 2007 12th IEEE Symposium on Computers and Communications.2007. p. 381–6.

Zhou C, Di D, Chen Q, Guo J. An adaptive aqm algorithm based on neuron reinforcement learning. In: 2009 IEEE International Conference on Control and Automation: 2009. p. 1342–6.

Zhu Y, Zhang G, Qiu J. Network traffic prediction based on particle swarm bp neural network. JNW. 2013; 8(11):2685–91.

Zineb AB, Ayadi M, Tabbane S. Cognitive radio networks management using an anfis approach with qos/qoe mapping scheme. IEEE; 2015, pp. 1–6.

Download references

Acknowledgments

We thank the anonymous reviewers for their insightful comments and suggestions that helped us improve the quality of the paper. This work is supported in part by the Royal Bank of Canada, NSERC Discovery Grants Program, the Quebec FRQNT postdoctoral research fellowship, the ELAP scholarship, and the COLCIENCIAS Scholarship Program No. 647-2014, Colombia.

Author information

Authors and affiliations.

David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada

Raouf Boutaba, Mohammad A. Salahuddin, Noura Limam, Sara Ayoubi, Nashid Shahriar & Felipe Estrada-Solano

Department of Telematics, University of Cauca, Popayan, Colombia

Felipe Estrada-Solano & Oscar M. Caicedo

You can also search for this author in PubMed   Google Scholar

Contributions

All authors read and approved the final manuscript.

Corresponding author

Correspondence to Raouf Boutaba .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Boutaba, R., Salahuddin, M.A., Limam, N. et al. A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. J Internet Serv Appl 9 , 16 (2018). https://doi.org/10.1186/s13174-018-0087-2

Download citation

Received : 03 January 2018

Accepted : 04 May 2018

Published : 21 June 2018

DOI : https://doi.org/10.1186/s13174-018-0087-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Traffic prediction
  • Traffic classification
  • Traffic routing
  • Congestion control
  • Resource management
  • Fault management
  • QoS and QoE management
  • Network security

networking research papers

To read this content please select one of the options below:

Please note you do not have access to teaching notes, understanding the role of networking in organizations.

Career Development International

ISSN : 1362-0436

Article publication date: 6 May 2014

The purpose of this paper is to review and synthesize research and theory on the definition, antecedents, outcomes, and mechanisms of networking in organizations.

Design/methodology/approach

Descriptions of networking are reviewed and an integrated definition of networking in organizations is presented. Approaches for measuring and studying networking are considered and the similarities and differences of networking with related constructs are discussed. A theoretical model of the antecedents and outcomes of networking is presented with the goal of integrating existing networking research. Mechanisms through which networking leads to individual and organizational outcomes are also considered.

Networking is defined as goal-directed behavior which occurs both inside and outside of an organization, focussed on creating, cultivating, and utilizing interpersonal relationships. The current model proposes that networking is influenced by a variety of individual, job, and organizational level factors and leads to increased visibility and power, job performance, organizational access to strategic information, and career success. Access to information and social capital are proposed as mechanisms that facilitate the effects of networking on outcomes.

Originality/value

Networking is held to be of great professional value for ambitious individuals and organizations. However, much of the research on networking has been spread across various disciplines. Consequentially, consensus on many important topics regarding networking remains notably elusive. This paper reviews and integrates existing research on networking in organizations and proposes directions for future study. A comprehensive definition and model of networking is presented and suggestions to researchers are provided.

  • Social network

Gibson, C. , H. Hardy III, J. and Ronald Buckley, M. (2014), "Understanding the role of networking in organizations", Career Development International , Vol. 19 No. 2, pp. 146-161. https://doi.org/10.1108/CDI-09-2013-0111

Emerald Group Publishing Limited

Copyright © 2014, Emerald Group Publishing Limited

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

networking research papers

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

A Better Approach to Networking

  • Christie Hunter Arscott

networking research papers

Move from “small talk” to “deep talk.”

Meeting strangers — especially in the context of work — is uncomfortable for most people. Just the thought of networking can provoke discomfort and anxiety. As humans, we have an innate need to be liked. Research shows we even have a tendency to connect our self-worth to the number of people who like us.

  • Because of this, many struggle with performance anxiety when it comes to networking. You’re afraid you might say the wrong thing, forget what you were going to say, or stumble over your words. The result would be the other person not liking you. But there’s a better way to network.
  • Focus on what you’re going to ask, not what going to say. Instead of preparing what you’d say when meeting someone new or how you’d respond to questions from a stranger, focus on what you’d ask in those same scenarios.
  • Then, practice moving from small talk to deep talk. We tend to underestimate how much other people, and especially first-time contacts, might enjoy and find satisfaction in meaningful conversation. Instead of asking “Where are you from?” ask, “What places have you lived in and traveled to during your educational and career journey that have shaped who you are?”

Do you shy away from talking to new people at networking events? Have you ever walked into the room and felt a tightness in your chest as you stood there, sweating, wondering what to say? Most of us can relate to some version of this experience. Meeting strangers — especially in the context of work — is uncomfortable. Just the thought of networking can provoke discomfort and anxiety.

networking research papers

  • Christie Hunter Arscott is an award-winning advisor, speaker, and author of the book Begin Boldly: How Women Can Reimagine Risk, Embrace Uncertainty, and Launch A Brilliant Career . A Rhodes Scholar, Christie has been named by Thinkers50 as one of the top management thinkers likely to shape the future of business.

Partner Center

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Sensors (Basel)

Logo of sensors

Study and Investigation on 5G Technology: A Systematic Review

Ramraj dangi.

1 School of Computing Science and Engineering, VIT University Bhopal, Bhopal 466114, India; [email protected] (R.D.); [email protected] (P.L.)

Praveen Lalwani

Gaurav choudhary.

2 Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Lyngby, Denmark; moc.liamg@7777yrahduohcvaruag

3 Department of Information Security Engineering, Soonchunhyang University, Asan-si 31538, Korea

Giovanni Pau

4 Faculty of Engineering and Architecture, Kore University of Enna, 94100 Enna, Italy; [email protected]

Associated Data

Not applicable.

In wireless communication, Fifth Generation (5G) Technology is a recent generation of mobile networks. In this paper, evaluations in the field of mobile communication technology are presented. In each evolution, multiple challenges were faced that were captured with the help of next-generation mobile networks. Among all the previously existing mobile networks, 5G provides a high-speed internet facility, anytime, anywhere, for everyone. 5G is slightly different due to its novel features such as interconnecting people, controlling devices, objects, and machines. 5G mobile system will bring diverse levels of performance and capability, which will serve as new user experiences and connect new enterprises. Therefore, it is essential to know where the enterprise can utilize the benefits of 5G. In this research article, it was observed that extensive research and analysis unfolds different aspects, namely, millimeter wave (mmWave), massive multiple-input and multiple-output (Massive-MIMO), small cell, mobile edge computing (MEC), beamforming, different antenna technology, etc. This article’s main aim is to highlight some of the most recent enhancements made towards the 5G mobile system and discuss its future research objectives.

1. Introduction

Most recently, in three decades, rapid growth was marked in the field of wireless communication concerning the transition of 1G to 4G [ 1 , 2 ]. The main motto behind this research was the requirements of high bandwidth and very low latency. 5G provides a high data rate, improved quality of service (QoS), low-latency, high coverage, high reliability, and economically affordable services. 5G delivers services categorized into three categories: (1) Extreme mobile broadband (eMBB). It is a nonstandalone architecture that offers high-speed internet connectivity, greater bandwidth, moderate latency, UltraHD streaming videos, virtual reality and augmented reality (AR/VR) media, and many more. (2) Massive machine type communication (eMTC), 3GPP releases it in its 13th specification. It provides long-range and broadband machine-type communication at a very cost-effective price with less power consumption. eMTC brings a high data rate service, low power, extended coverage via less device complexity through mobile carriers for IoT applications. (3) ultra-reliable low latency communication (URLLC) offers low-latency and ultra-high reliability, rich quality of service (QoS), which is not possible with traditional mobile network architecture. URLLC is designed for on-demand real-time interaction such as remote surgery, vehicle to vehicle (V2V) communication, industry 4.0, smart grids, intelligent transport system, etc. [ 3 ].

1.1. Evolution from 1G to 5G

First generation (1G): 1G cell phone was launched between the 1970s and 80s, based on analog technology, which works just like a landline phone. It suffers in various ways, such as poor battery life, voice quality, and dropped calls. In 1G, the maximum achievable speed was 2.4 Kbps.

Second Generation (2G): In 2G, the first digital system was offered in 1991, providing improved mobile voice communication over 1G. In addition, Code-Division Multiple Access (CDMA) and Global System for Mobile (GSM) concepts were also discussed. In 2G, the maximum achievable speed was 1 Mpbs.

Third Generation (3G): When technology ventured from 2G GSM frameworks into 3G universal mobile telecommunication system (UMTS) framework, users encountered higher system speed and quicker download speed making constant video calls. 3G was the first mobile broadband system that was formed to provide the voice with some multimedia. The technology behind 3G was high-speed packet access (HSPA/HSPA+). 3G used MIMO for multiplying the power of the wireless network, and it also used packet switching for fast data transmission.

Fourth Generation (4G): It is purely mobile broadband standard. In digital mobile communication, it was observed information rate that upgraded from 20 to 60 Mbps in 4G [ 4 ]. It works on LTE and WiMAX technologies, as well as provides wider bandwidth up to 100 Mhz. It was launched in 2010.

Fourth Generation LTE-A (4.5G): It is an advanced version of standard 4G LTE. LTE-A uses MIMO technology to combine multiple antennas for both transmitters as well as a receiver. Using MIMO, multiple signals and multiple antennas can work simultaneously, making LTE-A three times faster than standard 4G. LTE-A offered an improved system limit, decreased deferral in the application server, access triple traffic (Data, Voice, and Video) wirelessly at any time anywhere in the world.LTE-A delivers speeds of over 42 Mbps and up to 90 Mbps.

Fifth Generation (5G): 5G is a pillar of digital transformation; it is a real improvement on all the previous mobile generation networks. 5G brings three different services for end user like Extreme mobile broadband (eMBB). It offers high-speed internet connectivity, greater bandwidth, moderate latency, UltraHD streaming videos, virtual reality and augmented reality (AR/VR) media, and many more. Massive machine type communication (eMTC), it provides long-range and broadband machine-type communication at a very cost-effective price with less power consumption. eMTC brings a high data rate service, low power, extended coverage via less device complexity through mobile carriers for IoT applications. Ultra-reliable low latency communication (URLLC) offers low-latency and ultra-high reliability, rich quality of service (QoS), which is not possible with traditional mobile network architecture. URLLC is designed for on-demand real-time interaction such as remote surgery, vehicle to vehicle (V2V) communication, industry 4.0, smart grids, intelligent transport system, etc. 5G faster than 4G and offers remote-controlled operation over a reliable network with zero delays. It provides down-link maximum throughput of up to 20 Gbps. In addition, 5G also supports 4G WWWW (4th Generation World Wide Wireless Web) [ 5 ] and is based on Internet protocol version 6 (IPv6) protocol. 5G provides unlimited internet connection at your convenience, anytime, anywhere with extremely high speed, high throughput, low-latency, higher reliability and scalability, and energy-efficient mobile communication technology [ 6 ]. 5G mainly divided in two parts 6 GHz 5G and Millimeter wave(mmWave) 5G.

6 GHz is a mid frequency band which works as a mid point between capacity and coverage to offer perfect environment for 5G connectivity. 6 GHz spectrum will provide high bandwidth with improved network performance. It offers continuous channels that will reduce the need for network densification when mid-band spectrum is not available and it makes 5G connectivity affordable at anytime, anywhere for everyone.

mmWave is an essential technology of 5G network which build high performance network. 5G mmWave offer diverse services that is why all network providers should add on this technology in their 5G deployment planning. There are lots of service providers who deployed 5G mmWave, and their simulation result shows that 5G mmwave is a far less used spectrum. It provides very high speed wireless communication and it also offers ultra-wide bandwidth for next generation mobile network.

The evolution of wireless mobile technologies are presented in Table 1 . The abbreviations used in this paper are mentioned in Table 2 .

Summary of Mobile Technology.

Table of Notations and Abbreviations.

1.2. Key Contributions

The objective of this survey is to provide a detailed guide of 5G key technologies, methods to researchers, and to help with understanding how the recent works addressed 5G problems and developed solutions to tackle the 5G challenges; i.e., what are new methods that must be applied and how can they solve problems? Highlights of the research article are as follows.

  • This survey focused on the recent trends and development in the era of 5G and novel contributions by the researcher community and discussed technical details on essential aspects of the 5G advancement.
  • In this paper, the evolution of the mobile network from 1G to 5G is presented. In addition, the growth of mobile communication under different attributes is also discussed.
  • This paper covers the emerging applications and research groups working on 5G & different research areas in 5G wireless communication network with a descriptive taxonomy.
  • This survey discusses the current vision of the 5G networks, advantages, applications, key technologies, and key features. Furthermore, machine learning prospects are also explored with the emerging requirements in the 5G era. The article also focused on technical aspects of 5G IoT Based approaches and optimization techniques for 5G.
  • we provide an extensive overview and recent advancement of emerging technologies of 5G mobile network, namely, MIMO, Non-Orthogonal Multiple Access (NOMA), mmWave, Internet of Things (IoT), Machine Learning (ML), and optimization. Also, a technical summary is discussed by highlighting the context of current approaches and corresponding challenges.
  • Security challenges and considerations while developing 5G technology are discussed.
  • Finally, the paper concludes with the future directives.

The existing survey focused on architecture, key concepts, and implementation challenges and issues. In contrast, this survey covers the state-of-the-art techniques as well as corresponding recent novel developments by researchers. Various recent significant papers are discussed with the key technologies accelerating the development and production of 5G products.

2. Existing Surveys and Their Applicability

In this paper, a detailed survey on various technologies of 5G networks is presented. Various researchers have worked on different technologies of 5G networks. In this section, Table 3 gives a tabular representation of existing surveys of 5G networks. Massive MIMO, NOMA, small cell, mmWave, beamforming, and MEC are the six main pillars that helped to implement 5G networks in real life.

A comparative overview of existing surveys on different technologies of 5G networks.

2.1. Limitations of Existing Surveys

The existing survey focused on architecture, key concepts, and implementation challenges and issues. The numerous current surveys focused on various 5G technologies with different parameters, and the authors did not cover all the technologies of the 5G network in detail with challenges and recent advancements. Few authors worked on MIMO (Non-Orthogonal Multiple Access) NOMA, MEC, small cell technologies. In contrast, some others worked on beamforming, Millimeter-wave (mmWave). But the existing survey did not cover all the technologies of the 5G network from a research and advancement perspective. No detailed survey is available in the market covering all the 5G network technologies and currently published research trade-offs. So, our main aim is to give a detailed study of all the technologies working on the 5G network. In contrast, this survey covers the state-of-the-art techniques as well as corresponding recent novel developments by researchers. Various recent significant papers are discussed with the key technologies accelerating the development and production of 5G products. This survey article collected key information about 5G technology and recent advancements, and it can be a kind of a guide for the reader. This survey provides an umbrella approach to bring multiple solutions and recent improvements in a single place to accelerate the 5G research with the latest key enabling solutions and reviews. A systematic layout representation of the survey in Figure 1 . We provide a state-of-the-art comparative overview of the existing surveys on different technologies of 5G networks in Table 3 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-00026-g001.jpg

Systematic layout representation of survey.

2.2. Article Organization

This article is organized under the following sections. Section 2 presents existing surveys and their applicability. In Section 3 , the preliminaries of 5G technology are presented. In Section 4 , recent advances of 5G technology based on Massive MIMO, NOMA, Millimeter Wave, 5G with IoT, machine learning for 5G, and Optimization in 5G are provided. In Section 5 , a description of novel 5G features over 4G is provided. Section 6 covered all the security concerns of the 5G network. Section 7 , 5G technology based on above-stated challenges summarize in tabular form. Finally, Section 8 and Section 9 conclude the study, which paves the path for future research.

3. Preliminary Section

3.1. emerging 5g paradigms and its features.

5G provides very high speed, low latency, and highly salable connectivity between multiple devices and IoT worldwide. 5G will provide a very flexible model to develop a modern generation of applications and industry goals [ 26 , 27 ]. There are many services offered by 5G network architecture are stated below:

Massive machine to machine communications: 5G offers novel, massive machine-to-machine communications [ 28 ], also known as the IoT [ 29 ], that provide connectivity between lots of machines without any involvement of humans. This service enhances the applications of 5G and provides connectivity between agriculture, construction, and industries [ 30 ].

Ultra-reliable low latency communications (URLLC): This service offers real-time management of machines, high-speed vehicle-to-vehicle connectivity, industrial connectivity and security principles, and highly secure transport system, and multiple autonomous actions. Low latency communications also clear up a different area where remote medical care, procedures, and operation are all achievable [ 31 ].

Enhanced mobile broadband: Enhance mobile broadband is an important use case of 5G system, which uses massive MIMO antenna, mmWave, beamforming techniques to offer very high-speed connectivity across a wide range of areas [ 32 ].

For communities: 5G provides a very flexible internet connection between lots of machines to make smart homes, smart schools, smart laboratories, safer and smart automobiles, and good health care centers [ 33 ].

For businesses and industry: As 5G works on higher spectrum ranges from 24 to 100 GHz. This higher frequency range provides secure low latency communication and high-speed wireless connectivity between IoT devices and industry 4.0, which opens a market for end-users to enhance their business models [ 34 ].

New and Emerging technologies: As 5G came up with many new technologies like beamforming, massive MIMO, mmWave, small cell, NOMA, MEC, and network slicing, it introduced many new features to the market. Like virtual reality (VR), users can experience the physical presence of people who are millions of kilometers away from them. Many new technologies like smart homes, smart workplaces, smart schools, smart sports academy also came into the market with this 5G Mobile network model [ 35 ].

3.2. Commercial Service Providers of 5G

5G provides high-speed internet browsing, streaming, and downloading with very high reliability and low latency. 5G network will change your working style, and it will increase new business opportunities and provide innovations that we cannot imagine. This section covers top service providers of 5G network [ 36 , 37 ].

Ericsson: Ericsson is a Swedish multinational networking and telecommunications company, investing around 25.62 billion USD in 5G network, which makes it the biggest telecommunication company. It claims that it is the only company working on all the continents to make the 5G network a global standard for the next generation wireless communication. Ericsson developed the first 5G radio prototype that enables the operators to set up the live field trials in their network, which helps operators understand how 5G reacts. It plays a vital role in the development of 5G hardware. It currently provides 5G services in over 27 countries with content providers like China Mobile, GCI, LGU+, AT&T, Rogers, and many more. It has 100 commercial agreements with different operators as of 2020.

Verizon: It is American multinational telecommunication which was founded in 1983. Verizon started offering 5G services in April 2020, and by December 2020, it has actively provided 5G services in 30 cities of the USA. They planned that by the end of 2021, they would deploy 5G in 30 more new cities. Verizon deployed a 5G network on mmWave, a very high band spectrum between 30 to 300 GHz. As it is a significantly less used spectrum, it provides very high-speed wireless communication. MmWave offers ultra-wide bandwidth for next-generation mobile networks. MmWave is a faster and high-band spectrum that has a limited range. Verizon planned to increase its number of 5G cells by 500% by 2020. Verizon also has an ultra wide-band flagship 5G service which is the best 5G service that increases the market price of Verizon.

Nokia: Nokia is a Finnish multinational telecommunications company which was founded in 1865. Nokia is one of the companies which adopted 5G technology very early. It is developing, researching, and building partnerships with various 5G renders to offer 5G communication as soon as possible. Nokia collaborated with Deutsche Telekom and Hamburg Port Authority and provided them 8000-hectare site for their 5G MoNArch project. Nokia is the only company that supplies 5G technology to all the operators of different countries like AT&T, Sprint, T-Mobile US and Verizon in the USA, Korea Telecom, LG U+ and SK Telecom in South Korea and NTT DOCOMO, KDDI, and SoftBank in Japan. Presently, Nokia has around 150+ agreements and 29 live networks all over the world. Nokia is continuously working hard on 5G technology to expand 5G networks all over the globe.

AT&T: AT&T is an American multinational company that was the first to deploy a 5G network in reality in 2018. They built a gigabit 5G network connection in Waco, TX, Kalamazoo, MI, and South Bend to achieve this. It is the first company that archives 1–2 gigabit per second speed in 2019. AT&T claims that it provides a 5G network connection among 225 million people worldwide by using a 6 GHz spectrum band.

T-Mobile: T-Mobile US (TMUS) is an American wireless network operator which was the first service provider that offers a real 5G nationwide network. The company knew that high-band 5G was not feasible nationwide, so they used a 600 MHz spectrum to build a significant portion of its 5G network. TMUS is planning that by 2024 they will double the total capacity and triple the full 5G capacity of T-Mobile and Sprint combined. The sprint buyout is helping T-Mobile move forward the company’s current market price to 129.98 USD.

Samsung: Samsung started their research in 5G technology in 2011. In 2013, Samsung successfully developed the world’s first adaptive array transceiver technology operating in the millimeter-wave Ka bands for cellular communications. Samsung provides several hundred times faster data transmission than standard 4G for core 5G mobile communication systems. The company achieved a lot of success in the next generation of technology, and it is considered one of the leading companies in the 5G domain.

Qualcomm: Qualcomm is an American multinational corporation in San Diego, California. It is also one of the leading company which is working on 5G chip. Qualcomm’s first 5G modem chip was announced in October 2016, and a prototype was demonstrated in October 2017. Qualcomm mainly focuses on building products while other companies talk about 5G; Qualcomm is building the technologies. According to one magazine, Qualcomm was working on three main areas of 5G networks. Firstly, radios that would use bandwidth from any network it has access to; secondly, creating more extensive ranges of spectrum by combining smaller pieces; and thirdly, a set of services for internet applications.

ZTE Corporation: ZTE Corporation was founded in 1985. It is a partially Chinese state-owned technology company that works in telecommunication. It was a leading company that worked on 4G LTE, and it is still maintaining its value and doing research and tests on 5G. It is the first company that proposed Pre5G technology with some series of solutions.

NEC Corporation: NEC Corporation is a Japanese multinational information technology and electronics corporation headquartered in Minato, Tokyo. ZTE also started their research on 5G, and they introduced a new business concept. NEC’s main aim is to develop 5G NR for the global mobile system and create secure and intelligent technologies to realize 5G services.

Cisco: Cisco is a USA networking hardware company that also sleeves up for 5G network. Cisco’s primary focus is to support 5G in three ways: Service—enable 5G services faster so all service providers can increase their business. Infrastructure—build 5G-oriented infrastructure to implement 5G more quickly. Automation—make a more scalable, flexible, and reliable 5G network. The companies know the importance of 5G, and they want to connect more than 30 billion devices in the next couple of years. Cisco intends to work on network hardening as it is a vital part of 5G network. Cisco used AI with deep learning to develop a 5G Security Architecture, enabling Secure Network Transformation.

3.3. 5G Research Groups

Many research groups from all over the world are working on a 5G wireless mobile network [ 38 ]. These groups are continuously working on various aspects of 5G. The list of those research groups are presented as follows: 5GNOW (5th Generation Non-Orthogonal Waveform for Asynchronous Signaling), NEWCOM (Network of Excellence in Wireless Communication), 5GIC (5G Innovation Center), NYU (New York University) Wireless, 5GPPP (5G Infrastructure Public-Private Partnership), EMPHATIC (Enhanced Multi-carrier Technology for Professional Adhoc and Cell-Based Communication), ETRI(Electronics and Telecommunication Research Institute), METIS (Mobile and wireless communication Enablers for the Twenty-twenty Information Society) [ 39 ]. The various research groups along with the research area are presented in Table 4 .

Research groups working on 5G mobile networks.

3.4. 5G Applications

5G is faster than 4G and offers remote-controlled operation over a reliable network with zero delays. It provides down-link maximum throughput of up to 20 Gbps. In addition, 5G also supports 4G WWWW (4th Generation World Wide Wireless Web) [ 5 ] and is based on Internet protocol version 6 (IPv6) protocol. 5G provides unlimited internet connection at your convenience, anytime, anywhere with extremely high speed, high throughput, low-latency, higher reliability, greater scalablility, and energy-efficient mobile communication technology [ 6 ].

There are lots of applications of 5G mobile network are as follows:

  • High-speed mobile network: 5G is an advancement on all the previous mobile network technologies, which offers very high speed downloading speeds 0 of up to 10 to 20 Gbps. The 5G wireless network works as a fiber optic internet connection. 5G is different from all the conventional mobile transmission technologies, and it offers both voice and high-speed data connectivity efficiently. 5G offers very low latency communication of less than a millisecond, useful for autonomous driving and mission-critical applications. 5G will use millimeter waves for data transmission, providing higher bandwidth and a massive data rate than lower LTE bands. As 5 Gis a fast mobile network technology, it will enable virtual access to high processing power and secure and safe access to cloud services and enterprise applications. Small cell is one of the best features of 5G, which brings lots of advantages like high coverage, high-speed data transfer, power saving, easy and fast cloud access, etc. [ 40 ].
  • Entertainment and multimedia: In one analysis in 2015, it was found that more than 50 percent of mobile internet traffic was used for video downloading. This trend will surely increase in the future, which will make video streaming more common. 5G will offer High-speed streaming of 4K videos with crystal clear audio, and it will make a high definition virtual world on your mobile. 5G will benefit the entertainment industry as it offers 120 frames per second with high resolution and higher dynamic range video streaming, and HD TV channels can also be accessed on mobile devices without any interruptions. 5G provides low latency high definition communication so augmented reality (AR), and virtual reality (VR) will be very easily implemented in the future. Virtual reality games are trendy these days, and many companies are investing in HD virtual reality games. The 5G network will offer high-speed internet connectivity with a better gaming experience [ 41 ].
  • Smart homes : smart home appliances and products are in demand these days. The 5G network makes smart homes more real as it offers high-speed connectivity and monitoring of smart appliances. Smart home appliances are easily accessed and configured from remote locations using the 5G network as it offers very high-speed low latency communication.
  • Smart cities: 5G wireless network also helps develop smart cities applications such as automatic traffic management, weather update, local area broadcasting, energy-saving, efficient power supply, smart lighting system, water resource management, crowd management, emergency control, etc.
  • Industrial IoT: 5G wireless technology will provide lots of features for future industries such as safety, process tracking, smart packing, shipping, energy efficiency, automation of equipment, predictive maintenance, and logistics. 5G smart sensor technology also offers smarter, safer, cost-effective, and energy-saving industrial IoT operations.
  • Smart Farming: 5G technology will play a crucial role in agriculture and smart farming. 5G sensors and GPS technology will help farmers track live attacks on crops and manage them quickly. These smart sensors can also be used for irrigation, pest, insect, and electricity control.
  • Autonomous Driving: The 5G wireless network offers very low latency high-speed communication, significant for autonomous driving. It means self-driving cars will come to real life soon with 5G wireless networks. Using 5G autonomous cars can easily communicate with smart traffic signs, objects, and other vehicles running on the road. 5G’s low latency feature makes self-driving more real as every millisecond is essential for autonomous vehicles, decision-making is done in microseconds to avoid accidents.
  • Healthcare and mission-critical applications: 5G technology will bring modernization in medicine where doctors and practitioners can perform advanced medical procedures. The 5G network will provide connectivity between all classrooms, so attending seminars and lectures will be easier. Through 5G technology, patients can connect with doctors and take their advice. Scientists are building smart medical devices which can help people with chronic medical conditions. The 5G network will boost the healthcare industry with smart devices, the internet of medical things, smart sensors, HD medical imaging technologies, and smart analytics systems. 5G will help access cloud storage, so accessing healthcare data will be very easy from any location worldwide. Doctors and medical practitioners can easily store and share large files like MRI reports within seconds using the 5G network.
  • Satellite Internet: In many remote areas, ground base stations are not available, so 5G will play a crucial role in providing connectivity in such areas. The 5G network will provide connectivity using satellite systems, and the satellite system uses a constellation of multiple small satellites to provide connectivity in urban and rural areas across the world.

4. 5G Technologies

This section describes recent advances of 5G Massive MIMO, 5G NOMA, 5G millimeter wave, 5G IOT, 5G with machine learning, and 5G optimization-based approaches. In addition, the summary is also presented in each subsection that paves the researchers for the future research direction.

4.1. 5G Massive MIMO

Multiple-input-multiple-out (MIMO) is a very important technology for wireless systems. It is used for sending and receiving multiple signals simultaneously over the same radio channel. MIMO plays a very big role in WI-FI, 3G, 4G, and 4G LTE-A networks. MIMO is mainly used to achieve high spectral efficiency and energy efficiency but it was not up to the mark MIMO provides low throughput and very low reliable connectivity. To resolve this, lots of MIMO technology like single user MIMO (SU-MIMO), multiuser MIMO (MU-MIMO) and network MIMO were used. However, these new MIMO also did not still fulfill the demand of end users. Massive MIMO is an advancement of MIMO technology used in the 5G network in which hundreds and thousands of antennas are attached with base stations to increase throughput and spectral efficiency. Multiple transmit and receive antennas are used in massive MIMO to increase the transmission rate and spectral efficiency. When multiple UEs generate downlink traffic simultaneously, massive MIMO gains higher capacity. Massive MIMO uses extra antennas to move energy into smaller regions of space to increase spectral efficiency and throughput [ 43 ]. In traditional systems data collection from smart sensors is a complex task as it increases latency, reduced data rate and reduced reliability. While massive MIMO with beamforming and huge multiplexing techniques can sense data from different sensors with low latency, high data rate and higher reliability. Massive MIMO will help in transmitting the data in real-time collected from different sensors to central monitoring locations for smart sensor applications like self-driving cars, healthcare centers, smart grids, smart cities, smart highways, smart homes, and smart enterprises [ 44 ].

Highlights of 5G Massive MIMO technology are as follows:

  • Data rate: Massive MIMO is advised as the one of the dominant technologies to provide wireless high speed and high data rate in the gigabits per seconds.
  • The relationship between wave frequency and antenna size: Both are inversely proportional to each other. It means lower frequency signals need a bigger antenna and vise versa.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-00026-g002.jpg

Pictorial representation of multi-input and multi-output (MIMO).

  • MIMO role in 5G: Massive MIMO will play a crucial role in the deployment of future 5G mobile communication as greater spectral and energy efficiency could be enabled.

State-of-the-Art Approaches

Plenty of approaches were proposed to resolve the issues of conventional MIMO [ 7 ].

The MIMO multirate, feed-forward controller is suggested by Mae et al. [ 46 ]. In the simulation, the proposed model generates the smooth control input, unlike the conventional MIMO, which generates oscillated control inputs. It also outperformed concerning the error rate. However, a combination of multirate and single rate can be used for better results.

The performance of stand-alone MIMO, distributed MIMO with and without corporation MIMO, was investigated by Panzner et al. [ 47 ]. In addition, an idea about the integration of large scale in the 5G technology was also presented. In the experimental analysis, different MIMO configurations are considered. The variation in the ratio of overall transmit antennas to spatial is deemed step-wise from equality to ten.

The simulation of massive MIMO noncooperative and cooperative systems for down-link behavior was performed by He et al. [ 48 ]. It depends on present LTE systems, which deal with various antennas in the base station set-up. It was observed that collaboration in different BS improves the system behaviors, whereas throughput is reduced slightly in this approach. However, a new method can be developed which can enhance both system behavior and throughput.

In [ 8 ], different approaches that increased the energy efficiency benefits provided by massive MIMO were presented. They analyzed the massive MIMO technology and described the detailed design of the energy consumption model for massive MIMO systems. This article has explored several techniques to enhance massive MIMO systems’ energy efficiency (EE) gains. This paper reviews standard EE-maximization approaches for the conventional massive MIMO systems, namely, scaling number of antennas, real-time implementing low-complexity operations at the base station (BS), power amplifier losses minimization, and radio frequency (RF) chain minimization requirements. In addition, open research direction is also identified.

In [ 49 ], various existing approaches based on different antenna selection and scheduling, user selection and scheduling, and joint antenna and user scheduling methods adopted in massive MIMO systems are presented in this paper. The objective of this survey article was to make awareness about the current research and future research direction in MIMO for systems. They analyzed that complete utilization of resources and bandwidth was the most crucial factor which enhances the sum rate.

In [ 50 ], authors discussed the development of various techniques for pilot contamination. To calculate the impact of pilot contamination in time division duplex (TDD) massive MIMO system, TDD and frequency division duplexing FDD patterns in massive MIMO techniques are used. They discussed different issues in pilot contamination in TDD massive MIMO systems with all the possible future directions of research. They also classified various techniques to generate the channel information for both pilot-based and subspace-based approaches.

In [ 19 ], the authors defined the uplink and downlink services for a massive MIMO system. In addition, it maintains a performance matrix that measures the impact of pilot contamination on different performances. They also examined the various application of massive MIMO such as small cells, orthogonal frequency-division multiplexing (OFDM) schemes, massive MIMO IEEE 802, 3rd generation partnership project (3GPP) specifications, and higher frequency bands. They considered their research work crucial for cutting edge massive MIMO and covered many issues like system throughput performance and channel state acquisition at higher frequencies.

In [ 13 ], various approaches were suggested for MIMO future generation wireless communication. They made a comparative study based on performance indicators such as peak data rate, energy efficiency, latency, throughput, etc. The key findings of this survey are as follows: (1) spatial multiplexing improves the energy efficiency; (2) design of MIMO play a vital role in the enhancement of throughput; (3) enhancement of mMIMO focusing on energy & spectral performance; (4) discussed the future challenges to improve the system design.

In [ 51 ], the study of large-scale MIMO systems for an energy-efficient system sharing method was presented. For the resource allocation, circuit energy and transmit energy expenditures were taken into consideration. In addition, the optimization techniques were applied for an energy-efficient resource sharing system to enlarge the energy efficiency for individual QoS and energy constraints. The author also examined the BS configuration, which includes homogeneous and heterogeneous UEs. While simulating, they discussed that the total number of transmit antennas plays a vital role in boosting energy efficiency. They highlighted that the highest energy efficiency was obtained when the BS was set up with 100 antennas that serve 20 UEs.

This section includes various works done on 5G MIMO technology by different author’s. Table 5 shows how different author’s worked on improvement of various parameters such as throughput, latency, energy efficiency, and spectral efficiency with 5G MIMO technology.

Summary of massive MIMO-based approaches in 5G technology.

4.2. 5G Non-Orthogonal Multiple Access (NOMA)

NOMA is a very important radio access technology used in next generation wireless communication. Compared to previous orthogonal multiple access techniques, NOMA offers lots of benefits like high spectrum efficiency, low latency with high reliability and high speed massive connectivity. NOMA mainly works on a baseline to serve multiple users with the same resources in terms of time, space and frequency. NOMA is mainly divided into two main categories one is code domain NOMA and another is power domain NOMA. Code-domain NOMA can improve the spectral efficiency of mMIMO, which improves the connectivity in 5G wireless communication. Code-domain NOMA was divided into some more multiple access techniques like sparse code multiple access, lattice-partition multiple access, multi-user shared access and pattern-division multiple access [ 52 ]. Power-domain NOMA is widely used in 5G wireless networks as it performs well with various wireless communication techniques such as MIMO, beamforming, space-time coding, network coding, full-duplex and cooperative communication etc. [ 53 ]. The conventional orthogonal frequency-division multiple access (OFDMA) used by 3GPP in 4G LTE network provides very low spectral efficiency when bandwidth resources are allocated to users with low channel state information (CSI). NOMA resolved this issue as it enables users to access all the subcarrier channels so bandwidth resources allocated to the users with low CSI can still be accessed by the users with strong CSI which increases the spectral efficiency. The 5G network will support heterogeneous architecture in which small cell and macro base stations work for spectrum sharing. NOMA is a key technology of the 5G wireless system which is very helpful for heterogeneous networks as multiple users can share their data in a small cell using the NOMA principle.The NOMA is helpful in various applications like ultra-dense networks (UDN), machine to machine (M2M) communication and massive machine type communication (mMTC). As NOMA provides lots of features it has some challenges too such as NOMA needs huge computational power for a large number of users at high data rates to run the SIC algorithms. Second, when users are moving from the networks, to manage power allocation optimization is a challenging task for NOMA [ 54 ]. Hybrid NOMA (HNOMA) is a combination of power-domain and code-domain NOMA. HNOMA uses both power differences and orthogonal resources for transmission among multiple users. As HNOMA is using both power-domain NOMA and code-domain NOMA it can achieve higher spectral efficiency than Power-domain NOMA and code-domain NOMA. In HNOMA multiple groups can simultaneously transmit signals at the same time. It uses a message passing algorithm (MPA) and successive interference cancellation (SIC)-based detection at the base station for these groups [ 55 ].

Highlights of 5G NOMA technology as follows:

An external file that holds a picture, illustration, etc.
Object name is sensors-22-00026-g003.jpg

Pictorial representation of orthogonal and Non-Orthogonal Multiple Access (NOMA).

  • NOMA provides higher data rates and resolves all the loop holes of OMA that makes 5G mobile network more scalable and reliable.
  • As multiple users use same frequency band simultaneously it increases the performance of whole network.
  • To setup intracell and intercell interference NOMA provides nonorthogonal transmission on the transmitter end.
  • The primary fundamental of NOMA is to improve the spectrum efficiency by strengthening the ramification of receiver.

State-of-the-Art of Approaches

A plenty of approaches were developed to address the various issues in NOMA.

A novel approach to address the multiple receiving signals at the same frequency is proposed in [ 22 ]. In NOMA, multiple users use the same sub-carrier, which improves the fairness and throughput of the system. As a nonorthogonal method is used among multiple users, at the time of retrieving the user’s signal at the receiver’s end, joint processing is required. They proposed solutions to optimize the receiver and the radio resource allocation of uplink NOMA. Firstly, the authors proposed an iterative MUDD which utilizes the information produced by the channel decoder to improve the performance of the multiuser detector. After that, the author suggested a power allocation and novel subcarrier that enhances the users’ weighted sum rate for the NOMA scheme. Their proposed model showed that NOMA performed well as compared to OFDM in terms of fairness and efficiency.

In [ 53 ], the author’s reviewed a power-domain NOMA that uses superposition coding (SC) and successive interference cancellation (SIC) at the transmitter and the receiver end. Lots of analyses were held that described that NOMA effectively satisfies user data rate demands and network-level of 5G technologies. The paper presented a complete review of recent advances in the 5G NOMA system. It showed the comparative analysis regarding allocation procedures, user fairness, state-of-the-art efficiency evaluation, user pairing pattern, etc. The study also analyzes NOMA’s behavior when working with other wireless communication techniques, namely, beamforming, MIMO, cooperative connections, network, space-time coding, etc.

In [ 9 ], the authors proposed NOMA with MEC, which improves the QoS as well as reduces the latency of the 5G wireless network. This model increases the uplink NOMA by decreasing the user’s uplink energy consumption. They formulated an optimized NOMA framework that reduces the energy consumption of MEC by using computing and communication resource allocation, user clustering, and transmit powers.

In [ 10 ], the authors proposed a model which investigates outage probability under average channel state information CSI and data rate in full CSI to resolve the problem of optimal power allocation, which increase the NOMA downlink system among users. They developed simple low-complexity algorithms to provide the optimal solution. The obtained simulation results showed NOMA’s efficiency, achieving higher performance fairness compared to the TDMA configurations. It was observed from the results that NOMA, through the appropriate power amplifiers (PA), ensures the high-performance fairness requirement for the future 5G wireless communication networks.

In [ 56 ], researchers discussed that the NOMA technology and waveform modulation techniques had been used in the 5G mobile network. Therefore, this research gave a detailed survey of non-orthogonal waveform modulation techniques and NOMA schemes for next-generation mobile networks. By analyzing and comparing multiple access technologies, they considered the future evolution of these technologies for 5G mobile communication.

In [ 57 ], the authors surveyed non-orthogonal multiple access (NOMA) from the development phase to the recent developments. They have also compared NOMA techniques with traditional OMA techniques concerning information theory. The author discussed the NOMA schemes categorically as power and code domain, including the design principles, operating principles, and features. Comparison is based upon the system’s performance, spectral efficiency, and the receiver’s complexity. Also discussed are the future challenges, open issues, and their expectations of NOMA and how it will support the key requirements of 5G mobile communication systems with massive connectivity and low latency.

In [ 17 ], authors present the first review of an elementary NOMA model with two users, which clarify its central precepts. After that, a general design with multicarrier supports with a random number of users on each sub-carrier is analyzed. In performance evaluation with the existing approaches, resource sharing and multiple-input multiple-output NOMA are examined. Furthermore, they took the key elements of NOMA and its potential research demands. Finally, they reviewed the two-user SC-NOMA design and a multi-user MC-NOMA design to highlight NOMA’s basic approaches and conventions. They also present the research study about the performance examination, resource assignment, and MIMO in NOMA.

In this section, various works by different authors done on 5G NOMA technology is covered. Table 6 shows how other authors worked on the improvement of various parameters such as spectral efficiency, fairness, and computing capacity with 5G NOMA technology.

Summary of NOMA-based approaches in 5G technology.

4.3. 5G Millimeter Wave (mmWave)

Millimeter wave is an extremely high frequency band, which is very useful for 5G wireless networks. MmWave uses 30 GHz to 300 GHz spectrum band for transmission. The frequency band between 30 GHz to 300 GHz is known as mmWave because these waves have wavelengths between 1 to 10 mm. Till now radar systems and satellites are only using mmWave as these are very fast frequency bands which provide very high speed wireless communication. Many mobile network providers also started mmWave for transmitting data between base stations. Using two ways the speed of data transmission can be improved one is by increasing spectrum utilization and second is by increasing spectrum bandwidth. Out of these two approaches increasing bandwidth is quite easy and better. The frequency band below 5 GHz is very crowded as many technologies are using it so to boost up the data transmission rate 5G wireless network uses mmWave technology which instead of increasing spectrum utilization, increases the spectrum bandwidth [ 58 ]. To maximize the signal bandwidth in wireless communication the carrier frequency should also be increased by 5% because the signal bandwidth is directly proportional to carrier frequencies. The frequency band between 28 GHz to 60 GHz is very useful for 5G wireless communication as 28 GHz frequency band offers up to 1 GHz spectrum bandwidth and 60 GHz frequency band offers 2 GHz spectrum bandwidth. 4G LTE provides 2 GHz carrier frequency which offers only 100 MHz spectrum bandwidth. However, the use of mmWave increases the spectrum bandwidth 10 times, which leads to better transmission speeds [ 59 , 60 ].

Highlights of 5G mmWave are as follows:

An external file that holds a picture, illustration, etc.
Object name is sensors-22-00026-g004.jpg

Pictorial representation of millimeter wave.

  • The 5G mmWave offer three advantages: (1) MmWave is very less used new Band, (2) MmWave signals carry more data than lower frequency wave, and (3) MmWave can be incorporated with MIMO antenna with the potential to offer a higher magnitude capacity compared to current communication systems.

In [ 11 ], the authors presented the survey of mmWave communications for 5G. The advantage of mmWave communications is adaptability, i.e., it supports the architectures and protocols up-gradation, which consists of integrated circuits, systems, etc. The authors over-viewed the present solutions and examined them concerning effectiveness, performance, and complexity. They also discussed the open research issues of mmWave communications in 5G concerning the software-defined network (SDN) architecture, network state information, efficient regulation techniques, and the heterogeneous system.

In [ 61 ], the authors present the recent work done by investigators in 5G; they discussed the design issues and demands of mmWave 5G antennas for cellular handsets. After that, they designed a small size and low-profile 60 GHz array of antenna units that contain 3D planer mesh-grid antenna elements. For the future prospect, a framework is designed in which antenna components are used to operate cellular handsets on mmWave 5G smartphones. In addition, they cross-checked the mesh-grid array of antennas with the polarized beam for upcoming hardware challenges.

In [ 12 ], the authors considered the suitability of the mmWave band for 5G cellular systems. They suggested a resource allocation system for concurrent D2D communications in mmWave 5G cellular systems, and it improves network efficiency and maintains network connectivity. This research article can serve as guidance for simulating D2D communications in mmWave 5G cellular systems. Massive mmWave BS may be set up to obtain a high delivery rate and aggregate efficiency. Therefore, many wireless users can hand off frequently between the mmWave base terminals, and it emerges the demand to search the neighbor having better network connectivity.

In [ 62 ], the authors provided a brief description of the cellular spectrum which ranges from 1 GHz to 3 GHz and is very crowed. In addition, they presented various noteworthy factors to set up mmWave communications in 5G, namely, channel characteristics regarding mmWave signal attenuation due to free space propagation, atmospheric gaseous, and rain. In addition, hybrid beamforming architecture in the mmWave technique is analyzed. They also suggested methods for the blockage effect in mmWave communications due to penetration damage. Finally, the authors have studied designing the mmWave transmission with small beams in nonorthogonal device-to-device communication.

This section covered various works done on 5G mmWave technology. The Table 7 shows how different author’s worked on the improvement of various parameters i.e., transmission rate, coverage, and cost, with 5G mmWave technology.

Summary of existing mmWave-based approaches in 5G technology.

4.4. 5G IoT Based Approaches

The 5G mobile network plays a big role in developing the Internet of Things (IoT). IoT will connect lots of things with the internet like appliances, sensors, devices, objects, and applications. These applications will collect lots of data from different devices and sensors. 5G will provide very high speed internet connectivity for data collection, transmission, control, and processing. 5G is a flexible network with unused spectrum availability and it offers very low cost deployment that is why it is the most efficient technology for IoT [ 63 ]. In many areas, 5G provides benefits to IoT, and below are some examples:

Smart homes: smart home appliances and products are in demand these days. The 5G network makes smart homes more real as it offers high speed connectivity and monitoring of smart appliances. Smart home appliances are easily accessed and configured from remote locations using the 5G network, as it offers very high speed low latency communication.

Smart cities: 5G wireless network also helps in developing smart cities applications such as automatic traffic management, weather update, local area broadcasting, energy saving, efficient power supply, smart lighting system, water resource management, crowd management, emergency control, etc.

Industrial IoT: 5G wireless technology will provide lots of features for future industries such as safety, process tracking, smart packing, shipping, energy efficiency, automation of equipment, predictive maintenance and logistics. 5G smart sensor technology also offers smarter, safer, cost effective, and energy-saving industrial operation for industrial IoT.

Smart Farming: 5G technology will play a crucial role for agriculture and smart farming. 5G sensors and GPS technology will help farmers to track live attacks on crops and manage them quickly. These smart sensors can also be used for irrigation control, pest control, insect control, and electricity control.

Autonomous Driving: 5G wireless network offers very low latency high speed communication which is very significant for autonomous driving. It means self-driving cars will come to real life soon with 5G wireless networks. Using 5G autonomous cars can easily communicate with smart traffic signs, objects and other vehicles running on the road. 5G’s low latency feature makes self-driving more real as every millisecond is important for autonomous vehicles, decision taking is performed in microseconds to avoid accidents [ 64 ].

Highlights of 5G IoT are as follows:

An external file that holds a picture, illustration, etc.
Object name is sensors-22-00026-g005.jpg

Pictorial representation of IoT with 5G.

  • 5G with IoT is a new feature of next-generation mobile communication, which provides a high-speed internet connection between moderated devices. 5G IoT also offers smart homes, smart devices, sensors, smart transportation systems, smart industries, etc., for end-users to make them smarter.
  • IoT deals with moderate devices which connect through the internet. The approach of the IoT has made the consideration of the research associated with the outcome of providing wearable, smart-phones, sensors, smart transportation systems, smart devices, washing machines, tablets, etc., and these diverse systems are associated to a common interface with the intelligence to connect.
  • Significant IoT applications include private healthcare systems, traffic management, industrial management, and tactile internet, etc.

Plenty of approaches is devised to address the issues of IoT [ 14 , 65 , 66 ].

In [ 65 ], the paper focuses on 5G mobile systems due to the emerging trends and developing technologies, which results in the exponential traffic growth in IoT. The author surveyed the challenges and demands during deployment of the massive IoT applications with the main focus on mobile networking. The author reviewed the features of standard IoT infrastructure, along with the cellular-based, low-power wide-area technologies (LPWA) such as eMTC, extended coverage (EC)-GSM-IoT, as well as noncellular, low-power wide-area (LPWA) technologies such as SigFox, LoRa etc.

In [ 14 ], the authors presented how 5G technology copes with the various issues of IoT today. It provides a brief review of existing and forming 5G architectures. The survey indicates the role of 5G in the foundation of the IoT ecosystem. IoT and 5G can easily combine with improved wireless technologies to set up the same ecosystem that can fulfill the current requirement for IoT devices. 5G can alter nature and will help to expand the development of IoT devices. As the process of 5G unfolds, global associations will find essentials for setting up a cross-industry engagement in determining and enlarging the 5G system.

In [ 66 ], the author introduced an IoT authentication scheme in a 5G network, with more excellent reliability and dynamic. The scheme proposed a privacy-protected procedure for selecting slices; it provided an additional fog node for proper data transmission and service types of the subscribers, along with service-oriented authentication and key understanding to maintain the secrecy, precision of users, and confidentiality of service factors. Users anonymously identify the IoT servers and develop a vital channel for service accessibility and data cached on local fog nodes and remote IoT servers. The author performed a simulation to manifest the security and privacy preservation of the user over the network.

This section covered various works done on 5G IoT by multiple authors. Table 8 shows how different author’s worked on the improvement of numerous parameters, i.e., data rate, security requirement, and performance with 5G IoT.

Summary of IoT-based approaches in 5G technology.

4.5. Machine Learning Techniques for 5G

Various machine learning (ML) techniques were applied in 5G networks and mobile communication. It provides a solution to multiple complex problems, which requires a lot of hand-tuning. ML techniques can be broadly classified as supervised, unsupervised, and reinforcement learning. Let’s discuss each learning technique separately and where it impacts the 5G network.

Supervised Learning, where user works with labeled data; some 5G network problems can be further categorized as classification and regression problems. Some regression problems such as scheduling nodes in 5G and energy availability can be predicted using Linear Regression (LR) algorithm. To accurately predict the bandwidth and frequency allocation Statistical Logistic Regression (SLR) is applied. Some supervised classifiers are applied to predict the network demand and allocate network resources based on the connectivity performance; it signifies the topology setup and bit rates. Support Vector Machine (SVM) and NN-based approximation algorithms are used for channel learning based on observable channel state information. Deep Neural Network (DNN) is also employed to extract solutions for predicting beamforming vectors at the BS’s by taking mapping functions and uplink pilot signals into considerations.

In unsupervised Learning, where the user works with unlabeled data, various clustering techniques are applied to enhance network performance and connectivity without interruptions. K-means clustering reduces the data travel by storing data centers content into clusters. It optimizes the handover estimation based on mobility pattern and selection of relay nodes in the V2V network. Hierarchical clustering reduces network failure by detecting the intrusion in the mobile wireless network; unsupervised soft clustering helps in reducing latency by clustering fog nodes. The nonparametric Bayesian unsupervised learning technique reduces traffic in the network by actively serving the user’s requests and demands. Other unsupervised learning techniques such as Adversarial Auto Encoders (AAE) and Affinity Propagation Clustering techniques detect irregular behavior in the wireless spectrum and manage resources for ultradense small cells, respectively.

In case of an uncertain environment in the 5G wireless network, reinforcement learning (RL) techniques are employed to solve some problems. Actor-critic reinforcement learning is used for user scheduling and resource allocation in the network. Markov decision process (MDP) and Partially Observable MDP (POMDP) is used for Quality of Experience (QoE)-based handover decision-making for Hetnets. Controls packet call admission in HetNets and channel access process for secondary users in a Cognitive Radio Network (CRN). Deep RL is applied to decide the communication channel and mobility and speeds up the secondary user’s learning rate using an antijamming strategy. Deep RL is employed in various 5G network application parameters such as resource allocation and security [ 67 ]. Table 9 shows the state-of-the-art ML-based solution for 5G network.

The state-of-the-art ML-based solution for 5G network.

Highlights of machine learning techniques for 5G are as follows:

An external file that holds a picture, illustration, etc.
Object name is sensors-22-00026-g006.jpg

Pictorial representation of machine learning (ML) in 5G.

  • In ML, a model will be defined which fulfills the desired requirements through which desired results are obtained. In the later stage, it examines accuracy from obtained results.
  • ML plays a vital role in 5G network analysis for threat detection, network load prediction, final arrangement, and network formation. Searching for a better balance between power, length of antennas, area, and network thickness crossed with the spontaneous use of services in the universe of individual users and types of devices.

In [ 79 ], author’s firstly describes the demands for the traditional authentication procedures and benefits of intelligent authentication. The intelligent authentication method was established to improve security practice in 5G-and-beyond wireless communication systems. Thereafter, the machine learning paradigms for intelligent authentication were organized into parametric and non-parametric research methods, as well as supervised, unsupervised, and reinforcement learning approaches. As a outcome, machine learning techniques provide a new paradigm into authentication under diverse network conditions and unstable dynamics. In addition, prompt intelligence to the security management to obtain cost-effective, better reliable, model-free, continuous, and situation-aware authentication.

In [ 68 ], the authors proposed a machine learning-based model to predict the traffic load at a particular location. They used a mobile network traffic dataset to train a model that can calculate the total number of user requests at a time. To launch access and mobility management function (AMF) instances according to the requirement as there were no predictions of user request the performance automatically degrade as AMF does not handle these requests at a time. Earlier threshold-based techniques were used to predict the traffic load, but that approach took too much time; therefore, the authors proposed RNN algorithm-based ML to predict the traffic load, which gives efficient results.

In [ 15 ], authors discussed the issue of network slice admission, resource allocation among subscribers, and how to maximize the profit of infrastructure providers. The author proposed a network slice admission control algorithm based on SMDP (decision-making process) that guarantees the subscribers’ best acceptance policies and satisfiability (tenants). They also suggested novel N3AC, a neural network-based algorithm that optimizes performance under various configurations, significantly outperforms practical and straightforward approaches.

This section includes various works done on 5G ML by different authors. Table 10 shows the state-of-the-art work on the improvement of various parameters such as energy efficiency, Quality of Services (QoS), and latency with 5G ML.

The state-of-the-art ML-based approaches in 5G technology.

4.6. Optimization Techniques for 5G

Optimization techniques may be applied to capture NP-Complete or NP-Hard problems in 5G technology. This section briefly describes various research works suggested for 5G technology based on optimization techniques.

In [ 80 ], Massive MIMO technology is used in 5G mobile network to make it more flexible and scalable. The MIMO implementation in 5G needs a significant number of radio frequencies is required in the RF circuit that increases the cost and energy consumption of the 5G network. This paper provides a solution that increases the cost efficiency and energy efficiency with many radio frequency chains for a 5G wireless communication network. They give an optimized energy efficient technique for MIMO antenna and mmWave technologies based 5G mobile communication network. The proposed Energy Efficient Hybrid Precoding (EEHP) algorithm to increase the energy efficiency for the 5G wireless network. This algorithm minimizes the cost of an RF circuit with a large number of RF chains.

In [ 16 ], authors have discussed the growing demand for energy efficiency in the next-generation networks. In the last decade, they have figured out the things in wireless transmissions, which proved a change towards pursuing green communication for the next generation system. The importance of adopting the correct EE metric was also reviewed. Further, they worked through the different approaches that can be applied in the future for increasing the network’s energy and posed a summary of the work that was completed previously to enhance the energy productivity of the network using these capabilities. A system design for EE development using relay selection was also characterized, along with an observation of distinct algorithms applied for EE in relay-based ecosystems.

In [ 81 ], authors presented how AI-based approach is used to the setup of Self Organizing Network (SON) functionalities for radio access network (RAN) design and optimization. They used a machine learning approach to predict the results for 5G SON functionalities. Firstly, the input was taken from various sources; then, prediction and clustering-based machine learning models were applied to produce the results. Multiple AI-based devices were used to extract the knowledge analysis to execute SON functionalities smoothly. Based on results, they tested how self-optimization, self-testing, and self-designing are done for SON. The author also describes how the proposed mechanism classifies in different orders.

In [ 82 ], investigators examined the working of OFDM in various channel environments. They also figured out the changes in frame duration of the 5G TDD frame design. Subcarrier spacing is beneficial to obtain a small frame length with control overhead. They provided various techniques to reduce the growing guard period (GP) and cyclic prefix (CP) like complete utilization of multiple subcarrier spacing, management and data parts of frame at receiver end, various uses of timing advance (TA) or total control of flexible CP size.

This section includes various works that were done on 5G optimization by different authors. Table 11 shows how other authors worked on the improvement of multiple parameters such as energy efficiency, power optimization, and latency with 5G optimization.

Summary of Optimization Based Approaches in 5G Technology.

5. Description of Novel 5G Features over 4G

This section presents descriptions of various novel features of 5G, namely, the concept of small cell, beamforming, and MEC.

5.1. Small Cell

Small cells are low-powered cellular radio access nodes which work in the range of 10 meters to a few kilometers. Small cells play a very important role in implementation of the 5G wireless network. Small cells are low power base stations which cover small areas. Small cells are quite similar with all the previous cells used in various wireless networks. However, these cells have some advantages like they can work with low power and they are also capable of working with high data rates. Small cells help in rollout of 5G network with ultra high speed and low latency communication. Small cells in the 5G network use some new technologies like MIMO, beamforming, and mmWave for high speed data transmission. The design of small cells hardware is very simple so its implementation is quite easier and faster. There are three types of small cell tower available in the market. Femtocells, picocells, and microcells [ 83 ]. As shown in the Table 12 .

Types of Small cells.

MmWave is a very high band spectrum between 30 to 300 GHz. As it is a significantly less used spectrum, it provides very high-speed wireless communication. MmWave offers ultra-wide bandwidth for next-generation mobile networks. MmWave has lots of advantages, but it has some disadvantages, too, such as mmWave signals are very high-frequency signals, so they have more collision with obstacles in the air which cause the signals loses energy quickly. Buildings and trees also block MmWave signals, so these signals cover a shorter distance. To resolve these issues, multiple small cell stations are installed to cover the gap between end-user and base station [ 18 ]. Small cell covers a very shorter range, so the installation of a small cell depends on the population of a particular area. Generally, in a populated place, the distance between each small cell varies from 10 to 90 meters. In the survey [ 20 ], various authors implemented small cells with massive MIMO simultaneously. They also reviewed multiple technologies used in 5G like beamforming, small cell, massive MIMO, NOMA, device to device (D2D) communication. Various problems like interference management, spectral efficiency, resource management, energy efficiency, and backhauling are discussed. The author also gave a detailed presentation of all the issues occurring while implementing small cells with various 5G technologies. As shown in the Figure 7 , mmWave has a higher range, so it can be easily blocked by the obstacles as shown in Figure 7 a. This is one of the key concerns of millimeter-wave signal transmission. To solve this issue, the small cell can be placed at a short distance to transmit the signals easily, as shown in Figure 7 b.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-00026-g007.jpg

Pictorial representation of communication with and without small cells.

5.2. Beamforming

Beamforming is a key technology of wireless networks which transmits the signals in a directional manner. 5G beamforming making a strong wireless connection toward a receiving end. In conventional systems when small cells are not using beamforming, moving signals to particular areas is quite difficult. Beamforming counter this issue using beamforming small cells are able to transmit the signals in particular direction towards a device like mobile phone, laptops, autonomous vehicle and IoT devices. Beamforming is improving the efficiency and saves the energy of the 5G network. Beamforming is broadly divided into three categories: Digital beamforming, analog beamforming and hybrid beamforming. Digital beamforming: multiuser MIMO is equal to digital beamforming which is mainly used in LTE Advanced Pro and in 5G NR. In digital beamforming the same frequency or time resources can be used to transmit the data to multiple users at the same time which improves the cell capacity of wireless networks. Analog Beamforming: In mmWave frequency range 5G NR analog beamforming is a very important approach which improves the coverage. In digital beamforming there are chances of high pathloss in mmWave as only one beam per set of antenna is formed. While the analog beamforming saves high pathloss in mmWave. Hybrid beamforming: hybrid beamforming is a combination of both analog beamforming and digital beamforming. In the implementation of MmWave in 5G network hybrid beamforming will be used [ 84 ].

Wireless signals in the 4G network are spreading in large areas, and nature is not Omnidirectional. Thus, energy depletes rapidly, and users who are accessing these signals also face interference problems. The beamforming technique is used in the 5G network to resolve this issue. In beamforming signals are directional. They move like a laser beam from the base station to the user, so signals seem to be traveling in an invisible cable. Beamforming helps achieve a faster data rate; as the signals are directional, it leads to less energy consumption and less interference. In [ 21 ], investigators evolve some techniques which reduce interference and increase system efficiency of the 5G mobile network. In this survey article, the authors covered various challenges faced while designing an optimized beamforming algorithm. Mainly focused on different design parameters such as performance evaluation and power consumption. In addition, they also described various issues related to beamforming like CSI, computation complexity, and antenna correlation. They also covered various research to cover how beamforming helps implement MIMO in next-generation mobile networks [ 85 ]. Figure 8 shows the pictorial representation of communication with and without using beamforming.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-00026-g008.jpg

Pictorial Representation of communication with and without using beamforming.

5.3. Mobile Edge Computing

Mobile Edge Computing (MEC) [ 24 ]: MEC is an extended version of cloud computing that brings cloud resources closer to the end-user. When we talk about computing, the very first thing that comes to our mind is cloud computing. Cloud computing is a very famous technology that offers many services to end-user. Still, cloud computing has many drawbacks. The services available in the cloud are too far from end-users that create latency, and cloud user needs to download the complete application before use, which also increases the burden to the device [ 86 ]. MEC creates an edge between the end-user and cloud server, bringing cloud computing closer to the end-user. Now, all the services, namely, video conferencing, virtual software, etc., are offered by this edge that improves cloud computing performance. Another essential feature of MEC is that the application is split into two parts, which, first one is available at cloud server, and the second is at the user’s device. Therefore, the user need not download the complete application on his device that increases the performance of the end user’s device. Furthermore, MEC provides cloud services at very low latency and less bandwidth. In [ 23 , 87 ], the author’s investigation proved that successful deployment of MEC in 5G network increases the overall performance of 5G architecture. Graphical differentiation between cloud computing and mobile edge computing is presented in Figure 9 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-00026-g009.jpg

Pictorial representation of cloud computing vs. mobile edge computing.

6. 5G Security

Security is the key feature in the telecommunication network industry, which is necessary at various layers, to handle 5G network security in applications such as IoT, Digital forensics, IDS and many more [ 88 , 89 ]. The authors [ 90 ], discussed the background of 5G and its security concerns, challenges and future directions. The author also introduced the blockchain technology that can be incorporated with the IoT to overcome the challenges in IoT. The paper aims to create a security framework which can be incorporated with the LTE advanced network, and effective in terms of cost, deployment and QoS. In [ 91 ], author surveyed various form of attacks, the security challenges, security solutions with respect to the affected technology such as SDN, Network function virtualization (NFV), Mobile Clouds and MEC, and security standardizations of 5G, i.e., 3GPP, 5GPPP, Internet Engineering Task Force (IETF), Next Generation Mobile Networks (NGMN), European Telecommunications Standards Institute (ETSI). In [ 92 ], author elaborated various technological aspects, security issues and their existing solutions and also mentioned the new emerging technological paradigms for 5G security such as blockchain, quantum cryptography, AI, SDN, CPS, MEC, D2D. The author aims to create new security frameworks for 5G for further use of this technology in development of smart cities, transportation and healthcare. In [ 93 ], author analyzed the threats and dark threat, security aspects concerned with SDN and NFV, also their Commercial & Industrial Security Corporation (CISCO) 5G vision and new security innovations with respect to the new evolving architectures of 5G [ 94 ].

AuthenticationThe identification of the user in any network is made with the help of authentication. The different mobile network generations from 1G to 5G have used multiple techniques for user authentication. 5G utilizes the 5G Authentication and Key Agreement (AKA) authentication method, which shares a cryptographic key between user equipment (UE) and its home network and establishes a mutual authentication process between the both [ 95 ].

Access Control To restrict the accessibility in the network, 5G supports access control mechanisms to provide a secure and safe environment to the users and is controlled by network providers. 5G uses simple public key infrastructure (PKI) certificates for authenticating access in the 5G network. PKI put forward a secure and dynamic environment for the 5G network. The simple PKI technique provides flexibility to the 5G network; it can scale up and scale down as per the user traffic in the network [ 96 , 97 ].

Communication Security 5G deals to provide high data bandwidth, low latency, and better signal coverage. Therefore secure communication is the key concern in the 5G network. UE, mobile operators, core network, and access networks are the main focal point for the attackers in 5G communication. Some of the common attacks in communication at various segments are Botnet, message insertion, micro-cell, distributed denial of service (DDoS), and transport layer security (TLS)/secure sockets layer (SSL) attacks [ 98 , 99 ].

Encryption The confidentiality of the user and the network is done using encryption techniques. As 5G offers multiple services, end-to-end (E2E) encryption is the most suitable technique applied over various segments in the 5G network. Encryption forbids unauthorized access to the network and maintains the data privacy of the user. To encrypt the radio traffic at Packet Data Convergence Protocol (PDCP) layer, three 128-bits keys are applied at the user plane, nonaccess stratum (NAS), and access stratum (AS) [ 100 ].

7. Summary of 5G Technology Based on Above-Stated Challenges

In this section, various issues addressed by investigators in 5G technologies are presented in Table 13 . In addition, different parameters are considered, such as throughput, latency, energy efficiency, data rate, spectral efficiency, fairness & computing capacity, transmission rate, coverage, cost, security requirement, performance, QoS, power optimization, etc., indexed from R1 to R14.

Summary of 5G Technology above stated challenges (R1:Throughput, R2:Latency, R3:Energy Efficiency, R4:Data Rate, R5:Spectral efficiency, R6:Fairness & Computing Capacity, R7:Transmission Rate, R8:Coverage, R9:Cost, R10:Security requirement, R11:Performance, R12:Quality of Services (QoS), R13:Power Optimization).

8. Conclusions

This survey article illustrates the emergence of 5G, its evolution from 1G to 5G mobile network, applications, different research groups, their work, and the key features of 5G. It is not just a mobile broadband network, different from all the previous mobile network generations; it offers services like IoT, V2X, and Industry 4.0. This paper covers a detailed survey from multiple authors on different technologies in 5G, such as massive MIMO, Non-Orthogonal Multiple Access (NOMA), millimeter wave, small cell, MEC (Mobile Edge Computing), beamforming, optimization, and machine learning in 5G. After each section, a tabular comparison covers all the state-of-the-research held in these technologies. This survey also shows the importance of these newly added technologies and building a flexible, scalable, and reliable 5G network.

9. Future Findings

This article covers a detailed survey on the 5G mobile network and its features. These features make 5G more reliable, scalable, efficient at affordable rates. As discussed in the above sections, numerous technical challenges originate while implementing those features or providing services over a 5G mobile network. So, for future research directions, the research community can overcome these challenges while implementing these technologies (MIMO, NOMA, small cell, mmWave, beam-forming, MEC) over a 5G network. 5G communication will bring new improvements over the existing systems. Still, the current solutions cannot fulfill the autonomous system and future intelligence engineering requirements after a decade. There is no matter of discussion that 5G will provide better QoS and new features than 4G. But there is always room for improvement as the considerable growth of centralized data and autonomous industry 5G wireless networks will not be capable of fulfilling their demands in the future. So, we need to move on new wireless network technology that is named 6G. 6G wireless network will bring new heights in mobile generations, as it includes (i) massive human-to-machine communication, (ii) ubiquitous connectivity between the local device and cloud server, (iii) creation of data fusion technology for various mixed reality experiences and multiverps maps. (iv) Focus on sensing and actuation to control the network of the entire world. The 6G mobile network will offer new services with some other technologies; these services are 3D mapping, reality devices, smart homes, smart wearable, autonomous vehicles, artificial intelligence, and sense. It is expected that 6G will provide ultra-long-range communication with a very low latency of 1 ms. The per-user bit rate in a 6G wireless network will be approximately 1 Tbps, and it will also provide wireless communication, which is 1000 times faster than 5G networks.

Acknowledgments

Author contributions.

Conceptualization: R.D., I.Y., G.C., P.L. data gathering: R.D., G.C., P.L, I.Y. funding acquisition: I.Y. investigation: I.Y., G.C., G.P. methodology: R.D., I.Y., G.C., P.L., G.P., survey: I.Y., G.C., P.L, G.P., R.D. supervision: G.C., I.Y., G.P. validation: I.Y., G.P. visualization: R.D., I.Y., G.C., P.L. writing, original draft: R.D., I.Y., G.C., P.L., G.P. writing, review, and editing: I.Y., G.C., G.P. All authors have read and agreed to the published version of the manuscript.

This paper was supported by Soonchunhyang University.

Institutional Review Board Statement

Informed consent statement, data availability statement, conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Subscribe to the PwC Newsletter

Join the community, edit social preview.

networking research papers

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row, remove a task, add a method, remove a method, edit datasets, toward scalable docker-based emulations of blockchain networks for research and development.

22 Feb 2024  ·  Diego Pennino , Maurizio Pizzonia · Edit social preview

Blockchain, like any other complex technology, needs a strong testing methodology to support its evolution in both research and development contexts. Setting up meaningful tests for permissionless blockchain technology is a notoriously complex task for several reasons: software is complex, large number of nodes are involved, network is non ideal, etc. Developers usually adopt small virtual laboratories or costly real devnets, based on real software. Researchers usually prefer simulations of a large number of nodes, based on simplified models. In this paper, we aim to obtain the advantages of both approaches, i.e., performing large, realistic, inexpensive, and flexible experiments, using real blockchain software within a virtual environment. To do that, we tackle the challenge of running large blockchain networks in a single physical machine, leveraging Linux and Docker. We analyze a number of problems that arise when large blockchain networks are emulated and we provide technical solutions for all of them. Finally, we describe two experiences of emulating fairly large blockchain networks on a single machine, adopting both research oriented and production oriented software, and involving up to more than 3000 containers.

Code Edit Add Remove Mark official

Datasets edit.

Help | Advanced Search

Computer Science > Machine Learning

Title: reasoning algorithmically in graph neural networks.

Abstract: The development of artificial intelligence systems with advanced reasoning capabilities represents a persistent and long-standing research question. Traditionally, the primary strategy to address this challenge involved the adoption of symbolic approaches, where knowledge was explicitly represented by means of symbols and explicitly programmed rules. However, with the advent of machine learning, there has been a paradigm shift towards systems that can autonomously learn from data, requiring minimal human guidance. In light of this shift, in latest years, there has been increasing interest and efforts at endowing neural networks with the ability to reason, bridging the gap between data-driven learning and logical reasoning. Within this context, Neural Algorithmic Reasoning (NAR) stands out as a promising research field, aiming to integrate the structured and rule-based reasoning of algorithms with the adaptive learning capabilities of neural networks, typically by tasking neural models to mimic classical algorithms. In this dissertation, we provide theoretical and practical contributions to this area of research. We explore the connections between neural networks and tropical algebra, deriving powerful architectures that are aligned with algorithm execution. Furthermore, we discuss and show the ability of such neural reasoners to learn and manipulate complex algorithmic and combinatorial optimization concepts, such as the principle of strong duality. Finally, in our empirical efforts, we validate the real-world utility of NAR networks across different practical scenarios. This includes tasks as diverse as planning problems, large-scale edge classification tasks and the learning of polynomial-time approximate algorithms for NP-hard combinatorial problems. Through this exploration, we aim to showcase the potential integrating algorithmic reasoning in machine learning models.

Submission history

Access paper:.

  • Download PDF
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • Share full article

networking research papers

A Columbia Surgeon’s Study Was Pulled. He Kept Publishing Flawed Data.

The quiet withdrawal of a 2021 cancer study by Dr. Sam Yoon highlights scientific publishers’ lack of transparency around data problems.

Supported by

Benjamin Mueller

By Benjamin Mueller

Benjamin Mueller covers medical science and has reported on several research scandals.

  • Feb. 15, 2024

The stomach cancer study was shot through with suspicious data. Identical constellations of cells were said to depict separate experiments on wholly different biological lineages. Photos of tumor-stricken mice, used to show that a drug reduced cancer growth, had been featured in two previous papers describing other treatments.

Problems with the study were severe enough that its publisher, after finding that the paper violated ethics guidelines, formally withdrew it within a few months of its publication in 2021. The study was then wiped from the internet, leaving behind a barren web page that said nothing about the reasons for its removal.

As it turned out, the flawed study was part of a pattern. Since 2008, two of its authors — Dr. Sam S. Yoon, chief of a cancer surgery division at Columbia University’s medical center, and a more junior cancer biologist — have collaborated with a rotating cast of researchers on a combined 26 articles that a British scientific sleuth has publicly flagged for containing suspect data. A medical journal retracted one of them this month after inquiries from The New York Times.

A person walks across a covered walkway connecting two buildings over a road with parked cars. A large, blue sign on the walkway says "Columbia University Irving Medical Center."

Memorial Sloan Kettering Cancer Center, where Dr. Yoon worked when much of the research was done, is now investigating the studies. Columbia’s medical center declined to comment on specific allegations, saying only that it reviews “any concerns about scientific integrity brought to our attention.”

Dr. Yoon, who has said his research could lead to better cancer treatments , did not answer repeated questions. Attempts to speak to the other researcher, Changhwan Yoon, an associate research scientist at Columbia, were also unsuccessful.

The allegations were aired in recent months in online comments on a science forum and in a blog post by Sholto David, an independent molecular biologist. He has ferreted out problems in a raft of high-profile cancer research , including dozens of papers at a Harvard cancer center that were subsequently referred for retractions or corrections.

From his flat in Wales , Dr. David pores over published images of cells, tumors and mice in his spare time and then reports slip-ups, trying to close the gap between people’s regard for academic research and the sometimes shoddier realities of the profession.

When evaluating scientific images, it is difficult to distinguish sloppy copy-and-paste errors from deliberate doctoring of data. Two other imaging experts who reviewed the allegations at the request of The Times said some of the discrepancies identified by Dr. David bore signs of manipulation, like flipped, rotated or seemingly digitally altered images.

Armed with A.I.-powered detection tools, scientists and bloggers have recently exposed a growing body of such questionable research, like the faulty papers at Harvard’s Dana-Farber Cancer Institute and studies by Stanford’s president that led to his resignation last year.

But those high-profile cases were merely the tip of the iceberg, experts said. A deeper pool of unreliable research has gone unaddressed for years, shielded in part by powerful scientific publishers driven to put out huge volumes of studies while avoiding the reputational damage of retracting them publicly.

The quiet removal of the 2021 stomach cancer study from Dr. Yoon’s lab, a copy of which was reviewed by The Times, illustrates how that system of scientific publishing has helped enable faulty research, experts said. In some cases, critical medical fields have remained seeded with erroneous studies.

“The journals do the bare minimum,” said Elisabeth Bik, a microbiologist and image expert who described Dr. Yoon’s papers as showing a worrisome pattern of copied or doctored data. “There’s no oversight.”

Memorial Sloan Kettering, where portions of the stomach cancer research were done, said no one — not the journal nor the researchers — had ever told administrators that the paper was withdrawn or why it had been. The study said it was supported in part by federal funding given to the cancer center.

Dr. Yoon, a stomach cancer specialist and a proponent of robotic surgery, kept climbing the academic ranks, bringing his junior researcher along with him. In September 2021, around the time the study was published, he joined Columbia, which celebrated his prolific research output in a news release . His work was financed in part by half a million dollars in federal research money that year, adding to a career haul of nearly $5 million in federal funds.

The decision by the stomach cancer study’s publisher, Elsevier, not to post an explanation for the paper’s removal made it less likely that the episode would draw public attention or affect the duo’s work. That very study continued to be cited in papers by other scientists .

And as recently as last year, Dr. Yoon’s lab published more studies containing identical images that were said to depict separate experiments, according to Dr. David’s analyses.

The researchers’ suspicious publications stretch back 16 years. Over time, relatively minor image copies in papers by Dr. Yoon gave way to more serious discrepancies in studies he collaborated on with Changhwan Yoon, Dr. David said. The pair, who are not related, began publishing articles together around 2013.

But neither their employers nor their publishers seemed to start investigating their work until this past fall, when Dr. David published his initial findings on For Better Science, a blog, and notified Memorial Sloan Kettering, Columbia and the journals. Memorial Sloan Kettering said it began its investigation then.

None of those flagged studies was retracted until last week. Three days after The Times asked publishers about the allegations, the journal Oncotarget retracted a 2016 study on combating certain pernicious cancers. In a retraction notice , the journal said the authors’ explanations for copied images “were deemed unacceptable.”

The belated action was symptomatic of what experts described as a broken system for policing scientific research.

A proliferation of medical journals, they said, has helped fuel demand for ever more research articles. But those same journals, many of them operated by multibillion-dollar publishing companies, often respond slowly or do nothing at all once one of those articles is shown to contain copied data. Journals retract papers at a fraction of the rate at which they publish ones with problems.

Springer Nature, which published nine of the articles that Dr. David said contained discrepancies across five journals, said it was investigating concerns. So did the American Association for Cancer Research, which published 10 articles under question from Dr. Yoon’s lab across four journals.

It is difficult to know who is responsible for errors in articles. Eleven of the scientists’ co-authors, including researchers at Harvard, Duke and Georgetown, did not answer emailed inquiries.

The articles under question examined why certain stomach and soft-tissue cancers withstood treatment, and how that resistance could be overcome.

The two independent image specialists said the volume of copied data, along with signs that some images had been rotated or similarly manipulated, suggested considerable sloppiness or worse.

“There are examples in this set that raise pretty serious red flags for the possibility of misconduct,” said Dr. Matthew Schrag, a Vanderbilt University neurologist who commented as part of his outside work on research integrity.

One set of 10 articles identified by Dr. David showed repeated reuse of identical or overlapping black-and-white images of cancer cells supposedly under different experimental conditions, he said.

“There’s no reason to have done that unless you weren’t doing the work,” Dr. David said.

One of those papers , published in 2012, was formally tagged with corrections. Unlike later studies, which were largely overseen by Dr. Yoon in New York, this paper was written by South Korea-based scientists, including Changhwan Yoon, who then worked in Seoul.

An immunologist in Norway randomly selected the paper as part of a screening of copied data in cancer journals. That led the paper’s publisher, the medical journal Oncogene, to add corrections in 2016.

But the journal did not catch all of the duplicated data , Dr. David said. And, he said, images from the study later turned up in identical form in another paper that remains uncorrected.

Copied cancer data kept recurring, Dr. David said. A picture of a small red tumor from a 2017 study reappeared in papers in 2020 and 2021 under different descriptions, he said. A ruler included in the pictures for scale wound up in two different positions.

The 2020 study included another tumor image that Dr. David said appeared to be a mirror image of one previously published by Dr. Yoon’s lab. And the 2021 study featured a color version of a tumor that had appeared in an earlier paper atop a different section of ruler, Dr. David said.

“This is another example where this looks intentionally done,” Dr. Bik said.

The researchers were faced with more serious action when the publisher Elsevier withdrew the stomach cancer study that had been published online in 2021. “The editors determined that the article violated journal publishing ethics guidelines,” Elsevier said.

Roland Herzog, the editor of Molecular Therapy, the journal where the article appeared, said that “image duplications were noticed” as part of a process of screening for discrepancies that the journal has since continued to beef up.

Because the problems were detected before the study was ever published in the print journal, Elsevier’s policy dictated that the article be taken down and no explanation posted online.

But that decision appeared to conflict with industry guidelines from the Committee on Publication Ethics . Posting articles online “usually constitutes publication,” those guidelines state. And when publishers pull such articles, the guidelines say, they should keep the work online for the sake of transparency and post “a clear notice of retraction.”

Dr. Herzog said he personally hoped that such an explanation could still be posted for the stomach cancer study. The journal editors and Elsevier, he said, are examining possible options.

The editors notified Dr. Yoon and Changhwan Yoon of the article’s removal, but neither scientist alerted Memorial Sloan Kettering, the hospital said. Columbia did not say whether it had been told.

Experts said the handling of the article was symptomatic of a tendency on the part of scientific publishers to obscure reports of lapses .

“This is typical, sweeping-things-under-the-rug kind of nonsense,” said Dr. Ivan Oransky, co-founder of Retraction Watch, which keeps a database of 47,000-plus retracted papers. “This is not good for the scientific record, to put it mildly.”

Susan C. Beachy contributed research.

Benjamin Mueller reports on health and medicine. He was previously a U.K. correspondent in London and a police reporter in New York. More about Benjamin Mueller

Advertisement

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Effect of exercise for...

Effect of exercise for depression: systematic review and network meta-analysis of randomised controlled trials

Linked editorial.

Exercise for the treatment of depression

  • Related content
  • Peer review
  • Michael Noetel , senior lecturer 1 ,
  • Taren Sanders , senior research fellow 2 ,
  • Daniel Gallardo-Gómez , doctoral student 3 ,
  • Paul Taylor , deputy head of school 4 ,
  • Borja del Pozo Cruz , associate professor 5 6 ,
  • Daniel van den Hoek , senior lecturer 7 ,
  • Jordan J Smith , senior lecturer 8 ,
  • John Mahoney , senior lecturer 9 ,
  • Jemima Spathis , senior lecturer 9 ,
  • Mark Moresi , lecturer 4 ,
  • Rebecca Pagano , senior lecturer 10 ,
  • Lisa Pagano , postdoctoral fellow 11 ,
  • Roberta Vasconcellos , doctoral student 2 ,
  • Hugh Arnott , masters student 2 ,
  • Benjamin Varley , doctoral student 12 ,
  • Philip Parker , pro vice chancellor research 13 ,
  • Stuart Biddle , professor 14 15 ,
  • Chris Lonsdale , deputy provost 13
  • 1 School of Psychology, University of Queensland, St Lucia, QLD 4072, Australia
  • 2 Institute for Positive Psychology and Education, Australian Catholic University, North Sydney, NSW, Australia
  • 3 Department of Physical Education and Sport, University of Seville, Seville, Spain
  • 4 School of Health and Behavioural Sciences, Australian Catholic University, Strathfield, NSW, Australia
  • 5 Department of Clinical Biomechanics and Sports Science, University of Southern Denmark, Odense, Denmark
  • 6 Biomedical Research and Innovation Institute of Cádiz (INiBICA) Research Unit, University of Cádiz, Spain
  • 7 School of Health and Behavioural Sciences, University of the Sunshine Coast, Petrie, QLD, Australia
  • 8 School of Education, University of Newcastle, Callaghan, NSW, Australia
  • 9 School of Health and Behavioural Sciences, Australian Catholic University, Banyo, QLD, Australia
  • 10 School of Education, Australian Catholic University, Strathfield, NSW, Australia
  • 11 Australian Institute of Health Innovation, Macquarie University, Macquarie Park, NSW, Australia
  • 12 Children’s Hospital Westmead Clinical School, University of Sydney, Westmead, NSW, Australia
  • 13 Australian Catholic University, North Sydney, NSW, Australia
  • 14 Centre for Health Research, University of Southern Queensland, Springfield, QLD, Australia
  • 15 Faculty of Sport and Health Science, University of Jyvaskyla, Jyvaskyla, Finland
  • Correspondence to: M Noetel m.noetel{at}uq.edu.au (or @mnoetel on Twitter)
  • Accepted 15 January 2024

Objective To identify the optimal dose and modality of exercise for treating major depressive disorder, compared with psychotherapy, antidepressants, and control conditions.

Design Systematic review and network meta-analysis.

Methods Screening, data extraction, coding, and risk of bias assessment were performed independently and in duplicate. Bayesian arm based, multilevel network meta-analyses were performed for the primary analyses. Quality of the evidence for each arm was graded using the confidence in network meta-analysis (CINeMA) online tool.

Data sources Cochrane Library, Medline, Embase, SPORTDiscus, and PsycINFO databases.

Eligibility criteria for selecting studies Any randomised trial with exercise arms for participants meeting clinical cut-offs for major depression.

Results 218 unique studies with a total of 495 arms and 14 170 participants were included. Compared with active controls (eg, usual care, placebo tablet), moderate reductions in depression were found for walking or jogging (n=1210, κ=51, Hedges’ g −0.62, 95% credible interval −0.80 to −0.45), yoga (n=1047, κ=33, g −0.55, −0.73 to −0.36), strength training (n=643, κ=22, g −0.49, −0.69 to −0.29), mixed aerobic exercises (n=1286, κ=51, g −0.43, −0.61 to −0.24), and tai chi or qigong (n=343, κ=12, g −0.42, −0.65 to −0.21). The effects of exercise were proportional to the intensity prescribed. Strength training and yoga appeared to be the most acceptable modalities. Results appeared robust to publication bias, but only one study met the Cochrane criteria for low risk of bias. As a result, confidence in accordance with CINeMA was low for walking or jogging and very low for other treatments.

Conclusions Exercise is an effective treatment for depression, with walking or jogging, yoga, and strength training more effective than other exercises, particularly when intense. Yoga and strength training were well tolerated compared with other treatments. Exercise appeared equally effective for people with and without comorbidities and with different baseline levels of depression. To mitigate expectancy effects, future studies could aim to blind participants and staff. These forms of exercise could be considered alongside psychotherapy and antidepressants as core treatments for depression.

Systematic review registration PROSPERO CRD42018118040.

Figure1

  • Download figure
  • Open in new tab
  • Download powerpoint

Introduction

Major depressive disorder is a leading cause of disability worldwide 1 and has been found to lower life satisfaction more than debt, divorce, and diabetes 2 and to exacerbate comorbidities, including heart disease, 3 anxiety, 4 and cancer. 5 Although people with major depressive disorder often respond well to drug treatments and psychotherapy, 6 7 many are resistant to treatment. 8 In addition, access to treatment for many people with depression is limited, with only 51% treatment coverage for high income countries and 20% for low and lower-middle income countries. 9 More evidence based treatments are therefore needed.

Exercise may be an effective complement or alternative to drugs and psychotherapy. 10 11 12 13 14 In addition to mental health benefits, exercise also improves a range of physical and cognitive outcomes. 15 16 17 Clinical practice guidelines in the US, UK, and Australia recommend physical activity as part of treatment for depression. 18 19 20 21 But these guidelines do not provide clear, consistent recommendations about dose or exercise modality. British guidelines recommend group exercise programmes 20 21 and offer general recommendations to increase any form of physical activity, 21 the American Psychiatric Association recommends any dose of aerobic exercise or resistance training, 20 and Australian and New Zealand guidelines suggest a combination of strength and vigorous aerobic exercises, with at least two or three bouts weekly. 19

Authors of guidelines may find it hard to provide consistent recommendations on the basis of existing mainly pairwise meta-analyses—that is, assessing a specific modality versus a specific comparator in a distinct group of participants. 12 13 22 These meta-analyses have come under scrutiny for pooling heterogeneous treatments and heterogenous comparisons leading to ambiguous effect estimates. 23 Reviews also face the opposite problem, excluding exercise treatments such as yoga, tai chi, and qigong because grouping them with strength training might be inappropriate. 23 Overviews of reviews have tried to deal with this problem by combining pairwise meta-analyses on individual treatments. A recent such overview found no differences between exercise modalities. 13 Comparing effect sizes between different pairwise meta-analyses can also lead to confusion because of differences in analytical methods used between meta-analysis, such as choice of a control to use as the referent. Network meta-analyses are a better way to precisely quantify differences between interventions as they simultaneously model the direct and indirect comparisons between interventions. 24

Network meta-analyses have been used to compare different types of psychotherapy and pharmacotherapy for depression. 6 25 26 For exercise, they have shown that dose and modality influence outcomes for cognition, 16 back pain, 15 and blood pressure. 17 Two network meta-analyses explored the effects of exercise on depression: one among older adults 27 and the other for mental health conditions. 28 Because of the inclusion criteria and search strategies used, these reviews might have been under-powered to explore moderators such as dose and modality (κ=15 and κ=71, respectively). To resolve conflicting findings in existing reviews, we comprehensively searched randomised trials on exercise for depression to ensure our review was adequately powered to identify the optimal dose and modality of exercise. For example, a large overview of reviews found effects on depression to be proportional to intensity, with vigorous exercise appearing to be better, 13 but a later meta-analysis found no such effects. 22 We explored whether recommendations differ based on participants’ sex, age, and baseline level of depression.

Given the challenges presented by behaviour change in people with depression, 29 we also identified autonomy support or behaviour change techniques that might improve the effects of intervention. 30 Behaviour change techniques such as self-monitoring and action planning have been shown to influence the effects of physical activity interventions in adults (>18 years) 31 and older adults (>60 years) 32 with differing effectiveness of techniques in different populations. We therefore tested whether any intervention components from the behaviour change technique taxonomy were associated with higher or lower intervention effects. 30 Other meta-analyses found that physical activity interventions work better when they provide people with autonomy (eg, choices, invitational language). 33 Autonomy is not well captured in the taxonomy for behaviour change technique. We therefore tested whether effects were stronger in studies that provided more autonomy support to patients. Finally, to understand the mechanism of intervention effects, such as self-confidence, affect, and physical fitness, we collated all studies that conducted formal mediation analyses.

Our findings are presented according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses-Network Meta-analyses (PRISMA-NMA) guidelines (see supplementary file, section S0; all supplementary files, data, and code are also available at https://osf.io/nzw6u/ ). 34 We amended our analysis strategy after registering our review; these changes were to better align with new norms established by the Cochrane Comparing Multiple Interventions Methods Group. 35 These norms were introduced between the publication of our protocol and the preparation of this manuscript. The largest change was using the confidence in network meta-analysis (CINeMA) 35 online tool instead of the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) guidelines and adopting methods to facilitate assessments—for example, instead of using an omnibus test for all treatments, we assessed publication bias for each treatment compared with active controls. We also modelled acceptability (through dropout rate), which was not predefined but was adopted in response to a reviewer’s comment.

Eligibility criteria

To be eligible for inclusion, studies had to be randomised controlled trials that included exercise as a treatment for depression and included participants who met the criteria for major depressive disorder, either clinician diagnosed or identified through participant self-report as exceeding established clinical thresholds (eg, scored >13 on the Beck depression inventory-II). 36 Studies could meet these criteria when all the participants had depression or when the study reported depression outcomes for a subgroup of participants with depression at the start of the study.

We defined exercise as “planned, structured and repetitive bodily movement done to improve or maintain one or more components of physical fitness.” 37 Unlike recent reviews, 12 22 we included studies with more than one exercise arm and multifaceted interventions (eg, health and exercise counselling) as long as they contained a substantial exercise component. These trials could be included because network meta-analysis methods allows for the grouping of those interventions into homogenous nodes. Unlike the most recent Cochrane review, 12 we also included participants with physical comorbidities such as arthritis and participants with postpartum depression because the Diagnostic Statistical Manual of Mental Health Disorders , fifth edition, removed the postpartum onset specifier after that analysis was completed. 23 Studies were excluded if interventions were shorter than one week, depression was not reported as an outcome, and data were insufficient to calculate an effect size for each arm. Any comparison condition was included, allowing us to quantify the effects against established treatments (eg, selective serotonin reuptake inhibitors (SSRIs), cognitive behavioural therapy), active control conditions (usual care, placebo tablet, stretching, educational control, and social support), or waitlist control conditions. Published and unpublished studies were included, with no restrictions on language applied.

Information sources

We adapted the search strategy from the most recent Cochrane review, 12 adding keywords for yoga, tai chi, and qigong, as they met our definition for exercise. We conducted database searches, without filters or date limits, in The Cochrane Library via CENTRAL, SPORTDiscus via Embase, and Medline, Embase, and PsycINFO via Ovid. Searches of the databases were conducted on 17 December 2018 and 7 August 2020 and last updated on 3 June 2023 (see supplementary file section S1 for full search strategies). We assessed full texts of all included studies from two systematic reviews of exercise for depression. 12 22

Study selection and data collection

To select studies, we removed duplicate records in Covidence 38 and then screened each title and abstract independently and in duplicate. Conflicts were resolved through discussion or consultation with a third reviewer. The same methods were used for full text screening.

We used the Extraction 1.0 randomised controlled trial data extraction forms in Covidence. 38 Data were extracted independently and in duplicate, with conflicts resolved through discussion with a third reviewer.

For each study, we extracted a description of the interventions, including frequency, intensity, and type and time of each exercise intervention. Using the Compendium of Physical Activities, 39 we calculated the energy expenditure dose of exercise for each arm as metabolic equivalents of task (METs) min/week. Two authors evaluated each exercise intervention using the Behaviour Change Taxonomy version 1 30 for behaviour change techniques explicitly described in each exercise arm. They also rated the level of autonomy offered to participants, on a scale from 1 (no choice) to 10 (full autonomy). We also extracted descriptions of the other arms within the randomised trials, including other treatment or control conditions; participants’ age, sex, comorbidities, and baseline severity of depressive symptoms; and each trial’s location and whether or not the trial was funded.

Risk of bias in individual studies

We used Cochrane’s risk of bias tool for randomised controlled trials. 40 Risk of bias was rated independently and in duplicate, with conflicts resolved through discussion with a third reviewer.

Summary measures and synthesis

For main and moderation analyses, we used bayesian arm based multilevel network meta-analysis models. 41 All network meta-analytical approaches allow users to assess the effects of treatments against a range of comparisons. The bayesian arm based models allowed us to also assess the influence of hypothesised moderators, such as intensity, dose, age, and sex. Many network meta-analyses use contrast based methods, comparing post-test scores between study arms. 41 Arm based meta-analyses instead describe the population-averaged absolute effect size for each treatment arm (ie, each arm’s change score). 41 As a result, the summary measure we used was the standardised mean change from baseline, calculated as standardised mean differences with correction for small studies (Hedges’ g). In keeping with the norms from the included studies, effect sizes describe treatment effects on depression, such that larger negative numbers represent stronger effects on symptoms. Using National Institute for Health and Care Excellence guidelines, 42 we standardised change scores for different depression scales (eg, Beck depression inventory, Hamilton depression rating scale) using an internal reference standard for each scale (for each scale, the average of pooled standard deviations at baseline) reported in our meta-analysis. Because depression scores generally show regression to the mean, even in control conditions, we present effect sizes as improvements beyond active control conditions. This convention makes our results comparable to existing, contrast based meta-analyses.

Active control conditions (usual care, placebo tablet, stretching, educational control, and social support) were grouped to increase power for moderation analyses, for parsimony in the network graph, and because they all showed similar arm based pooled effect sizes (Hedges’ g between −0.93 and −1.00 for all, with no statistically significant differences). We separated waitlist control from these active control conditions because it typically shows poorer effects in treatment for depression. 43

Bayesian meta-analyses were conducted in R 44 using the brms package. 45 We preregistered informative priors based on the distributional parameters of our meta-analytical model. 46 We nested effects within arms to manage dependency between multiple effect sizes from the same participants. 46 For example, if one study reported two self-reported measures of depression, or reported both self-report and clinician rated depression, we nested these effect sizes within the arm to account for both pieces of information while controlling for dependency between effects. 46 Finally, we compared absolute effect sizes against a standardised minimum clinically important difference, 0.5 standard deviations of the change score. 47 From our data, this corresponded to a large change in before and after scores (Hedges’ g −1.16), a moderate change compared with waitlist control (g −0.55), or a small benefit when compared with active controls (g −0.20). For credibility assessments comparing exercise modalities, we used the netmeta package 48 and CINeMA. 49 We also used netmeta to model acceptability, comparing the odds ratio for drop-out rate in each arm.

Additional analyses

All prespecified moderation and sensitivity analyses were performed. We moderated for participant characteristics, including participants’ sex, age, baseline symptom severity, and presence or absence of comorbidities; duration of the intervention (weeks); weekly dose of the intervention; duration between completion of treatment and measurement, to test robustness to remission (in response to a reviewer’s suggestion); amount of autonomy provided in the exercise prescription; and presence of each behaviour change technique. As preregistered, we moderated for behaviour change techniques in three ways: through meta-regression, including all behaviour change techniques simultaneously for primary analysis; including one behaviour change technique at a time (using 99% credible intervals to somewhat control for multiple comparisons) in exploratory analyses; and through meta-analytical classification and regression trees (metaCART), which allowed for interactions between moderating variables (eg, if goal setting combined with feedback had synergistic effects). 50 We conducted sensitivity analyses for risk of bias, assessing whether studies with low versus unclear or high risk of bias on each domain showed statistically significant differences in effect sizes.

Credibility assessment

To assess the credibility of each comparison against active control, we used CINeMA. 35 49 This online tool was designed by the Cochrane Comparing Multiple Interventions Methods Group as an adaptation of GRADE for network meta-analyses. 35 In line with recommended guidelines, for each comparison we made judgements for within study bias, reporting bias, indirectness, imprecision, heterogeneity, and incoherence. Similar to GRADE, we considered the evidence for comparisons to show high confidence then downgraded on the basis of concerns in each domain, as follows:

Within study bias —Comparisons were downgraded when most of the studies providing direct evidence for comparisons were unclear or high risk.

Reporting bias —Publication bias was assessed in three ways. For each comparison with at least 10 studies 51 we created funnel plots, including estimates of effect sizes after removing studies with statistically significant findings (ie, worst case estimates) 52 ; calculated an s value, representing how strong publication bias would need to be to nullify meta-analytical effects 52 ; and conducted a multilevel Egger’s regression test, indicative of small study bias. Given these tests are not recommended for comparisons with fewer than 10 studies, 51 those comparisons were considered to show “some concerns.”

Indirectness — Our primary population of interest was adults with major depression. Studies were considered to be indirect if they focused on one sex only (>90% male or female), participants with comorbidities (eg, heart disease), adolescents and young adults (14-20 years), or older adults (>60 years). We flagged these studies as showing some concerns if one of these factors was present, and as “major concerns” if two of these factors were present. Evidence from comparisons was classified as some concerns or major concerns using majority rating for studies directly informing the comparison.

Imprecision — As per CINeMA, we used the clinically important difference of Hedges’ g=0.2 to ascribe a zone of equivalence, where differences were not considered clinically significant (−0.2<g<0.2). Studies were flagged as some concerns for imprecision if the bounds of the 95% credible interval extended across that zone, and they were flagged as major concerns if the bounds extended to the other side of the zone of equivalence (such that effects could be harmful).

Heterogeneity — Prediction intervals account for heterogeneity differently from credible intervals. 35 As a result, CINeMA accounts for heterogeneity by assessing whether the prediction intervals and the credible intervals lead to different conclusions about clinical significance (using the same zone of equivalence from imprecision). Comparisons are flagged as some concerns if the prediction interval crosses into, or out of, the zone of equivalence once (eg, from helpful to no meaningful effect), and as major concerns if the prediction interval crosses the zone twice (eg, from helpful and harmful).

Incoherence — Incoherence assesses whether the network meta-analysis provides similar estimates when using direct evidence (eg, randomised controlled trials on strength training versus SSRI) compared with indirect evidence (eg, randomised controlled trials where either strength training or SSRI uses waitlist control). Incoherence provides some evidence the network may violate the assumption of transitivity: that the only systematic difference between arms is the treatment, not other confounders. We assessed incoherence using two methods: Firstly, a global design-by-treatment interaction to assess for incoherence across the whole network, 35 49 and, secondly, separating indirect and direct evidence (SIDE method) for each comparison through netsplitting to see whether differences between those effect estimates were statistically significant. We flagged comparisons as some concerns if either no direct comparisons were available or direct and indirect evidence gave different conclusions about clinical significance (eg, from helpful to no meaningful effect, as per imprecision and heterogeneity). Again, we classified comparisons as major concerns if the direct and indirect evidence changed the sign of the effect or changed both limits of the credible interval. 35 49

Patient and public involvement

We discussed the aims and design of this study with members of the public, including those who had experienced depression. Several of our authors have experienced major depressive episodes, but beyond that we did not include patients in the conduct of this review.

Study selection

The PRISMA flow diagram outlines the study selection process ( fig 1 ). We used two previous reviews to identify potentially eligible studies for inclusion. 12 22 Database searches identified 18 658 possible studies. After 5505 duplicates had been removed, two reviewers independently screened 13 115 titles and abstracts. After screening, two reviewers independently reviewed 1738 full text articles. Supplementary file section S2 shows the consensus reasons for exclusion. A total of 218 unique studies described in 246 reports were included, totalling 495 arms and 14 170 participants. Supplementary file section S3 lists the references and characteristics of the included studies.

Fig 1

Flow of studies through review

Network geometry

As preregistered, we removed nodes with fewer than 100 participants. Using this filter, most interventions contained comparisons with at least four other nodes in the network geometry ( fig 2 ). The results of the global test design-by-treatment interaction model were not statistically significant, supporting the assumption of transitivity (χ 2 =94.92, df=75, P=0.06). When net-splitting was used on all possible combinations in the network, for two out of the 120 comparisons we found statistically significant incoherence between direct and indirect evidence (SSRI v waitlist control; cognitive behavioural therapy v tai chi or qigong). Overall, we found little statistical evidence that the model violated the assumption of transitivity. Qualitative differences were, however, found for participant characteristics between different arms (see supplementary file, section S4). For example, some interventions appeared to be prescribed more frequently among people with severe depression (eg, 7/16 studies using SSRIs) compared with other interventions (eg, 1/15 studies using aerobic exercise combined with therapy). Similarly, some interventions appeared more likely to be prescribed for older adults (eg, mean age, tai chi=59 v dance=31) or women (eg, per cent female: dance=88% v cycling=53%). Given that plausible mechanisms exist for these systematic differences (eg, the popularity of tai chi among older adults), 53 there are reasons to believe that allocation to treatment arms would be less than perfectly random. We have factored these biases in our certainty estimates through indirectness ratings.

Fig 2

Network geometry indicating number of participants in each arm (size of points) and number of comparisons between arms (thickness of lines). SSRI=selective serotonin reuptake inhibitor

Risk of bias within studies

Supplementary file section S5 provides the risk of bias ratings for each study. Few studies explicitly blinded participants and staff ( fig 3 ). As a result, overall risk of bias for most studies was unclear or high, and effect sizes could include expectancy effects, among other biases. However, sensitivity analyses suggested that effect sizes were not influenced by any risk of bias criteria owing to wide credible intervals (see supplementary file, section S6). Nevertheless, certainty ratings for all treatments arms were downgraded owing to high risk of bias in the studies informing the comparison.

Fig 3

Risk of bias summary plot showing percentage of included studies judged to be low, unclear, or high risk across Cochrane criteria for randomised trials

Synthesis of results

Supplementary file section S7 presents a forest plot of Hedges’ g values for each study. Figure 4 shows the predicted effects of each treatment compared with active controls. Compared with active controls, large reductions in depression were found for dance (n=107, κ=5, Hedges’ g −0.96, 95% credible interval −1.36 to −0.56) and moderate reductions for walking or jogging (n=1210, κ=51, g −0.63, −0.80 to −0.46), yoga (n=1047, κ=33, g=−0.55, −0.73 to −0.36), strength training (n=643, κ=22, g=−0.49, −0.69 to −0.29), mixed aerobic exercises (n=1286, κ=51, g=−0.43, −0.61 to −0.25), and tai chi or qigong (n=343, κ=12, g=−0.42, −0.65 to −0.21). Moderate, clinically meaningful effects were also present when exercise was combined with SSRIs (n=268, κ=11, g=−0.55, −0.86 to −0.23) or aerobic exercise was combined with psychotherapy (n=404, κ=15, g=−0.54, −0.76 to −0.32). All these treatments were significantly stronger than the standardised minimum clinically important difference compared with active control (g=−0.20), equating to an absolute g value of −1.16. Dance, exercise combined with SSRIs, and walking or jogging were the treatments most likely to perform best when modelling the surface under the cumulative ranking curve ( fig 4 ). For acceptability, the odds of participants dropping out of the study were lower for strength training (n=247, direct evidence κ=6, odds ratio 0.55, 95% credible interval 0.31 to 0.99) and yoga (n=264, κ=5, 0.57, 0.35 to 0.94) than for active control. The rate of dropouts was not significantly different from active control in any other arms (see supplementary file, section S8).

Fig 4

Predicted effects of different exercise modalities on major depression compared with active controls (eg, usual care), with 95% credible intervals. The estimate of effects for the active control condition was a before and after change of Hedges’ g of −0.95 (95% credible interval −1.10 to −0.79), n=3554, κ =113. Colour represents SUCRA from most likely to be helpful (dark purple) to least likely to be helpful (light purple). SSRI=selective serotonin reuptake inhibitor; SUCRA=surface under the cumulative ranking curve

Consistent with other meta-analyses, effects were moderate for cognitive behaviour therapy alone (n=712, κ=20, g=−0.55, −0.75 to −0.37) and small for SSRIs (n=432, κ=16, g=−0.26, −0.50 to −0.01) compared with active controls ( fig 4 ). These estimates are comparable to those of reviews that focused directly on psychotherapy (g=−0.67, −0.79 to −0.56) 7 or pharmacotherapy (g=−0.30, –0.34 to −0.26). 25 However, our review was not designed to find all studies of these treatments, so these estimates should not usurp these directly focused systematic reviews.

Despite the large number of studies in the network, confidence in the effects were low ( fig 5 ). This was largely due to the high within study bias described in the risk of bias summary plot. Reporting bias was also difficult to robustly assess because direct comparison with active control was often only provided in fewer than 10 studies. Many studies focused on one sex only, older adults, or those with comorbidities, so most arms had some concerns about indirect comparisons. Credible intervals were seldom wide enough to change decision making, so concerns about imprecision were few. Heterogeneity did plausibly change some conclusions around clinical significance. Few studies showed problematic incoherence, meaning direct and indirect evidence usually agreed. Overall, walking or jogging had low confidence, with other modalities being very low.

Fig 5

Summary table for credibility assessment using confidence in network meta-analysis (CINeMA). SSRI=selective serotonin reuptake inhibitor

Moderation by participant characteristics

The optimal modality appeared to be moderated by age and sex. Compared with models that only included exercise modality (R 2 =0.65), R 2 was higher for models that included interactions with sex (R 2 =0.71) and age (R 2 =0.69). R 2 showed no substantial increase for models including baseline depression (R 2 =0.67) or comorbidities (R 2 =0.66; see supplementary file, section S9).

Effects appeared larger for women than men for strength training and cycling ( fig 6 ). Effects appeared to be larger for men than women when prescribing yoga, tai chi, and aerobic exercise alongside psychotherapy. Yoga and aerobic exercise alongside psychotherapy appeared more effective for older participants than younger people ( fig 7 ). Strength training appeared more effective when prescribed to younger participants than older participants. Some estimates were associated with substantial uncertainty because some modalities were not well studied in some groups (eg, tai chi for younger adults), and mean age of the sample was only available for 71% of the studies.

Fig 6

Effects of interventions versus active control on depression (lower is better) by sex. Shading represents 95% credible intervals

Fig 7

Effects of interventions versus active control on depression (lower is better) by age. Shading represents 95% credible intervals

Moderation by intervention and design characteristics

Across modalities, a clear dose-response curve was observed for intensity of exercise prescribed ( fig 8 ). Although light physical activity (eg, walking, hatha yoga) still provided clinically meaningful effects (g=−0.58, −0.82 to −0.33), expected effects were stronger for vigorous exercise (eg, running, interval training; g=−0.74, −1.10 to −0.38). This finding did not appear to be due to increased weekly energy expenditure: credible intervals were wide, which meant that the dose-response curve for METs/min prescribed per week was unclear (see supplementary file, section S10). Weak evidence suggested that shorter interventions (eg, 10 weeks: g=−0.53, −0.71 to −0.35) worked somewhat better than longer ones (eg, 30 weeks: g=−0.37, −0.79 to 0.03), with wide credible intervals again indicating high uncertainty (see supplementary file, section S11). We also moderated for the lag between the end of treatment and the measurement of the outcome. We found no indication that participants were likely to relapse within the measurement period (see supplementary file, section S12); effects remained steady when measured either directly after the intervention (g=−0.59, −0.80 to −0.39) or up to six months later (g=−0.63, −0.87 to −0.40).

Fig 8

Dose-response curve for intensity (METs) across exercise modalities compared with active control. METs=metabolic equivalents of task

Supplementary file section S13 provides coding for the behaviour change techniques and autonomy for each exercise arm. None of the behaviour change techniques significantly moderated overall effects. Contrary to expectations, studies describing a level of participant autonomy (ie, choice over frequency, intensity, type, or time) tended to show weaker effects (g=−0.28, −0.78 to 0.23) than those that did not (g=−0.75, −1.17 to −0.33; see supplementary file, section S14). This effect was consistent whether or not we included studies that used physical activity counselling (usually high autonomy).

Use of group exercise appeared to moderate the effects: although the overall effects were similar for individual (g=−1.10, −1.57 to −0.64) and group exercise (g=−1.16, −1.61 to −0.73), some interventions were better delivered in groups (yoga) and some were better delivered individually (strength training, mixed aerobic exercise; see supplementary file, section S15).

As preregistered, we tested whether study funding moderated effects. Models that included whether a study was funded did explain more variance (R 2 =0.70) compared with models that included treatment alone (R 2 =0.65). Funded studies showed stronger effects (g=−1.01, −1.19 to −0.82) than unfunded studies (g=−0.77, −1.09 to −0.46). We also moderated for the type of measure (self-report v clinician report). This did not explain a substantial amount of variance in the outcome (R 2 =0.66).

Sensitivity analyses

Evidence of publication bias was found for overall estimates of exercise on depression compared with active controls, although not enough to nullify effects. The multilevel Egger’s test showed significance (F 1,98 =23.93, P<0.001). Funnel plots showed asymmetry, but the result of pooled effects remained statistically significant when only including non-significant studies (see supplementary file, section S16). No amount of publication bias would be sufficient to shrink effects to zero (s value=not possible). To reduce effects below clinical significance thresholds, studies with statistically significant results would need to be reported 58 times more frequently than studies with non-significant results.

Qualitative synthesis of mediation effects

Only a few of the studies used explicit mediation analyses to test hypothesised mechanisms of action. 54 55 56 57 58 59 One study found that both aerobic exercise and yoga led to decreased depression because participants ruminated less. 54 The study found that the effects of aerobic exercise (but not yoga) were mediated by increased acceptance. 54 “Perceived hassles” and awareness were not statistically significant mediators. 54 Another study found that the effects of yoga were mediated by increased self-compassion, but not rumination, self-criticism, tolerance of uncertainty, body awareness, body trust, mindfulness, and attentional biases. 55 One study found that the effects from an aerobic exercise intervention were not mediated by long term physical activity, but instead were mediated by exercise specific affect regulation (eg, self-control for exercise). 57 Another study found that neither exercise self-efficacy nor depression coping self-efficacy mediated effects of aerobic exercise. 56 Effects of aerobic exercise were not mediated by the N2 amplitude from electroencephalography, hypothesised as a neuro-correlate of cognitive control deficits. 58 Increased physical activity did not appear to mediate the effects of physical activity counselling on depression. 59 It is difficult to infer strong conclusions about mechanisms on the basis of this small number of studies with low power.

Summary of evidence

In this systematic review and meta-analysis of randomised controlled trials, exercise showed moderate effects on depression compared with active controls, either alone or in combination with other established treatments such as cognitive behaviour therapy. In isolation, the most effective exercise modalities were walking or jogging, yoga, strength training, and dancing. Although walking or jogging were effective for both men and women, strength training was more effective for women, and yoga or qigong was more effective for men. Yoga was somewhat more effective among older adults, and strength training was more effective among younger people. The benefits from exercise tended to be proportional to the intensity prescribed, with vigorous activity being better. Benefits were equally effective for different weekly doses, for people with different comorbidities, or for different baseline levels of depression. Although confidence in many of the results was low, treatment guidelines may be overly conservative by conditionally recommending exercise as complementary or alternative treatment for patients in whom psychotherapy or pharmacotherapy is either ineffective or unacceptable. 60 Instead, guidelines for depression ought to include prescriptions for exercise and consider adapting the modality to participants’ characteristics and recommending more vigorous intensity exercises.

Our review did not uncover clear causal mechanisms, but the trends in the data are useful for generating hypotheses. It is unlikely that any single causal mechanism explains all the findings in the review. Instead, we hypothesise that a combination of social interaction, 61 mindfulness or experiential acceptance, 62 increased self-efficacy, 33 immersion in green spaces, 63 neurobiological mechanisms, 64 and acute positive affect 65 combine to generate outcomes. Meta-analyses have found each of these factors to be associated with decreases in depressive symptoms, but no single treatment covers all mechanisms. Some may more directly promote mindfulness (eg, yoga), be more social (eg, group exercise), be conducted in green spaces (eg, walking), provide a more positive affect (eg, “runner’s high”’), or be more conducive to acute adaptations that may increase self-efficacy (eg, strength). 66 Exercise modalities such as running may satisfy many of the mechanisms, but they are unlikely to directly promote the mindful self-awareness provided by yoga and qigong. Both these forms of exercise are often practised in groups with explicit mindfulness but seldom have fast and objective feedback loops that improve self-efficacy. Adequately powered studies testing multiple mediators may help to focus more on understanding why exercise helps depression and less on whether exercise helps. We argue that understanding these mechanisms of action is important for personalising prescriptions and better understanding effective treatments.

Our review included more studies than many existing reviews on exercise for depression. 13 22 27 28 As a result, we were able to combine the strengths of various approaches to exercise and to make more nuanced and precise conclusions. For example, even taking conservative estimates (ie, the least favourable end of the credible interval), practitioners can expect patients to experience clinically significant effects from walking, running, yoga, qigong, strength training, and mixed aerobic exercise. Because we simultaneously assessed more than 200 studies, credible intervals were narrower than those in most existing meta-analyses. 13 We were also able to explore non-linear relationships between outcomes and moderators, such as frequency, intensity, and time. These analyses supported some existing findings—for example, our study and the study by Heissel et al 22 found that shorter interventions had stronger effects, at least for six months; our study and the study by Singh et al 13 both found that effects were stronger with vigorous intensity exercise compared with light and moderate exercise. However, most existing reviews found various treatment modalities to be equally effective. 13 27 In our review, some types of exercise had stronger effect sizes than others. We attribute this to the study level data available in a network meta-analysis compared with an overview of reviews 24 and higher power compared with meta-analyses with smaller numbers of included studies. 22 28 Overviews of reviews have the ability to more easily cover a wider range of participants, interventions, and outcomes, but also risk double counting randomised trials that are included in separate meta-analyses. They often include heterogeneous studies without having as much control over moderation analyses (eg, Singh et al included studies covering both prevention and treatment 13 ). Some of those reviews grouped interventions such as yoga with heterogeneous interventions such as stretching and qigong. 13 This practise of combining different interventions makes it harder to interpret meta-analytical estimates. We used methods that enabled us to separately analyse the effects of these treatment modalities. In so doing, we found that these interventions do have different effects, with yoga being an intervention with strong effects and stretching being better described as an active control condition. Network meta-analyses revealed the same phenomenon with psychotherapy: researchers once concluded there was a dodo bird verdict, whereby “everybody has won, and all must have prizes,” 67 until network meta-analyses showed some interventions were robustly more effective than others. 6 26

Predictors of acceptability and outcomes

We found evidence to suggest good acceptability of yoga and strength training; although the measurement of study drop-out is an imperfect proxy of adherence. Participants may complete the study without doing any exercise or may continue exercising and drop out of the study for other reasons. Nevertheless, these are useful data when considering adherence.

Behaviour change techniques, which are designed to increase adherence, did not meaningfully moderate the effect sizes from exercise. This may be due to several factors. It may be that the modality explains most of the variance between effects, such that behaviour change techniques (eg, presence or absence of feedback) did not provide a meaningful contribution. Many forms of exercise potentially contain therapeutic benefits beyond just energy expenditure. These characteristics of a modality may be more influential than coexisting behaviour change techniques. Alternatively, researchers may have used behaviour change techniques such as feedback or goal setting without explicitly reporting them in the study methods. Given the inherent challenges of behaviour change among people with depression, 29 and the difficulty in forecasting which strategies are likely to be effective, 68 we see the identification of effective techniques as important.

We did find that autonomy, as provided in the methods of included studies, predicted effects, but in the opposite direction to our hypotheses: more autonomy was associated with weaker effects. Physical activity counselling, which usually provides a great deal of patient autonomy, was among the lowest effect sizes in our meta-analysis. Higher autonomy judgements were associated with weaker outcomes regardless of whether physical activity counselling was included in the model. One explanation for these data is that people with depression benefit from the clear direction and accountability of a standardised prescription. When provided with more freedom, the low self-efficacy that is symptomatic of depression may stop patients from setting an appropriate level of challenge (eg, they may be less likely to choose vigorous exercise). Alternatively, participants were likely autonomous when self-selecting into trials with exercise modalities they enjoyed, or those that fit their social circumstances. After choosing something value aligned, autonomy within the trial may not have helpful. Either way, data should be interpreted with caution. Our judgement of the autonomy provided in the methods may not reflect how much autonomy support patients actually felt. The patient’s perceived autonomy is likely determined by a range of factors not described in the methods (eg, the social environment created by those delivering the programme, or their social identity), so other studies that rely on patient reports of the motivational climate are likely to be more reliable. 33 Our findings reiterate the importance of considering these patient reports in future research of exercise for depression.

Our findings suggest that practitioners could advocate for most patients to engage in exercise. Those patients may benefit from guidance on intensity (ie, vigorous) and types of exercise that appear to work well (eg, walking, running, mixed aerobic exercise, strength training, yoga, tai chi, qigong) and be well tolerated (eg, strength training and yoga). If social determinants permit, 66 engaging in group exercise or structured programmes could provide support and guidance to achieve better outcomes. Health services may consider offering these programmes as an alternative or adjuvant treatment for major depression. Specifically, although the confidence in the evidence for exercise is less strong than for cognitive behavioural therapy, the effect sizes seem comparable, so it may be an alternative for patients who prefer not to engage in psychotherapy. Previous reviews on those with mild-moderate depression have found similar effects for exercise or SSRIs, or the two combined. 13 14 In contrast, we found some forms of exercise to have stronger effects than SSRIs alone. Our findings are likely related to the larger power in our review (n=14 170) compared with previous reviews (eg, n=2551), 14 and our ability to better account for heterogeneity in exercise prescriptions. Exercise may therefore be considered a viable alternative to drug treatment. We also found evidence that exercise increases the effects of SSRIs, so offering exercise may act as an adjuvant for those already taking drugs. We agree with consensus statements that professionals should still account for patients’ values, preferences, and constraints, ensuring there is shared decision making around what best suits the patient. 66 Our review provides data to help inform that decision.

Strengths, limitations, and future directions

Based on our findings, dance appears to be a promising treatment for depression, with large effects found compared with other interventions in our review. But the small number of studies, low number of participants, and biases in the study designs prohibits us from recommending dance more strongly. Given most research for the intervention has been in young women (88% female participants, mean age 31 years), it is also important for future research to assess the generalisability of the effects to different populations, using robust experimental designs.

The studies we found may be subject to a range of experimental biases. In particular, researchers seldom blinded participants or staff delivering the intervention to the study’s hypotheses. Blinding for exercise interventions may be harder than for drugs 23 ; however, future studies could attempt to blind participants and staff to the study’s hypotheses to avoid expectancy effects. 69 Some of our ratings are for studies published before the proliferation of reporting checklists, so the ratings might be too critical. 23 For example, before CONSORT, few authors explicitly described how they generated a random sequence. 23 Therefore, our risk of bias judgements may be too conservative. Similarly, we planned to use the Cochrane risk of bias (RoB) 1 tool 40 so we could use the most recent Cochrane review of exercise and depression 12 to calibrate our raters, and because RoB 2 had not yet been published. 70 Although assessments of bias between the two tools are generally comparable, 71 the RoB 1 tool can be more conservative when assessing open label studies with subjective assessments (eg, unblinded studies with self-reported measures for depression). 71 As a result, future reviews should consider using the latest risk of bias tool, which may lead to different assessments of bias in included studies.

Most of the main findings in this review appear robust to risks from publication bias. Specifically, pooled effect sizes decreased when accounting for risk of publication bias, but no degree of publication bias could nullify effects. We did not exclude grey literature, but our search strategy was not designed to systematically search grey literature or trial registries. Doing so can detect additional eligible studies 72 and reveal the numbers of completed studies that remain unpublished. 73 Future reviews should consider more systematic searches for this kind of literature to better quantify and mitigate risk of publication bias.

Similarly, our review was able to integrate evidence that directly compared exercise with other treatment modalities such as SSRIs or psychotherapy, while also informing estimates using indirect evidence (eg, comparing the relative effects of strength training and SSRIs when tested against a waitlist control). Our review did not, however, include all possible sources of indirect evidence. Network meta-analyses exist that directly focus on psychotherapy 7 and pharmacotherapy, 25 and these combined for treating depression. 6 Those reviews include more than 500 studies comparing psychological or drug interventions with controls. Harmonising the findings of those reviews with ours would provide stronger data on indirect effects.

Our review found some interesting moderators by age and sex, but these were at the study level rather than individual level—that is, rather than being able to determine whether women engaging in a strength intervention benefit more than men, we could only conclude that studies with more women showed larger effects than studies with fewer women. These studies may have been tailored towards women, so effects may be subject to confounding, as both sex and intervention may have changed. The same finding applied to age, where studies on older adults were likely adapted specifically to this age group. These between study differences may explain the heterogeneity in the effects of interventions, and confounding means our moderators for age and sex should be interpreted cautiously. Future reviews should consider individual patient meta-analyses to allow for more detailed assessments of participant level moderators.

Finally, for many modalities, the evidence is derived from small trials (eg, the median number of walking or jogging arms was 17). In addition to reducing risks from bias, primary research may benefit from deconstruction designs or from larger, head-to-head analyses of exercise modalities to better identify what works best for each candidate.

Clinical and policy implications

Our findings support the inclusion of exercise as part of clinical practice guidelines for depression, particularly vigorous intensity exercise. Doing so may help bridge the gap in treatment coverage by increasing the range of first line options for patients and health systems. 9 Globally there has been an attempt to reduce stigma associated with seeking treatment for depression. 74 Exercise may support this effort by providing patients with treatment options that carry less stigma. In low resource or funding constrained settings, group exercise interventions may provide relatively low cost alternatives for patients with depression and for health systems. When possible, ideal treatment may involve individualised care with a multidisciplinary team, where exercise professionals could take responsibility for ensuring the prescription is safe, personalised, challenging, and supported. In addition, those delivering psychotherapy may want to direct some time towards tackling cognitive and behavioural barriers to exercise. Exercise professionals might need to be trained in the management of depression (eg, managing risk) and to be mindful of the scope of their practice while providing support to deal with this major cause of disability.

Conclusions

Depression imposes a considerable global burden. Many exercise modalities appear to be effective treatments, particularly walking or jogging, strength training, and yoga, but confidence in many of the findings was low. We found preliminary data that may help practitioners tailor interventions to individuals (eg, yoga for older men, strength training for younger women). The World Health Organization recommends physical activity for everyone, including those with chronic conditions and disabilities, 75 but not everyone can access treatment easily. Many patients may have physical, psychological, or social barriers to participation. Still, some interventions with few costs, side effects, or pragmatic barriers, such as walking and jogging, are effective across people with different personal characteristics, severity of depression, and comorbidities. Those who are able may want to choose more intense exercise in a structured environment to further decrease depression symptoms. Health systems may want to provide these treatments as alternatives or adjuvants to other established interventions (cognitive behaviour therapy, SSRIs), while also attenuating risks to physical health associated with depression. 3 Therefore, effective exercise modalities could be considered alongside those intervention as core treatments for depression.

What is already known on this topic

Depression is a leading cause of disability, and exercise is often recommended alongside first line treatments such as pharmacotherapy and psychotherapy

Treatment guidelines and previous reviews disagree on how to prescribe exercise to best treat depression

What this study adds

Various exercise modalities are effective (walking, jogging, mixed aerobic exercise, strength training, yoga, tai chi, qigong) and well tolerated (especially strength training and yoga)

Effects appeared proportional to the intensity of exercise prescribed and were stronger for group exercise and interventions with clear prescriptions

Preliminary evidence suggests interactions between types of exercise and patients’ personal characteristics

Ethics statements

Ethical approval.

Not required.

Acknowledgments

We thank Lachlan McKee for his assistance with data extraction. We also thank Juliette Grosvenor and another librarian (anonymous) for their review of our search strategy.

Contributors: MN led the project, drafted the manuscript, and is the guarantor. MN, TS, PT, MM, BdPC, PP, SB, and CL drafted the initial study protocol. MN, TS, PT, BdPC, DvdH, JS, MM, RP, LP, RV, HA, and BV conducted screening, extraction, and risk of bias assessment. MN, JS, and JM coded methods for behaviour change techniques. MN and DGG conducted statistical analyses. PP, SB, and CL provided supervision and mentorship. All authors reviewed and approved the final manuscript. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: None received.

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

Data sharing Data and code for reproducing analyses are available on the Open Science Framework ( https://osf.io/nzw6u/ ).

The lead author (MN) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

Dissemination to participants and related patient and public communities: We plan to disseminate the findings of this study to lay audiences through mainstream and social media.

Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

  • ↵ World Health Organization. Depression. 2020 [cited 2020 Mar 12]. https://www.who.int/news-room/fact-sheets/detail/depression
  • ↵ Birkjær M, Kaats M, Rubio A. Wellbeing adjusted life years: A universal metric to quantify the happiness return on investment. Happiness Research Institute; 2020. https://www.happinessresearchinstitute.com/waly-report
  • Jacobson NC ,
  • Pinquart M ,
  • Duberstein PR
  • Cuijpers P ,
  • Karyotaki E ,
  • Vinkers CH ,
  • Cipriani A ,
  • Furukawa TA
  • Strawbridge R ,
  • Marwood L ,
  • Santomauro D ,
  • Collins PY ,
  • Generaal E ,
  • Lawlor DA ,
  • Cooney GM ,
  • Recchia F ,
  • Miller CT ,
  • Mundell NL ,
  • Gallardo-Gómez D ,
  • Del Pozo-Cruz J ,
  • Álvarez-Barbosa F ,
  • Alfonso-Rosa RM ,
  • Del Pozo Cruz B
  • Salcher-Konrad M ,
  • ↵ National Collaborating Centre for Mental Health (UK). Depression: The Treatment and Management of Depression in Adults (Updated Edition). Leicester (UK): British Psychological Society; https://www.ncbi.nlm.nih.gov/pubmed/22132433
  • Bassett D ,
  • ↵ American Psychiatric Association. Practice Guideline for the Treatment of Patients with Major Depressive Disorder. Third Edition. Washington, DC: American Psychiatric Association; 2010. 87 p. https://psychiatryonline.org/pb/assets/raw/sitewide/practice_guidelines/guidelines/mdd-1410197717630.pdf
  • ↵ NICE. Depression in adults: treatment and management. [cited 2023 Mar 13]. National Institute for Health and Care Excellence; 2022 https://www.nice.org.uk/guidance/ng222/resources
  • Heissel A ,
  • Brokmeier LL ,
  • Ekkekakis P
  • ↵ Chaimani A, Caldwell DM, Li T, Higgins JPT, Salanti G. Undertaking network meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al., editors. Cochrane Handbook for Systematic Reviews of Interventions. Cochrane; 2022. www.training.cochrane.org/handbook
  • Furukawa TA ,
  • Salanti G ,
  • Miller KJ ,
  • Gonçalves-Bradley DC ,
  • Areerob P ,
  • Hennessy D ,
  • Mesagno C ,
  • Glowacki K ,
  • Duncan MJ ,
  • Gainforth H ,
  • Richardson M ,
  • Johnston M ,
  • Abraham C ,
  • Whittington C ,
  • McAteer J ,
  • French DP ,
  • Olander EK ,
  • Chisholm A ,
  • Mc Sharry J
  • Ntoumanis N ,
  • Prestwich A ,
  • Caldwell DM ,
  • Nikolakopoulou A ,
  • Higgins JPT ,
  • Papakonstantinou T ,
  • Caspersen CJ ,
  • Powell KE ,
  • Christenson GM
  • ↵ Veritas Health Innovation. Covidence systematic review software. Melbourne, Australia; 2023. www.covidence.org
  • Ainsworth BE ,
  • Haskell WL ,
  • Herrmann SD ,
  • Altman DG ,
  • Gøtzsche PC ,
  • Cochrane Bias Methods Group ,
  • Cochrane Statistical Methods Group
  • Hodges JS ,
  • ↵ Dias S, Welton NJ, Sutton AJ, Ades AE. NICE DSU technical support document 2: a generalised linear modelling framework for pairwise and network meta-analysis of randomised controlled trials. In: National Institute for Health and Care Excellence (NICE), editor. NICE Decision Support Unit Technical Support Documents. London: Citeseer; 2011. https://www.ncbi.nlm.nih.gov/books/NBK310366/
  • Faltinsen E ,
  • Todorovac A ,
  • Staxen Bruun L ,
  • ↵ R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2022. https://www.R-project.org/
  • Hengartner MP ,
  • Balduzzi S ,
  • Dusseldorp E ,
  • Sterne JAC ,
  • Sutton AJ ,
  • Ioannidis JPA ,
  • Mathur MB ,
  • VanderWeele TJ
  • Leung LYL ,
  • La Rocque CL ,
  • Mazurka R ,
  • Stuckless TJR ,
  • Harkness KL
  • Vollbehr NK ,
  • Hoenders HJR ,
  • Bartels-Velthuis AA ,
  • Zeibig JM ,
  • Seiffer B ,
  • Ehmann PJ ,
  • Alderman BL
  • Bombardier CH ,
  • Gibbons LE ,
  • ↵ American Psychological Association. Clinical practice guideline for the treatment of depression across three age cohorts. American Psychological Association; 2019. https://www.apa.org/depression-guideline/
  • van Straten A ,
  • Reynolds CF 3rd .
  • Johannsen M ,
  • Nissen ER ,
  • Lundorff M ,
  • Coventry PA ,
  • Schuch FB ,
  • Deslandes AC ,
  • Gosmann NP ,
  • Fleck MP de A
  • Saunders DH ,
  • Phillips SM
  • Teychenne M ,
  • Hunsley J ,
  • Di Giulio G
  • Milkman KL ,
  • Hecksteden A ,
  • Savović J ,
  • ↵ Richter B, Hemmingsen B. Comparison of the Cochrane risk of bias tool 1 (RoB 1) with the updated Cochrane risk of bias tool 2 (RoB 2). Cochrane; 2021. Report No.: 1. https://community.cochrane.org/sites/default/files/uploads/inline-files/RoB1_2_project_220529_BR%20KK%20formatted.pdf
  • Chandler J ,
  • Lefebvre C ,
  • Glanville J ,
  • Briscoe S ,
  • Coronado-Montoya S ,
  • Kwakkenbos L ,
  • Steele RJ ,
  • Turner EH ,
  • Angermeyer MC ,
  • van der Auwera S ,
  • Schomerus G
  • Al-Ansari SS ,

networking research papers

5G, 6G, and Beyond: Recent advances and future challenges

  • Published: 20 January 2023
  • Volume 78 , pages 525–549, ( 2023 )

Cite this article

  • Fatima Salahdine   ORCID: orcid.org/0000-0003-4330-906X 1 ,
  • Tao Han 2 &
  • Ning Zhang 3  

2215 Accesses

11 Citations

Explore all metrics

With the high demand for advanced services and the increase in the number of connected devices, current wireless communication systems are required to expand to meet the users’ needs in terms of quality of service, throughput, latency, connectivity, and security. 5G, 6G, and Beyond (xG) aim at bringing new radical changes to shake the wireless communication networks where everything will be fully connected fulfilling the requirements of ubiquitous connectivity over the wireless networks. This rapid revolution will transform the world of communication with more intelligent and sophisticated services and devices leading to new technologies operating over very high frequencies and broader bands. To achieve the objectives of the xG networks, several key technology enablers need to be performed, including massive MIMO, software-defined networking, network function virtualization, vehicular to everything, mobile edge computing, network slicing, terahertz, visible light communication, virtualization of the network infrastructure, and intelligent communication environment. In this paper, we investigated the recent advancements in the 5G/6G and Beyond systems. We highlighted and analyzed their different key technology enablers and use cases. We also discussed potential issues and future challenges facing the new wireless networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

networking research papers

https://www.statista.com/statistics/245501/multiple-mobile-device-ownership-worldwide/ https://www.statista.com/statistics/245501/multiple-mobile-device-ownership-worldwide/

Oughton E, Frias Z, Russell T, Sicker D, Cleevely DD (2018) Towards 5g: scenario-based assessment of the future supply and demand for mobile telecommunications infrastructure. Technol Forecast Soc Chang 133:141–155

Article   Google Scholar  

Yu H, Lee H, Jeon H (2017) What is 5g? emerging 5g mobile services and network requirements. Sustainability 9(10):1848

Giordani M, Polese M, Mezzavilla M, Rangan S, Zorzi M (2020) Toward 6g networks: use cases and technologies. IEEE Commun Mag 58(3):55–61

Zhang Z, Xiao Y, Ma Z, Xiao M, Ding Z, Lei X, Karagiannidis GK, Fan P (2019) 6G wireless networks: vision, requirements, architecture, and key technologies. IEEE Veh Technol Mag 14 (3):28–41

Frenger P, Tano R (2019) A technical look at 5g energy consumption and performance

Bogale TE, Le LB (2016) Massive MIMO and mmWave for 5G wireless hetnet: potential benefits and challenges. IEEE Veh Technol Mag 11(1):64–75

Shafi M, Jha RK, Sabraj M (2020) A survey on security issues of 5G NR: perspective of artificial dust and artificial rain. J Netw Comput Appl, vol 160

Zaidi Z, Friderikos V, Yousaf Z, Fletcher S, Dohler M, Aghvami H (2018) Will SDN be part of 5G? IEEE Commun Surveys Tutor 20(4):3220–3258

Cho H. -H., Lai C. -F., Shih TK, Chao H. -C. (2014) Integration of SDR and SDN for 5G. Ieee Access 2:1196–1204

Bizanis N, Kuipers FA (2016) SDN And virtualization solutions for the internet of things: a survey. IEEE Access 4:5591–5606

Sun S, Gong L, Rong B, Lu K (2015) An intelligent SDN framework for 5G heterogeneous networks. IEEE Commun Mag 53(11):142–147

Gandotra P, Jha RK (2017) A survey on green communication and security challenges in 5G wireless communication networks. J Netw Comput Appl 96:39–61

Gao Z, Dai L, Mi D, Wang Z, Imran MA, Shakir MZ (2015) Mmwave massive-MIMO-based wireless backhaul for the 5G ultra-dense network. IEEE Wirel Commun 22(5):13–21

Salem AA, El-Rabaie S, Shokair M (2020) A proposed efficient hybrid precoding algorithm for millimeter wave massive MIMO 5G networks. Wirel Pers Commun 112(1):149–167

Kour H, JHA R (2020) Half duplex radio: towards green 5G NR. IEEE Consum Electron Mag

ZHANG P, Tao YZ, ZHANG Z (2016) Survey of several key technologies for 5G. J Commun 37(7):15–29

Google Scholar  

An J, Yang K, Wu J, Ye N, Guo S, Liao Z (2017) Achieving sustainable ultra-dense heterogeneous networks for 5G. IEEE Commun Mag 55(12):84–90

Dighriri M, Alfoudi ASD, Lee GM, Baker T (2016) Data traffic model in machine to machine communications over 5G network slicing. In: 2016 9th international conference on developments in eSystems engineering (deSE). IEEE, pp 239–244

Afolabi I, Taleb T, Samdanis K, Ksentini A, Flinck H (2018) Network slicing and softwarization: a survey on principles, enabling technologies, and solutions. IEEE Commun Surveys Tutor 20(3):2429–2453

Da Silva I, Mildh G, Kaloxylos A, Spapis P, Buracchini E, Trogolo A, Zimmermann G, Bayer N (2016) Impact of network slicing on 5G radio access networks. In: 2016 European conference on networks and communications (EuCNC). IEEE, pp 153–157

Dissanayak MB, Ekanayake N (2021) On the exact performance analysis of molecular communication via diffusion for internet of bio-nano things. IEEE Trans Nanobiosci

Mahmoud HHH, Amer AA, Ismail T (2021) 6G: a comprehensive survey on technologies, applications, challenges, and research problems. Trans Emerging Telecommun Technol, pp e4233

Akyildiz IF, Kak A (2019) The internet of space things/cubesats: a ubiquitous cyber-physical system for the connected world. Comput Netw 150:134–149

Du L, Li L, Ngo HQ, Mai TC, Matthaiou M (2021) Cell-free massive mimo. IEEE Trans Commun, Joint maximum-ratio and zero-forcing precoder with power control

Akyildiz IF, Kak A, Nie S (2020) 6G and beyond: the future of wireless communications systems. IEEE Access 8:133995–134030

Cao J, Ma M, Li H, Ma R, Sun Y, Yu P, Xiong L (2019) A survey on security aspects for 3GPP 5G networks. IEEE Commun Surveys Tutor 22(1):170–195

Choudhary G, Kim J, Sharma V (2018) Security of 5G-mobile backhaul networks: a survey. J Wireless Mobile Netw, Ubiquitous Comput Depend Appl 9(4):41–70

Suomalainen J, Juhola A, Shahabuddin S, Mämmelä A., Ahmad I (2020) Machine learning threatens 5g security. IEEE Access 8:190822–190842

Alturfi S. M, Marhoon H. A, Al-Musawi B (2020) Internet of things security techniques: a survey. AIP Conf Proc 2290(1):040016. AIP Publishing LLC

Thembelihle D, Rossi M, Munaretto D (2017) Softwarization of mobile network functions towards agile and energy efficient 5g architectures: a survey. Wireless Commun Mobile Comput, vol 2017

Gupta A, Jha RK (2015) A survey of 5g network: architecture and emerging technologies. IEEE access 3:1206–1232

Liolis K, Geurtz A, Sperber R, Schulz D, Watts S, Poziopoulou G, Evans B, Wang N, Vidal O, Tiomela Jou B et al (2019) Use cases and scenarios of 5g integrated satellite-terrestrial networks for enhanced mobile broadband: the sat5g approach. Int J Satell Commun Netw 37(2):91–112

Ji X, Huang K, Jin L, Tang H, Liu C, Zhong Z, You W, Xu X, Zhao H, Wu J, Yi M (2018) Overview of 5G csecurity technology. Science China Information Sciences 61(8):1–25

Ferrag MA, Maglaras L, Argyriou A, Kosmanos D, Janicke H (2018) Security for 4G and 5G cellular networks: a survey of existing authentication and privacy-preserving schemes. J Netw Comput Appl 101:55–82

Zhang S, Wang Y, Zhou W (2019) Towards secure 5G networks: a Survey. Comput Netw 162:106871

Saad W, Bennis M, Chen M (2020) A vision of 6g wireless systems: applications, trends, technologies, and open research problems. IEEE Netw 34(3):134–142

Dibaei M, Ghaffari A (2020) Full-duplex medium access control protocols in wireless networks: a survey. Wirel Netw 26(4):2825–2843

Foukas X, Patounas G, Elmokashfi A, Marina MK (2017) Network slicing in 5G: survey and challenges. IEEE Commun Mag 55(5):94–100

Ordonez-Lucena J, Ameigeiras P, Lopez D, Ramos-Munoz JJ, Lorca J, Folgueira J (2017) Network slicing for 5g with sdn/nfv: concepts, architectures, and challenges. IEEE Commun Mag 55(5):80–87

Hong S (2019) Security vulnerability and countermeasure on 5G networks survey. J Digital Convergence 17(12):197–202

Walia JS, Hämmäinen H, Kilkki K, Yrjölä S (2019) 5G network slicing strategies for a smart factory. Comput Ind 111:108–120

Vu TK, Liu C. -F., Bennis M, Debbah M, Latva-Aho M, Hong CS (2017) Ultra-reliable and low latency communication in mmwave-enabled massive mimo networks. IEEE Commun Lett 21 (9):2041–2044

Jungnickel V, Manolakis K, Zirwas W, Panzner B, Braun V, Lossow M, Sternad M, Apelfröjd R., Svensson T (2014) The role of small cells, coordinated multipoint, and massive mimo in 5g. IEEE commun Magazine 52(5):44–51

Liu X, Liu Y, Wang X, Lin H (2017) Highly efficient 3-d resource allocation techniques in 5g for noma-enabled massive mimo and relaying systems. IEEE J Select Areas Commun 35(12):2785–2797

Albreem MA, Alsharif MH, Kim S (2020) A low complexity near-optimal iterative linear detector for massive MIMO in realistic radio channels of 5G communication systems. Entropy 4:22

MathSciNet   Google Scholar  

Mishra PK, Pandey S, Biswash SK (2016) Efficient resource management by exploiting d2d communication for 5g networks. IEEE Access 4:9910–9922

Yang H, Seet B-C, Hasan SF, Chong PHJ, Chung MY (2016) Radio resource allocation for d2d-enabled massive machine communication in the 5g era, in. In: 2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, pp 55–60

Wang M, Yan Z (2015) Security in D2D communications: a review. In: 2015 IEEE Trustcom/BigDataSE/ISPA,. IEEE, vol 1, pp 1199–1204

Ahmad I, Kumar T, Liyanage M, Okwuibe J, Ylianttila M, Gurtov A (2017) 5G security: analysis of threats and solutions. In: 2017 IEEE conference on standards for communications and networking, CSCN 2017, pp 193–199

Ahmad I, Shahabuddin S, Kumar T, Okwuibe J, Gurtov A, Ylianttila M (2019) Security for 5G and beyond. IEEE Commun Surveys Tutor 21(4):3682–3722

Anderson J, Hu H, Agarwal U, Lowery C, Li H, Apon A (2016) Performance considerations of network functions virtualization using containers. In: 2016 international conference on computing, networking and communications (ICNC). IEEE, pp 1–7

Herrera JdJG, Vega JFB (2016) Network functions virtualization: a survey. IEEE Lat Am Trans 14(2):983–997

Zhou Y, Yu W (2014) Optimized backhaul compression for uplink cloud radio access network. IEEE J Select Areas Commun 32(6):1295–1307

Han Q, Liang S, Zhang H (2015) Mobile cloud sensing, big data, and 5g networks make an intelligent and smart world. IEEE Netw 29(2):40–45

Barbarossa S, Sardellitti S, Di Lorenzo P (2014) Communicating while computing: distributed mobile cloud computing over 5G heterogeneous networks. IEEE Signal Proc Mag 31(6):45–55

Mao Y, You C, Zhang J, Huang K, Letaief KB (2017) Mobile edge computing: Survey and research outlook, arXiv: 1701.01090

Beck MT, Werner M, Feld S, Schimper S (2014) Mobile edge computing: a taxonomy. In: Proc of the sixth international conference on advances in future internet. Citeseer, pp 48–55

Hu YC, Patel M, Sabella D, Sprecher N, Young V (2015) Mobile edge computing—a key technology towards 5g. ETSI White Paper 11(11):1–16

Wang H-M, Zheng T-X, Yuan J, Towsley D, Lee MH (2016) Physical layer security in heterogeneous cellular networks. IEEE Trans Commun 64(3):1204–1219

Yang H, Alphones A, Xiong Z, Niyato D, Zhao J, Wu K (2020) Artificial-intelligence-enabled intelligent 6g networks. IEEE Netw 34(6):272–280

Papadopoulos H, Wang C, Bursalioglu O, Hou X, Kishiyama Y (2016) Massive MIMO technologies and challenges towards 5G. IEICE Trans Commun E99B(3):602–621

Borgaonkar R, Redon K, Seifert J. -P. (2011) Security analysis of a femtocell device. In: Proceedings of the 4th international conference on security of information and networks, pp 95–102

Gohil A, Modi H, Patel SK (2013) 5G technology of mobile communication: a survey. In: 2013 international conference on intelligent systems and signal processing. ISSP 2013, pp 288–292

Ahmad I, Liyanage M, Shahabuddin S, Ylianttila M, Gurtov A (2018) Design principles for 5G security. A Comprehensive Guide to 5G Security:75–98

Wang W, Zhang Q (2014) Local cooperation architecture for self-healing femtocell networks. IEEE Wirel Commun 21(2):42–49

Letaief KB, Chen W, Shi Y, Zhang J, Zhang Y. -J. A. (2019) The roadmap to 6g: Ai empowered wireless networks. IEEE Commun Mag 57(8):84–90

Popovski P, Trillingsgaard KF, Simeone O, Durisi G (2018) 5G wireless network slicing for embb, urllc, and mmtc. A communication-theoretic view, Ieee Access 6:55765–55779

Li X, Ni R, Chen J, Lyu Y, Rong Z, Du R (2020) End-to-end network slicing in radio access network, transport network and core network domains. IEEE Access 8:29525–29537

Khan LU, Yaqoob I, Tran NH, Han Z, Hong CS (2020) Network slicing: recent advances, taxonomy, requirements, and open research challenges. IEEE Access 8:36009–36028

Routray SK, Mohanty S (2020) Why 6g?: motivation and expectations of next-generation cellular networks, arXiv: 1903.04837

Panwar N, Sharma S, Singh AK (2016) A survey on 5G: the next generation of mobile communication. Physical Commun 18:64–84

Chih-Lin I, Rowell C, Han S, Xu Z, Li G, Pan Z (2014) Toward green and soft: a 5g perspective. IEEE Commun Mag 52(2):66–73

Boccardi F, Heath RW, Lozano A, Marzetta TL, Popovski P (2014) Five disruptive technology directions for 5g. IEEE Commun Mag 52(2):74–80

An J, Yang K, Wu J, Ye N, Guo S, Liao Z (2017) Achieving sustainable ultra-dense heterogeneous networks for 5g. IEEE Commun Mag 55(12):84–90

Hossain E, Rasti M, Tabassum H, Abdelnasser A (2014) Evolution toward 5g multi-tier cellular wireless networks: an interference management perspective. IEEE Wirel Commun 21(3):118–127

Salahdine F, Ghazi HE, Kaabouch N, Fihri WF (2016) Matched filter detection with dynamic threshold for cognitive radio networks. Int Conf Wireless Netw Mobile Commun, WINCOM 2015

Salahdine F, Ghribi E, Kaabouch N (2020) Metrics for evaluating the efficiency of compressing sensing techniques, in. In: 2020 international conference on information networking (ICOIN). IEEE, pp 562–567

Chen K, Duan R (2011) C-ran the road towards green ran. China Mobile Res Inst, White Paper, vol 2

Liu J, Zhao T, Zhou S, Cheng Y, Niu Z (2014) Concert: a cloud-based architecture for next-generation cellular systems. IEEE Wirel Commun 21(6):14–22

Wu J, Zhang Z, Hong Y, Wen Y (2015) Cloud radio access network (c-ran): a primer. IEEE Netw 29(1):35–41

Wang M, Zhu T, Zhang T, Zhang J, Yu S, Zhou W (2020) Security and privacy in 6g networks: new areas and new challenges. Digital Communications Netw 6(3):281–291

Usman M, Gebremariam AA, Raza U, Granelli F (2015) A software-defined device-to-device communication architecture for public safety applications in 5g networks. IEEE Access 3:1649–1654

Akyildiz IF, Nie S, Lin S-C, Chandrasekaran M (2016) 5g roadmap: 10 key enabling technologies. Comput Netw 106:17–48

De Ree M, Mantas G, Radwan A, Mumtaz S, Rodriguez J, Otung IE (2019) Key management for beyond 5g mobile small cells: a survey. IEEE Access 7:59200–59236

Li QC, Niu H, Papathanassiou AT, Wu G (2014) 5G network capacity: key elements and technologies. IEEE Veh Technol Mag 9(1):71–78

Maghsudi S, Hossain E (2016) Multi-armed bandits with application to 5g small cells. IEEE Wirel Commun 23(3):64–73

Sun S, Kadoch M, Gong L, Rong B (2015) Integrating network function virtualization with sdr and sdn for 4g/5g networks. IEEE Netw 29(3):54–59

Athley F, Tombaz S, Semaan E, Tidestav C, Furuskär A (2015) Providing extreme mobile broadband using higher frequency bands, beamforming, and carrier aggregation. In: 2015 IEEE 26th annual international symposium on personal, indoor, and mobile radio communications (PIMRC). IEEE, pp 1370–1374

Guo J, Durrani S, Zhou X, Yanikomeroglu H (2017) Massive machine type communication with data aggregation and resource scheduling. IEEE Trans Commun 65(9):4012–4026

Datsika E, Antonopoulos A, Zorba N, Verikoukis C (2017) Software defined network service chaining for ott service providers in 5g networks. IEEE Commun Mag 55(11):124–131

Mezzavilla M, Zhang M, Polese M, Ford R, Dutta S, Rangan S, Zorzi M (2018) End-to-end simulation of 5g mmwave networks. IEEE Commun Surveys Tutor 20(3):2237–2263

Sakai M, Kamohara K, Iura H, Nishimoto H, Ishioka K, Murata Y, Yamamoto M, Okazaki A, Nonaka N, Suyama S, Mashino J, Okamura A, Okumura Y (2020) Experimental field trials on MU-MIMO transmissions for high SHF wide-band massive MIMO in 5G. IEEE Trans Wirel Commun 19(4):2196–2207

Prasad KNV, Hossain E, Bhargava VK (2017) Energy efficiency in massive MIMO-based 5G networks: opportunities and challenges. IEEE Wirel Commun 24(3):86–94

de Almeida AM, Lenzi MK, Lenzi EK (2020) A survey of fractional order calculus applications of multiple-input, multiple-output (Mimo) process control. Fractal Frac 4(2):1–31

Carrera DF, Vargas-Rosales C, Azpilicueta L, Galaviz-Aguilar JA (2020) Comparative study of channel estimators for massive MIMO 5G NR systems. IET Commun 14(7):1175–1184

Araújo D. C., Maksymyuk T, de Almeida AL, Maciel T, Mota JC, Jo M (2016) Massive MIMO: survey and future research topics. IET Commun 10(15):1938–1946

Loh TH, Heliot F, Cheadle D, Fielder T (2020) An assessment of the radio frequency electromagnetic field exposure from a massive MIMO 5G testbed:1–5

Sellami A, Nasraoui L, Atallah LN (2020) Multi-stage localization for massive MIMO 5G systems. IEEE Vehicular Technol Conf, vol 2020

Panzner B, Zirwas W, Dierks S, Lauridsen M, Mogensen P, Pajukoski K, Miao D (2014) Deployment and implementation strategies for massive MIMO in 5G. In: 2014 IEEE Globecom Workshops, GC Wkshps 2014, pp 346–351

Dai B, Yu W (2014) Sparse beamforming and user-centric clustering for downlink cloud radio access network. IEEE Access 2:1326–1339

Article   MathSciNet   Google Scholar  

Wu S, Wang CX, Aggoune EHM, Alwakeel MM, He Y (2014) A non-stationary 3-D wideband twin-cluster model for 5G massive MIMO channels. IEEE J Select Areas Commun 32(6):1207–1218

Jungnickel V, Manolakis K, Zirwas W, Panzner B, Braun V, Lossow M, Sternad M, Apelfrȯjd R., Svensson T (2014) The role of small cells, coordinated multipoint, and massive MIMO in 5G. IEEE Commun Mag 52(5):44–51

Agrawal SK, Sharma K (2016) 5g millimeter wave (mmwave) communications. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp 3630–3634

Akoum S, El Ayach O, Heath RW (2012) Coverage and capacity in mmwave cellular systems. In: 2012 conference record of the forty sixth asilomar conference on signals, systems and computers (ASILOMAR), pp 688–692

Niu Y, Li Y, Jin D, Su L, Vasilakos AV (2015) A survey of millimeter wave communications (mmwave) for 5g: opportunities and challenges. Wireless Netw 21(8):2657–2676

Giordani M, Mezzavilla M, Zorzi M (2016) Initial access in 5g mmwave cellular networks. IEEE Commun Mag 54(11):40– 47

Giordani M, Polese M, Roy A, Castor D, Zorzi M (2018) A tutorial on beam management for 3gpp nr at mmwave frequencies. IEEE Commun Surveys Tutor 21(1):173–196

Akyildiz IF, Lee W-Y, Chowdhury KR (2009) Crahns: cognitive radio ad hoc networks. AD hoc networks 7(5):810–836

Ahmad I, Kumar T, Liyanage M, Okwuibe J, Ylianttila M, Gurtov A (2018) Overview of 5G security challenges and solutions. IEEE Commun Standards Magazine 2(1):36–43

Li Y, Phan LTX, Loo BT (2016) Network functions virtualization with soft real-time guarantees. In: IEEE INFOCOM 2016-The 35th annual IEEE international conference on computer communications. IEEE, pp 1–9

Siddique U, Tabassum H, Hossain E, Kim DI (2015) Wireless backhauling of 5g small cells: challenges and solution approaches. IEEE Wirel Commun 22(5):22–31

Dong Y, Chawla NV, Swami A (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 135–144

Wang N, Hossain E, Bhargava VK (2015) Backhauling 5g small cells: a radio resource management perspective. IEEE Wirel Commun 22(5):41–49

Afolabi I, Taleb T, Samdanis K, Ksentini A, Flinck H (2018) Network slicing and softwarization: a survey on principles, enabling technologies, and solutions. IEEE Commun Surveys Tutorials 20 (3):2429–2453

Moreno Y, Pastor-Satorras R, Vespignani A (2002) Epidemic outbreaks in complex heterogeneous networks. European Phys J B-Condensed Matter Complex Syst 26(4):521–529

Mogensen P, Pajukoski K, Tiirola E, Vihriala J, Lahetkangas E, Berardinelli G, Tavares FM, Mahmood NH, Lauridsen M, Catania D et al (2014) Centimeter-wave concept for 5g ultra-dense small cells. In: 2014 IEEE 79th vehicular technology conference (VTC Spring). IEEE, pp 1–6

Rao RS, Kumar Ashish, Srivastava N (2020) Full-duplex wireless communication in cognitive radio networks: a survey. In: Advances in VLSI, communication, and signal processing. Springer, pp 261–277

Zhao Y (2020) A survey of 6G wireless communications: emerging technologies, pp 1–10

Quadri A, Manesh MR, Kaabouch N (2017) Noise cancellation in cognitive radio systems: a performance comparison of evolutionary algorithms. In: 2017 IEEE 7th annual computing and communication workshop and conference (CCWC). IEEE, pp 1–7

Mishra PK, Pandey S, Biswash SK (2016) Efficient resource management by exploiting D2D communication for 5G networks. IEEE Access 4:9910–9922

Karachontzitis S, Timotheou S, Krikidis I, Berberidis K (2014) Security-aware max–min resource allocation in multiuser ofdma downlink. IEEE Trans Inf Forensics Security 10(3):529–542

Li Y, Zhou T, Xu J, Li Z, Wang H (2011) Adaptive tdd ul/dl slot utilization for cellular controlled d2d communications. In: 2011 Global mobile congress. IEEE, pp 1–6

Akpakwu GA, Silva BJ, Hancke GP, Abu-Mahfouz AM (2017) A survey on 5g networks for the internet of things: communication technologies and challenges. IEEE Access 6:3619–3647

Salahdine F, Kaabouch N (2020) Security threats, detection, and countermeasures for physical layer in cognitive radio networks: a survey. Phys Commun 39:101001

Zhao M, Kumar A, Ristaniemi T, Chong PHJ (2017) Machine-to-machine communication and research challenges: a survey. Wirel Pers Commun 97(3):3569–3585

Weyrich M, Schmidt J-P, Ebert C (2014) Machine-to-machine communication. IEEE Softw 31(4):19–23

Amodu OA, Othman M (2018) Machine-to-machine communication: an overview of opportunities. Comput Netw 145:255–276

Ali A, Shah GA, Farooq MO, Ghani U (2017) Technologies and challenges in developing machine-to-machine applications: a survey. J Netw Comput Appl 83:124–139

Wunder G, Jung P, Kasparick M, Wild T, Schaich F, Chen Y, Ten Brink S, Gaspar I, Michailow N, Festag A et al (2014) 5gnow: non-orthogonal, asynchronous waveforms for future mobile applications. IEEE Commun Mag 52(2):97–105

Ejaz W, Anpalagan A, Imran MA, Jo M, Naeem M, Qaisar SB, Wang W (2016) Internet of things (iot) in 5g wireless communications. IEEE Access 4:10310–10314

Dighriri M, Alfoudi ASD, Lee GM, Baker T (2016) Data traffic model in machine to machine communications over 5g network slicing. In: 2016 9th International conference on developments in eSystems engineering (deSE). IEEE, pp 239–244

Garcia-Roger D, González EE, Martín-Sacristán D, Monserrat JF (2020) V2x support in 3gpp specifications: from 4g to 5g and beyond. IEEE Access 8:190946–190963

Salahdine F, Aggarwal S, Nasipuri A (2022) Short-term traffic congestion prediction with deep learning for lora networks. In: SoutheastCon 2022, pp 261–268

Rahim A, Malik PK, Ponnapalli VS (2020) State of the art: a review on vehicular communications, impact of 5g, fractal antennas for future communication. In: Proceedings of First International Conference on Computing, Communications, and Cyber-Security (IC4s 2019). Springer, pp 3–153–15

Hussain R, Hussain F, Zeadally S (2019) Integration of vanet and 5g security: a review of design and implementation issues. Futur Gener Comput Syst 101:843–864

Lai C, Lu R, Zheng D, Shen XS (2020) Security and privacy challenges in 5g-enabled vehicular networks. IEEE Netw 34(2):37–45

El-Rewini Z, Sadatsharan K, Selvaraj DF, Plathottam SJ, Ranganathan P (2020) Cybersecurity challenges in vehicular communications. Vehicular Commun 23:100214

Arena F, Pau G (2019) An overview of vehicular communications. Future Internet 11(2):27

Mahmood A, Zhang WE, Sheng QZ (2019) Software-defined heterogeneous vehicular networking: the architectural design and open challenges. Future Internet 11(3):70

Sun X, Ansari N (2016) Edgeiot: Mobile edge computing for the internet of things. IEEE Commun Mag 54(12):22–29

Abbas N, Zhang Y, Taherkordi A, Skeie T (2017) Mobile edge computing: a survey. IEEE Internet Things J 5(1):450–465

Ahmed E, Rehmani MH (2017) Mobile edge computing: opportunities, solutions and challenges

Naughton L, Daly H (2020) Augmented humanity: data, privacy and security. In: Cyber Defence in the Age of AI, Smart Societies and Augmented Humanity. Springer, pp 73–93

Sharma SK, Woungang I, Anpalagan A, Chatzinotas S (2020) Toward tactile internet in beyond 5g era: recent advances, current issues, and future directions. IEEE Access 8:56948–56991

Wang H, Chen S, Xu H, Ai M, Shi Y (2015) Softnet: a software defined decentralized mobile network architecture toward 5g. IEEE Netw 29(2):16–22

Chen T, Matinmikko M, Chen X, Zhou X, Ahokangas P (2015) Software defined mobile networks: concept, survey, and research directions. IEEE Commun Mag 53(11):126–133

Mijumbi R, Serrat J, Gorricho J-L, Latré S, Charalambides M, Lopez D (2016) Management and orchestration challenges in network functions virtualization. IEEE Commun Mag 54(1):98–105

Damnjanovic A, Montojo J, Wei Y, Ji T, Luo T, Vajapeyam M, Yoo T, Song O, Malladi D (2011) A survey on 3gpp heterogeneous networks. IEEE Wireless Commun 18(3):10– 21

Han F, Zhao S, Zhang L, Wu J (2016) Survey of strategies for switching off base stations in heterogeneous networks for greener 5g systems. IEEE Access 4:4959–4973

Al-Qasrawi IS (2017) Proposed technologies for solving future 5G heterogeneous networks challenges. Int J Comput Appl 7(1):1–8

Khandekar A, Bhushan N, Tingfang J, Vanghi V (2010) Lte-advanced: heterogeneous networks. In: 2010 European wireless conference (EW). IEEE, pp 978–982

Cai S, Che Y, Duan L, Wang J, Zhou S, Zhang R (2016) Green 5g heterogeneous networks through dynamic small-cell operation. IEEE J Select Areas Commun 34(5):1103–1115

Salahdine F, Opadere J, Liu Q, Han T, Zhang N, Wu S (2021) A survey on sleep mode techniques for ultra-dense networks in 5g and beyond. Comput Netw 201:108567

Liu C, Natarajan B, Xia H (2015) Small cell base station sleep strategies for energy efficiency. IEEE Trans Veh Technol 65(3):1652–1661

Rost P, Mannweiler C, Michalopoulos DS, Sartori C, Sciancalepore V, Sastry N, Holland O, Tayade S, Han B, Bega D et al (2017) Network slicing to enable scalability and flexibility in 5g mobile networks. IEEE Commun Mag 55(5):72–79

Zhang H, Liu N, Chu X, Long K, Aghvami A-H, Leung VC (2017) Network slicing based 5g and future mobile networks: mobility, resource management, and challenges. IEEE commun Mag 55 (8):138–145

Galinina O, Pyattaev A, Andreev S, Dohler M, Koucheryavy Y (2015) 5G multi-rat lte-wifi ultra-dense small cells: performance dynamics, architecture, and trends. IEEE J Select Areas Commun 33(6):1224–1240

Li S, Xu LD, Zhao S (2018) 5G internet of things: a survey. J Industr Inf Integ 10:1–9

Busari SA, Huq KMS, Mumtaz S, Dai L, Rodriguez J (2018) Millimeter-wave massive MIMO communication for future wireless systems: a survey. IEEE Commun Surveys Tutorials 20(2):836–869

Ge X, Yang J, Gharavi H, Sun Y (2017) Energy efficiency challenges of 5g small cell networks. IEEE Commun Mag 55(5):184–191

Bai Q, Nossek JA (2015) Energy efficiency maximization for 5g multi-antenna receivers. Trans Emerging Telecommun Technol 26(1):3–14

Zi R, Ge X, Thompson J, Wang C-X, Wang H, Han T (2016) Energy efficiency optimization of 5g radio frequency chain systems. IEEE J Select Areas Commun 34(4):758–771

Akpakwu GA, Silva BJ, Hancke GP, Abu-Mahfouz AM (2017) A survey on 5G networks for the internet of things: communication technologies and challenges. IEEE Access 6:3619–3647

Hong X, Wang J, Wang C-X, Shi J (2014) Cognitive radio in 5g: a perspective on energy-spectral efficiency trade-off. IEEE Commun Mag 52(7):46–53

Wu G, Yang C, Li S, Li GY (2015) Recent advances in energy-efficient networks and their application in 5g systems. IEEE Wirel Commun 22(2):145–151

Buzzi S, Chih-Lin I, Klein TE, Poor HV, Yang C, Zappone A (2016) A survey of energy-efficient techniques for 5g networks and challenges ahead. IEEE J Select Areas Commun 34(4):697–709

Mousa SH, Ismail M, Nordin R, Abdullah NF (2020) Effective wide spectrum sharing techniques relying on CR technology toward 5G: a survey. J Commun 15(2):122–147

Salahdine F, El Ghazi H (2017) A real time spectrum scanning technique based on compressive sensing for cognitive radio networks. In: 2017 IEEE 8th annual ubiquitous computing, electronics and mobile communication conference, UEMCON 2017, vol 2018-Janua, pp 506–511

Salahdine F, Kaabouch N, El Ghazi H (2016) A survey on compressive sensing techniques for cognitive radio networks. Phys Commun 20:61–73

Reyes H, Subramaniam S, Kaabouch N, Hu WC (2016) A spectrum sensing technique based on autocorrelation and Euclidean distance and its comparison with energy detection for cognitive radio networks. Comput Electr Eng 52:319–327

Salahdine F (2018) Compressive spectrum sensing for cognitive radio networks, arXiv: 1802.03674

Sun S, Gong L, Rong B, Lu K (2015) An intelligent sdn framework for 5g heterogeneous networks. IEEE Commun Mag 53(11):142–147

Khan R, Kumar P, Jayakody DNK, Liyanage M (2020) A survey on security and privacy of 5G technologies: potential solutions, recent advancements, and future directions. IEEE Commun Surveys Tutorials 22(1):196–248

Chowdhury MZ, Shahjalal M, Ahmed S, Jang YM (2020) 6G wireless communication systems: applications, requirements, technologies, challenges, and research directions. IEEE Open Journal of the Communications Society 1:957–975

Zanzi L, Albanese A, Sciancalepore V, Costa-Pérez X (2020) Nsbchain: a secure blockchain framework for network slicing brokerage. ICC IEEE Int Conf Commun:1–7

Arabia-Obedoza MR, Rodriguez G, Johnston A, Salahdine F, Kaabouch N (2020) Social engineering attacks a reconnaissance synthesis analysis. In: 2020 11th IEEE annual ubiquitous computing, electronics & mobile communication conference (UEMCON). IEEE, pp 0843?0848

Liu Q, Han T, Moges E (2020) Edgeslice: slicing wireless edge computing network with decentralized deep reinforcement learning. arXiv: 2003.12911

Salahdine F, Liu Q, Han T (2022) Towards secure and intelligent network slicing for 5g networks. IEEE Open J Comput Soc

Download references

Author information

Authors and affiliations.

Department of Electrical and Computer Engineering, University of North Carolina at Charlotte, Charlotte, North Carolina, USA

Fatima Salahdine

Helen John C. Hartmann Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ, 07102, USA

Department of Electrical and Computer Engineering, University of Windsor, Windsor, Ontario, Canada

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Fatima Salahdine .

Ethics declarations

Conflict of interest.

Not applicable

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Salahdine, F., Han, T. & Zhang, N. 5G, 6G, and Beyond: Recent advances and future challenges. Ann. Telecommun. 78 , 525–549 (2023). https://doi.org/10.1007/s12243-022-00938-3

Download citation

Received : 31 August 2021

Accepted : 01 December 2022

Published : 20 January 2023

Issue Date : October 2023

DOI : https://doi.org/10.1007/s12243-022-00938-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Massive MIMO
  • Cell-free MIMO. Millimeter waves
  • Full duplex
  • Network slicing
  • Spectrum sharing
  • Energy efficiency
  • Resource orchestration
  • Heterogeneous networks
  • Machine learning
  • Internet of things
  • Internet of Nano things
  • Internet of me
  • Tactile internet
  • Find a journal
  • Publish with us
  • Track your research
  • Newsletters

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

  • Will Douglas Heaven archive page

OpenAI has built a striking new generative video model called Sora that can take a short text description and turn it into a detailed, high-definition film clip up to a minute long.

Based on four sample videos that OpenAI shared with MIT Technology Review ahead of today’s announcement, the San Francisco–based firm has pushed the envelope of what’s possible with text-to-video generation (a hot new research direction that we flagged as a trend to watch in 2024 ).

“We think building models that can understand video, and understand all these very complex interactions of our world, is an important step for all future AI systems,” says Tim Brooks, a scientist at OpenAI.

But there’s a disclaimer. OpenAI gave us a preview of Sora (which means sky in Japanese) under conditions of strict secrecy. In an unusual move, the firm would only share information about Sora if we agreed to wait until after news of the model was made public to seek the opinions of outside experts. [Editor’s note: We’ve updated this story with outside comment below.] OpenAI has not yet released a technical report or demonstrated the model actually working. And it says it won’t be releasing Sora anytime soon. [ Update: OpenAI has now shared more technical details on its website.]

The first generative models that could produce video from snippets of text appeared in late 2022. But early examples from Meta , Google, and a startup called Runway were glitchy and grainy. Since then, the tech has been getting better fast. Runway’s gen-2 model, released last year, can produce short clips that come close to matching big-studio animation in their quality. But most of these examples are still only a few seconds long.  

The sample videos from OpenAI’s Sora are high-definition and full of detail. OpenAI also says it can generate videos up to a minute long. One video of a Tokyo street scene shows that Sora has learned how objects fit together in 3D: the camera swoops into the scene to follow a couple as they walk past a row of shops.

OpenAI also claims that Sora handles occlusion well. One problem with existing models is that they can fail to keep track of objects when they drop out of view. For example, if a truck passes in front of a street sign, the sign might not reappear afterward.  

In a video of a papercraft underwater scene, Sora has added what look like cuts between different pieces of footage, and the model has maintained a consistent style between them.

It’s not perfect. In the Tokyo video, cars to the left look smaller than the people walking beside them. They also pop in and out between the tree branches. “There’s definitely some work to be done in terms of long-term coherence,” says Brooks. “For example, if someone goes out of view for a long time, they won’t come back. The model kind of forgets that they were supposed to be there.”

Impressive as they are, the sample videos shown here were no doubt cherry-picked to show Sora at its best. Without more information, it is hard to know how representative they are of the model’s typical output.   

It may be some time before we find out. OpenAI’s announcement of Sora today is a tech tease, and the company says it has no current plans to release it to the public. Instead, OpenAI will today begin sharing the model with third-party safety testers for the first time.

In particular, the firm is worried about the potential misuses of fake but photorealistic video . “We’re being careful about deployment here and making sure we have all our bases covered before we put this in the hands of the general public,” says Aditya Ramesh, a scientist at OpenAI, who created the firm’s text-to-image model DALL-E .

But OpenAI is eyeing a product launch sometime in the future. As well as safety testers, the company is also sharing the model with a select group of video makers and artists to get feedback on how to make Sora as useful as possible to creative professionals. “The other goal is to show everyone what is on the horizon, to give a preview of what these models will be capable of,” says Ramesh.

To build Sora, the team adapted the tech behind DALL-E 3, the latest version of OpenAI’s flagship text-to-image model. Like most text-to-image models, DALL-E 3 uses what’s known as a diffusion model. These are trained to turn a fuzz of random pixels into a picture.

Sora takes this approach and applies it to videos rather than still images. But the researchers also added another technique to the mix. Unlike DALL-E or most other generative video models, Sora combines its diffusion model with a type of neural network called a transformer.

Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models like OpenAI’s GPT-4 and Google DeepMind’s Gemini . But videos are not made of words. Instead, the researchers had to find a way to cut videos into chunks that could be treated as if they were. The approach they came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Brooks.

The transformer inside Sora can then process these chunks of video data in much the same way that the transformer inside a large language model processes words in a block of text. The researchers say that this let them train Sora on many more types of video than other text-to-video models, varied in terms of resolution, duration, aspect ratio, and orientation. “It really helps the model,” says Brooks. “That is something that we’re not aware of any existing work on.”

“From a technical perspective it seems like a very significant leap forward,” says Sam Gregory, executive director at Witness, a human rights organization that specializes in the use and misuse of video technology. “But there are two sides to the coin,” he says. “The expressive capabilities offer the potential for many more people to be storytellers using video. And there are also real potential avenues for misuse.” 

OpenAI is well aware of the risks that come with a generative video model. We are already seeing the large-scale misuse of deepfake images . Photorealistic video takes this to another level.

Gregory notes that you could use technology like this to misinform people about conflict zones or protests. The range of styles is also interesting, he says. If you could generate shaky footage that looked like something shot with a phone, it would come across as more authentic.

The tech is not there yet, but generative video has gone from zero to Sora in just 18 months. “We’re going to be entering a universe where there will be fully synthetic content, human-generated content and a mix of the two,” says Gregory.

The OpenAI team plans to draw on the safety testing it did last year for DALL-E 3. Sora already includes a filter that runs on all prompts sent to the model that will block requests for violent, sexual, or hateful images, as well as images of known people. Another filter will look at frames of generated videos and block material that violates OpenAI’s safety policies.

OpenAI says it is also adapting a fake-image detector developed for DALL-E 3 to use with Sora. And the company will embed industry-standard C2PA tags , metadata that states how an image was generated, into all of Sora’s output. But these steps are far from foolproof. Fake-image detectors are hit-or-miss. Metadata is easy to remove, and most social media sites strip it from uploaded images by default.  

“We’ll definitely need to get more feedback and learn more about the types of risks that need to be addressed with video before it would make sense for us to release this,” says Ramesh.

Brooks agrees. “Part of the reason that we’re talking about this research now is so that we can start getting the input that we need to do the work necessary to figure out how it could be safely deployed,” he says.

Update 2/15: Comments from Sam Gregory were added .

Artificial intelligence

Ai for everything: 10 breakthrough technologies 2024.

Generative AI tools like ChatGPT reached mass adoption in record time, and reset the course of an entire industry.

What’s next for AI in 2024

Our writers look at the four hot trends to watch out for this year

  • Melissa Heikkilä archive page

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

Deploying high-performance, energy-efficient AI

Investments into downsized infrastructure can help enterprises reap the benefits of AI while mitigating energy consumption, says corporate VP and GM of data center platform engineering and architecture at Intel, Zane Ball.

  • MIT Technology Review Insights archive page

Stay connected

Get the latest updates from mit technology review.

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at [email protected] with a list of newsletters you’d like to receive.

  • See us on facebook
  • See us on twitter
  • See us on youtube
  • See us on linkedin
  • See us on instagram

Stanford Medicine study identifies distinct brain organization patterns in women and men

Stanford Medicine researchers have developed a powerful new artificial intelligence model that can distinguish between male and female brains.

February 20, 2024

sex differences in brain

'A key motivation for this study is that sex plays a crucial role in human brain development, in aging, and in the manifestation of psychiatric and neurological disorders,' said Vinod Menon. clelia-clelia

A new study by Stanford Medicine investigators unveils a new artificial intelligence model that was more than 90% successful at determining whether scans of brain activity came from a woman or a man.

The findings, published Feb. 20 in the Proceedings of the National Academy of Sciences, help resolve a long-term controversy about whether reliable sex differences exist in the human brain and suggest that understanding these differences may be critical to addressing neuropsychiatric conditions that affect women and men differently.

“A key motivation for this study is that sex plays a crucial role in human brain development, in aging, and in the manifestation of psychiatric and neurological disorders,” said Vinod Menon , PhD, professor of psychiatry and behavioral sciences and director of the Stanford Cognitive and Systems Neuroscience Laboratory . “Identifying consistent and replicable sex differences in the healthy adult brain is a critical step toward a deeper understanding of sex-specific vulnerabilities in psychiatric and neurological disorders.”

Menon is the study’s senior author. The lead authors are senior research scientist Srikanth Ryali , PhD, and academic staff researcher Yuan Zhang , PhD.

“Hotspots” that most helped the model distinguish male brains from female ones include the default mode network, a brain system that helps us process self-referential information, and the striatum and limbic network, which are involved in learning and how we respond to rewards.

The investigators noted that this work does not weigh in on whether sex-related differences arise early in life or may be driven by hormonal differences or the different societal circumstances that men and women may be more likely to encounter.

Uncovering brain differences

The extent to which a person’s sex affects how their brain is organized and operates has long been a point of dispute among scientists. While we know the sex chromosomes we are born with help determine the cocktail of hormones our brains are exposed to — particularly during early development, puberty and aging — researchers have long struggled to connect sex to concrete differences in the human brain. Brain structures tend to look much the same in men and women, and previous research examining how brain regions work together has also largely failed to turn up consistent brain indicators of sex.

test

Vinod Menon

In their current study, Menon and his team took advantage of recent advances in artificial intelligence, as well as access to multiple large datasets, to pursue a more powerful analysis than has previously been employed. First, they created a deep neural network model, which learns to classify brain imaging data: As the researchers showed brain scans to the model and told it that it was looking at a male or female brain, the model started to “notice” what subtle patterns could help it tell the difference.

This model demonstrated superior performance compared with those in previous studies, in part because it used a deep neural network that analyzes dynamic MRI scans. This approach captures the intricate interplay among different brain regions. When the researchers tested the model on around 1,500 brain scans, it could almost always tell if the scan came from a woman or a man.

The model’s success suggests that detectable sex differences do exist in the brain but just haven’t been picked up reliably before. The fact that it worked so well in different datasets, including brain scans from multiple sites in the U.S. and Europe, make the findings especially convincing as it controls for many confounds that can plague studies of this kind.

“This is a very strong piece of evidence that sex is a robust determinant of human brain organization,” Menon said.

Making predictions

Until recently, a model like the one Menon’s team employed would help researchers sort brains into different groups but wouldn’t provide information about how the sorting happened. Today, however, researchers have access to a tool called “explainable AI,” which can sift through vast amounts of data to explain how a model’s decisions are made.

Using explainable AI, Menon and his team identified the brain networks that were most important to the model’s judgment of whether a brain scan came from a man or a woman. They found the model was most often looking to the default mode network, striatum, and the limbic network to make the call.

The team then wondered if they could create another model that could predict how well participants would do on certain cognitive tasks based on functional brain features that differ between women and men. They developed sex-specific models of cognitive abilities: One model effectively predicted cognitive performance in men but not women, and another in women but not men. The findings indicate that functional brain characteristics varying between sexes have significant behavioral implications.

“These models worked really well because we successfully separated brain patterns between sexes,” Menon said. “That tells me that overlooking sex differences in brain organization could lead us to miss key factors underlying neuropsychiatric disorders.”

While the team applied their deep neural network model to questions about sex differences, Menon says the model can be applied to answer questions regarding how just about any aspect of brain connectivity might relate to any kind of cognitive ability or behavior. He and his team plan to make their model publicly available for any researcher to use.

“Our AI models have very broad applicability,” Menon said. “A researcher could use our models to look for brain differences linked to learning impairments or social functioning differences, for instance — aspects we are keen to understand better to aid individuals in adapting to and surmounting these challenges.”

The research was sponsored by the National Institutes of Health (grants MH084164, EB022907, MH121069, K25HD074652 and AG072114), the Transdisciplinary Initiative, the Uytengsu-Hamilton 22q11 Programs, the Stanford Maternal and Child Health Research Institute, and the NARSAD Young Investigator Award.

About Stanford Medicine

Stanford Medicine is an integrated academic health system comprising the Stanford School of Medicine and adult and pediatric health care delivery systems. Together, they harness the full potential of biomedicine through collaborative research, education and clinical care for patients. For more information, please visit med.stanford.edu .

Artificial intelligence

Exploring ways AI is applied to health care

Stanford Medicine Magazine: AI

  • Introduction
  • Conclusions
  • Article Information

A, Flowchart of the Baby Bambi pilot. Timelines are indicated are as follows: Time 0 to time 1: up to 1 day; time 1 to time 2: 1 to 5 working days; time 2 to time 3: 10 days. B, Flowchart depicting pilot referral, enrollment, and results. Of the 47 excluded patients, 8 could not provide biparental consent on account of religious beliefs. CNV indicates copy-number variation; MOH, Ministry of Health; NICU, neonatal intensive care unit; RF, requisition form; SNV, single-nucleotide variant; TASMC, Tel Aviv Sourasky Medical Center; UPD, uniparental disomy; VUS, variant of unknown significance.

A, Diagnostic efficacy of rtGS in full Baby Bambi cohort (130 neonates). B, Types and proportions of disease-causing variants identified in the Baby Bambi cohort (65 neonates). LP indicates likely pathogenic; P, pathogenic; SNV, single-nucleotide variant; VUS, variant of unknown significance.

eMethods 1. Supplemental Methods and Translated Copies of Informed Consent, Clinical Utility, Outcome, and Requisition Forms

eTable 1. Study Inclusion and Exclusion Criteria

eMethods 2. Statistical Analyses

eReferences.

eAppendix. Supplemental Results

eFigure 1. Neonatal Age at Referral for rtGS

eFigure 2. Ethnic Backgrounds of Baby Bambi Pilot Population

eTable 3. Distribution of Secondary Category Inclusion Criteria (HPO Terms) Among Diagnosed, Possibly Diagnosed, and Undiagnosed Patients

eFigure 3. Variables of Diagnostic vs Negative rtGS Results

eTable 2. Baby Bambi Cohort Results

Data Sharing Statement

  • Enhancing Neonatal Intensive Care With Rapid Genome Sequencing JAMA Network Open Invited Commentary February 22, 2024 Shan Jiang, MSc; Bonny Parkinson, PhD; Yuanyuan Gu, PhD

See More About

Sign up for emails based on your interests, select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

  • Download PDF
  • X Facebook More LinkedIn

Marom D , Mory A , Reytan-Miron S, et al. National Rapid Genome Sequencing in Neonatal Intensive Care. JAMA Netw Open. 2024;7(2):e240146. doi:10.1001/jamanetworkopen.2024.0146

Manage citations:

© 2024

  • Permissions

National Rapid Genome Sequencing in Neonatal Intensive Care

  • 1 The Genetics Institute and Genomics Center, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
  • 2 Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
  • 3 Community Genetics Department, Public Health Services, Ministry of Health, Ramat Gan, Israel
  • 4 Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel
  • 5 Department of Neonatalogy, Baruch Padeh Medical Center, Tzafon Medical Center, Tiberias, Israel
  • 6 Azrieli Faculty of Medicine, Bar Ilan University, Ramat Gan, Israel
  • 7 Genetics Unit, Barzilai University Medical Center, Ashkelon, Israel
  • 8 Faculty of Health Sciences, Ben-Gurion University of the Negev, Be’er-Sheva, Israel
  • 9 Department of Neonatalogy, Barzilai University Medical Center, Ashkelon, Israel
  • 10 Genetics Institute, Bnai Zion Medical Center, Haifa, Israel
  • 11 Department of Neonatalogy, Bnai Zion Medical Center, Haifa, Israel
  • 12 Genetics Institute, Carmel Medical Center, Haifa, Israel
  • 13 Department of Neonatalogy, Carmel Medical Center, Haifa, Israel
  • 14 Department of Neonatalogy, Dana-Dwek Children’s Hospital, Tel Aviv Medical Center, Tel Aviv, Israel
  • 15 The Genetics Institute and Center of Rare Diseases, Emek Medical Center, Afula, Israel
  • 16 Department of Neonatalogy, Emek Medical Center, Afula, Israel
  • 17 Department of Neonatalogy, Galilee Medical Center, Naharia, Israel
  • 18 Genetics Institute, Galilee Medical Center, Naharia, Israel
  • 19 Department of Genetics, Hadassah Medical Organization, Jerusalem, Israel
  • 20 Faculty of Medicine, The Hebrew University of Jerusalem, Ein Kerem, Jerusalem, Israel
  • 21 Department of Neonatalogy, Hadassah Medical Organization, Jerusalem, Israel
  • 22 Department of Neonatalogy, Kaplan Medical Center, Rehovot, Israel
  • 23 Genetics Institute, Kaplan Medical Center, Rehovot, Israel
  • 24 Department of Neonatalogy, Laniado Hospital, Netanya, Israel
  • 25 Adelson School of Medicine, Ariel University, Ariel, Israel
  • 26 Genetics Institute, Meir Medical Center, Kefar-Sava, Israel
  • 27 Department of Neonatalogy, Meir Medical Center, Kefar-Sava, Israel
  • 28 Department of Neonatalogy, Saint Vincent Hospital (French Hospital), Nazareth, Israel
  • 29 Genetics Institute, Rambam Medical Center, Haifa, Israel
  • 30 Department of Neonatalogy, Rambam Medical Center, Haifa, Israel
  • 31 Genetics Institute, Samson Assuta University Medical Center, Ashdod, Israel
  • 32 Department of Neonatalogy, Samson Assuta University Medical Center, Ashdod, Israel
  • 33 Pediatric Genetics Unit, Schneider Children’s Medical Center of Israel, Petach Tikva, Israel
  • 34 Department of Neonatalogy, Shaare Zedek Medical Center, Jerusalem, Israel
  • 35 Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem, Israel
  • 36 Department of Neonatalogy, Shamir Medical Center, Zerifin, Israel
  • 37 Genetics Institute, Shamir Medical Center, Zerifin, Israel
  • 38 The Danek Gertner Institute of Human Genetics, Sheba Medical Center, Tel-Hashomer, Israel
  • 39 Neonatology Department, Sheba Medical Center, Tel-Hashomer, Israel
  • 40 Genetics Institute, Soroka University Medical Center, Be’er Sheva, Israel
  • 41 Department of Neonatalogy, Soroka University Medical Center, Be’er Sheva, Israel
  • 42 Department of Neonatalogy, The Hillel Yaffe Medical Center, Hadera, Israel
  • 43 Department of Neonatalogy, Wolfson Medical Center, Holon, Israel
  • 44 Genetics Institute, Wolfson Medical Center, Hadera, Israel
  • 45 Department of Neonatalogy, Ziv Medical Center Sefat, Tsfat, Israel
  • 46 Genetics Institute, Ziv Medical Center, Safed, Israel
  • Invited Commentary Enhancing Neonatal Intensive Care With Rapid Genome Sequencing Shan Jiang, MSc; Bonny Parkinson, PhD; Yuanyuan Gu, PhD JAMA Network Open

Question   Can rapid trio genome sequencing (rtGS) be deployed in a national public health care setting?

Findings   In this cohort study that included all neonatal intensive care units in Israel, rtGS in 130 neonates suspected of having a genetic disorder revealed a diagnosis in 50% (12 chromosomal and 52 monogenic disorders and 1 uniparental disomy). Immediate precision medicine was offered for 9% of diagnosed participants, and the mean turnaround time for rapid report was 7 days.

Meaning   These findings suggest that clinical rtGS can be implemented in the neonatal acute care setting in a national public health care system.

Importance   National implementation of rapid trio genome sequencing (rtGS) in a clinical acute setting is essential to ensure advanced and equitable care for ill neonates.

Objective   To evaluate the feasibility, diagnostic efficacy, and clinical utility of rtGS in neonatal intensive care units (NICUs) throughout Israel.

Design, Setting, and Participants   This prospective, public health care–based, multicenter cohort study was conducted from October 2021 to December 2022 with the Community Genetics Department of the Israeli Ministry of Health and all Israeli medical genetics institutes (n = 18) and NICUs (n = 25). Critically ill neonates suspected of having a genetic etiology were offered rtGS. All sequencing, analysis, and interpretation of data were performed in a central genomics center at Tel-Aviv Sourasky Medical Center. Rapid results were expected within 10 days. A secondary analysis report, issued within 60 days, focused mainly on cases with negative rapid results and actionable secondary findings. Pathogenic, likely pathogenic, and highly suspected variants of unknown significance (VUS) were reported.

Main Outcomes and Measures   Diagnostic rate, including highly suspected disease-causing VUS, and turnaround time for rapid results. Clinical utility was assessed via questionnaires circulated to treating neonatologists.

Results   A total of 130 neonates across Israel (70 [54%] male; 60 [46%] female) met inclusion criteria and were recruited. Mean (SD) age at enrollment was 12 (13) days. Mean (SD) turnaround time for rapid report was 7 (3) days. Diagnostic efficacy was 50% (65 of 130) for disease-causing variants, 11% (14 of 130) for VUS suspected to be causative, and 1 novel gene candidate (1%). Disease-causing variants included 12 chromosomal and 52 monogenic disorders as well as 1 neonate with uniparental disomy. Overall, the response rate for clinical utility questionnaires was 82% (107 of 130). Among respondents, genomic testing led to a change in medical management for 24 neonates (22%). Results led to immediate precision medicine for 6 of 65 diagnosed infants (9%), an additional 2 (3%) received palliative care, and 2 (3%) were transferred to nursing homes.

Conclusions and Relevance   In this national cohort study, rtGS in critically ill neonates was feasible and diagnostically beneficial in a public health care setting. This study is a prerequisite for implementation of rtGS for ill neonates into routine care and may aid in design of similar studies in other public health care systems.

Genetic disorders and birth defects account for 30% of morbidity and 40% of mortality in neonatal intensive care units (NICUs). 1 - 5 Overlapping clinical features in this age group make reaching a diagnosis by standard-of-care testing challenging. It is hypothesized that early etiologic diagnosis, facilitated by next-generation sequencing, has the potential to revolutionize clinical care, improve prognosis, and offer precision life-saving therapy or aid in palliative care decisions in critically ill neonates. 6 Next-generation sequencing has been proven superior to standard-of-care testing in providing an accurate diagnosis. 7 - 9

A recent retrospective analysis 10 of 60 diagnosed infants from 5 centers in the Netherlands demonstrated the clinical utility of rapid exome sequencing (rES) for critically ill neonates as defined by increased diagnostic yield, shorter time to diagnosis, and net health care savings. The authors recommended widespread implementation of rES as a first-tier genetic test in critically ill neonates with suspected genetic disorders. 10

Earlier studies, including NSIGHT1 and NICUseq, which compared genome sequencing (GS) with standard-of-care testing, have revealed GS to have a higher diagnostic yield even when compared with ES. 9 , 11 - 13 An additional advantage of GS over ES, shown in the NSIGHT2 study, is the possibility for rapid and ultrarapid turnaround times (TATs) of as soon as 3 days. 7 Taken together with the ability to detect diverse genomic variants in a single test, including chromosomal copy-number abnormalities, single-nucleotide variants (SNVs), triplet repeat expansions, uniparental disomy (UPD), and variants in noncoding regions, GS is the preferred comprehensive bedside genomic testing in the critically ill, for whom rapid clinical decisions are needed. 14 Furthermore, rapid GS (rGS) was shown to be economically beneficial mainly through significantly shorter lengths of hospital stay in infants with a diagnostic GS test compared with undiagnosed children. 15

Prospective studies utilizing next-generation sequencing in NICUs have shown a mean diagnostic rate of approximately 36% in different cohorts. Common indications for ES in early studies included mainly neonates and young infants with congenital anomalies or neurologic phenotypes, while recent studies broadened the indications to include any critically ill child suspected of having a genetic disorder. 11 , 13 - 17

Diagnostic results affected clinical management decisions in 25% to 65% of patients across cohorts and enabled reproductive planning for 27% of families of diagnosed patients. 15 , 18 , 19 Although an ultrarapid sequencing and analysis system with scalable diagnosis in less than a day was recently published, 20 it has been suggested that a 2- to 3-week diagnostic pipeline is sufficient to impact most clinical decision making. 18

Most published studies have been performed in a single center or in a restricted region, such as the rGS Baby Bear and NICUseq projects, each including up to 5 medical centers. 11 , 15 Diagnostic efficacy and change-of-management assessments were comparable in these studies.

In public health care systems, limited resources challenge the widespread implementation of advanced sequencing technologies into routine inpatient clinical practice. Israel has a universal health care system. To our knowledge, we have conducted the first prospective national pilot of trio rGS (rtGS) as a single test for genomic diagnosis in critically ill neonates. Our primary objective was to assess the feasibility and diagnostic efficacy of rtGS for critically ill neonates in the Israeli national health care system. A secondary objective was to assess the clinical utility defined as the outcomes of rtGS for precision medicine and recurrence risk reduction in families.

We performed a prospective national pilot of rtGS in critically ill neonates. The project was a collaboration between the Community Genetics Department in the Israeli Ministry of Health (MOH), the Israeli Association of Medical Genetics, and the Israeli Neonatal Society. All medical genetics institutes (n = 18) and NICUs (n = 25) belonging to the Israeli national health care system participated in the project. The study was approved by the MOH Medical Research Ethics Committee. Written informed consent for GS was obtained from parents (eMethods 1 in Supplement 1 ). The report follows the Strengthening the Reporting of Observational Studies in Epidemiology ( STROBE ) reporting guideline for cohort studies.

All sequencing and data analyses were performed at the Tel-Aviv Sourasky Medical Center (TASMC) Genomics Center. Each NICU was assigned a corresponding genetics institute, either in the same center (n = 18) or from an adjacent medical facility (n = 7). Results were reported to the medical genetics team caring for the infant. The medical genetics team was responsible for disclosing results to the NICU staff and families. Follow-up genetic counselling was planned at the discretion of the medical genetics teams. Clinical utility (Hebrew-translated Clinician-Reported Genetic Testing Utility Index [C-GUIDE] 21 ) and outcome questionnaires were completed by the NICU staff within 14 days of receiving the secondary report ( Figure 1 A; eMethods 1 in Supplement 1 ).

Critically ill neonates (n = 130) were selected by the practicing neonatologist. A requisition form (eMethods 1 in Supplement 1 ) listing inclusion criteria (eTable 1 in Supplement 1 ) and a medical summary report were reviewed for eligibility by the MOH Community Genetics Department. Only 1 primary inclusion criterion could be selected for each patient, with unlimited secondary category criteria (eTable 1 and eMethods 1 in Supplement 1 ). Upon approval, written parental informed consent was obtained and trio samples were delivered to the TASMC Genomics Center ( Figure 1 A). Ethnicity was self-reported and subcategorized as Jewish, non-Jewish, or mixed. Mixed ethnicity was defined as both Jewish and non-Jewish background. Consanguinity was self-reported and defined by a shared common ancestor.

All samples were subjected to the Illumina DNA PCR-Free Library Prep and sequenced on the NovaSeq 6000 (Illumina) using S1/S2 reagent kit version 1.5, 150 × paired-end. Variant analysis was performed using a 2-step approach. Step 1 consisted of primary rapid analysis, which was performed on the TruSight Software Suite (TSS [Illumina]), relying on phenotype-driven variant prioritization using Human Phenotype Ontology (HPO) terms provided by the referring medical geneticist. Variants prioritized as related to the phenotype were manually reviewed by the bioinformatics and pediatric genetics team and classified according to the American College of Medical Genetics and Genomics (ACMG) criteria. 22 Inconclusive or suspected candidate results were discussed on a case-by-case basis with the referring team at each site. Pathogenic and likely pathogenic (P/LP) variants and variants of unknown significance (VUS) highly suspected to be causative were reported to the referring geneticist via a rapid report. Step 2 included secondary analysis performed on Franklin data analysis software (Genoox). This step involved reanalysis for undiagnosed cases and analysis for ACMG actionable secondary findings (SFs), 23 unless parents opted out. These were reported back to the referring geneticist via a comprehensive final report. Rapid and secondary analyses were compared for compatibility.

Dual diagnosis was defined as cases with more than 1 diagnosis related to the phenotype. Suspected dual diagnosis was defined as cases with 1 P/LP variant and an additional VUS deemed related to the phenotype. Samples were not analyzed for carrier state of autosomal recessive (AR) disorders.

Feasibility was assessed by calculating TAT, defined as the time from sample arrival at the sequencing laboratory to rapid report finalization ( Figure 1 A). Diagnostic efficacy was calculated for the proportion of diagnostic and negative results for the entire cohort and for the various indications for testing. rtGS results and inheritance patterns were further compared between Jewish and non-Jewish ethnicities. Cases with VUS in a gene associated with the patient’s phenotype, potentially expanding the phenotype or in a novel candidate gene, were categorized as highly suspected to be causative (possibly diagnosed).

All statistical analyses were conducted using R version 4.1.1 (R Project for Statistical Computing). Fisher exact test was used for categorical comparisons due to the presence of small sample sizes in certain categories. For continuous variables, the nonparametric Kruskal-Wallis test was used. 24 Statistical significance was set at P  < .05, and all tests were 2-tailed.

A binary logistic regression (LR) model was built to analyze associations between variables and diagnosis status, chosen for its robustness and suitability for categorical outcomes (eMethods 2 in Supplement 1 ).

A total of 130 patients were enrolled during a 15-month period (October 2021 to December 2022) ( Figure 1 B). There were 60 female neonates (46%) and 70 male neonates (54%). Mean (SD) age at referral was 12 (13) days (eFigure 1 in Supplement 1 ). There were 46 (35%) preterm neonates and 53 (41%) full-term neonates, among the 99 with data available. Ethnic backgrounds included Jewish (67 [51%]), non-Jewish (58 [45%]), and mixed ethnicity (5 [4%]) ( Table 1 ; eFigure 2A-C in Supplement 1 ). The mortality rate within 84 days of enrollment was 27% (29 of 107 for whom data were available). The genetic diagnosis explained death in all diagnosed cases, but one infant died of sepsis. The outcome (deceased or alive) did not significantly differ across the diagnosed, possibly diagnosed, and undiagnosed subgroups ( Table 1 ).

Abnormal results were found in 80 neonates (62%), including 65 P/LP variants (diagnosed group), 14 VUS highly suspected as causative, and 1 patient with a candidate novel gene. Negative rtGS results were reported in 50 neonates (38%) ( Table 1 and Figure 2 A). SNVs were the most common type of disease-causing variants (51 of 65 [78%]), followed by chromosomal abnormalities (12 [18%]), 1 patient with a triplet repeat expansion (in DMPK ), and 1 patient with UPD ( Figure 2 B). SNVs were all exonic, except for 1 case with 2 intronic VUSs (eTable 2 in Supplement 2 ). The distribution of abnormal variants did not differ significantly between full-term and preterm infants or between males and females. AR (22 [34%]) and de novo (23 [36%]) were the leading inheritance patterns among the diagnosed group, whereas dominant familial disorders were more likely classified as VUS ( Table 1 ).

There was a minor significant difference in the distribution of diagnosed, undiagnosed, and VUS categories among Jewish and non-Jewish infants ( Table 1 ). A total of 28 of 65 neonates with a diagnosis (43%) were Jewish and 36 (55%) were non-Jewish. Of 15 neonates with a possible diagnosis (ie, VUS), 7 (47%) were Jewish and 7 were non-Jewish (47%). Of 50 neonates with no diagnosis, 32 (64%) were Jewish and 15 (30%) were non-Jewish ( P  = .04), indicating a slightly higher diagnostic rate in the non-Jewish group. Dominant de novo disorders predominated in Jewish neonates (16 of 28 [57%] vs 7 of 36 [19%]), in contrast to AR disorders in non-Jewish neonates (19 of 36 [53%] vs 3 of 28 [11%]) ( P  < .001) eFigure 2D in Supplement 1 ). A nonsignificant higher diagnostic yield was seen in neonates of consanguineous couples (14 of 23 [60%]) compared with those of nonconsanguineous couples (44 of 90 [49%]) ( P  = .11). None of the detected diagnostic variants or VUS were known founder mutations in the Israeli population.

Definitive dual diagnosis was confirmed in 1 patient manifesting 2 de novo conditions, Treacher-Collins and Cleidocranial Dysplasia syndromes (eTable 2 in Supplement 2 ) and was suspected in 13 additional cases (20%). Follow-up data were unavailable to confirm causality of additional VUSs.

Mean (SD) TATs of rapid and secondary analysis reports were 7.4 (2.7) and 67.6 (26.1) days, respectively. Rapid reports of causative variants (mean [SD] TAT, 6.5 [2.3] days) had significantly shorter TATs than negative cases (mean [SD] TAT, 8.5 [3.3] days) ( P  = .003) ( Table 1 ).

All variants detected by TSS were also detected by Franklin. In 8 cases, rapid analysis was negative, but secondary analysis revealed 2 pathogenic variants, increasing the diagnostic rate by 2%. Six additional VUSs were also found (eTable 2 in Supplement 2 ).

Actionable secondary findings 23 were detected in 7 families (5%), including known founder variants in BRCA1/2 (eTable 2 in Supplement 2 ). Incidental findings were reported in 3 families (2%), and familial segregation was offered: a maternally inherited COL4A5 variant led to subsequent diagnosis of familial Alport syndrome; a paternally inherited pre-alteration expanded DMPK allele; and a homozygous VWF variant for which preventive measures were offered.

The distribution of the various primary and secondary inclusion criteria categories among the 3 groups (diagnosed, possibly diagnosed, and undiagnosed) was analyzed. For the primary category, a neurologic phenotype and multiple congenital anomalies were the most common (82 [63%]) ( Table 1 ). No significant differences were found in the distribution of diagnostic variants or VUSs across primary criteria ( Table 1 ). Abnormality of prenatal development or birth (HP:0001197, 36 [28%]) and brain imaging abnormality (HP:0410263, 34 [26%]) were the leading secondary category indications for testing, although they did not characterize a positive or negative rtGS result (eTable 3 in Supplement 1 ).

The LR model identified 3 secondary category inclusion criteria with significant association with a positive rtGS result: hepatic failure (HP:0001399), with a coefficient of 2.84 ( P  = .05); seizures (HP:0001250), with a coefficient of 2.20 ( P  = .03); and generalized hypotonia (HP:0001290), with a coefficient of 1.53 ( P  = .04). Conversely, abnormal kidney morphology (HP:0012210) with a coefficient of −3.43 and abnormality of the endocrine system (HP:0000818) with a coefficient of −1.81, each showed a significant correlation with a negative rtGS result ( P  = .01 and P  = .04, respectively) (eAppendix and eFigure 3 in Supplement 1 ).

Possible diagnosis was suggested in 15 cases with VUS suspected of explaining the phenotype. We applied the model trained on the diagnosed cases to assess causality of these VUS. By this model, a calculated probability score greater than 0.5 was assigned a score of 1 and was considered supportive of the clinical association between a reported VUS and the patient’s phenotype. A calculated probability score of less than 0.5 was assigned a score of 0, suggesting a negative association (eAppendix in Supplement 1 ). This approach supported causality of detected VUSs in nearly two-thirds of the possibly diagnosed group ( Table 2 ; eAppendix in Supplement 1 ). It is essential to note that these variables should be interpreted cautiously.

The outcomes of rtGS for clinical management were evaluated via questionnaires completed by the NICU staff. The response rate was 82% (107 cases). Among respondents, change of management was reported for 24 cases (22%), all in the diagnosed group, although in most cases the type of change was not disclosed. rtGS affected decisions regarding invasive procedures in 10 cases (9%). Among diagnosed patients, results led to early tailored medication in 6 of 65 (9%), transfer to nursing care facilities in 2 (3%), and supportive care in 2 (3%) ( Table 3 ).

Increased recurrence risk in future offspring could be confirmed for all inherited disorders in the diagnosed group (30 [46%]), including 22 families with AR, 5 with autosomal dominant, 2 with X-linked disorders, and 1 family with confirmed parental balanced reciprocal chromosomal translocation ( Table 1 ). A parental balanced reciprocal translocation was highly suspected in an additional case (eTable 2 in Supplement 2 ), but parental studies were unavailable.

To our knowledge, this is the first national study evaluating the feasibility of rtGS in a public health care setting. Israel has a unique universal health care system, an annual birth rate of approximately 180 000, and an estimated 12% admission rate to NICUs. 25 In this collaborative study, we evaluated the feasibility, diagnostic efficacy, and clinical utility of rtGS in the Israeli national public health care system, including 130 critically ill neonates enrolled by NICUs and geneticists throughout the country. Rapid mean TAT of 7 days proved feasibility, with a substantial 2 days shorter TAT in the diagnosed group. This is compatible with the immediate report of a diagnosis once a definitive P/LP variant was detected and the tendency to reevaluate VUS and negative cases. We found that rapid return of diagnostic rtGS results is possible on a national scale and within the capacities of a public health care system in an adequate timeframe for change-of-management decisions without compromising prognosis. 11 , 19

Diagnostic efficacy in our cohort was relatively high at 50% (considering only P/LP variants), compared with the average 30% to 40% published previously. Several earlier studies have reported high diagnostic rates of 57% to 73%, 26 , 27 but these were conducted in small homogeneous cohorts. Indeed, a recent review of 21 prospective studies encompassing 1654 infants 19 indicated a negative correlation between diagnostic rate and cohort size, noting rates of less than 30% in cohorts of 100 or more participants. Our high diagnostic yield may be explained by the unique Israeli population, still showing high rates of consanguinity and endogamy as reflected by a high risk of AR disorders and significantly higher diagnostic rates in the non-Jewish participants (eFigure 2D in Supplement 1 ). Interestingly, we did not detect any known Israeli founder variants; thus, national preconception carrier screening programs would have detected none of the carrier parents.

Most studies limit their reports to P/LP variants alone, although 21 VUSs in 184 participants (11%) were reported in Project Baby Bear. 15 In our cohort, VUSs have also been suspected as either fully or partially related to the child’s phenotype in 12% (15 of 130) ( Table 3 ). Using an LR model developed on the definitive diagnoses, we were able to support causality in 64% (9 of 15) of VUSs. Long-term follow up, including continued phenotyping and familial segregation studies, will shed light on these associations. Application of this novel approach in a larger cohort may aid in the interpretation of VUSs and raise diagnostic rates.

We used a rather narrow definition of clinical utility, focusing on precision medical treatment and family planning possibilities, critical for families with a severely affected or deceased child (46% of diagnosed patients) (eTable 2 in Supplement 2 ). The possibility of offering early tailored medical therapy is expected to substantially impact prognosis. Tailored management could be offered in 9% of our diagnosed patients (6 of 65), including prompt inclusion of 1 patient in a clinical trial ( Table 3 ). Overall, 22% of utility questionnaire respondents reported a change of management for their patients. This rate is somewhat lower than the 25% to 65% reported in the literature; however, the definition of change of management is not uniform across studies, affecting its rate.

A causative genetic variant was not detected in 38% to 50% of our patients. Some of these infants’ phenotypes do not have a genetic etiology, while others may be identified in the future by clinical refinement, periodic reanalysis, new research findings, and/or advanced -omics technologies.

This study has limitations, including possible bias due to lack of systematic evaluation of all newborns eligible for the study. Participation in the pilot was not mandatory; thus, eligible neonates could have been tested by other means. Referral bias could potentially affect diagnostic rates, eg, centers serving population with high rates of consanguinity. Despite this possible limitation, diagnostic rate did not significantly differ among consanguineous (14 of 65) and nonconsanguineous (44 of 90), families ( P  = .11) but was slightly higher in non-Jewish vs Jewish infants. In addition, clinical utility may be underestimated, as it was assessed short term. Long-term follow-up data regarding outcomes in terms of survival, growth and development, medical procedures, and more will provide additional evidence of clinical utility of rtGS in ill neonates. Furthermore, there are still limitations to the GS bioinformatics tools, as exhibited in a hypotonic infant with Prader-Willi syndrome (eTable 2 in Supplement 2 ). Neither TSS nor Franklin detected the maternal UPD, which was subsequently observed on methylation-specific multiplex ligation-dependent probe amplification ordered by the treating geneticist. Retrospective reanalysis revealed a complex mixed maternal UPD spanning approximately 81 megabase on chr15q11.2-q26.3. Following this experience, Franklin improved their algorithm to detect complex heterodisomy. Furthermore, sound clinical judgement is still required, even in the era of artificial intelligence. A neonate was found homozygous for a GBE variant (eTable 2 in Supplement 2 ) classified as VUS according to ACMG guidelines 22 ; however, her phenotype and family history were highly compatible with the severe type of glycogen storage disease type IV, and the variant was reclassified as disease causing. The latter 2 cases highlight the importance of the ongoing crosstalk between bioinformaticians and treating physicians.

In this study, we found that rapid TAT of approximately 7 days was feasible in a national public health care setting with a 50% to 62% diagnostic rate, engaging both NICU staff and medical geneticists in a collaborative model. Early precision medicine, made available via rtGS, holds promise for significant clinical utility in this patient group. Following the success of the Baby Bambi national pilot, rtGS in critically ill neonates has been implemented in routine clinical care.

Accepted for Publication: December 15, 2023.

Published: February 22, 2024. doi:10.1001/jamanetworkopen.2024.0146

Open Access: This is an open access article distributed under the terms of the CC-BY-NC-ND License . © 2024 Marom D et al. JAMA Network Open .

Corresponding Author: Daphna Marom, MD, The Genetics Institute and Genomics Center, Tel Aviv Sourasky Medical Center, Weitzman 6, 6423906 Tel Aviv, Israel ( [email protected] ).

Author Contributions: Drs Marom and Feldman had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Marom, J. G. Cohen, Morhi, Smolkin, L. Cohen, Zangen, Shalata, Peleg, Lavie-Nevo, Mandel, Felszer-Fisch, Fleisher Sheffer, Eventov-Friedman, Ben-Yehoshua, Omari, Globus, Yaron, Singer, Baris Feldman.

Acquisition, analysis, or interpretation of data: Marom, Mory, Reytan-Miron, Amir, Kurolap, J. G. Cohen, Riskin, Chervinsky, Falik-Zaccai, Rips, Ofek Shlomai, Shporen, Ben-Yehoshua, Simmonds, Yaacobi, Bauer-Rusek, Weiss, Hochwald, Koifman, Batzir, Segel, Morag, Reish, Eliyahu, Leibovitch, Schwartz, Abramsky, Hochberg, Oron, Banne, Portnov, Samra, Singer.

Drafting of the manuscript: Marom, Mory, L. Cohen, Zangen, Peleg, Lavie-Nevo, Fleisher Sheffer, Eventov-Friedman, Omari, Globus, Yaron, Morag, Leibovitch, Hochberg, Oron, Portnov.

Critical review of the manuscript for important intellectual content: Marom, Reytan-Miron, Amir, Kurolap, J. G. Cohen, Morhi, Smolkin, Zangen, Shalata, Riskin, Mandel, Chervinsky, Felszer-Fisch, Falik-Zaccai, Rips, Ofek Shlomai, Eventov-Friedman, Shporen, Ben-Yehoshua, Simmonds, Yaacobi, Bauer-Rusek, Weiss, Hochwald, Koifman, Batzir, Segel, Reish, Eliyahu, Schwartz, Abramsky, Hochberg, Banne, Samra, Singer, Baris Feldman.

Statistical analysis: Reytan-Miron, Amir, Ben-Yehoshua.

Obtained funding: Portnov, Baris Feldman.

Administrative, technical, or material support: Marom, Mory, Reytan-Miron, J. G. Cohen, Morhi, L. Cohen, Shalata, Peleg, Lavie-Nevo, Chervinsky, Felszer-Fisch, Falik-Zaccai, Rips, Eventov-Friedman, Simmonds, Yaacobi, Omari, Hochwald, Koifman, Globus, Leibovitch, Abramsky, Hochberg, Oron, Singer, Feldman.

Supervision: Marom, J. G. Cohen, Morhi, Smolkin, Zangen, Riskin, Mandel, Felszer-Fisch, Ofek Shlomai, Hochwald, Yaron, Segel, Reish, Baris Feldman.

Conflict of Interest Disclosures: Dr Marom reported receiving compensation for next-generation sequencing wet work from the Israeli Ministry of Health and receiving next-generation sequencing reagents, bioinformatics solution, editing assistance, and open access publication fee from Illumina during the conduct of the study and receiving speaking fees from Illumina Shine the Light Event and XPeer outside the submitted work. Dr Amir reported receiving compensation for next-generation sequencing wet work from the Israeli Ministry of Health and next-generation sequencing reagents, bioinformatics solution, editing assistance, and open access publication fee from Illumina during the conduct of the study. Dr Baris Feldman reported compensation for next-generation sequencing wet work from the Israeli Ministry of Health and next-generation sequencing reagents, bioinformatics solution, editing assistance, and open access publication fee from Illumina during the conduct of the study and receiving honorarium and travel expenses from Illumina and speaker’s fee from XPeer channel educational video. No other disclosures were reported.

Funding/Support: The study was sponsored by a collaboration between the Israeli Ministry of Health, Illumina Inc, and the Genomics Center at the Tel-Aviv Sourasky Medical Center. The study was partially supported by Illumina Inc, with reagents, bioinformatic solutions, and editorial assistance.

Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data Sharing Statement: See Supplement 3 .

Additional Contributions: We thank Raye Alford, PhD, and Stacie Taylor, PhD (Illumina, Inc), for their support in language editing and formatting according to journal requirements and style. Written permission to include their names was obtained, and they were not compensated for their time outside of their usual salary.

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

IMAGES

  1. 😂 Research paper on social networking. Research Paper Example On The Impacts Of Social

    networking research papers

  2. Organizational Network Analysis (ONA): Theory and Example

    networking research papers

  3. Networking Engineer Research Papers

    networking research papers

  4. (PDF) NETWORKING-HUMAN AREA NETWORK

    networking research papers

  5. 😀 Research papers pdf. 5 Ways to Download Research Papers Free Legally. 2019-01-07

    networking research papers

  6. Latest research papers on computer networks

    networking research papers

VIDEO

  1. Computer Networks Technology

  2. VOLTAGE AND CURRENT STABILITY OF HVDC MMC NON LINEAR CONTROLLER MATLAB SIMULINK

  3. Exact sum PDF and CDF wireless communication matlab code

  4. 6 DOF model of a UAV on MATLAB Simulink communication

  5. Network Analysis

  6. MATLAB SIMULINK SIMULATION OF THERMAL MODELLING OF HEATING SYSTEM IN A HOUSE

COMMENTS

  1. (PDF) Computer Networking: A Survey

    Computer Networking: A Survey September 2015 Authors: Deepa Balasubramaniam Sri Krishna Arts and Science College,Coimbatore Abstract and Figures Computer networks have become increasingly...

  2. Computer Networks

    The International Journal of Computer and Telecommunications Networking Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes … View full aims & scope $2370

  3. PUBLICATIONS

    Thomas Stahlbuhk, Brooke Shrader, Eytan Modiano, " Learning algorithms for scheduling in wireless networks with unknown channel statistics ," Ad Hoc Networks, Vol. 85, pp. 131-144, 2019. 117. Rajat Talak, Eytan Modiano, " Age-Delay Tradeoffs in Queueing Systems ," IEEE Transactions on Information Theory, 2021.

  4. Networking

    Networking Networking is central to modern computing, from WANs connecting cell phones to massive data stores, to the data-center interconnects that deliver seamless storage and fine-grained distributed computing.

  5. Journal of Communications and Networks

    Need Help? US & Canada: +1 800 678 4333 Worldwide: +1 732 981 0060 Contact & Support

  6. Home

    OriginalPaper 17 February 2024 Part of 2 collections: 1- Track on Networking and Applications 1- Track on Networking and Applications A comprehensive survey on communication techniques for the realization of intelligent transportation systems in IoT based smart cities

  7. A comprehensive survey on machine learning for networking: evolution

    This makes ML in networking an interesting research area, and requires an understanding of the ML techniques and the problems in networking. ... In the remainder of the paper, MLP-NNs and NNs in general, will be denoted by the tuple (input_nodes,hidden_layer_nodes +,output_nodes), for instance a (106,60,40,1) MLP-NN has a 160-node input layer, ...

  8. Understanding the role of networking in organizations

    The purpose of this paper is to review and synthesize research and theory on the definition, antecedents, outcomes, and mechanisms of networking in organizations. Design/methodology/approach Descriptions of networking are reviewed and an integrated definition of networking in organizations is presented.

  9. Edge computing in SDN-IoT networks: a systematic review of issues

    Software defined networks and the Internet of Things (IoT) are two major and emerging developments in networking that have much in common and their survival depends on each other. Software Defined Networking (SDN) is one of the 5G enabling innovations that can help design complex, manageable, cost-effective and adaptable networks. On the other hand, Edge Computing (EC) will do automatic ...

  10. Evolution and Impact of Wi-Fi Technology and Applications: A ...

    The IEEE 802.11 standard for wireless local area networking (WLAN), commercially known as Wi-Fi, has become a necessity in our day-to-day life. Over a billion Wi-Fi access points connect close to hundred billion of IoT devices, smart phones, tablets, laptops, desktops, smart TVs, video cameras, monitors, printers, and other consumer devices to the Internet to enable millions of applications to ...

  11. Software-Defined Networking (SDN): A Review

    Software-Defined Networking (SDN): A Review Abstract: The Internet of Everything (IoE) connects millions of machines, vehicles, nodes, smoke detectors, watches, glasses, webcams, and other devices to the internet. These entities need the proper guidance and control for expected performance.

  12. Social Network Sites and Well-Being: The Role of Social Connection

    Many studies have been conducted on Facebook and other social network sites, but to date, no single theoretical perspective has organized the literature on the association of social network sites with well-being. One review of existing research on Facebook described it as "diverse and fragmented" (Wilson, Gosling, & Graham, 2012, p. 203).

  13. A Better Approach to Networking

    But there's a better way to network. Focus on what you're going to ask, not what going to say. Instead of preparing what you'd say when meeting someone new or how you'd respond to ...

  14. Study and Investigation on 5G Technology: A Systematic Review

    The main motto behind this research was the requirements of high bandwidth and very low latency. 5G provides a high data rate, improved quality of service (QoS), low-latency, high coverage, high reliability, and economically affordable services. 5G delivers services categorized into three categories: (1) Extreme mobile broadband (eMBB).

  15. A review paper on wireless sensor network techniques in Internet of

    Thus, a review is presented in this paper with specific attention to aspects of wireless networking for the preservation of energy and aggregation of data. 2. Role of IoT in WSN. Significant classification opinions and surveys of WSN and IoT-based energy-saving technologies have been supported by several research papers and studies.

  16. Home

    281,524 (2022) Latest issue February 2024 | Volume 30, Issue 2 View all volumes and issues Latest articles Heterogeneous Lifi-Wifi with multipath transmission protocol for effective access point selection and load balancing R. Arunkumar B. Thanasekhar Original Paper 21 February 2024

  17. Connected Papers

    You can use Connected Papers to: Get a visual overview of a new academic field Enter a typical paper and we'll build you a graph of similar papers in the field.

  18. IEEE Network

    IEEE Network. null | IEEE Xplore. Need Help? US & Canada: +1 800 678 4333 Worldwide: +1 732 981 0060 Contact & Support

  19. Papers with Code

    In this paper, we aim to obtain the advantages of both approaches, i.e., performing large, realistic, inexpensive, and flexible experiments, using real blockchain software within a virtual environment. To do that, we tackle the challenge of running large blockchain networks in a single physical machine, leveraging Linux and Docker.

  20. [2402.13744] Reasoning Algorithmically in Graph Neural Networks

    The development of artificial intelligence systems with advanced reasoning capabilities represents a persistent and long-standing research question. Traditionally, the primary strategy to address this challenge involved the adoption of symbolic approaches, where knowledge was explicitly represented by means of symbols and explicitly programmed rules. However, with the advent of machine ...

  21. A Columbia Surgeon's Study Was Pulled. He Kept Publishing Flawed Data

    Armed with A.I.-powered detection tools, scientists and bloggers have recently exposed a growing body of such questionable research, like the faulty papers at Harvard's Dana-Farber Cancer ...

  22. Effect of exercise for depression: systematic review and network meta

    Objective To identify the optimal dose and modality of exercise for treating major depressive disorder, compared with psychotherapy, antidepressants, and control conditions. Design Systematic review and network meta-analysis. Methods Screening, data extraction, coding, and risk of bias assessment were performed independently and in duplicate. Bayesian arm based, multilevel network meta ...

  23. 5G, 6G, and Beyond: Recent advances and future challenges

    In Table 1, we compared these survey papers to highlight the added value introduced by our paper. Most of the conducted research did not cover all the xG network aspects including requirements, architecture, taxonomy, use cases, and applications. Some papers were dedicated to only one of the 5G use cases.

  24. OpenAI teases an amazing new generative video model called Sora

    OpenAI has built a striking new generative video model called Sora that can take a short text description and turn it into a detailed, high-definition film clip up to a minute long.. Based on four ...

  25. Stanford Medicine study identifies distinct brain organization patterns

    The lead authors are senior research scientist Srikanth Ryali, PhD, and academic staff researcher Yuan Zhang, PhD. "Hotspots" that most helped the model distinguish male brains from female ones include the default mode network, a brain system that helps us process self-referential information, and the striatum and limbic network, which are ...

  26. National Rapid Genome Sequencing in Neonatal Intensive Care

    Key Points. Question Can rapid trio genome sequencing (rtGS) be deployed in a national public health care setting?. Findings In this cohort study that included all neonatal intensive care units in Israel, rtGS in 130 neonates suspected of having a genetic disorder revealed a diagnosis in 50% (12 chromosomal and 52 monogenic disorders and 1 uniparental disomy).