# Research

## Results

#### Probabilistic analysis of systems

In 2021 the activities have been concentrated around different topics related to the analysis of distributed systems. We also continued the study of the transient regime of Markov models, and we also worked on some aspects of cellular networks in the area of this subsection.

Probabilistic analysis of population protocols. The computational model of population protocols is a formalism that allows the analysis of properties emerging from simple and pairwise interactions among a very large number of anonymous finite-state agents. We analyze in 38 a new asynchronous rumor spreading protocol to deliver a rumor to all the nodes of a large-scale distributed network. This protocol relies on successive pull operations involving $k$ different nodes, with $k=2$ or $k=3$ , and called $k$ -pull operations. Specifically during a $k$ -pull operation, an uninformed node $a$ contacts $k–1$ other nodes at random in the network, and if at least one of them knows the rumor, then node $a$ learns it. We perform a detailed study in continuous-time of ${\Theta }_{k,n}$ , the total time needed for all the $n$ nodes to learn the rumor. We obtain, for $k\in \left\{2,3\right\}$ , the mean value, the variance and the distribution of ${\Theta }_{k,n}$ together with their asymptotic behavior when the number of nodes $n$ tends to infinity.

These results are extended in the discrete-time case in 27, where we propose and analyze the general case of a $k$ -pull operation, with $k\ge 2$ . We perform a thorough study of the total number ${T}_{k,n}$ of $k$ -pull operations needed for all the $n$ nodes to learn the rumor. We compute the expected value and the variance of ${T}_{k,n}$ , together with their limiting values when $n$ tends to infinity. We also analyze the limiting distribution of $\left({T}_{k,n}–E\left({T}_{k,n}\right)\right)/n$ and prove that it has a double exponential distribution when $n$ tends to infinity. Finally, we show that when $k>2$ , our new protocol requires less operations than the traditional 2-push-pull and 2-push protocols by using stochastic dominance arguments. All these results generalize the standard case $k=2$ .

The average-based distributed algorithms, analysed in 24, rely on simple and pairwise random interactions among a large and unknown number of anonymous agents. This allows the characterization of global properties emerging from these local interactions. Agents start with an initial integer value, and at each interaction keep the average integer part of both values as their new value. The convergence occurs when, with high probability, all the agents possess the same value, which means that they all know a property of the global system. Using a well chosen stochastic coupling, we improve upon existing results by providing explicit and tight bounds of the convergence time. We apply these general results to both the proportion problem and the system size problem.

Collecting data in large scale systems. We propose and analyse in 46 the performance and the vulnerability to attacks of three algorithms for collecting longitudinal data in a large scale system. A monitoring device is in charge of continuously collecting measurements from end-devices. The communication graph is connected but not necessarily complete. For scalability reasons, at each collect, a single end-device is randomly selected among all the end-devices to send the content of its local buffer of data to the monitoring device. Once sent, the end-device resets its buffer, and resumes its measurement process. Two of the three algorithms are randomized algorithms while the third one is deterministic. We study the transient and stationary maximum load distribution at end-devices when collects are made using the first and third algorithm, and we provide bounds via a coupling argument when the second algorithm is used. While the third algorithm provides the best performance, it is highly vulnerable to attacks.

Data streaming and sampling. Distributed systems increasingly require the processing of large amounts of data, for metrology, safety or security purposes. We consider the problem of achieving uniform node sampling in large scale systems in presence of Byzantine nodes. The uniform node sampling service offers to applications using it a single simple primitive that returns, upon invocation, the identifier of a random node that belongs to the system. We first propose an omniscient strategy that processes on the fly an unbounded and arbitrarily biased input stream made of node identifiers exchanged within the system, and outputs a stream that preserves the uniformity property. Informally, uniformity states that any node in the system should have the same probability to appear in the sample of any correct node of the system. We show through a Markov chain analysis that this property holds despite any arbitrary bias introduced by the adversary. We then propose a strategy based on a sketch data structure that is capable of approximating the omniscient strategy without requiring any prior knowledge on the composition of the input stream. We show through both theoretical analysis and extensive simulations that this “knowledge-free” strategy accurately approximates the omniscient one. We evaluate the resilience of the knowledge-free strategy by studying two representative attacks (flooding and targeted attacks). We quantify the minimum number of identifiers that Byzantine nodes must insert in the input stream to prevent uniformity. Finally, we propose a new construction that processes each input stream with sketches put in series that allows to both increase the accuracy of a single sketch and decrease the time to converge to a uniform output stream. To our knowledge, such a work has never been proposed before 17.

Phenotypic characteristics of a plant specie refer to its physical properties as cataloged by plant biologists at different research centers around the world. Clustering species based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently given promising results for genotypic data. The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant species, we test it on the phenotypic data obtained from about 2400 Soybean species. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ) 28. The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost the same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45% more accurate than HC in terms of clustering accuracy. The computational complexity of our algorithm is more than a magnitude less than that of HC.

Due to the large sizes of data, clustering algorithms often take too much time. Sampling this data before clustering is commonly used to reduce this time. In this work, we propose a probabilistic sampling technique called cube sampling along with K-Prototype clustering. Cube sampling is used because of its accurate sample selection. K-Prototype is most frequently used clustering algorithm when the data is numerical as well as categorical (very common in today’s time). The novelty of this work is in obtaining the crucial inclusion probabilities for cube sampling using Principal Component Analysis (PCA). Experiments on multiple datasets from the UCI repository demonstrate that cube sampled K-Prototype algorithm gives the best clustering accuracy among similarly sampled other popular clustering algorithms (K-Means, Hierarchical Clustering (HC), Spectral Clustering (SC)). When compared with unsampled K-Prototype, K-Means, HC and SC, it still has the best accuracy with the added advantage of reduced computational complexity (due to reduced data size) 51.

Blockchain and AI. Bitcoin system (or Bitcoin) is a peer-to-peer and decentralized payment system that uses a cryptocurrency named bitcoin (BTC) and was released as open-source software in 2009. Unlike fiat currencies, there is no centralized authority or any statutory recognition, backing, or regulation for BTCs. All transactions are confirmed for validity by a network of volunteer nodes (miners) and after that, a collective agreement is recorded into a distributed ledger “Blockchain”. The Bitcoin platform has attracted both social and anti-social elements. On the one hand, it is social as it ensures the exchange of values, maintaining trust in a cooperative, community-driven manner without the need for a trusted third party. At the same time, it is anti-social as it creates hurdles for law enforcement to trace suspicious transactions due to anonymity and privacy. To understand how the social and anti-social tendencies in the user base of BTC affect its evolution, there is a need to analyze the BTC system as a network. A first study has explored the local topology and geometry of the network during its first decade of existence. The characteristics, local and global network properties of the user’s graph were analyzed at ten intervals between 2009-2020 with a gap of one year 26. Afterwards, we focused on illegal activities using BTC systems. We thus utilize machine learning for identifying these illicit entities and addressed the issue by implementing an ensemble of decision trees for supervised learning. More parameters allow the ensemble model to learn discriminating features that can categorize multiple groups of illicit users from licit ones 11. The proposed model provided a reliable tool for forensic study. In parallel, we conducted an experiment on a dataset of 1216 real-life entities 25.

Transient analysis of Markovian models. Classic performance evaluation using queueing theory is usually done assuming a stable model in equilibrium. However, there are situations where we are interested in the transient phase. In this case, the main metrics are built around the model’s state distribution at an arbitrary point in time. In dependability, a significant part of the analysis is done in the transient phase. In previous work, we developed an approach to derive distributions of some continuous time Markovian models, built around uniformization (also called Jensen’s method), transforming the problem into a discrete time one, and the concept of stochastic duality. This combination of tools provides significant simplifications in many cases. However, stochastic duality does not always exist. Recently, we discovered that an idea of algebraic duality, formally similar to stochastic duality, can be defined and applied to any linear differential system (or equivalently, to any matrix). In this case, there is no limitation, the transformation is always possible. We call it the exponential-dual matrix method. In 56, we describe the limitations of stochastic duality and how the exponential-dual matrix method operates for any system, stochastic or not. These concepts are illustrated throughout our article with specific examples, including the case of infinite matrices. The heart of the content of this chapter was presented in the JMM conference 53. In the same forum, we presented some views on the discrete time case (matrix powers)analyzing specific particluar cases 52.

Cellular Networks with Delay-Tolerant Users. In 29, we analyze the impact of delaying delay-tolerant calls under certain conditions in cellular networks. We propose to queue the call if the user agrees when the terminal has bad radio conditions and the system is loaded. The call is served as soon as radio conditions become good or the current load goes below a given threshold. We model the system as a continuous-time Markov chain, which allows us to compute the blocking probability, the mean waiting time and the mean service time. For instance, numerical results show that when the proportion of users with delay tolerance is 20%, the system can bear 16% more calls with the same blocking probability, and 113% more calls if 80% of users are delay tolerant.

#### Machine learning

Our activities around machine learning tools continue to grow. We describe in this subsection only the approaches proposing new learning methodologies.

Monitoring with Machine Learning. The use of artificial intelligence techniques in monitoring is becoming an essential building block for the new techniques developed in recent years. In this context, we proposed an application in a specific use cases.

Multilabel scene classification has emerged as a critical research area in the domain of remote sensing. Contemporary classification models primarily emphasize on a single object or multiobject scene classification of satellite remote sensed images. These classification models rely on feature engineering from images, deep learning, or transfer learning. Comparatively, multilabel scene classification of Very High Resolution (VHR) images is a fairly unexplored domain of research. Models trained for single label scene classification are unsuitable for the application of recognizing multiple objects in a single remotely sensed VHR satellite image. To overcome this research gap, the current inquiry proposes to fine‐tune state of the art Convolutional Neural Network (CNN) architectures for multilabel scene classification. The proposed approach pre-trains CNNs on the ImageNet dataset and further fine‐tunes them to the task of detecting multiple objects in VHR images. To understand the efficacy of this approach, the final models are applied on a VHR data base: the UCMERCED image dataset containing 21 different terrestrial land use categories with a submeter resolution. The performance of the final models is compared with the graph convolutional network‐based model by Khan et al. From the results on performance metrics, it was observed that the proposed models achieve comparable results in significantly fewer epochs 22.

Mixing Reservoir Computing end Evolutionary algorithms. Reservoir Computing models are a class of recurrent neural networks that have enjoyed recent attention, in particular, their main family, Echo State Networks (ESNs). These models have a large number of hidden-hidden weights (in the so-called reservoir) forming a recurrent topology. The reservoir is randomly connected with fixed weights during learning: only readout parameters (from reservoir to output neurons) are trained; the reservoir weights are frozen after initialized. Since the reservoir structure is fixed during learning, only its initialization process has an impact on the model’s performance. In 32, we introduce an evolutionary method for adjusting the reservoir non-null weights. The evolutionary process runs on the frequency space corresponding to a Fourier transformation of the weights. This allows to reduce the dimension of the space where learning takes place. We combine an evolutionary search in the Fourier space with supervised learning for the readout weights. The resulting algorithm, called EvoESN (Evolutionary ESN), obtains promising results modeling two well-known problems of the chaotic time-series area.

Combining FFT/LSTM for Time-series forecasting. Over the last few years, networks’ infrastructures are experiencing a profound change initiated by Software Defined Networking (SDN) and Network Function Virtualization (NFV). In such networks, avoiding the risk of service degradation increasingly involves predicting the evolution of metrics impacting the Quality of Service (QoS), in order to implement appropriate preventive actions. Recurrent neural networks, in particular Long Short Term Memory (LSTM) networks, already demonstrated their efficiency in predicting time series, in particular in networking, thanks to their ability to memorize long sequences of data. In 35, we propose an improvement that increases their accuracy by combining them with filters, especially the Fast Fourier Transform (FFT), in order to better extract the characteristics of the time series to be predicted. The proposed approach allows improving prediction performance significantly, while presenting an extremely low computational complexity at run-time compared to classical techniques such as Auto-Regressive Integrated Moving Average (ARIMA), which requires costly online operations.

Collaborative Exploration and Exploitation in massively Multi-Player Bandits. In 57, we propose an approach to optimize the performance of Internet of Things (IoT) networks. We formulate the optimization problem as a massive multi-player multi-armed bandit problem, where the devices are the players and the radio channels are the arms, with collisions possibly preventing message reception. For handling a realistic IoT network, we do not assume that sensing information is available (i.e. that the collision are observed) or that the number of players is smaller than the number of arms. As the optimization problem is intractable, we propose two greedy policies: the first one focusing on the number of successful communications, while the second one also takes into account fairness between players. In order to implement an approximation of the targeted policies, we propose an explore-then-exploit approach, and establish a regret lower bound. For estimating the mean reward of arms, we propose a decentralized exploration algorithm with controlled information exchanges between players. Then we state that the regret of the estimated target policy is optimal with respect to the time horizon T. Finally, we provide some experimental evidences that the proposed algorithms outperform several baselines.

Federated learning in security management for IT and OT. The Internet of Things has begun to spread over a variety of domains, including industry and finance. It represents an increasing threat for both IT and OT. The lack of collaboration results in the same attacks targeting different organizations one after the other. Often employed as an answer to this problem, cyber threat-intelligence sharing induces its own set of challenges: trust, privacy, and traceability. This work 54 takes advantages of a distributed sharing-oriented architecture and to enhance the security of industrial infrastructures. We study Federated Learning algorithms to build a distributed, autonomic system for detecting and characterizing attacks, as well as providing countermeasures. Experiments on real-world testbeds at the chair Cyber CNI allow us to validate the theoretical assumptions against realistic infrastructures and scenarios, fitting industrial use-cases.

#### Future networks and architectures

The general field dealing with next generation networks and architectures analyzes the architectural evolutions of networks while addressing the need to develop new algorithms to support the new functions of the network.

In 2021, we still have had a particular focus on network resources’ orchestration, in particular through network slicing. We have also continued our work on the monitoring of these networks by trying to minimize the costs of monitoring while allowing better detection of failures.

Network slicing and resources’ orchestration. Resource allocation of 5G and beyond 5G (B5G) network slices is one of the most important challenges for network operators. Network slicing mainly consists in placing constrained services, which are typically expressed in the form of a Virtual Network Function-Forwarding Graph (VNF-FG). Several solutions have already been proposed in the literature to address these types of problems. However, in these approaches, past placement experiences yield no benefit to new placements (i.e., noting is learnt from the obeserved past).

In 47, we propose a highly reliable solution for systematically placing network services, touching the optimal results while maintaining the scalability, making it suitable for online scenarios with strict time constraints. We organized our solution as a Branch and Bound search structure, which leverages Artificial Intelligence (AI) search strategies (Especially A-Star) to address the placement problem, following the popular objective of Service Acceptance (SA). Extensive empirical analysis has been carried out and the results confirm a significant improvements compared to existing work.

In 62, we introduced a platform for dynamic virtual network embedding. The proposed solution is based on a combination of a deep reinforcement learning strategy and a Monte Carlo (MC) approach. The main idea is to learn to generate, using a Deep Q-Network (DQN), a distribution of the placement solution, on which a MC-based search technique is applied. This makes the agent’s exploration of the solution space more efficient.

Due to the exploration-exploitation dilemma, the solutions obtained by DRL-based approaches can be infeasible. This can lead to reliability concerns. To overcome this issue, we combine, in 43, DRL and relational Graph Convolutional Neural (GCN) networks in order to automatically learn how to improve the quality of heuristics in the placement of services. Simulation results show the effectiveness of our procedure. Starting with an initial solution given by the heuristics it can find an improvement of about 35% on average.

To address the diversity of use cases envisioned by the 5G technology, it is critical that the design of the future network allows maximum flexibility and cost effectiveness. This requires that network functions should be designed in a modular fashion to enable fast deployment and scalability. This expected efficiency can be achieved with the cloud native paradigm where network functions can be deployed as containers. Virtualization tools such as Kubernetes offer multiple functionalities for the automatic management of the deployed containers hosting the network functions. These tools must be applied efficiently to improve the network functions availability and resilience. Our paper 41 focuses on resource allocation in a Kubernetes infrastructure hosting different network services. The objective of the proposed solution is to avoid resource shortage in the cluster nodes while protecting the most critical functions. A statistical approach is followed for the modeling of the problem as well as for its resolution, given the random nature of the treated information.

Future networks monitoring. Network Slicing (NS) is a key technology that enables network operators to accommodate different types of services with varying needs on a single physical infrastructure. Despite the advantages it brings, NS raises some technical challenges, mainly ensuring the Service Level Agreements (SLA) for each slice. Hence, monitoring the state of these slices will be a priority for ISPs.

In 42, a new monitoring procedure for anomaly localization, customized for NFV-based network infrastructures deployed with the Service Function Chaining (SFC) mechanism, one of the most important key enablers for NFV networks. Our solution allows the deployment of efficient probing schemes that guarantee the localization of multiple simultaneously failed nodes with a minimum cost. This is formulated as a graph matching problem and solved with a max-flow approach. Simulations show that our solution localizes the failed nodes with a small rate of false positives and false negatives.

To overcome high measurements overhead, network tomography (NT) is a promising solution, in future virtualised networks. Indeed, network tomography consists of a set of methods to infer unmeasured network metrics using end-to-end measurements between monitors. In 45, we focus on inferring the additive metrics of slices such as delays or logarithms of loss rates. We model the inference task as a regression problem that we solve using neural networks. In our approach, we train the model on an artificial dataset. This not only avoids the costly process of collecting a large set of labeled data but has also a nice covering property useful for the procedure’s accuracy. Moreover, to handle a change on the topology or the slices we monitor, we propose a solution based on transfer learning in order to find a trade-off between the quality of the solution and the cost to get it. Simulation results with both, emulated and simulated traffic show the efficiency of our method compared to existing ones in terms of both accuracy and computation time.

NT-based solutions require constraining monitoring traffic to follow specific paths, which we can achieve by using segment-based routing (SR). This allows deploying customized probing scheme, such as cycles’ probing. A major challenge with SR is, however, the limited length of the monitoring path. In 44, we focus on the complexity of that task and propose MonGNN, a standalone solution based on Graph Neural Networks (GNNs) and genetic algorithms to find a trade-off between the quality of monitors’ placement and the cost to achieve it. Simulation results show the efficiency of our approach compared to existing methods.

Toward operative QoE models for streaming services. The measurement of the Quality of Experience (QoE) of multimedia streaming services (MMSS) over IP networks may be done using objective QoE models. These are mathematical functions transforming metrics from technical to user domains. An interesting category of QoE models predicts QoE scores of MMSS at runtime, letting use them for the monitoring operation. This requires integrating them inside the production environments, i.e., where the actual MMSS are consumed by end-users. This aspect is often neglected by QoE modelers that focus mainly on the accuracy and fitness of their designed models with respect to a given set of settings and conditions. As a consequence, a considerable technical effort should be made in order to bring them from the laboratory to the production environments. This obviously discourages MMSS providers to easily accept and adopt them.

For the sake of enhancing QoE models integration, in 37 we propose Mesukal, a software-layer ensuring portability of QoE models over a variety of underlying MMSS, e.g. YouTube or Netflix. Specifically, Mesukal acts as a Java Virtual Machine (JVM) enabling to build portable applications over different OS, e.g. Windows, Linux or MacOS. Mesukal can be considered as a virtual MMSS that is able to seamlessly interact with QoE models, on the one hand, and arbitrary real MMSS, on the other hand. Each considered MMSS over IP networks is appropriately virtualized by a dedicated Mesukal App. Besides real MMSS, Mesukal can be used to instantiate experimental MMSS where the accuracy and portability of QoE models may be inspected and checked under controlled conditions. The inputs needed by the concerned QoE models are fetched from each real MMSS using probes that are tailored following the technology used by the considered multimedia service. In addition, Mesukal includes a rich GUI dashboard that enables to inspect QoE results.

#### Wireless Networks

The general domain dealing with wireless networks covers the optimization of these networks at multiples levels.

In 2021, we have continued our activities on improving the support of a large (i.e., massive) number of wireless IoT devices. We also continued our activities on improving video streaming in these networks through the improvement of multi-homing support.

Optimizing wireless IoT Networks. Driven by various services and applications, Machine Type Communications (MTC) will become an integral part of our daily life over the next few years. Meeting the ITU-T requirements, in terms of density, battery longevity, coverage, price, and supported mechanisms and functionalities, Cellular IoT, and particularly Narrowband-Internet of Things (NB-IoT), is identified as a promising candidate to handle massive MTC accesses. However, this massive connectivity would pose a huge challenge for network operators in terms of scalability. Indeed, the connection to the network in cellular IoT passes through a random access procedure and a high concentration of IoT devices would, very quickly, lead to a bottleneck. The latter procedure needs, then, to be enhanced as the connectivity would be considerable. With this in mind, we propose, in 55, to apply the access class barring (ACB) mechanism to regulate the number of devices competing for the access. In order to derive the blocking factor, we formulate the access problem as a Markov decision process that we were able to solve using one of the most advanced deep reinforcement learning techniques. The evaluation of the proposed access control, through simulations, shows the effectiveness of our approach compared to existing techniques such as the adaptive one and the Proportional Integral Derivative (PID) controller. Indeed, it manages to keep the proportion of access attempts close to the optimum, despite the lack of accurate information on the number of access attempts.

Despite the multiple benefits that such technology offers, the quick depletion of sensors’ battery power represents a major concern, mainly due to the extensive computational tasks and communication operations performed by individual sensors. Indeed, the cost of replacing batteries can be prohibitively expensive, especially when sensors are deployed in areas where access is difficult, in urbanized cities. To extend sensors’ lifetime, we proposed in 30 a new variant of the LEACH protocol named LEACH enhanced with probabilistic cluster head selection (LEACH-PRO). LEACH-PRO introduces several measures to extend WSNs nodes’ lifetime such as cluster head node selection using a probabilistic function based on maximum residual energy and minimum distance to the sink. The obtained simulation results have proven the supremacy of LEACH-PRO over LEACH and direct transmission protocol in terms of the achieved network lifetime and the generated traffic overhead. Most importantly, LEACH-PRO will significantly extend the sensors’ lifetime, which would make this type of deployment more viable in smart city scenarios.

Enhancing dynamic adaptive streaming over HTTP for multi-homed users. Nowadays, multimedia streaming traffic reaches 71% of the mobile data traffic over the world and most of the multimedia services use Dynamic adaptive streaming over HTTP (DASH) to adjust video delivery to the dynamic network environment and achieve higher user Quality of Experience (QoE) levels. Moreover, 90% of the video traffic is consumed by smart devices equipped with multiple network interfaces (Wifi, 3G, and 4G) known as multi-homed devices. In 20, we proposed a survey analyzing the protocols, the mechanisms, and the latest standards proposed in the literature for improving the performance and quality of video content in multipath and multihomed overlay networks. Multipath is a broader term, but in the context of this survey, we define multipath as enhancing network routing technique by using various paths that are not necessarily completely disjoint. Most existing surveys are specialized in one specific domain area related to multipath or multihoming. This study covers the research proposals at the different layers/sublayers of an overlay network from transport to the application and extends to cover the latest technologies like machine learning, Fog and Mobile Edge computing, VR 360 video, and the Internet of Multimedia Things (IoMT). As such, our work tries to be as comprehensive as possible to relate multipath and multihoming research solutions for video streaming to the current and emerging video streaming technologies.

For enhancing the video quality received by “multi-homed client” (i.e., clients with multiple network interfaces), we proposed, in 21, a network selection algorithm based on the Multi-Armed Bandit heuristic on top of the DASH protocol. For choosing the best network at each progression powerfully, DASH gives mobile video quality dependent on the apparent exhibition from the pre-owned system association through the Adaptive Bitrate Rules (ABR) without investigating the system conditions through the other network(s) which could give better quality. Subsequently, a few alterations for DASH ABR is required to enhance the video quality. Two of the MAB algorithms (UCB and Epsilon Greedy) were embraced for improving MPEG-Dash. The investigations are performed through a proving ground execution to show that UCB surpasses Epsilon Greedy, in stable system conditions, regarding goodput received by the Dash customer. Additionally, UCB can discover the harmony between investigating new choices, and abusing the triumphant variation. Index Terms-DASH, Multi-homed, video streaming .

Indoor localization with FTM. Metric Multidimensional Scaling is commonly used to solve multi-sensor location problems in 2D or 3D spaces. In this paper, we show that such technique provides poor results in the case of indoor location problems based on 802.11 Fine Timing Measurements, because the number of anchors is small and the ranging error asymmetrically distributed. We then propose a two-step iterative approach based on geometric resolution of angle inaccuracies 49. The first step reduces the effect of poor ranging exchanges. The second step reconstructs the anchor positions, starting from the distances of highest likely-accuracy. We show that this geometric approach provides better location accuracy results than other Euclidean Distance Metric techniques based on Least Square Error logic. We also show that the proposed technique, with the input of one or more known location, can allow a set of fixed sensors to auto-determine their position on a floor plan 19.

Because it is unauthenticated and unprotected, our experiments indicate that an adversary can implement ranging and location attacks, causing an unsuspecting client to incorporate forged values into its location computation. FTM clients tend to range against a small set of responders (top 3 to 6 responders with strongest signal). Once ranges have been collected, the client can compute its location using various techniques, such as 3-sphere intersection, matrix error minimization techniques or Kalman filter. Irrespective of the technique, we show in this paper that an attacker can cause a ranging client to deviate from its intended path, which can have dire consequences in some settings (e.g., automatic shuttle in public venue causing damages). We also show that protection intended for attacks on comparable ranging techniques, like GPS, are ineffective in the case of FTM 36. We finally show that protection intended for attacks on comparable ranging techniques, like GPS, are ineffective in the case of FTM. However, we also show that a crowd-sourcing technique that confirms that one AP is known by the others can mitigate the attack exposure 50.

#### Network Economics

The general field of network economics, analyzing the relationships between all actors of the digital economy, has been an important subject for years in the team.

In 2021, we still have had a particular focus on network neutrality issues (basically saying that in the network all flows/packets should be “considered” equal), but trying to look at them from original perspectives, and investigating so-called grey zones not yet addressed in the debate. The problems have been addressed from the following angles.

Non neutrality can be pushed by content providers. There is a trend for big content providers such as Netflix and YouTube to give grades to ISPs, to incentivize those ISPs to improve at least the quality offered to their service; this behavior can be seen as an incentive to be favored by ISPs and therefore not stick to neutrality rules as usually pushed by content providers. We have designed in 23 a model analyzing ISPs’ optimal allocation strategies in a competitive context and in front of quality-sensitive users. We have showed that the optimal strategy is non-neutral, that is, it does not allocate bandwidth proportionally to the traffic share of content providers. On the other hand, we have showed that non-neutrality does not benefit ISPs but is surprisingly favorable to users’ perceived quality.

Apparent difficulties to apply neutrality principles in next generation networks. Slicing is seen as one of the key characteristics of 5G and beyond networks but seems in apparent contradiction with neutrality principles promoted worldwide. We have discussed in 59 the two contradictory but considered compulsory notions.

Heterogeneity of neutrality rules. Network neutrality has recently been repealed in the United States, leading to a worldwide Internet with different imposed policies. We have built and analyzed in 31 a game-theoretic model representing the interactions between users, network providers and content providers in this heterogeneous regulation context, and investigated the impact of two neutrality relaxation policies in a part of the world on all actors, compared with a fully-neutral network. Our results show that neutrality repeal may only favor the ISP in the differentiation-authorized zone, but no other actor, and that it can be worse off for everybody if the regulation procedures are very strict in the neutral area.

Monitoring neutrality. Network Neutrality is protected by law in many countries over the world, but monitoring has to be performed to ensure operators conform to the rules. The Wehe application, jointly developed by Northeastern University and the French regulator ARCEP, allows users to take measurements and analyze them to detect possible traffic differentiation. The paper 33 presents a test bed designed to evaluate the detection capacities of Wehe; by computing the detection probabilities and estimating the potential benefit of an operator willing to differentiate while avoiding detection, we fine-tune and compare the main differentiation types (throughput, packet loss and delay) that an operator could implement.

Neutrality of search engines. Neutrality should not be restricted to access providers: it concerns all other intermediaries between end users and content. Following this idea, the search neutrality debate has appeared from content or service providers complaining about being discriminated and therefore losing market shares due to an unfairly low ranking given by search engines. Those questions stress the need for methodologies and tools to verify bias in search engine rankings and analyze their potential impact. We have developed in 58 and the popularization paper 63 a simple yet effective framework comparing the results of existing search engines. We have presented statistical tests based on outlier detection pointing out potential biases, and introduce two meta engines aiming at reducing bias. All this is implemented in a publicly-available tool from which extensive comparisons and bias investigations are carried out.

Estimation of spectrum valuation for 5G dynamic frequency. The high data rates and diversity of services in 5G require a flexible and efficient use of all the available frequencies. In 5G networks, new approaches of dynamic spectrum sharing will be deployed, allowing Mobile Network Operators (MNOs) to access other incumbents’ spectrum, after obtaining a license from the regulator. The attribution of licenses will be made via auction mechanisms valid for given geographical areas and durations. To determine how to bid, each MNO has to estimate their valuation for spectrum i.e., how much they are willing to pay for it. In 34, we propose a model for estimating that valuation. The model is based on Markov chain modeling of user behavior, to compute the MNO satisfaction as a function of the obtained spectrum, and is illustrated by applying it to real operator data.

#### Monte Carlo

We maintain a research activity in different areas related to dependability, performability and vulnerability analysis of communication systems, using both the Monte Carlo and the Quasi-Monte Carlo approaches to evaluate the relevant metrics. Monte Carlo (and Quasi-Monte Carlo) methods often represent the only tool able to solve complex problems of these types.

Central Limit Theorem for Randomized Quasi-Monte Carlo. Randomized quasi-Monte Carlo (RQMC) can produce an estimator of a mean (i.e., integral) with root mean-square error that shrinks at a faster rate than Monte Carlo’s. While RQMC is often employed to provide a confidence interval (CI) for the mean, this approach implicitly assumes that the RQMC estimator obeys a central limit theorem (CLT), which has not been established for most RQMC settings. To address this, we have provided in 39, and extended in 60, various conditions that ensure an RQMC CLT, as well as an asymptotically valid CI, and examine the tradeoffs in our restrictions. Our sufficient conditions, based on the Lindeberg condition and depending on the regularity of the integrand, generally require that the number of randomizations grows sufficiently fast relative to the number of points used from the low-discrepancy sequence.

Gradient-based optimization and sensitivity analysis. The generalized likelihood ratio (GLR) method is a recently introduced gradient estimation method for handling discontinuities for a wide scope of sample performances. We have put in 40, 61 the GLR methods from previous work into a single framework, simplify regularity conditions for justifying unbiasedness of GLR, and relax some of those conditions that are difficult to verify in practice. Moreover, we have combineed GLR with conditional Monte Carlo methods and randomized quasi-Monte Carlo methods to reduce the variance. Numerical experiments show that the variance reduction could be significant in various applications.

Rare event simulation of regenerative systems. Consider the hitting time $T$ to a rarely visited set $A$ of a regenerative process $X$ . Let $F$ be the cumulative distribution function (CDF) of $T$ , and assume we want to estimate $F$ , along with its $q$ -quantile and conditional tail expectation (CTE). In various asymptotic settings, the distribution of $T/E\left[T\right]$ converges to an exponential as the set $A$ becomes rarer. Thus, we can approximate $F$ by an exponential with mean $E\left[T\right]$ . As the mean $E\left[T\right]$ is unknown, we estimate it via simulation with measure-specific importance sampling to calibrate the approximation. This leads to our so-called exponential estimators of $F$ and the corresponding risk measures. Moreover, as $X$ is regenerative, we can write $T=S+V$ , where $S$ is a geometric sum of lengths of cycles before the one that hits $A$ and $V$ is the time to hit $A$ in the first cycle that visits $A$ . In various asymptotic settings, we also have that $S/E\left[S\right]$ converges weakly to an exponential. As $S$ and $V$ are independent, we can then write the CDF $F$ of $T$ as the convolution of the CDF $G$ of $S$ and the CDF $H$ of $V$ . We then exploit this to construct so-called convolution estimators of $F$ and its corresponding risk measures. We have examined in 48 the behavior of the exponential and convolution estimators. Through simple models, we have shown that the weak convergence to an exponential may hold for $S/E\left[S\right]$ but not for $T/E\left[T\right]$ . Thus, the convolution estimator may be valid, but the exponential estimator may not be. We have also discussed the bias of estimators. Indeed, for moderately rare events, bias could potentially surpass variance. For this reason, we have proposed other estimators, still based on the same regenerative principle, and have compared all of them numerically.

#### A new Importance Sampling approach for large sums of i.i.d. variables.

In 18 we study a well known problem in the area, the estimation of the probability that the sum of  $N$ nonnegative independent and identically distributed random variables falls below a given threshold  $\gamma$ , using importance sampling (IS). We are particularly interested in the rare event regime when  $N$ is large and/or  $\gamma$ is small. To deal with this problem, we have the exponential twisting change of measure, a popular technique that, in most of the cases, compares favorably to other existing estimators. However, it has two main limitations: it assumes the knowledge of the moment generating function of the terms and sampling under the new measure is not straightforward and might be expensive. The aim of this work is to propose an alternative change of measure called Gamma IS, that yields, in the rare event regime, at least the same performance as the exponential twisting technique, without introducing previous limitations. For distributions whose probability density functions (PDFs) are $O\left({x}^{d}\right)$ , as $x\to 0$ and $d>–1$ , we prove that the Gamma IS PDF with appropriately chosen parameters retrieves asymptotically, in the rare event regime, the same performance of the estimator based on the use of the exponential twisting approach. Moreover, in the Log-normal setting, where the PDF at zero vanishes faster than any polynomial, we numerically show that a Gamma IS PDF with optimized parameters clearly outperforms the exponential twisting change of measure. Numerical experiments validate the efficiency of the proposed estimator in delivering a highly accurate estimate in the regime of large  $N$ and/or small  $\gamma$ .

Comments are closed.