Federated Learning: An Overview(part-1)

What is federated machine learning or simply federated learning? In this blog, we provide a detailed overview of federated learning including current practices and bottlenecks. In the second part, we see how we can extend it beyond a normal server-client approach.

Introduction

Exponential growth in data and the emergence of numerous data sources in recent years opened up interesting research opportunities. At the same time, It also lead to a lot of challenges in terms of collecting, storing, processing and communicating of data.

In a typical machine learning environment, training data have to transfer to a single machine or to a data center. This raises concerns in terms of privacy and communication especially when dealing with a large number of users. See Federated Learning, Machine Learning, Decentralized Data.

  • First, the data may be completely personal, sensitive or legally protected. There are privacy laws such as HIPAA, GDPR which put restrictions for seamless data transfer from end users.
  • Second,  the frequent communication with edges of the network often slow, unreliable and expensive causing high latency and low throughput. Federated learning addresses these two issues in an efficient way.

Federated learning helps the edge(client) devices interactively learn a shared model without transferring the training data to a central location.

  • The edge devices fetch the current model from the central entity, typically a server.
  • Then, learn weights based on local data and share changes as small focused updates back to the central entity.
  • The central entity calculates the average of weights obtained from the different edge devices and updates the existing model.
  • The updated model communicates to the edge devices. 

These steps happen in an iterative fashion.

An Ideal Federated learning environment ensures low latency, less power consumption and address privacy aspects efficiently. Also, to enable more personalised experience, improved local model can apply to the applications in the edge devices. See Federated Learning: Collaborative Machine Learning without Centralized Training Data.

Client-Server approach

Federated learning provides a collaborative machine learning environment. Here, the training process distributed among a selected number of edge devices with the central entity possess a minimal role. Federation of edge devices perform most of the training tasks. The central entity responsible for distribution of the current model and updating it by averaging the focused updates given by the edge devices. The key feature is that the training data won’t leave the edge devices at all. Only the weights learned share with central entity. See Federated Learning.

Typical set-up

A typical federated learning environment involve following steps. See Federated learning: distributed machine learning with data locality and privacy.

1. The central entity initialize the model and select K number of edge devices

The central entity initialise the model and select k number of edge devices
Figure(i). Federated Learning- first phase

In the first phase, the central node initializes a model. It can be just an arbitrary initialization or using some initialization strategies of particular model types such as a linear model, support vector machine or a neural network. The best practice to use publicly available data. For example, in the case of creating a model for predicting the next word in a sequence, it is a good idea to use data from Wikipedia. After initializing the model, central entity share the same with a number of randomly selected edge devices as illustrated in Figure(i).

2. Edge devices update the weights and communicate with the central entity

Edge devices update the weights and communicate with the central entity
Figure(ii). Federated Learning- second phase

Up on receiving the initial model from the central entity, each edge device trains the model using locally available data and calculate weights as illustrated in Figure(ii). Normally, it requires many iterations of algorithms such as gradient descent to obtain the optimum weights. But to have a generic federated model, central entity communicate the number of iterations needed with the edge devices.

After the iterations over, edge devices transfer updated weights to the central entity.

3. Central entity updated the weights of the its model based on the weight updates from edge devices

Central entity updated the weights of the its model based on the weight updates from edge devices
Figure(iii). Federated Learning- third phase

To build a federated model, the central entity uses the updated weights received from the edge devices. This represents in Figure(iii). Common way to combine the models is to calculate the average of weights. Sometimes, it is also possible to use weighted average.

4. The updated model communicated with all the edge devices

The updated model communicated with all the edge devices
Figure(iv). Federated Learning- final phase

The first three steps together form a communication round. This process may have to repeat multiple times until the model parameters get stabilized. The computed federated model will then share among all collaborated edge devices(as in Figure(iv)) where they replace their local model. After that, next training iteration started.

Best Practices

Some of the best practices and research efforts aiming efficient execution of the aformentioned steps are:

  • The number of randomly selected edge devices for sending the initial model is a hyperparameter. It can select based on factors like the number of available nodes, consistency of nodes, communication network characteristics and so on. This hyperparameter influences the number of iterations needed for convergence. So, it is always a good idea to deal with a small number of edge devices in each iteration.
  • As the edge devices often have less reliable network connection, there are chances all initially selected edge devices may not be present during further rounds of training. This should also be taken into consideration during the learning phase.
  • As we noticed, the training data do not leave the edge devices at all. But there may be chances to make some predictions about data from the weight updates and even reconstruct the data[see Model inversion attacks that exploit confidence information and basic countermeasures]. To ensure the privacy, some encryption can use while sending the updated weights to the central entity. Also, can avoid storing the individual weights in the central entity. It can be achieved by immediate averaging or by a policy like only decrypting the messages once a predefined number of edge device updates received[see Practical secure aggregation for privacy-preserving machine learning].
  • Since upload speed usually slower than download, efficient compression techniques can be employed[see Federated learning: Strategies for improving communication efficiency].
  • The number of iterations(steps 1 to 3 mentioned above) needed mainly depends on the quality of updates obtained from the edge devices. One approach is to take several steps of optimization algorithm such as stochastic gradient descent(SGD) before communicating the updated weights to the central entity[see Communication-Efficient Learning of Deep Networks from Decentralized Data].
  • Highly iterative optimization algorithms such as SGD demands low-latency, high throughput connections in accessing the training data. But in a federated learning environment, often have higher-latency, low-throughput connections. In addition, data may not available on a frequent basis. The Federated Averaging Algorithm[see Communication-Efficient Learning of Deep Networks from Decentralized Data] developed by Google can train deep networks is 10-100 times communication efficient and takes fewer iterations to achieve higher quality updates compared to SGD. It exploits the powerful processing capacity residing in the edge devices like mobile phones.
  • There are new research efforts such as Federated optimization[see Federated optimization: Distributed machine learning for on-device intelligence] for improving the optimization techniques in the context of Federated learning where data are unevenly distributed among a large number of nodes.

Federated learning vs Distributed machine learning

Let’s look at some unique characteristics and challenges of federated learning which differentiate it from distributed machine learning(as in a data center)[see Federated Learning, Federated learning: Strategies for improving communication efficiency].

  • More distributed: Federated learning environments are far more distributed than a distributed machine learning one with a huge number of edge nodes.
  • Non-Identical data distribution: In a typical distributed learning environment, weight updates from all the edge nodes will be similar as they hold similar data. This is not the case in a federated learning environment as edge devices may generate data from completely dissimilar distributions. Also, in a federated learning environment the number of training data instances in each edge device vary significantly from others in most cases.
  • Slow and unreliable communication: In Federated learning the edge devices normally have slow connections usually upload speed slower than download. Also, many edge devices and central node are not always connected to the central node. In a data center, a highly faster and optimized network are provided.

Having these properties Federated learning requires special approaches and algorithms.

Applications

If there are some kind of parameter update like gradient descent federated learning concepts can be applied to almost any scenarios. In that aspect, algorithms like linear regression, logistic regression, neural networks can fit well in a federated learning environment. Also in the case of clustering(example: K-Means clustering) the edge devices can update the cluster center based on local data and the central entity can adjust the center by averaging the updates received[see Federated Learning].

In general, Federated learning well suited in following cases[see The future of machine learning is decentralized]

  • Edge device data is too sensitive.
  • Edge device data is impractical to transfer.
  • Large scale personalization is needed.

One most promising application is the language modelling in which the application tries to predict the next word based on previously entered word sequence. For example, when a user type something in search engine, the existing model(received from a central entity) predicts the next word. As soon as the user type the next word, update the model and communicate to the central entity. In this case, new data points are implicitly created by the user[see Federated Learning: Collaborative Machine Learning without Centralized Training Data]. There are also potential use cases in healthcare, banking, military, insurance etc.

To achieve improved personalization, It is possible to extend Federated Learning in a two-phase training process. In the first phase, the central entity in collaboration with all edge devices train the model. In second phase, individual devices adapt to local data or preferences. One problem with this approach is that once the edge devices personalize the model, they cannot contribute to the central model by sending the updated weights as the local model is no more generic. It may lead to a situation where the central model gets outdated. One approach to overcome this is to provide a personalized vector to the model using which individual devices can provide preferences[see Federated Learning].

Bottlenecks

Some of the disadvantages of the aforementioned Federated Learning model considering the learning aspects.

  • Federated learning is not suitable for the applications for which it is necessary to have whole training data to be at one place for training[see Federated Learning: Collaborative Machine Learning without Centralized Training Data].
  • The overall performance of the system depends on the central entity which coordinates with edge devices to update the federated model. There is no direct collaboration among the edge devices in learning.
  • The varying quality and behavior of data in edge devices may lead to a fluctuating federated model in terms of performance. That is, the weight updates may not always get closer to the optimum.

Related Articles

Responses

Your email address will not be published. Required fields are marked *