Federated Learning for Data Security And Privacy in AI

Overview

The article talks about the concept of federated learning and how it can help address the issues of data ownership and privacy at a time when majority of applications and services are data driven, across the domains of healthcare, automotive, finance and consumer electronics. Although it is still in nascent stages, federated learning can become a potential solution to ensure that the individual’s right to privacy is protected.

In today’s world, where majority of applications and services are data driven, technologies like AI and machine learning play a very important role across the domains of healthcare, automotive, finance and consumer electronics. Conventional machine learning models are often deployed in a centralized manner, where data is sent from the distributed nodes to the central server and gets aggregated there. This type of architecture calls for the training/retraining data to be made available in a central data server and the improved model can be transferred to the nodes to carry out the predictions locally.

Since the edge nodes need to send data to the main server, there is concern regarding data privacy, security and data ownership. Large scale data collection at a single location, even a powerful cloud based server, also has other challenges like single-point failures and data breaches. There are also complicated administrative procedures, data protection regulations and restrictions such as the General Data Protection Regulation (GDPR) which act as roadblocks creating a lack of trust due to the limited transparency of the system.

Federated Learning: What it means

The concept of federated learning addresses the issues of data ownership and privacy by ensuring the the data never leaves the distributed node devices. At the same time the central model is updated and shared to all nodes in the network. The copies of machine learning models are distributed to the sites/devices where data is available and the training of the model is performed locally. The updated neural network weights are sent back to the main repository for updations. Thus multiple nodes contribute to building a common, robust machine learning model iteratively through randomized central model sharing, local optimization, local update sharing and secure model updates. The federated learning approach for training deep networks was first illustrated by AI researchers at Google in 2016.

Given the raising concerns over privacy, the main repository or server is designed completely blind to a node’s local data and training process. The data thus resides with the owner thereby preserving data confidentiality which is highly beneficial for industrial and medical AI applications where privacy is of utmost importance. The topology for federated learning can be peer-to-peer or fully decentralized. Contribution tracking and audit trails can also be enforced.

“The concept of federated learning addresses the issues of data ownership and privacy by ensuring that the data never leaves the distributed node devices. The topology for federated learning can be peer-to-peer or fully decentralized. Contribution tracking and audit trails can also be enforced.”

Ensuring data privacy

To ensure that federated learning guarantees security and privacy, it may also need to be combined with other technologies. The machine learning model parameters that are transferred between the nodes or parties in a federated learning system contains sensitive information and can be prone to privacy attacks. Attackers can steal personal data from nodes or during communication if the data is not encrypted.

Since the data is decentralized, there is no standardized approach for data labelling which can affect the quality of labelling and the integrity of the model. Model inversion or reconstruction attacks can take place thereby leading to data leakage etc. Malicious clients may use adversarial algorithms to perform targeted attacks like model poisoning. These challenges can be addressed by using different techniques like differential privacy using data perturbation, homomorphic encryption, secure multiparty computations and secure hardware implementations.

Strategy and governance

Every organization should have a strong data strategy and governance framework in place regarding the ownership and usage of end user data. Although it is still in nascent stages, federated learning can become a potential solution to ensure that the individual’s right to privacy is protected. By being more transparent regarding how data is being used, giving more ownership, control and ensuring secure storage of individual’s data, federated learning can build smarter machine learning models without compromising data privacy and security.

Given the above, industry and research communities have started to identify, evaluate and document the security and privacy concerns to usher in widespread utilization and adoption of federated learning approaches for preserving security and privacy.

Source: NASSCOM Communities

Federated Learning for Data Security And Privacy in AI

Follow us on

Follow us on