Federated Learning in Practice: Training AI Models Without Centralizing Data
Flower framework, differential privacy, and production FL for mobile and edge devices
Federated Learning in Practice: Training AI Models Without Centralizing Data
Flower framework, differential privacy, and production FL for mobile and edge devices
Practical guide to federated learning using the Flower framework, covering federation strategies, differential privacy, communication efficiency, and real-world deployment for healthcare and fintech.
Federated Learning trains models across distributed data sources without centralizing data. Core algorithm (FedAvg): 1) Server broadcasts global model to clients. 2) Each client trains on local data for N local epochs. 3) Clients send model weight updates (not data) to server. 4) Server aggregates updates (weighted average) to improve global model. Flower (flwr) framework: simple Python API for FL research and production. Server: fl.server.start_server(server_address="0.0.0.0:8080", config=fl.server.ServerConfig(num_rounds=10), strategy=fl.server.strategy.FedAvg()). Client: inherit from fl.client.NumPyClient, implement get_parameters, set_parameters, fit, evaluate methods. Differential Privacy: add Gaussian noise to gradients before sharing. dp-sgd from tensorflow-privacy or Opacus (PyTorch). Trade-off: more privacy = less accuracy. Epsilon budget tracks privacy expenditure. Communication efficiency: FedProx (handles stragglers), gradient compression (Top-K sparsification, quantization to reduce communication by 100x), asynchronous federation. Real-world challenges: non-IID data (clients have different data distributions causing client drift), system heterogeneity (varying compute and network), client dropout (not all clients participate every round). Applications: Google Keyboard prediction, Apple Siri personalization, healthcare (hospital collaboration without sharing patient data), financial (fraud detection without sharing transaction data).