0:00
Hello everyone, this is Mehreen Tahir and today we will be talking about federated learning
0:08
which is considered to be the new dawn for machine learning for IoT devices
0:14
I have mostly been introduced, but let me go through the introduction once again. I'm a PhD
0:20
student and a researcher at SFI Center for Research Training in Machine Learning
0:24
based in Dublin City University, Ireland. I'm also working as technical specialist and author for
0:32
Content Lab. And I've been recognized by Code Project and C Sharp Corner as MVP. You can see
0:40
my shirt as well. And previously, I've also worked as an author for Pact Publishing. So that is me
0:48
to give you an overview of what we will be talking about in today's session
0:55
Let's go through the agenda quickly. First, we will just scan the edge and IoT ecosystem
1:02
We will talk about the emergence of intelligent devices. Are they very new
1:06
Or is it something that has been there for a long time? We will talk about the protocols for device communication
1:14
Then we will move forward with federated learning and discuss the scenarios where we would want to apply federated learning
1:22
and the scenarios where we would not want to apply federated learning
1:26
We will also talk about the challenges that federated learning face these days
1:31
and then I will leave the floor open for any questions that you might have
1:38
And this is going to be a high-level overview since it's an advanced concept
1:43
but perhaps we can follow up with the hands-on lab later on
1:47
contact Simon for that okay let's begin so first of all why are we even
1:54
interested in IOT devices and market to highlight it's important let me give you
2:02
a few numbers so McKinsky has reported that there will be 11.1 trillion market
2:09
value by 2025 that is a huge number we're talking about trillions and Bosch
2:15
has reported for 14 billion connected devices while the cisco says there will be 5 billion
2:21
connected devices gardner had came up with a number of 309 billion iot supplier revenue and
2:28
109 trillion economic value and idc says there will be 7.1 trillion iot solutions revenue this
2:38
is just the market number uh so all the cash that we are talking about so you know that you know
2:45
there's going to be a huge scope uh and let's see what mit review this is from uh 2014 the only
2:54
reason i wanted to highlight it is so you have the idea where we were standing like 10 years ago and
3:01
where we are standing right now so it says the machines that would go online um would exceed
3:07
PCs and smart when we're talking about like connected tanks these things include all the
3:15
sensors all all smart devices however small they might be so in 2010 if you see there was very
3:24
small number of connected devices but but if we move along the right to the right to 2020 the
3:32
The number of these connected things have certainly surpassed even the cell phones that
3:38
we are using right now. And we are talking about the numbers in billions
3:44
Around 30 billion devices. And that was 2020. Imagine what it would be like in 2021
3:53
Global IoT market has increased by 38%. This is year-to-year growth we're talking about
4:02
And by 2020, we had a $500 billion US dollar market for IoT devices
4:13
And if we talk about the subsectors for this market share, we are mostly interested in developing smart cities, which comprises 26%
4:22
then there is industrial IoT, connected health, smart homes, self-driving cars, any type of wearable devices or smart utilities. This is all part of this IoT market. So
4:38
industrial IoT and smart cities are topping the chart, but we're also mostly, the world is now
4:44
also moving towards the connected health and smart home. Connected cars and self-driving cars
4:51
still have a scope. So if anyone of you is interested, please go ahead. So what does this
4:59
imply? Like all those numbers. First, there's no denying that there is a huge IoT storm
5:06
right here. And we're talking about managing billions of connected devices. And they might
5:13
be trillions in the near future. A huge dilute of data has been observed in 2020. And which
5:21
means we're talking about around 1.5 GB of traffic per day from average internet user. This is the
5:28
amount of data that you were producing on average per day. And then if you talk about smart hospitals
5:37
around 3,000 GB data per day, self-driving cars, and we're talking about each one
5:44
had 4,000 GB per day. Radars and sonars almost were producing same 10 to 100 kbps per second
5:54
And now we are moving to per second how it was going. GPS around 50 kilobytes per second
6:02
UDARS 10 to 70 MBs, cameras 20 to 40 MBs, and then connected aircraft were producing 40,000 GB
6:11
per day but the connected factories um where mostly sensors are installed like small sensors
6:19
we're producing thousands and thousands of gb's data per day now we need to make use of this data
6:27
and we have to manage all these connected devices how do we do that um before going there uh let's
6:36
talk about how this market evolved so these connected devices or intelligent systems are not
6:45
new people were using sensors in their factories even in 1980s or before that uh if anyone of you
6:53
is old enough they might know the plc scada automated uh sensors for temperature and humidity
7:02
and everything, they have been there since 1980s. So we are talking about electrical and magnetic
7:08
sensors, which were connected to logic controls. These sensors were previously dumb, but they were
7:15
connected to intelligent hubs where mostly decision making was happening. And also these
7:22
control hubs got more distributed and miniaturized with microcontrollers. The biggest push that this
7:31
industry observed was after the SOCs and ARMs were made cheap. Why? Because it mostly reduced the cost for intelligent nodes
7:42
and also computing schemes were changed totally So you don have to be a pro to interact with iot devices now if we look at the iot devices like these days um most of the devices has local cpu
8:02
they have memory um they have gps gsm lte although there might be small packages of these
8:09
but almost every IoT device has all these. Again, like I said, small packages
8:16
the CPU is not going to be very powerful like your laptop or computer
8:20
but it will be sufficient for the IoT device. Also, the memory is pretty low
8:26
We're talking a few MBs to few GBs at best. I think two, three GBs, that would be it
8:32
And then the links are very slow and lossy. Most of these devices would be battery operated and mostly be installed at some remote locations
8:45
And these IoT devices include Ardinos, BeagleBoard, Raspberry Pi, all those cheap boards from China, which include ESP826, ESP32, and etc
8:58
Any one that you can think of. okay so how do we communicate with all these devices biggest communication is the key
9:09
since these iot endpoints are now intelligent they have their own brain like cpu so we would
9:17
need a communication protocol and this communication protocol should be able to work on low power and
9:25
lossy channels. Most of these protocols are modeled after the web style protocols, such as
9:33
HTTPs and web sockets with proxies or direct use of message queues. Scale might be much smaller
9:42
because we only need to transfer small packets and the transfer rates are lower, but still these
9:49
communication protocols are there. And what types of communication protocols? We are talking about
9:55
device to gateways, which means the communication that will be happening between device to network
10:03
gateway. Then there will be device to device communication, which would be relay type
10:11
Gateway to cloud communication, which is the typical cloud interface these days. And there
10:17
will be device to cloud. So, you know, cloud to device and then device to cloud can also happen
10:24
But that is possible if the cloud is at the edge. We'll talk about that
10:31
So, if we look at the major communication IoT protocol these days, direct messaging
10:39
message queue, telemetry transport, MQTT, most of you, if you have been working with IoT
10:45
might have worked with it it is lightweight small footprint pub sub type um and it requires a message
10:52
broker um it is data agnostic so means there is no standard data structure it's just text-based
11:00
label and info then um messages will be delivered with or without confirmation um or guarantee but
11:09
you know many deliver just fine but once and then there is a contained application protocol
11:16
co-op lightweight it's a fast http um tens of bytes over internet or over thousands of bytes in
11:24
typical http over dcp ip this protocol was specifically designed for constrained nodes
11:33
that have very limited resources and it also has restful interfaces um and it may contact device to
11:42
cloud or via proxy to cloud or device to device so um and there are a few other protocols which
11:51
include cloud web-based or weed wax protocols and there will be different communication medians
11:58
which means we could be transferring data over wire or over wireless or we could be using different
12:06
communication protocols the reason we talked so much about these protocols and technology
12:14
is you know there is a huge number of protocols but how do we decide on one how do we convert
12:23
that is the main question so if uh any of you have worked with the machine learning you guys would
12:31
know the classical machine learning works in a way that you train the model on a central server
12:36
and then uh all the data goes there model is trained on the central server and this is where
12:43
all the decision making is happening how do we uh do that in iot environment are we going to send
12:50
all the data from each iFT device to the central server and then get the prediction but if you are
12:57
sending all the data back to central server that means uh there would be a communication overhead
13:04
a huge communication overhead because they have limited resources so what do we do
13:11
So, another thing that concerns us is privacy. So, all these intelligent systems, like you want to make an album for your kid to easily share with grandma
13:29
or the text completion tasks, or next word predictors, or if you want to write better emails, ysis, and all those
13:40
It happens, but it just needs access to your data. And how do we know that that data would not be leaked to some third party
13:50
I hope you're aware of the controversy that has been happening with WhatsApp and Facebook
13:56
that they have been sharing data with the other companies. So where do we stand on privacy terms
14:03
What do we do again? Okay, so a solution could be to develop intelligent systems, which means that each system or subsystem can make decision based on variety of stimulus
14:19
Distributed decision making is what we're talking about because we don't want to traverse the data through long distances with so many devices involved
14:29
So what do we do is we bring something to our device. We try to implement in-place decision making so the decision could happen right there. And this intelligence would be driven by data, IoT-enabled data collection
14:51
But how do we do that? Can we use edge computing? Can we use distributed machine learning
14:58
Or where do we go from here? Okay, this is where the federated learning comes in
15:08
This is a very special case of distributed machine learning. What happens is we are interested in enabling devices to learn from each other
15:22
So there is IoT at the very base level, which are connected edge devices
15:27
Then there is AI, driving intelligence from device data. We want to train models based on the data and then get prediction or do something intelligent And on top of that is federated learning which it will help us improve the performance at the edge So edge okay let me clarify one thing here Edge is
15:52
edge could be your cell phone. Edge could be any end point where you want the decision to have
15:58
It could be the access point for the cloud. It could be your cell phone. It could be any other
16:05
IOT device, which is intelligent enough. So, more specifically, what we are doing here is
16:15
we are talking about a network of nodes. There are all those nodes with the central server
16:24
but instead of sharing data with the central server, we share model. We don't send data from
16:32
node to server we would send our model to server where it would be aggregated and sent back to
16:40
nodes i know that is a lot to take in so um let's break it down and let's try to understand what is
16:49
happening here first let's see there so imagine there are connected devices let's say we have
16:58
cell phones okay from four of the people that are watching us live right now and there is one
17:04
central server and the server has untrained model okay there's just some random machine learning
17:11
model let's say we want to predict the next word so we have an untrained model sitting at the server
17:17
and now the cluster of ip devices is comprised of four cell phones um which are there we are
17:23
just calling them node at this point what we will do is we will send a copy of that model which was
17:31
sitting on the server to each of the node so if i am in the node i would receive a copy of that model
17:38
on my cell phone if you're in node you would receive a copy of that model on your cell phone
17:44
so each device would receive a copy be. Now, all the nodes in the network has that untrained model. Okay? In the next step
17:58
what we're doing is we are taking data from each node. But by taking, I don't mean that we are
18:07
sharing data, every node has its own data on which it is going to train a model. So if I'm talking
18:16
about myself, the data from my cell phone, which word am I using? What am I writing? What I want to
18:23
say? It will use that data and your cell phone would use your data to train the model on your
18:31
phone not on the server sorry about that and then uh okay so we were talking about training so each
18:42
node is a training the model to fit the data that they have so previously it was this uh just pay
18:50
attention to how the line is on the nodes and now it will train the data according it will train the
18:58
model accordingly to its data and this is how the training is happening on the nodes
19:05
In the next step, each node would send a copy of its trained model back to the server
19:14
Now, again, sorry, I don't know why I might do so many new
19:21
So the server now would combine all these models received from each node by taking an average
19:30
It will aggregate. And then it will train that central model. This model, which is now trained by aggregating the models from each node
19:44
now captures the pattern in the training data on all the nodes
19:49
It is an aggregated one. And the last step, what happens is the server will now send the updated model to each node again
19:59
And the whole process would be repeated all over again. So the first step is server is going to send the model to each node
20:08
Node would train the model locally using their local data and will send a copy of that updated model to the server
20:15
Server will receive each copy from all the nodes and then aggregate the model
20:21
Once it is aggregated, the server will send the copy of the updated model back to the nodes
20:29
And note that there is no data sharing happening. Everything is being achieved at the very edge
20:37
We're doing it on your end. So no data sharing means there is privacy preservation and also very less communication overhead
20:47
So, the model that I just explained is known as vanilla federated learning or federated averaging
20:58
This is a snippet of the code. I didn't want to bring in the code at this moment because this is
21:02
just a lot to take in. This is just to make the idea clear that what is happening. We are receiving
21:10
the model. Okay, one more thing. I think I should clarify this. We are not actually sending the
21:18
whole model. We are exchanging the parameters that it is trained on. So, we don't have to share
21:27
anything to avoid the communication overhead. So, what is happening is the central server will
21:34
receive parameters from each node in the network, and then it will aggregate its central model
21:42
and that's where we get our updated central model. An example of this, like I had explained
21:51
previously, is the next word predictor with the privacy concern. So, Google has been, actually
21:59
the federated learning concept was introduced by google i think in 2017 and they have been
22:05
implemented implementing it in gboard to give a more private experience but more personalized
22:14
keyboard experience to the users okay i hope that this is somewhat clear if not i will leave the floor open for the question later
22:28
on so please feel free to ask um here we will talk about when we would want to apply for greater
22:35
learning um of course foremost option is when we need privacy where we don't want to share our
22:41
private data with anyone uh it could be a retail company it could be just the user so uh this is
22:50
where we would apply apply for greater learning um then if there are bandwidth or power consumption
22:57
concerns then federated learning could come in since we are not actually exchanging the data so
23:04
we won't be using much of the bandwidth or power we are just exchanging the parameters
23:09
and if there is also a high cost of data transfer then federated learning is a yes
23:16
there are cases when we don't want to apply federated learning um
23:21
Those cases include when more data won improve your model So when working with machine learning you can actually construct a learning curve And if you see that more data is helping in improving your model performance
23:37
then you would wanna apply the machine learning. Sorry, federated learning, because we are talking about data from each node
23:45
and that is huge amount of data. Also, so this is almost related to data
23:52
This federated learning is driven by data. if additional data is not correlated um or it's not helping improve our model we don't want to
24:04
apply for data learning or if we have already achieved the ceiling then ceiling means we
24:11
have achieved the desired result we have um the model um uh the way we want it like
24:19
just just the performance we wanted sorry my bad okay um and a few considerations before
24:30
we want to begin first thing is we want to benchmark our performance how much accuracy
24:38
we are talking about how much loss can we take in so this performance should be a benchmark first
24:46
because the training to and fro training the server will keep on sending the parameters to
24:51
nodes nodes will keep on sharing their parameters back to central server will keep on happening
24:57
until we have received our desired result so the foremost thing is we want to benchmark the
25:03
performance then um we want to make sure that performance is improving with more data and the
25:12
models can be combined meaningfully okay so um if there is some if uh if you're talking about a
25:21
banking sector uh there could be a meaningful scenario where we would want to combine um models
25:28
from uh let's say the statement how the purchasing history of the user from their shopping so we
25:34
could predict what user would most likely to be buying next or something so like there should be
25:40
meaning in in the combination of the models that we are talking about and then uh we the nodes that
25:48
we're talking about in the network should also be able to train model and not just protect and last
25:55
but not the least is we want to determine how we will be transferring this these models which
26:02
protocol we would be using how much bandwidth we would be consuming um and everything
26:10
Okay, so are there any tools that will help us build federated learning models
26:17
Well, yes, there are. First is OpenMind. It's an open source community focused on researching, developing, and promoting tools for secure privacy, preserving, and value-aligned artificial intelligence
26:31
I literally just Googled it and pasted it here. But it's an amazing community for the researchers
26:37
If anyone of you is interested in research or in privacy aspect of the machine learning
26:44
please go ahead and check them out. Then there is PyCEP. This is a library for privacy preserving deep learning
26:53
And Google. Google has also introduced TensorFlow Federated, which is the API and also the core
27:01
Both are available. and what it does it it draft tensorflow model and included federated learning implementation
27:08
and of course this is a high level um where you don't have to worry about the nitty-gritties and
27:15
very small details uh it will handle it all by itself but it also pays attention to the separation
27:21
of concerns of model communication and everything so okay all sounds good but are there any challenges well yes there are first is system
27:36
issues power consumption is still a challenge because like like i was giving you um an example
27:44
of uh training on your phone so we don't want to consume or we don't want to compromise the
27:50
performance of the edge device. We don't want to drain the battery of your cell phone. So how much
27:57
power we would be consuming over edge devices and how we would be doing it when we will be
28:05
training the model? This is all a question. Well, there are approaches. Let's say they want to train
28:12
the model when you're not using it, when you're sleeping or something. But again, we don't want
28:16
to drain the battery or something. So that is still an open question. Another system issue is
28:25
dropped connections. So how are we interacting the nodes and the servers? How the communication
28:33
is happening? If it is happening over the internet, maybe someone has turned off internet. Like I
28:39
usually just turn off my internet when I'm going to bed. So which means I'm not giving it any
28:46
connection or any time to train the model? How are we going to deal that situation? Another open
28:53
question. And then there are stragglers. Stragglers are basically the devices that failed to train
29:01
that failed to respond in the given time window, or they failed to complete the model training in
29:10
the given time. So this is another question. And privacy is still a concern. How? Okay, we are not
29:23
sharing data, but we are still sharing model updates, parameters. And, you know, adversaries
29:31
might not be able to inspect data, but they can inspect the model and they can poison your model
29:37
So if there is some poisonous node, some hacker, what are we going to do with it
29:45
How are we going to deal with it? How we will preserve the privacy
29:49
Well, there are approaches which include wrapping it up. Sorry, I just lost the word
30:01
but um so you know we wanna code it encrypt sorry yeah got it encrypt so there are encryption based
30:11
uh techniques uh that we wanna we might be using but again that includes a little overhead so if
30:18
there there is a better way to preserve the privacy and um save the bandwidth at the same time
30:24
open question if anyone is you okay so um this is almost it uh if you are interested in federated
30:34
learning go check it out on google they have pretty good blog on it um there are communities
30:40
building uh this is an advanced concept and it's still developing um so you might face few challenges
30:49
in that case you can get in touch with me on my email on my twitter find me on c sharp corner
30:56
drop me a text an email wherever uh if anyone of you is interested in getting to know more or
31:03
want to discuss or want to explore some research opportunities um yep that would be it from my side