2:29
Hello, hello, hello. Welcome everyone. My name is Stephen Simon and we are back with yet another episode of Ask Me Anything series
2:48
If you are joining us for the very first time, we do this Ask Me Anything live show every Thursday at 7.30pm ISG and 10am Eastern Time
2:57
And if you are some of our fans joining us from Europe, we do it at 4 p.m. Central Eastern summertime
3:04
So in today's session, we have a very exciting, I'll just get to the guest
3:09
He's a very exciting person. His name is Ico. He works as a consultant and developer for ThoughtWorks
3:15
He's passionate about data science, software craftsmanship, clean code and infrastructure engineering
3:21
while working with clients he focuses on improving the development processes and
3:26
quality of the team he's working with nowadays he's in singapore iko has
3:30
previously worked with clients in germany uk india as well this person is
3:35
really amazing and and i think an adventurous person he loves to go for skydiving and whatnot hiking so
3:43
without wasting any more time let's go and invite our guests at the show today
3:46
iko joining us from all the way from singapore Thank you
4:19
Hi, Aiko. Welcome to the live show
4:34
Hello. Thanks for having me. You look to be someone like very exciting, right
4:39
How about the skydiving? So first of all, what an amazing video
4:45
I did not expect that. Skydiving is very boring. Oh, really? I'm kidding
4:56
Actually, I mean, I'm kidding. Obviously, it's like your heart starts racing
5:00
minutes before you jump out. But especially the first one, the first jump
5:07
was actually like you have to practice a lot and like dry runs
5:12
And the whole first jump is only you try to follow the instructions that you were given so that you don't die. Thank God you had a parachute
5:21
right? Tell me, do they give you an extra parachute or it's only one? Oh, you always have two. There's
5:26
always a backup one. I hope that backup works too. All right, so coming back to the show, right
5:33
We did have an introduction video but as a format of the show, I'll still go ahead and give a quick
5:38
introduction who you are, what you do as you're joining this chapter for the very first time
5:42
people would definitely like to know who is Aiko and where he comes from? Sure, yeah. My name is Aiko
5:49
I've been working with ThoughtWorks for four and a half years now as a consultant and as a developer
5:57
I am particular passionate about machine learning and AI and I'll talk about that a little bit more
6:04
in detail later. And I'm also interested in the ethics of technology and that is basically also
6:11
the general topic that I try to combine these two and what are we talking about today
6:17
Yeah. And then, I mean, skydiving. I like hiking. I like nature. I like warm weather
6:25
Yeah, definitely. That makes you move from Germany to this South Asia region. Right. So
6:31
I think you've traveled from Germany. How has life changed from moving almost like other
6:38
other part of the world right and how are you coping up with this pandemic
6:44
Multiple tough questions in one. Life has changed a lot and that was also
6:50
so before before I moved to Singapore I actually spent one of the
6:54
half years in in Bangalore in India and I had traveled before I like traveling and and I have
7:03
I have seen a couple of different countries and and one thing that I
7:06
would wanted to understand because as I, I mean, growing up in Germany
7:10
at some point I realized through the travel that I was quite, I mean, quite privileged
7:17
There was a lot of things that I thought just I was given for free
7:21
Right. And and there and having, having lived in India for one and a half years was really an eye experience where I where I saw how lots of the things are I mean the culture is very different from what I grew up like
7:38
I learned a lot about a different perspective about living a life like about lots of different
7:43
things I learned a lot about traffic cars right so there's I think there's just in Singapore
7:53
nowadays where there are so many different cultures that all mix together so many different
7:59
fantastic food. I love the food options in Singapore. So that is, I think it just opened
8:04
my mind up a little bit that I see, I have more perspective on things and I see how other people
8:12
look at things and that helps me a little bit understand others a little bit better
8:16
And the pandemic, I mean, I think we all were locked in for some time
8:23
That is stressful. I think that it's very clear. And that, I mean, I think I managed
8:33
I would like to go back to the office and talk to my team again
8:39
have like water cooler conversations. Oh, yeah. A couple of jokes, right
8:43
like this like the the personal interaction is is uh is missing a little bit nowadays i'm looking
8:49
forward to to going back to the office yeah i think definitely we humans are very much like
8:54
social beings right we like we like to communicate talk to each other and we have just somehow managed
8:59
to get to this through this online event and definitely i would i would note that water
9:03
cooler conversation right many things happens over there so we'll keep up to that right so
9:09
Definitely offices must and what an amazing background Aiko. The skyscrapers in the background is pretty cool
9:16
Yeah, it's pretty cool. So let's get back to the business and talk about, we're going to add the title in the live space
9:27
Artificial Intelligence More Like Artificial Stupidity. What it is and people are watching us today, what are they going to expect from this session for the next 35 minutes
9:36
Right. So the title came to be because I noticed that there's a lot of AI now. There's a lot of
9:45
machine learning in a lot of systems that we use and a lot of them works quite well and gives us
9:52
opportunities that we didn't have before. But also a lot of news are about AI failing and there's
10:00
seemingly simple operations that fail. And then I tried to look a little bit deeper into why things
10:06
are failing and and yeah what causes that and then basically i came to the conclusion that there's
10:14
there's often things where we think it would be artificial intelligence but the outcome is
10:20
actually pretty stupid sometimes and then um that is artificial stupidity yeah yeah and i think this
10:27
ai system is quite biased too right i've been there they're sometimes very biased and racist
10:31
so that it's also something very important to note all right so i won't take much of the time right
10:36
We're going to talk a lot about what you do and all after you end this slide
10:40
I'm going to bring your slides into the live show. Right. And next 30 minutes and all is all yours
10:47
Yeah, thanks. Yeah, I mean, I mentioned it, the title, it's not just clickbaity
10:55
It also works because it attracts some attention, but there's actually something behind it
11:01
And I'll explain a little bit more about that in detail in the next couple of minutes
11:08
What are we doing in general? I'll be telling a couple of stories about AI and about ML and
11:14
about data. And some of them are success stories, as I mentioned, but some of them also fail. And then
11:20
we can have a deeper look into those that failed and then see why they failed and look what we can
11:26
do in the future when developing these systems, what we can make better so that it doesn't
11:31
fail that often or that hard. And when I say I'll be telling stories, there's one of the biggest
11:37
storytellers out there, at least from my perspective, is Douglas Adams. And I mean
11:42
he's particularly popular in the tech community. I'm sure lots of viewers have seen some of his
11:48
have read this book, at least heard of it. It has brought us a couple of famous memes that 42 is the
11:55
answer to life, the question of life, the universe and everything, and that we should not panic
12:02
But this book, The Hitchhiker's Guide to the Galaxy, has also brought us
12:07
a different thing that is not as popular, which is this thing called the Babelfish
12:13
And this Douglas Adams describes this fish, this fictitious fish, as this tiny yellow fish
12:20
that exists in the universe and it has this crazy feature in a way
12:27
that if you take this fish and stick it in your ear
12:32
then it somehow picks up the brain waves of the organisms around you
12:37
And what it does, when someone speaks in whatever language they speak in
12:42
the Babelfish translates it directly into your ear in a language that you understand
12:48
And that is, I mean, that is, I mentioned it's a fictitious fish, right? Douglas Adams writes a lot of crazy stuff. And it's definitely, there's no way how this is possible, right? But now, let me show you this other thing
13:10
Hallo und herzlich willkommen zu meinem Vortrag über künstliche Intelligenz. And what we can see here is that this Babelfish behavior where I say something in one language, and unless you speak Malena, it's in English, which is probably way more helpful
13:32
But I speak something in one language, in German in this case, and it automatically, in this case, Google Translate, translates it, picks up my voice and gets the text out of it and translates it into a different language
13:49
So admittedly, it is not as cool as this Babelfish that you that you pluck into your ear. But the general idea of like this, this immediate translation is, is already pretty close. And it's, it's done by by machine learning, right? There's, there's some machine learning system behind it
14:10
And I think that is already an example where functionality that we previously thought is impossible is already existing
14:25
And obviously you say it's not the fish. It's a different thing
14:33
but let me show you this other thing where Google has also these Google Pixel Buds
14:41
and these are in fact tiny, tiny things that you stick in your ear
14:45
I mean, it's earbuds, right? And they have this feature that they can translate
14:51
and live translate into your ear in a language that you understand. You still need a phone and the actual translation happens on the phone and it not fish But but I think the point here is that machine learning is enabling us to do fantastic things that that a couple of decades ago were only appearing in the in the craziest fictional books
15:16
And and I think that is that is something that is one of these positive examples that I mentioned I'll be talking about
15:22
And another positive example is prediction of cancer based on image data
15:31
Like a couple of years ago, machine learning systems started to be better at predicting cancer based on pictures of cells than humans, than trained doctors whose job it is to identify that
15:46
And I think last year, I think it started a couple of years ago with breast cancer
15:51
And I think last year Google released a model that is better at identifying cancer cells in lung cancer patients
16:00
And that is literally a life-changing functionality that we now have in our hands due to the advances in machine learning
16:13
And that is, yeah, I think machine learning is an extremely powerful set of tools or technology in general that can help a lot of humans
16:31
But as I mentioned and as the title suggests, not everything is working perfectly all the time
16:40
And if you I have here the a couple of examples like on the on the right side, we see this this tweet from from this person who is tweeting to Indigo, which is an Indian airline
16:53
And he's sarcastically saying, hey, thanks for flying me to Kolkata while at the same time sending my luggage to Hyderabad
17:02
So obviously that's sarcasm. And you as a person understand that. And the Indigo and I make the assumption that there's a bot that is answering here
17:13
And that bot replies with, hey, we're glad to hear that because of the sarcastic thank you tweet from that user
17:23
So there's definitely the sarcasm. The sarcastic tone was missed very clearly
17:28
And it seems a rather simple thing for us humans. And in the middle, we have an example where someone is texting the PayPal service and is saying, hey, I got scammed
17:42
And PayPal's reaction is, hey, great. So also clearly a misunderstanding, clearly a misinterpretation of the message and of the situation that was described
17:54
And then on the very left, another example where AI horribly failed
18:00
but these are just I mean there are countless of these I have one one particular example that is my
18:07
favorite example of of machine learning failing and if you speak Mandarin you could already try
18:15
to figure out what's going on but everyone else I give a little bit of a background so in in a
18:20
couple of Chinese cities there are pedestrian crossing like zebra crossings right and then
18:26
And there's a traffic light and shows you, hey, if it's green, you can you can grow
18:30
If it's red, you can't. And obviously, some people sometimes jaywalk. I think that is true for every place on our planet
18:41
But in a couple of these Chinese cities, they have installed cameras pointing at this pedestrian crossing
18:48
And what they do is they have image recognition and basically identify, okay, it is red for pedestrians
19:00
Nobody should cross. And if they then see a person crossing, they have this big screen
19:06
And this is what we see here in that photo. They have a big screen
19:10
And then they basically shame that person. Hey, and showing everyone, hey, this person just jaywalked
19:16
This person just crossed the street. even though there was a red light. Now, if you look closely
19:23
you can see that it's not actually a person crossing the pedestrian crossing
19:30
It's a bus that has the face of a person printed on it
19:35
In fact, in this case, I think it's the CEO of this company
19:39
and they advertise with her printed on the bus. but obviously there's a misinterpretation of this
19:48
and the camera zoomed in and showed her on the big screen
19:52
and in fact at that particular moment she was in a different city and she got a notification on her phone
20:00
hey we just saw you jaywalking in that city, that crossing and she was obviously very surprised
20:09
And that is one example where you can see, okay, there is AI that, again, horribly or pretty badly failed and accused someone of jaywalking
20:22
And I don't think it was the case for this particular example, but other cities and other sewer crossings have not only this image recognition and face recognition and they know who that person is
20:39
but they also have that linked to the WeChat pay account. And that is a very popular way of paying
20:49
And due to a couple of regulations, like you need a Chinese bank account
20:53
And in order to get a Chinese bank account, you need to identify yourself. So through this information, people can be identified
21:02
And what happens is they get charged the fee for jaywalking immediately
21:08
like within seconds they get identified hey okay we saw you jaywalking you now have to have to pay
21:14
the fine i don't know what the fine is um but it gets immediately deducted uh from the wechat pay
21:21
account and and that is something i mean i'll be honest from a technical perspective that is a very
21:27
interesting use case there are there's image recognition there's face recognition um there
21:33
are multiple systems, this payment system is connected to that. That is, I mean, technically it is intriguing, but on a different level, maybe on an ethical
21:43
or on a moral level, I mean, if everything would be working perfectly, that also would
21:50
be a different conversation that we would be having. But we can clearly see here that it is not flawless and that it is in fact failing
21:58
And if we now give these machine learning algorithms so much power that they can decide
22:03
who gets fined and who gets not fined. And if we think about other examples
22:08
where the punishment is not just a fine, but maybe jail time or something else
22:14
then having a machine learning system make this decision is what I think
22:21
at least questionable, considering that it can fail Right And I mentioned AI and machine learning a couple of times now I think it is important to have like a common understanding at least for
22:36
for this presentation, have a common understanding of what these terms mean
22:41
at least in my understanding. And, and when I say machine learning
22:45
I like this, I like this definition a lot. I, when I say artificial intelligence and this definition is from the Oxford
22:53
dictionaries and if you type in Google define machine learning this is also the definition
22:56
that you get and this definition reads artificial intelligence is the theory and development of
23:02
computer systems able to perform tasks normally requiring human intelligence. And the key point here to note is that it is not talking about a particular implementation
23:15
It is not talking about a particular technical approach. It is solely talking about
23:22
something that is done by a machine where we would usually think a human needs
23:28
to have intelligence to do that. I can give you one example. It's a chess program
23:34
where if you play chess with someone, you expect that person has
23:40
some level of intelligence. At least I do. And if you now have a chess program
23:48
that can play chess, you would assume that it's artificial intelligence. at least according to this definition
23:55
But if we think about how we could implement such a chess program
23:59
it could be rather simple. Like the only thing that you need is you need to look at the state of the board
24:04
where all the chess pieces are. And then you can calculate the next move
24:09
and all the next moves. And then you can, based on that move
24:14
you can calculate all the subsequent moves. And all you need is basically a loop
24:20
and then maybe a couple of if-else. There's no inherent complex or fancy algorithm in place
24:29
It is basically brute forcing all the moves and then you select the set of moves that leads to your goal
24:35
which is winning the chess match. And so even though it fits under this blanket of artificial intelligence
24:41
which is nowadays often used to make things look shiny and fancy
24:47
and powerful, it is, at least based on this definition, it is merely a perspective on the problems that are being solved
24:59
and not particularly the implementation of the technical approach. Where if you think about machine learning
25:07
which is a subset of this artificial intelligence, I think that is where it becomes very interesting
25:12
because machine learning is in fact is a pretty different approach than the classical
25:21
software engineering where you have an algorithm that is developed by that develops itself instead of a programmer defining it it
25:41
It defines it by the data that you put in. Like you typically a machine learning model is you put in some data and over time it basically adopts, hey, if I get this input data, I have to produce this output
25:58
And if you put in some other set of data, it produces some other output
26:04
And that is something that is really, really interesting. and at least from my perspective
26:10
that is also a little bit different from the classical software engineering. And then there are neural networks
26:16
which is a type of machine learning algorithm that is inspired by the biological brain
26:29
where it has neurons and they are connected. it. And a couple of a couple of layers where the all these neurons are then working together
26:43
And then there's then there's deep learning and deep learning is more or less just neural
26:49
networks, but a lot of these layers. So I think that is that is good to have this understanding
26:54
of these terms. And to give you an example on how a deep neuron could look like, and this
27:02
is actually a very simple example, there are a couple of layers and all these nodes which
27:07
are these neurons inspired by the neurons, and they are all connected. And all of these
27:13
connections, there are some kind of data flowing, some kind of activation, some kind of number
27:19
is moving from one neuron to the other, right? From one computational node to the other. And
27:24
And in a typical real life situation, these networks are significantly bigger
27:33
This has like 10 nodes and like three hidden layers. They can have hundreds and thousands of nodes and 50 and 100 layers
27:44
And then I think the important thing to understand here is that it is so much, it is so complex
27:52
that it becomes, from a human perspective, impossible to understand what is going on
28:00
Like, why is now this particular input? Why does my machine learning model
28:07
why does my deep neural network produce that output? There's so much going on in between
28:12
There's so much data flowing from one node to the other that it's impossible to understand
28:18
to comprehend what's going on, in contrast to the classical software engineering
28:23
where I can debug and if, and if my condition is true, I go here, otherwise I go here and loop, I do it, I do it
28:29
Like that simple debugging doesn't work anymore in these models. And that is one of the problems
28:38
why we see a lot of machine learning systems fail because we simply don't know for the inputs
28:45
what happens with the output. Like we have lost the visibility on why this machine learning model makes it
28:53
how it makes a decision and why it decides in a way. It's basically a black box behavior oftentimes
28:59
And that is one of the reasons why we see it often failing because simply we wouldn't know that it would fail
29:07
But there's another issue. And this black box behavior is one big issue
29:12
and I'll talk about potential solutions later. There's another issue. And I give you, this is a really nice example that I like
29:21
It was in 2011. The city of Boston had a problem. They had potholes
29:27
They had, the streets were not that good. They had potholes and they wanted to fix that
29:33
And since it was 2011, they did what everyone was doing. They built an app
29:38
And they built this app called StreetBump. And the idea is actually quite smart
29:42
The idea was that people installed the app on their phone. and then they would drive around
29:50
they would put the phone on the passenger seat. And whenever there was a pothole, it would bump up
29:56
the phone would recognize through the accelerometer, recognize, ah, okay, there was a pothole
30:02
would then take the GPS coordinates and send those GPS coordinates to the server
30:08
And then the city of Boston would know all the potholes where they are in the city
30:15
Now, what happened in reality was that after some time that they had published the app and they found out
30:21
that mostly all the potholes were in areas in Boston with a high cost of living, in rich neighborhoods
30:30
And if you think about it, it doesn't really make sense that only in rich neighborhoods that would be where potholes are, right
30:37
Potholes are, I mean, probably evenly distributed or close to evenly distributed
30:45
But the data didn't show that. And here's the catch. It was 2011
30:52
Smartphones that could install these apps were very expensive. Not everyone could afford and not everyone had a smartphone that could install these apps
31:02
And they were relatively new. So only the rich people had a smartphone and then could install this app
31:09
And that's why only the high cost of living areas showed up in the map where the portals were
31:19
So the point to note here is there's no machine learning involved
31:26
There's no AI in that way. The only thing that is wrong here is the data itself
31:34
And that is the second part of why I often see machine learning and AI systems fail, because the data is simply wrong
31:44
And they fixed that problem, by the way, later, which I think is evenly smart
31:50
And they later put the app on phones and put those phones into public buses that would drive across the city and garbage trucks that would cover the whole city
32:00
And that's how they then figured out they would cover the whole city street area
32:07
So I think that's a pretty good idea. But let me give you another example where we have already used Google Translate. Let me try
32:23
something else. And maybe some of you remember this. He is a nurse. If I type he is a nurse
32:31
nurse and she is a doctor. And I particularly have used Malay here because Malay is a language
32:41
where the pronouns are gender neutral. So if you, in English, we have he and she, which indicates
32:46
as male or female, you don't have, you can see here it is, I think, dia. I actually don't know
32:53
which word that is, but you can see that there's, there's, this is doctor and this is nurse. And
32:58
and there's no differentiation between the genders. And the thing to note here is that I write he is a nurse
33:07
the male person is a nurse, and she, the female person, is a doctor
33:11
And it translates to this in Malay. And if I now reverse this, now if I translate this back to English, let's see what happens
33:20
All of a sudden the sentence which was he is a nurse gets
33:27
translated back again into she is a nurse and and he is a doctor which I typed as she is a doctor
33:33
gets translated into he is a doctor. And that again like we've even though there is machine
33:40
learning in place here the reason why this is why this mistake happens is because of the data. These
33:49
these translation systems they're basically they just read a lot of text they read a lot of text
33:55
in two different languages, and then over time learn, okay, this part of the text gets translated
34:01
to this part of the text in the other language. And they read a lot of text. I don't know how much
34:06
but a lot. And historically speaking, and that is what you get when you look at a lot of the
34:13
texts that were used. Historically speaking, a doctor was male. I mean, nowadays we know that
34:19
there's no reason for the woman not to be a doctor, and there's also no reason for a man
34:24
not to be a nurse. But historically speaking most of the texts did have these male pronouns for the
34:31
doctor case and the female pronouns for the nurse case. And that is why this sentence which is
34:39
which does not have a gender gets translated into these biased, these social biased gender because
34:46
of the pre-existing data that was used, historical data that had this bias. And there's one
34:54
One more thing that this whole thing, especially this sentence got popular a couple of years
35:01
ago with the Turkish language, which is similar in terms of gender neutral pronouns
35:08
And what Google then did is they offered this special translation where they said, hey
35:15
these translations are gender specific and it can mean she is a doctor or he is a doctor
35:22
But that is something that Google then had to set up individually for this particular case
35:32
Now there's another example and maybe some of you remember it was Tay
35:38
Tay was a bot developed by Microsoft and the idea was this bot knows how to type and now
35:45
it should learn how to speak with other people. And then it was trained on the data that it was given by Twitter users
35:53
So basically you could tweet to this Tay and then that's how we would learn how to speak
36:02
And now if you think about it, if you use how humans interact on Twitter as a basis to teach someone how to speak, then it can only go downhill
36:13
And that's what happened. People found out, okay, they are training this model
36:19
And within 24 hours, Tay got shut down because it turned into a racist and sexist monster
36:25
because people on purpose just texted racist and sexist and other disgusting comments to Tay
36:34
But that again, the reason why Tay turned out to be this racist at Twitter bot or chat bot
36:43
is because of the data that was fed into it. Right and I mean it a Twitter bot I think it was on other platforms as well But it not that bad right But there are cases where we use systems with biased data and that outcome is significantly more important than what a Twitter bot says
37:07
And this COMPASS system is a thing and that is used in the US
37:14
And basically what it does is it helps judges to decide how much bail they should set for
37:22
criminals. So criminals committed a crime, gets caught and then there's a bail and there's also
37:27
jail time or prison time. And this system is basically the criminal
37:34
fills out a form, I think it's 200 something questions and then the system predicts how likely it is
37:43
that that person re-offends, how likely it is that person commits the crime
37:47
again after the person had been punished. So if the system predicts, hey, this person is likely
37:55
to commit again a crime, then the judge would take that into account and set up a higher bail or
38:03
even in some cases extend the prison or the jail time to not let the person out early and commit a crime again
38:15
But then there was ProPublica, which yzed the data. And what they found was that there were a lot of cases where..
38:24
So in this prediction, you can be wrong in two cases, right
38:29
A false positive and a false negative. A false positive in the way where you predict the person will commit a crime again, but the person did not, or you predict the person will not commit a crime again, but then the person did
38:43
And if you look at the data, who these predictions were made for, you can see that the people who were labeled a higher risk but did not reoffend is only 23% white people, but 44.9% African Americans
38:58
So a significantly higher amount that labelled a higher risk when they were black compared to white people
39:06
And the opposite, when people were estimated to have a lower risk of re-offending
39:12
But they actually did re-offend again. That was the case for almost 50% of the white people
39:18
and only 28% of the African Americans. So, and we can here clearly see that there's a racial bias in this system and mainly due to the data that it was being trained with
39:34
And there's the data, the form did not include any racial information about skin color or the ethnical background
39:44
background. It was based on all the other data that was in that, about the neighborhood
39:52
people live in and things like that. So, and that is, and that system is still being used
39:57
even though it is proven as a racial bias against African-Americans. And there's another example
40:05
where this study shows that autonomous driving cars, so these autonomous driving cars are
40:11
basically there are lots of sensors everywhere. Like you have some infrared and some LIDAR and things
40:17
but a lot of them have mostly cameras, or there's even one that only has cameras
40:22
And basically what it is, it scans, like it films the surroundings
40:27
and applies some image recognition. And if it identifies, oh, there's a pedestrian
40:32
there's a person, it does what it needs to do, like braking or like driving around it or whatever, right
40:38
to save that person or to have the least, on the best positive, the best possible outcome
40:45
of their situation. But what the study found out is that dark skinned pedestrians
40:51
are less likely to be detected by these autonomous driving cars. And what that means in detail is that if the car is driving
41:00
and there is a pedestrian and it would lead to an accident, the chances for the car circumventing that accident are higher if you are white
41:12
And if you are dark-skinned, it is more likely that the car just drives over you, right
41:17
And potentially kills you. And that is a case where machine learning has so much power
41:28
and the data fed into the system has so much power over people
41:34
that we need to ask ourselves the question whether that is the correct way to deal with that
41:42
and whether the systems are mature enough to do that. So in summary, I remember the time when I was a child
41:53
and my friends were doing something and my mother said, you can't do that
41:56
and I would tell her, hey, but all my friends are doing it. My mother was not impressed, and I still couldn't do it
42:03
But for machine learning models, that's exactly how it works. Machine learning models look at the data, what everyone else is doing
42:12
and then does exactly the same thing. Now, you're looking at me, I'm telling you, hey, machine learning is everywhere
42:20
and machine learning is failing a lot and has horrible consequences. I'm telling you the world is on fire
42:28
But are we lost? And luckily we are not lost. Otherwise it would be a very dark presentation
42:36
There are things that we can do. And one thing is, and that has become a little bit more popular
42:41
over the last couple of years in the industry, is that we need to have explainability
42:48
as a part in our model selection process. Like having an understanding of how the model, why the model works in a way is an important part in deciding what model we want
43:02
And I can give you an example. There's this deconvolution method that helps you understand how this deep learning often for image recognition is mostly convolutional neural networks
43:16
And it shows you how these layers in that network actually identify images
43:22
And you can see here on the bottom, the first one is just some pixels
43:26
And then the second layer identifies edges already. And then the third layer has like, you can see this
43:32
noses and eyes are identified and then so on. And for each layer you get like an idea of what is identified And then you can better understand the decision making of that model In general we need to move away from this typical machine learning approach where you just have data and your model and then that it
43:52
Into a situation where we have interpretability and we have human inspection where we can actually see, okay, this was the right decision
44:03
and we need to understand why that decision happened. And then we need human interaction to actually see
44:11
okay, that was a good one, that was the right one, that was the wrong one, and then feed that information back into the system
44:17
so that it learns from that again so that it doesn't make the same error again
44:22
And there's one technology that I like a lot, which is this, it's called LIME
44:28
which stands for Local Interpretive Model Agnostic Explanations. It basically applies to all machine learning models and it explains why they work
44:36
And I'll show you here how that works. And there's this picture and it got classified as a tree frog with like a 0.54 accuracy
44:49
And what this LIME does is it looks at the data and looks at the prediction and then says
45:00
what if I throw away parts of the data? How does the prediction change
45:05
And we can see here in the middle, there's parts of the picture are taken away
45:09
they're grayed out. And then this new image with the grayed out parts
45:13
gets classified again. And now it still says it's a tree frog
45:19
And that means that the data I removed was not important for this model
45:25
to classify that image as a tree frog. And you can see in the middle, in the center middle, that there are some other things are grayed out and you can see the model does not identify a tree fork anymore, which then means that the data, at least part of the data that got removed, was responsible for the model identifying this as a tree fork
45:50
And that is extremely helpful to understand why machine learning model work in the way that they do
45:56
Another thing is more diverse teams. Like the world is not as, and I've mentioned that before
46:04
I've been in contact with a lot of different people and they have a lot of different perspectives
46:10
And I come back to the example of the autonomous driving car. Maybe if the team would have been more diverse
46:18
that trained these models, maybe the model would have been trained with more diverse set of pictures and would be able to identify dark-skinned pedestrians
46:30
equally good compared to white-skinned pedestrians. Then there's another thing that is typically helpful when you start a data-centric project
46:42
or an artificial intelligence project. This is the Data Ethics Canvas that gets published by the Open Data Institute
46:51
which basically guides you away a couple of questions. It's like, what are the limitations of my data source
46:56
Who I'm sharing with? And then these are a couple of questions that you should ask yourself
47:02
Removing data can be helpful. It's not as simple as I make it sound here
47:08
But if you are building a system where racial or ethical background does not matter
47:16
then don't train your system with that data. I think that is a very obvious thing
47:20
It's not as simple. And we saw with the compass example, none of the questions were, what is your skin color
47:27
And yet it discriminated against African-Americans because they were quasi identifiers. And that basically means if you pick other data points
47:39
you can figure out more about the data set. Like if I don't tell you my name
47:45
but I give you my GPS coordinates where I live and where I go to work, then you can pretty quickly figure out
47:51
that data is about me, right? And these are quasi identifiers. So that's a little bit more tricky to figure these on out
47:57
And then, but the general idea is if you have data that you probably sure don't need, maybe don't include it
48:06
And then there is this tool from Google, Google facets, which helps you visualize your data set
48:12
And you can quickly see, okay, I only have 20% female, then you clearly know that this group
48:20
is underrepresented in your data and you would need to change that
48:24
And then there's this what if tool which also helps you understand machine learning models
48:29
significantly better. Right, I hope that explained the problem that we are facing nowadays a little bit
48:37
where machine learning gets deployed widely and does a lot of things that often fails
48:42
And I hope I showed the two main reasons why it fails and I hope I showed a couple of ways
48:47
so we can prevent that in the future. What an amazing session, Aiko
48:54
I mean, there was so much to take away this 40 minutes session, right? Let me go ahead and quickly ask you
48:59
you had so many examples to give away, right? With so many of these real world scenarios
49:04
how long it took you to build this presentation? Yeah, so I mentioned that I'm passionate about ethics, right
49:13
And I'm passionate about it. So I had built this up over a couple of years actually
49:18
and I just converted it all into one slide deck. And I also have, so there's
49:25
so the company I work for ThoughtWorks, there are a lot of people who share this understanding
49:32
that ethics is an important part, especially of technology. And I have a lot of colleagues who
49:37
there are lots of discussions around these topics. So there's typically a lot of information
49:43
our internal channels as well um about these topics so yes i think there were so many examples but i
49:50
do remember a couple of them that the paper right the thing but i got scammed the people said that's
49:55
great the indigo one right you were to talk about india right so definitely that that was good too
50:02
and i think uh the the self-driving car that he did talk about why it's been very racist and he
50:07
also gave the he talked about the examples that you have the problem and towards the end he also
50:11
to give the solutions that, hey, you can remove the data, that you can be very much diverse
50:16
So I could, it definitely looks like you have added a great amount of time for this type content, right
50:22
We don't have much questions coming from the viewers, but definitely an amazing session
50:27
What do you suggest, Aiko? I mean is there any way people who are just watching us now or maybe the recording session how would people would go and connect you and maybe find you in social media platforms Is there a way to do that
50:40
I think you can most likely find me on LinkedIn, actually. I'm not very much on any other social media platform
50:49
But I'm on Medium and LinkedIn. So on LinkedIn, you can contact me and on Medium, you can follow me
50:55
I sometimes post blog posts when I find the time to write something. Yeah, yeah, definitely. So I think medium is good, right? Medium would be something to go ahead and read out what you write. And I could just talk about you did share some of the resources towards the end, right? I mean, like what if resources and all that. What you would suggest to someone who just, you did also mention about, hey people, breast cancer is that people would always go and get started with working, right
51:23
What are some of the things or maybe the best practices you would suggest to the guy who are moving in the field of AI
51:29
There are some developers like 10 to 15 years, even though they're moving it right. So what you would suggest to them, what would be some of the best practices coming from you
51:38
I think, yeah, I think the most important thing and the industry and especially the machine learning industry is already doing that a little bit or is moving into that direction
51:48
The first and most important thing is to realize these two issues, to realize that a lot of the data has social biases and that need to be addressed
52:00
Otherwise, the model we're building will be as biased. And the second thing is having this explainability, really understand why your model is behaving in a way
52:14
I think these are the two things that you need to, when you want to build an ethical model
52:20
or actually when you build any machine learning model, you should always consider these
52:25
These should always be part of your development. And I mean, you can build up some metrics
52:32
You can have diverse testing. I think there are, I showcased a couple of tools
52:38
There's Lime, for example. You can definitely use that. Use that. There's no issue with that
52:42
there are other examples um and probably if you watch this in in one year maybe there are better
52:49
examples like the the industry is moving quickly right there are more things coming up i think the
52:54
important part is realize that there are issues and that you need to address that and and then
53:00
um read up on that do your own research find the tools um that help you do that and then you can
53:05
can build a sustainable machine learning model that's that's perfect echo i'll ask i quickly
53:11
asked one last question right maybe maybe a little i know it's very late at your place right almost
53:16
like 11 o'clock uh so we need to talk about that one should be very much you know i have
53:21
have thoughts and building machine learning models but when you also talk about the the ash ai are
53:28
there many many of these cloud platforms and the other platforms available that provide this you
53:32
know this ai api is like face recognition and all that stuff like you it does provide very easy for
53:39
developers coming from the entire ecosystem to go and just call an API and make their application
53:44
intelligent but how should one manage this all this biasness in that because then they don't
53:50
have the capability of changing the data right what should one do and is it is it okay to just
53:55
rely on these resources because you just cannot manipulate the data? Great question. So the
54:03
I would hope that the people building these systems do have some kind of understanding
54:11
And I think these, like the problem, I mean, there was, so that is something in general
54:18
there was always bias, right? Humans are also biased. The issue comes up when you have one
54:25
bias system that you can, and that is the world we live in, you can deploy it to a billion users
54:29
with the click of a button. And that's where the scale that we have nowadays
54:33
at our hands with the technology is where it becomes dangerous. And I would assume that if you look at the cloud providers
54:41
that have these features, I would assume they are smart people and I would assume they try to do their best
54:49
They're probably not perfect. I mean, nobody is. But I think they are working hard on making that good
54:57
But the thing is, if you don't control the data, there's not much you can do
55:04
If you do control that, then you should definitely have a look at that
55:09
Right. But then. Yeah, I would argue that probably if you use like a machine learning as a service or AI as a service and you see how it goes wrong, it would probably be good to share that with the developers
55:24
something yeah definitely i think yeah i mean as he definitely said no one can be perfect and we
55:29
have seen in in the past that uh all this you know um just to mention the apple uh idea i mean
55:36
it has seen a lot of racism right so um you did talk mention about that in your presentation and
55:42
hey if there's any guys watching from there please check out this session need to have some ethics
55:47
even in the air service bar that was an amazing session i think you're a very fun person right
55:52
your session was really amazing one of its very very uh own kind and we have had it on this uh
55:57
live show we are really glad honored and humbled to have you i stephan samon on behalf of entire
56:03
c sharp corner committee would like to thank you for your valuable time and your contribution to
56:07
the community and we would literally love to have you back with some amazing presentation a couple
56:11
of months or whenever you're available i know you are running uh very busy these days and you've got
56:16
a sign for us so thanks so much and just allowing you to go and sleep it's too late for you thank
56:20
Thank you so much. Yeah, thank you very much for having me and letting me sleep soon
56:26
All right. We're going to send those videos by, the intro videos. You loved it so much
56:31
We're going to drop you in the email. Bye-bye, Eikar. Good night. Take care. Bye. Thanks. Bye. Bye