Build AI-Enabled Apps with .NET MAUI and Azure Cognitive Services by Aditya Oberai
11K views
Nov 1, 2023
Code Repository: https://github.com/adityaoberai/CSharpCon22Demo C# Corner - Community of Software and Data Developers https://www.c-sharpcorner.com #CSharpCorner #CSharpConf22
View Video Transcript
0:00
I am Aditya, I am currently a developer advocate at AppRite, I also work as a hackathon coach
0:19
with major league hacking and I have been a part of Twilio's Champions program for a
0:23
while now. And you know, just aside from that my free time I love working with .NET, I love learning
0:28
more, exploring, working with APIs, working with cross-platform apps. And yeah, that's a little bit about me
0:36
Now, before I jump in, I do want to mention, if anyone wants to click pictures and post
0:40
about them, clicking and sharing is highly encouraged, so if anyone wants to do that
0:44
please feel free to do so. That's going to be totally all right. Now, if I talk about the session that we have here and the content we're going to discuss
0:52
I think the session title was rather self-explanatory, but just to cover the agenda once
1:00
we're going to be talking a little bit about cognitive services, the suite of services that this offers
1:07
the sort of APIs it offers. Aside from that, we will bring a special focus
1:11
towards the computer vision API, and that is primarily because the demo we have
1:16
will comprise the computer vision API. So I'll give a little brief introduction to that as well
1:22
Aside from that, we will discuss what .NET MAUI is, what it enables you to do
1:29
I will cover that in brief rather than in detail because I know there is a dedicated session on .NET MAUI in the C-sharp and .NET track as well tomorrow if I'm not wrong
1:38
So I will give a brief introduction to that. And after that, we will go ahead and explore the example app that I have that you can look at as well
1:47
So, yeah. First up, let's talk about Azure Cognitive Services. Just before I jump into this, is there anyone here who is aware of what Cognitive Services is
1:58
We have a few folks here. Have you tried it out in the past as well
2:04
Have you tried it in the past as well? Okay, nice. What about you, sir
2:11
Have you had a chance to work with Cognitive Services in the past as well? Yeah, I've been able to talk with Cognitive Services
2:19
Oh, wow. That's amazing. That's actually wonderful
2:31
Thank you so much for sharing that. Now, it's great to hear about your experiences
2:36
but I do believe for a lot of folks, this might be their first introduction to what Cognitive Services is
2:41
Let me hop ahead and tell you a little bit about that. So Azure Cognitive Services is, well
2:48
a cloud-based set of services with REST APIs and SDKs that lets you build in cognitive intelligence
2:55
into your applications. In simple words, what that means is you can add cognitive features
3:01
cognitive features meaning abilities around speech, vision, and so on. And you can do this and add these features
3:11
into your applications without actually having any AI or data science skills
3:16
which is actually really interesting, right? Because for me, I like working with APIs
3:20
but I can't figure out the statistics behind a lot of this work
3:24
That's beyond me. But, you know, that's exactly what Cognitive Services enables
3:29
As I've already talked about one of the benefits, let me jump into a few more that Cognitive Services has
3:35
First thing, it is customizable. You can create customizable pre-trained models with the AI research
3:42
You know, it's built on the foundations of the AI research that Microsoft has been doing for a while
3:47
It's really, really convenient because you can deploy cognitive services anywhere across the world
3:54
whether it's, you know, from the cloud, whether it's on the edge. The good thing is, like, for example
4:00
if you ever look at cognitive services or with other Azure services, you would notice they've got
4:04
different regions available, right? Yeah, you can essentially deploy cognitive services resources
4:10
across any of those. Next up, as I've been touting from the beginning
4:16
you need absolutely no ML expertise. I, for example, have absolutely none
4:21
and yet I have been able to build ML-enabled applications, AI-enabled applications using cognitive services
4:27
because that's how easy it makes life for you if you ever want to do that. And last but by no means the least
4:35
cognitive services is developed with the strict ethical standards and it does prioritize on empowering
4:41
and enabling responsible usage of AI across the industry. And that has been a priority that Microsoft has had for a long while
4:49
So it does follow very, very strict ethical standards that way. Now, I did mention that, you know
4:56
cognitive services is a suite of cognitive features, right? Let's learn a little more about what these categories
5:04
what these types of services are. Like, for example, first up, we've got vision
5:11
And, you know, you can imagine that we'll be talking about vision because I already mentioned we'll be touching the Computer Vision API, right
5:17
But, you know, even aside from the Computer Vision API, which they use to yze content in images and video files
5:23
you've also got Custom Vision, which lets you actually create customizable image recognition models
5:30
You've got the Face API, which is used to detect emotions in faces of people
5:35
And, you know, aside from that, it can be used to check whether, you know, you've got the same face across
5:40
so fun fact, Uber actually used the Face API for onboarding of their drivers
5:47
for their authentication and recognition and that's a case study you can actually check out
5:51
on the Microsoft website right now as well then of course we've got speech
5:55
which includes speech to text transcribing audible speech you've got text to speech similarly
6:00
speech translation is available speaker recognition is also now available where you can actually detect people based on voice
6:08
that's a service that has been made available now as well. They've got the language APIs
6:13
which features language understanding, or LUIS, which lets you build in natural language
6:18
into your applications. What that means is you could essentially build specific pieces of data
6:24
There is a content moderator, which detects offensive or unwanted content. There is a personalizer
6:30
which is used for creating personalized experiences across user bases. If there's anyone working in the customer data space as well
6:37
that might be something that you've heard of. The search APIs are there, which actually feature
6:42
a variety of Bing-based APIs, ranging from News Search to Image Search
6:49
Web, AutoSuggest, and so on. So that with cognitive services is actually a really, really vast suite
6:57
and it's very, very handy to use. Coming to the API that I mentioned, the API in focus
7:05
that is, the computer vision API, This is one that we're going to be looking at later on as well
7:10
And Azure's computer vision service here, it does give you access to algorithms that process images
7:16
and return information based on them. One of the biggest features of, you know
7:22
like, of the computer vision API is their optical character recognition API or OCR which lets you extract text from images And you know with the new Read API they have you know within their OCR offering
7:37
you can not just detect printed text, but also handwritten text, which is incredibly difficult
7:42
because, just for context, is there anyone who knows why you can never get a perfect
7:48
handwritten text match anywhere? If you ever try OCR on handwritten text, it's impossible to get
7:55
a perfect match 100% of the times. Exactly. You can't have all of this text
8:03
available across the internet, right? It's going to be different. So, you know
8:09
the read API is surprisingly accurate with the handwritten text as well. And it doesn't just support English
8:15
but multiple languages beyond as well. Then, you know, after that, we've got image ysis
8:20
which lets you extract features from images such as objects, faces, text descriptions and so on
8:28
which is, again, very, very convenient if, you know, you're working with an application
8:32
Let's say, for example, for folks who might be differently enabled, right? Something like image ysis
8:38
could really, really help you build tools to work with data in real time
8:42
and extract and share that information. The Face API does technically fall under
8:47
their computer vision offering as well, and it allows you to detect, recognize
8:52
and yze faces in images. And, you know, as we mentioned, that Uber has used this for the drivers as well
9:03
And, you know, even aside from that, facial recognition software is pretty important across scenarios
9:08
such as identity verification, touchless access, face blurring for privacy, right? That's something that you could enable with the face API as well
9:18
if you're working on any privacy-centric applications. And then last, but not the least
9:23
We've got spatial ysis, which lets you yze and presence movement of people
9:28
distancing and so on across video feeds. And, you know, based on ysis of space, essentially
9:36
it will give you responses and produce events that you can have other systems respond to
9:42
A few things about the computer vision API are that if you are working with images
9:47
the image does need to be one of the conventional formats like JPEG, PNGs, GIFs
9:53
or bitmap BMP files. At this moment, you do need the file to be less than 4 MB
9:59
And the minimum size of the image should be 50 by 50 pixels
10:05
For the read API that is capped at 10,000 by 10,000 as well
10:09
which probably most of us aren't actually hitting in most cases, if I'm honest
10:14
So I think it's a fairly good range for you to work with if you have a plan to
10:21
Having talked about cognitive services, Let's talk about MAUI. Quick question. Is there anyone who's aware of what .NET MAUI is
10:30
Is there anyone aware of what Xamarin was, by any chance? Yeah, yeah
10:40
So Xamarin basically was a cross-platform framework that let you work and build applications across
10:47
And the thing with Xamarin was, it was initially its own organization, its own entity
10:52
and it used mono, but Xamarin's internal runtime and solution was basically built on .NET Framework 4
11:00
And, till last year, that was what Xamarin used. This year though, Xamarin has entirely been merged into the .NET SDK
11:11
and is now called MAUI, or, you know, Multi-Application UI. So, and then you know with multi-platform application UI, that's the full form
11:23
And in its essence, it is a cross-platform framework that lets you build mobile and desktop applications with C Sharp and XAML
11:31
So, if anyone's aware of Flutter or React Native here, chances are that there might be a few folks aware of that
11:37
MAUI is another solution you could work with, which lets you work with a C Sharp code base
11:42
which of course has a lot of its benefits, primarily being that it has a C Sharp
11:47
and a .NET-based code base, right? Because the C Sharp community, the C Sharp ecosystem is so extensive
11:55
it lets you work across systems and solutions, and it will definitely enable your team a lot more
12:00
if you're working with .NET already. And with MAUI right now, you do have support for Android, iOS
12:08
Tizen, Samsung Tizen, you've got Mac OS, you've got Windows, so you could essentially build applications across all of these platforms
12:16
and essentially allow right ones run anywhere experience, right? Currently, MAUI does support .NET 6, and I know .NET 7 is in preview
12:28
so by the time .NET Conf happens later this year, .NET 7 support should also be rolled out
12:34
and, you know, they've got individual frameworks across, you know, the MAUI offering
12:40
So just for example, we had Xamarin.iOS to build iOS apps. You've got .NET for iOS now
12:47
Xamarin.Android is now .NET for Android. You've got .NET for Mac OS as well, which uses Mac Catalyst
12:53
And for Windows applications, it is using WinUI 3. So all of these frameworks all use the .NET 6 base class library
13:02
And this will allow you to abstract away the underlying platform entirely from your code
13:10
And it's going to use the .NET runtime to provide an execution environment. I'm not going to go into too much more detail
13:16
because I know there's going to be a dedicated session on this as well. But that being said, .NET MAUI is definitely very handy to work with
13:23
If you've worked with Xamarin, you will like this a lot more. There has been a massive focus on developer experience with MAUI's upgrades
13:31
Resource management has become far, far easier as well. You can now use declarative UIs as well
13:37
That's entirely an option. So MAUI is definitely a really good solution for you to work with
13:43
if you want to work on cross-platform applications. Now, I did mention that, you know
13:51
after covering all of these topics, we're going to have a little example app
13:55
that we'll work with as well, right? So for this application, we're going to basically be building an application
14:03
that clicks a picture of any piece of text and extracts and shows it in the app
14:07
Now, for this, like, I do have a GitHub link that I will be sharing with you just after this
14:13
so if you want to follow along, you can do that, and you can work on it later on in your own free time as well
14:18
In the meantime, though, in order to build this sort of an app, the prerequisites that you would need would be an Azure account, of course
14:25
and you would have to create a computer vision resource on your Azure portal
14:29
It could be a free resource. You could go with paid. Either way is fine. Whatever works for you
14:35
We, at the end of the day, essentially just need the API key and the endpoint that it offers
14:40
Next up, the MAUI application that we have is .NET 6 based, so it would be good to have
14:46
.NET 6 installed on your systems as well. And we would need the latest versions of Visual Studio IDE for that
14:54
Just to clarify once again, it is the Visual Studio IDE and not VS Code
14:58
I know that is a question that comes across a lot but at the moment in order to build in MAUI applications you will need the IDE VS Code will not do So yeah the IDE is going to be necessary
15:08
And within that, so if anyone's worked with Visual Studio IDE, you would probably know that it offers quite a variety of workloads
15:15
whether to build internet-based applications, whether to work with Azure Cloud Services and build those SDKs
15:22
whether to build desktop applications. There is similarly a workload for MAUI applications as well
15:28
And that's another thing you will need because it's going to not just add all the necessary dependencies for you to build a MAUI application
15:34
It will also give you a vast variety of templates that you can use across all of these platforms
15:41
So those are the prerequisites if you want to work with a MAUI application and, you know, at least build this particular one
15:49
So next up, firstly, of course, we need to create a computer vision resource, right
15:54
you can do that by just going ahead to your Azure portal and searching and
15:59
Once you do that you probably come across one of these pages where you've got a mention a resource name
16:03
It'll ask you for your resource group your region and so on. It's a very simple setup in general
16:10
pretty straightforward The only thing I will you know Recommend you all to keep an eye out on is is that whenever you're creating a computer vision resource
16:19
You could just directly go ahead and review and create it after this first page
16:23
after you've checked the responsible AI notice. But make sure to give a look at the networking settings once
16:29
because some of your applications you will want to expose across public networks
16:35
Some cases you might not, right? And it's good to be wary about that. If you're working on a private network
16:40
and you want to make sure that your computer vision resource is not exposed publicly, just be mindful about that
16:48
That being said, I've got the GitHub link here before we talk about the applications
16:53
So if anyone wants to get that open, I've got the QR as well and the GitHub link
16:58
So whatever works for you should be good. I'm going to take 15 seconds or so
17:07
just to make sure everyone's got this captured and ready. Yeah, so once that is done
17:21
once you've got this repository open, you should see an application there
17:25
which is built and ready. Just an easy way to understand the application
17:31
is if you enter the C Sharp Con 2022 demo directory within that repository
17:37
you will come across quite a few different files, one of them being mainpage.xaml
17:44
and mainpage.xaml.cs. That is where we've written the code for running these functionalities
17:51
That is the landing page, the first page that we come across
17:55
So at the moment, I would recommend you folks to take a look at that first
18:00
before anything else, because that's going to help you follow along with the examples and the code that I show you after this
18:07
So the XAML essentially is called Extensible Application Markup Language. It is an extension of the XML format you might have come across
18:18
and it is a markup language that lets you design UIs for applications using MAUI or WinUI or in the past UWP as well
18:29
You also used XAML across WPF and its later editions and so on
18:34
So XAML is a markup language that lets you design a UI. For this application, the UI I designed was pretty simple
18:40
It's just a picture along with a capture image and an extract text button
18:45
The picture is the picture you will be clicking through the application, and the extract text
18:50
button will extract the text and give you the response. Right. Along with that, we also have a helper file in this application under the helpers folder
19:01
which is basically where you would add your API key and endpoint
19:05
And this is just for reference because we're going to be looking at code later on and referencing to this
19:09
but I wanted to not hard-code my API key into the functions directly
19:17
In the future, what I do recommend, though, is in any case, in anywhere, wherever you have API keys or private information
19:25
at least with .NET-based applications or any mobile applications or so on
19:32
please do use Web API backends. That is something that I highly recommend
19:35
because you can keep this information a lot more secure and environment variables and so on
19:42
That's not necessarily as easy to do when you're on a mobile application on a local device
19:46
where you've got all the information stored, right? So, yeah, that's a recommendation from a security perspective
19:54
Coming to the main page.xaml.cs, the first major function you will see
19:59
is the capture image function, which uses the media picker in MAUI to capture a file
20:06
If anyone was a Xamarin developer, you may have been aware of the media plugin by James Montemagno
20:12
which basically let you capture pictures or pick out files. Now, with Maui, they actually merged that functionality
20:20
into Maui's Essentials. So you can actually do that directly with Maui's inbuilt libraries
20:29
So it is pretty simple that way. The media picker lets you capture a file
20:34
and after that, what we have just done is we have created a local path
20:38
and saved the bitstream of this image locally to ensure that this image can be accessed later on as well
20:47
The one thing I will note is the file path variable at the bottom, that is a global variable across this class
20:53
and that is just there so that we can store this file path
20:57
and use it later on across another function because we don't have a direct call from this function
21:02
into this extract text function. I kept a dedicated button and an event for that
21:08
So we are keeping the local file path stored separately in another variable in the meantime
21:16
But otherwise, this code here is majorly boilerplate. You can actually go to the .NET MAUI documentation
21:22
and find this as well. Next up, we've got the extract text functionality
21:28
which is using the computer vision SDK here, which is available for .NET
21:33
So what here we're going to do is we're going to use the credentials of our cognitive service resource
21:39
and we will authenticate our client first with that. The SDK has a provision for that, as you can see at the top
21:46
We've got the Computer Vision client, which is being authenticated with these credentials
21:51
Now, after that, the Computer Vision API has a two-step flow. The first step of this flow is to send the image data over to the API
22:03
it is then processed and sends back a header which contains the location detail
22:09
which you can then again hit later on to get back your response
22:13
So there is a two-step approach here with the computer and vision API and that is basically because it needs more processing time
22:20
So it chooses to keep this function a little more async and they've stored the response at a separate location
22:29
We got our thread going off to sleep for a small little period of time and then we send a call back again So that is something to keep in mind whenever you use the computer vision API It is going to be a two approach
22:41
It's not just going to be you hit the API and you get back the response. So in the first step, we essentially send the
22:49
file, the byte stream, you know, the byte added the stream of this
22:53
of this image that we've clicked and we send that over to the computer vision API
22:59
We get back a location which is stored in the operation location string there
23:05
We then put the thread to sleep for just a couple of seconds
23:09
while we wait for this to end. And after that, we again hit the API with this location in mind
23:15
to get back our response. After that, we've just gone through this entire response
23:20
that we've received in the text URL file results. And then we just iterate through this
23:27
to get back all the different lines of, you know, extracted from this text
23:32
So if we look at this application, the way it's going to run is you capture an image
23:39
We've got the text there. It's going to click a picture. You know, you can click a picture, and the moment you extract text
23:45
just wait for a couple of seconds, and you will see a display alert pop up
23:50
which has our output there. So yeah, it's pretty simple that way, you know
23:56
It's a very comfortable experience, and just with a few lines of code, we've basically built an application
24:01
that can extract text from handwritten stuff. Another fun solution that I just wanted to showcase
24:10
aside from the Computer Vision API, was one that I built with the Face API a little while back with Xamarin and Face API
24:16
and what this did is it clicked a picture of a person, you know, took their facial information
24:23
extracted the predominant emotion, and then placed an emoji on top. So that way you can actually build really, really fun and handy solutions with the face
24:32
API and with cognitive services in general. And by the way, the GitHub link for that is available there as well
24:37
So if you want to look at that, you can do so later on
24:42
So yeah, those were a few fun use cases of cognitive services and how you could use those
24:48
with MAUI and build AI-enabled applications. And that's about it for me
24:54
Thank you so much. By the way, if anyone wants to connect with me, I've got my Twitter handle there as well
25:07
You can look at that if you want. I've also got my website which has all my other socials
25:11
So feel free to connect with me over there. Does anyone have any questions in the meantime
25:17
Yeah, I'll just take this and then you. I am Arun from Bhartavidyapeet
25:24
So basically I have a question like how is the cognitive services is different from OpenCV and MediaVibes
25:30
Like we have few inbuilt libraries also in Python or MediaVibes. So like for face recognition, it's a Cascade classifier
25:40
So like we can easily detect the face using Cascade. So how is the cognitive services is more easy and beneficial for us
25:50
So just a question because I'm not a Python developer myself. if you're working with OpenCV, would you have to define the logic for extraction of this information
25:59
detection and so on on your own? Manvili, just a question there
26:03
No, no. Like, we can simply call out the function from the library
26:07
Inbuilt libraries we are having. So, as I told, like, for detecting out the face, we have cascade
26:14
So, we just have to call the library and we have to put the image path
26:19
and easily we can detect out the face. Okay. Are these libraries available across languages other than Python
26:28
Yes. It's available in C sharp, Python. And I might guess it's also available in C
26:35
And that's particularly for, and this is dedicated to computer vision based, right
26:40
Yeah, yeah, absolutely. So with OpenCV, I personally haven't done a benchmark across, and you know, comparison across handwritten text, I think that would be something to compare with
26:49
Beyond that, because Cognitive Services is a vast suite that goes beyond vision
26:54
and I think a lot of the services involved here, whether it's LUIS
26:58
which lets you enable and implement natural language processing, or if we're talking about the decision APIs
27:04
you probably wouldn't get as simple an experience as consuming REST APIs across all of these
27:10
OpenCV is one example, which is pretty cool, but you may not get that same experience across this entire suite
27:16
so that way I think with cognitive services it will make the friction point
27:23
between implementing AI within your applications a lot simpler across a variety of use cases for a lot of people that way
27:30
Answer the last question like could we change out the algorithm working
27:34
behind the cognitive services? So I do believe cognitive services offers you
27:39
customizability to an extent yes there are a few different models that they have
27:44
worked on in the past I would say that you know that is something that we can
27:49
explore once it should be possible and there are some services for sure that
27:53
let you are trained a lot more manually right like when when I talk about Louise
27:58
and with the pot maker I know that lets you take a lot more custom input custom
28:02
vision purely takes your own information for you to build a custom model there so
28:08
in quite a few services yes there is more customizability in some relatively
28:13
a little lesser. Okay, sir. Thank you so much. I think I
28:17
saw a question from you as well. Hello, I'm Akshay. So, my
28:23
question is, how are MAUI apps better than progressive web apps? Or
28:27
is this like a C++ versus Python or Java or something sort of question? So, with
28:31
a progressive web app, you're essentially taking the web experience and allowing that on your local systems
28:37
right? With MAUI, you're building fully native applications. So, it's a little more closer to
28:43
building something like a dedicated native Android app and so on. So these aren't hybrid apps
28:48
Although you can do that with MAUI now. With laser support, you can actually build hybrid apps as well
28:52
similar to what you would be doing with Electron. But using MAUI this way will let you
28:58
and provide you a dedicated native experience. So if you want to work with native APIs on your system and so on
29:05
MAUI will enable you to do that. Yeah. Thank you. We had a question there as well
29:10
I think we can take this one and then probably move forward. So if anyone asks further questions, we can take those afterwards as well
29:16
But I'll take this one last question now. Thank you. So how is the support for this OCR cognitive service in Indian languages
29:26
So does this work on Indian languages? I do believe that they have support beyond English as well
29:35
So there should be support across quite a few vernaculars. But I will have to check which ones
29:39
I'm not 100% sure on that. But yes, I do know that they do claim
29:45
that the Read API works beyond English as well. So we can definitely explore that
29:49
I'll actually look into that and get back to you once a little later today as well. If that's all right
29:53
Thanks, that works. Yeah. Awesome. Well, thank you so much. And hope you all have a fun conference
#Distributed & Cloud Computing
#Enterprise Technology
#Intelligent Personal Assistants
#Windows & .NET