Create Data Pipeline in Data Factory of Microsoft Fabric
4K views
Oct 30, 2023
This video shows how to copy European Centre for Disease Prevention and Control Covid-19 cases dataset and ingest into Fabric Lakehouse for perform analysis.
View Video Transcript
0:00
Hello everyone! In this video, I'm going to show you how to create a data pipeline in Data Factory of Microsoft Fabric
0:14
So let's get started! Basically, a data pipeline allows us to copy, move or transform data from its source to its destination
0:23
Let's see how to do this in the Data Factory So this is Microsoft Fabric welcome page
0:30
I'm going to click on this Power BI and create a workspace
0:34
So click on my workspaces and choose new workspace I'm just going to call it new data pipeline
0:43
And then click apply That's going to create the workspace Alright, so I'm going to switch from the Power BI experience to the data factory
0:54
Click on that and then we can see we have the data flow gen 2 which is power query online and the data pipeline
1:03
so I'm gonna click on data pipeline and Then I need to give a name for the pipeline. Let's just call it a
1:11
forced data Pipeline and then click create and save, start building your data pipeline
1:21
now we can add pipeline activity or copy data or choose a task to start
1:27
so we just want to copy data from a source to a destination click on copy data
1:32
and then in the copy data dialog box we can choose the source
1:37
now there are so many sources here from sample data in package csv json file and so on and so forth and of course you can even connect to different sources like the amazon rds for sql server we can connect to amazon s3 azure blob storage and so on anyway we want
1:58
to connect to this data covid19 data lake so click on that and choose next covid19 data file
2:08
we can see different kind of tables such as BINC COVID-19 and we can see COVID-19 tracker
2:14
European Center for Disease Prevention and Control and of course Oxford COVID-19 Government
2:20
Response Tracker we're connected to this European Center for Disease Prevention and Control
2:26
and then we can choose the format either from CSV which is 1.5 megabytes or we can choose from the
2:33
json json line or packet file we want to go with the csv so click on that and choose next and then
2:40
we need to choose the data destination we can see the first is the data source connect to the actual
2:46
data source and then choose the data destination now i want to actually load into a lake house so
2:52
i'm going to choose lake house and then click next now we can choose to dump the data in an existing
2:59
lake house if we have any we can choose to create a new lake house i'm going to just create a new
3:04
lake house and just call it covid19 data set and then click next in the next page we have the root
3:15
folder now i can choose tables or file so i want to go to the table and of course i can optionally
3:21
load to a new table or an existing table so i want to dump it in a new table and then i can rename
3:27
this default name but this is fine and very importantly we can see the column mapping so we can perform some things here we can see these are the source columns and then the type Now we can change some of them from string to date and some other data types
3:45
I'm just going to click on next. And then we can see the step
3:50
I want to copy from this file, from this location. And then we have this going to be the destination, the lake house
3:57
And then we can see the source and the sample data set, the destination, connection name as well as the table name
4:05
So we can choose to start data transfer immediately or maybe later
4:10
But this is fine, just click on save and run And then we can see gathering your new changes and start
4:19
We can see successfully running force data pipeline And then we can see under the output at the bottom here
4:26
we can see the pipeline run ID, we can see the pipeline status and then we can
4:33
see the activity name which is to copy data we can see the status the run
4:39
start the duration the input and output so just gonna wait for some few seconds
4:46
to see the activity status There we go, so we can see succeeded, successfully ran the first pipeline, that's quite amazing
4:59
So we've been able to transfer the data from the source into the lake house and then we
5:04
can go back to the workspace, yeah, you can come back here and double click
5:08
And there we go, we can see the pipeline that we established and the data is now in the
5:15
lake house. click on it and then wait for the data to load Okay so there we go so you can see covid19 underscore data set and then you can see all the tables which is loading anyway
5:32
okay so there we go i'm going to come to the workspace we can begin to even write sql in the
5:44
SQL ytics endpoints so when I click on that that's going to open the SQL
5:49
endpoints to write queries against the data so you can see loading your SQL
5:53
ytics endpoint metadata and there we go so I can click on new SQL query so
6:03
I'm going to write select star from ecdc underscore cases I can see where
6:13
country and territories equal to inside single code Australia right and then we
6:25
can terminate the statement and click on wrong okay so there we go so this is the
6:39
output of the query and then we can see there are 350 records where the
6:46
countries and territories is equal to Australia so this is basically how we
6:54
can connect from a data source and load the data into a destination using a data
7:02
pipeline in data factory so I hope you enjoyed this video if you do like share
7:07
to friends and comments thank you and bye for now cheers