This prject will give you some experience with Google BigQuery and analytical SQL queries. We will be using public datasets from NYC OpenData, a repository of free and public datasets provided by the city, from the city government’s agencies. In particular, we will be creating a data warehouse storing five tables, each uploaded from a different public dataset that concerns NYC Public Schools. You will need: Internet access, a coupon code for GCP (which you should have retrieved during assignment 0 before class began), and a GCP account (which you should have retrieved during assignment 0 before class began). Please read through this assignment and make sure you have these three requirements EARLY. Contact me if you have any trouble!
Setup: First, sign in to GCP, create a new project, and enable BigQuery. Please name your project dw-hw1- (for example, mine would be dw-hw1-erm).
Enable Bigquery
If you don't already have a google account (Gmail or Google Apps). you must create one.
Sign in google plateform console (console.cloud.google.com) and navigate to BigQuery, You can also open the BigQuery web UI directly by entering the following URL in you browser.
https://console.cloud.google.com/bigquery
Accept the term of services.
Before you use the BigQuery, you must create a project, follow the prompts to create your new project.
The project Id is a unique name across all google cloud projects, It will be referred to later in this codelab as Project_id.
Next we’re going to load 5 tables into our dataset.
In the lefthand sidebar, click on your project name and dataset name, so that you see a view like this:
In the navigation bar in the middle of the righthand side of the page, which says “data-warehousing-for-analytics:hw1” in the example image above, click “create table”.
You will see a pop up menu like below: select “Google Cloud Storage” as the data source.
C o p y t h i s G o o g l e C l o u d S t o r a g e ( G C S ) b u c k e t o b j e c t l i n k i n t o t h e “ S e l e c t f i l e f r o m G C S b u c k e t ” i n p u t l i n e :
g s : / / a n a l y t i c a l _ s q l _ h o m e w o r k / 2 0 1 8 - 2 0 1 9 _ D a i l y _A t t e n d a n c e . c s v
Under “Table name”, put Daily_Attendance
And check the box that is labeled “auto detect” to enable auto-detection of the schema of the table.
Then click “Create table” to begin the import job.
In a moment, the data will be copied from the public GCS bucket where the example data is stored (I put it there, it is a dataset open to the public) into your BigQuery instance. You can inspect the table by going to the lefthand sidebar in the BigQuery UI, expanding the project name, expanding the dataset name, and clicking on the table name. It will look like this:
You should see the table schema (at this point, you will only have one table). Run a query like:
select * from hw1.Daily_Attendance limit 1;
to test that your data was imported correctly.
Now, repeat the same steps for the rest of the five tables for this dataset:
Open the NYC Open Data link provided in the table above to inspect the data dictionary provided for each of these data sets. This will give you an idea of what each table holds
You can contact us for:
BigQuery Project Help
BigQuery Practice Exercise
BigQuery Tutorial
BigQuery Homework Help
Other services related to BigQuery Google Cloud Plateform
Send your query and get instant help at:
contact@codersarts.com
Comments