Learning Outcomes:
On successful completion of this module, you will be able to:
Demonstrate a knowledge and understanding of the theory and practice of large scale data driven application.
Apply skills to deal with the complex issues involved in the design and implementation of a reliable large scale data driven application.
Demonstrate competence by applying theoretical skills to practical problems.
Problem
You are required to build a web-based system for displaying and querying the World Wide COVID-19 data sets
A dataset consisting of COVID-19 attributes for 3718 cases can be found in the file COVID-
19.7z
This should be unpackable with archive manager and loadable into Excel/Open office
(use Unicode UTF-8).
You should examine and attempt to understand the data (there will be lab/tutorial help),
clean if needed.
Each case has the following attributes:
Field description
FIPS: US only. Federal Information Processing Standards code that uniquely identifies counties within the USA.
Admin2: County name. US only.
Province_State: Province, state or dependency name.
Country_Region: Country, region or sovereignty name. The names of locations included on the Website correspond with the official designations used by the U.S. Department of State.
Last Update: MM/DD/YYYY HH:mm:ss (24 hour format, in UTC).
Lat and Long: Dot locations on the dashboard. All points (except for Australia) shown on the map are based on geographic centroids, and are not representative of a specific address, building or any location at a spatial scale finer than a province/state. Australian dots are located at the centroid of the largest city in each state.
Confirmed: Confirmed cases include presumptive positive cases and probable cases, in accordance with CDC guidelines as of April 14.
Deaths: Death totals in the US include confirmed and probable, in accordance with CDC guidelines as of April 14.
Recovered: Recovered cases outside China are estimates based on local media reports, and state and local reporting when available, and therefore may be substantially lower than the true number. US state-level recovered cases are from COVID Tracking Project.
Active: Active cases = total confirmed - total recovered - total deaths.
Combined Key: Admin2 + Province_State + Country_Region.
Incidence_Rate: Incidence Rate = cases per 100,000 persons.
Case-Fatality Ratio (%): Case-Fatality Ratio (%) = Number recorded deaths Number cases.
Design and implement a database to store this, with a view that this is only a small sample. You need to choose (noSQL) type of database management systems and a system to implement it in. Implement a web-based system linked to the database to allow the querying and display of the data using PHP and MongoDB.
PART I:-
Requirements
DB design report to be handed in end of week 11 as a single pdf document via Canvas. This should include:
your analysis of the data, and data model diagram [25 Marks]
From your point view, which type of database (relational, noSQL) and database management systems you might think better to use with such application, and a database design. [15 Marks] (For formative feedback and evidence of work on assessment)
Apply the following queries on your database design
List the following details (FIPS, Provinces State, Lat, Long, Confirmed, Deaths, Recovered, Active) where the death cases are less than 850 and greater than 800 worldwide.
Find the number of records and the total numbers of the confirmed cases, the death cases, and the recovered cases within the country region of Denmark.
List all the details that feature COVID-19 cases in Greece and Cyprus
List 10 records’ details that feature COVID-19 cases where both Lat and Long have no values.
Find out the top 10 countries’ names with highest Confirmed cases and Death cased.
List all the confirmed cases in China including the Province State, the location (Lat and Long) details.
System Report
To be handed in Friday 11 th of December (tbc) as a final single pdf document via Canvas.
This should include:
Details of the database implementation, noting any differences from the initial design (include the original database design report as an appendix). [5 Marks]
Details of the system implementation, choice of language (PHP), overall architecture of the system (full code listings are not required)
Sample screen shots of the interface and sample query results. [5 Marks]
Discussion of the decisions made, a critical evaluation of the system and how it could be extended if required (assume further data will be added on a regular basis) (50% weighting)
PART II:-
Prototype System
Demo of the system, time slots to be arranged after hand in of System Report.
During the demo session, student will be asked on the following questions:-
Design GUI of the system (One web page written in php and MongoDB) [15 Marks]
MongoDB connectivity with PHP [10 Marks]
Importing the dataset into MongoDB [5 Marks]
MongoDB query statements [10 Marks]
Discuss the results of the queries above [10 Marks]
Using Google map to plot query statements’ results [Extra
Marks]
(50% weighting)
The coursework is the complete assessment for this module; reassessment will be by a resubmission of the report and/or revised implementation.
Field description
FIPS: US only. Federal Information Processing Standards code that uniquely identifies counties within the USA.
Admin2: County name. US only.
Province_State: Province, state or dependency name.
Country_Region: Country, region or sovereignty name. The names of location included on the Website correspond with the official designations used by the U.S. Department of State.
Last Update: MM/DD/YYYY HH:mm:ss (24 hour format, in UTC).
Lat and Long_: Dot locations on the dashboard. All points (except for Australia) shown on the map are based on geographic centroids, and are not representative of a specific address, building or any location at a spatial scale finer than a province/state. Australian dots are located at the centroid of the largest city in each state.
Confirmed: Counts include confirmed and probable (where reported).
Deaths: Counts include confirmed and probable (where reported).
Recovered: Recovered cases are estimates based on local media reports, and state and local reporting when available, and therefore may be substantially lower than the true number. US state-level recovered cases are from COVID Tracking Project.
Active: Active cases = total cases - total recovered - total deaths.
Incidence_Rate: Incidence Rate = cases per 100,000 persons.
Case-Fatality Ratio (%): Case-Fatality Ratio (%) = Number recorded deaths /Number cases.
All cases, deaths, and recoveries reported are based on the date of initial report.
Exceptions to this are noted in the "Data Modification" and "Retrospective reporting of (probable) cases and deaths" subsections below.
Comments