The following database project will create an educational attainment "demand" forecast for the state of California for years greater than 2010. The demand forecast is the expected number of population who have obtained a certain level of education. The population is divided into age groups and education attainment is divided into different levels. The population of each group is estimated for each year up to year 2050. Implement the following steps to obtain and educational demand forecast for the state of California. The files can be downloaded below.
1. Create a ca_pop schema in your MySQL database.
2. Using your ca_pop schema, create an "educational_attainment" table which columns match the columns in the Excel spreadsheet "ca_pop_educational_attainment.csv".
3. Using your ca_pop schema, create a "pop_proj" table which columns match the columns in the Excel spreadsheet "pop_proj_1970_2050.csv".
4. Using the data loading technique for a csv file you learned in Module 1, load the data in "ca_pop_educational_attainment.csv" into the table "educational_attainment".
5. Using the data loading technique for a csv file you learned in Module 1, load the data in "pop_proj_1970_2050.csv" into the table "pop_proj".
6. Write a query to select the total population in each age group.
7. Use the query from Step 6 as a subquery to find each type of education attained by the population in that age group and the fraction of the population of that age group that has that educational attainment. Label the fraction column output as "coefficient".
For instance, the fraction of the population in age group 00 - 17 who has an education attainment of Bachelor's degree or higher is 0.0015, which is the "coefficient".
8. Create a demographics table from the SQL query from Step 7.
9. Create a query on the "pop_proj" table which shows the population count by date_year and age.
10. Use that query from Step 9 as a subquery and join it to the demographics table using the following case statement:
demographics.age = case when temp_pop.age 18 then '00 to 17' when temp_pop.age 64 then '65 to 80+' else '18 to 64' end
"temp_pop" is an alias for the subquery. Use the following calculation for the demand
output:
round(sum(temp_pop.total_pop * demographics.coefficient)) as demand
Output the demand grouped by year and education level.
Write each query you used in Steps 1 - 8 in a text file. If a query produced a result set, then list the first ten rows of each row set after the query. Bundle your lessons learned report, your queries and your query results text file, and your MySQL query explanations from both before and after adding table indexes into a single zip file and submit that zip file as your final portfolio.
Comments