Description :
This dataset provides information about the room's environmental factors such as temperature, humidity, light, Co2 Humidity ratio and occupancy. We can use this dataset for predicting occupancy in an office room. There are three dataset available, one for training and two for testing the models considering the office door opened and closed during occupancy. The target variable occupancy 0 and 1,
Recommended Model :
Algorithms to be used , Random forest, svm’s, GaussianNB classifier, Decision tree Classifier, Logistic regression etc
Recommended Projects :
Dataset for Predicting room occupancy using environmental factors.
Dataset link
Data set Link : UCI MLR - https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
Overview of data
Detailed overview of dataset
Records in the dataset = 8143ROWS
Columns in the dataset = 7 COLUMNS
Data is provided with date-time information and six environmental measures taken each minute over multiple days, specifically
date : - Data is provided with date time information and six environmental factors taken per minute over multiple days (year-month-day hour:minute:second)
Temperature : Room Temperature, in Celsius.
Humidity : Relative Humidity in percentage
light : Light measures in Lux
CO2 : Carbon dioxide measured in parts per million (ppm)
HumidityRatio : Humidity Ratio, Derived quantity from temperature and relative humidity, in kilogram of water vapours /kg of air
Target variable :
Occupancy, 0 or 1, 0 for not occupied, 1 for occupied status
EDA[Code]
Dataset
import pandas as pd
#Load data
file_loc = "data\\datatraining.txt"
occupancy_data = pd.read_csv(file_loc)
occupancy_data.head()
Total number of rows and column in the dataset
r_c=occupancy_data.shape
print("Total number of record in the dataset : ",r_c[0])
print("Total number of columns in the dataset : ",r_c[1])
Check Details
# Data information
occupancy_data.info()
Check the number of missing values in the dataset
# check missing values in each column
occupancy_data.isna().sum()
Statistical information
# statistical information
occupancy_data.describe()
Data Visulization
Correlation
import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = occupancy_data.corr()
corr.style.background_gradient(cmap='coolwarm')
Box plot
num_cols = occupancy_data.select_dtypes(exclude ='object').columns
for col in num_cols:
plt.boxplot(occupancy_data[col])
plt.xlabel(col)
plt.show()
occupancy_data.set_index('date', inplace=True)
for i in range(len(num_cols)):
occupancy_data.iloc[:,[i]].plot(figsize=(15,5))
plt.xticks(rotation=45,size=10)
plt.yticks(size=10)
Tempreture
Humidity
Light
Co2
Humidity Ratio
Other related data
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.
Comments