Introduction
Welcome to our new blog post! Today, we are excited to share a new project requirement with you, titled “Analyzing Company Performance in EU Competition”.
In this project, our aim is to explore the dynamics of EU startup competition, harnessing the power of data and analytics to dissect company performance, funding patterns, and competition outcomes. We're driven by the belief that through meticulous analysis of these facets, we can unearth valuable insights poised to redefine our understanding of the startup ecosystem.
In the Solution approach sections of this blog post, we will discuss our approach to solving this project requirement. We will walk you through our thought process, the methodologies we plan to employ, and the tools we will be using. Our goal is to provide a comprehensive solution that is both effective and efficient.
Then, we will showcase the output of our analysis, including visualizations, key findings, and interpretations of the data.
Project Requirement :
DOMAIN
Startup ecosystem
CONTEXT:
Company X is a EU online publisher focusing on the startups industry. The company specifically reports on the business related to technology news, analysis of emerging trends and profiling of new tech businesses and products. Their event i.e. Startup Battlefield is the world’s pre-eminent startup competition. Startup Battlefield features 15-30 top early stage startups pitching top judges in front of a vast live audience, present in person and online.
DATA DESCRIPTION
CompanyX_EU.csv - Each row in the dataset is a Start-up company and the columns describe the company.
DATA DICTIONARY:
Startup: Name of the company
Product: Actual product
Funding: Funds raised by the company in USD
Event: The event the company participated in
Result: Described by Contestant, Finalist, Audience choice, Winner or Runner up
OperatingState: Current status of the company, Operating ,Closed, Acquired or IPO
Dataset has been downloaded from the internet. All the credit for the dataset goes to the original creator of the data.
PROJECT OBJECTIVE:
Analyse the data of the various companies from the given dataset and perform the tasks that are specified in the below steps. Draw insights from the various attributes that are present in the dataset, plot distributions, state hypotheses and draw conclusions from the dataset.
STEPS AND TASK
Read the CSV file.
Data Exploration:
Check the datatypes of each attribute.
Check for null values in the attributes.
Data preprocessing & visualisation
Drop the null values.
Convert the ‘Funding’ features to a numerical value.
(Execute below code)
df1.loc[:,'Funds_in_million'] = df1['Funding'].apply(lambda x: float(x[1:-1])/1000 if x[-1] == 'K' else (float(x[1:-1])*1000 if x[-1] == 'B' else float(x[1:-1]))) C. Plot
Plot box plot for funds in million.
Check the number of outliers greater than the upper fence.
Check frequency of the OperatingState features classes.
Statistical Analysis:
Is there any significant difference between Funds raised by companies that are still operating vs companies that closed down?
Write the null hypothesis and alternative hypothesis.
Test for significance and conclusion
Make a copy of the original data frame.
Check frequency distribution of Result variables.
Calculate percentage of winners that are still operating and percentage of contestants that are still operating
Write your hypothesis comparing the proportion of companies that are operating between winners and contestants:
Test for significance and conclusion
Select only the Event that has ‘disrupt’ keyword from 2013 onwards.
Solution Approach :
Dataset Used:
We utilized the dataset "CompanyX_EU.csv," which contains information about various companies, including their funding, operating state, result in the competition, and event details.
Dataset Information:
The dataset comprises multiple columns, including "Funding," "OperatingState," "Result," and "Event," among others.
Column Name | Description |
Startup | Name of the company |
Product | Actual product or service offered by the company |
Funding | Funds raised by the company in USD |
Event | The event the company participated in |
Result | Outcome of the company in the competition (e.g., Contestant, Finalist, Winner) |
OperatingState | Current status of the company (e.g., Operating, Closed, Acquired, IPO) |
We conducted exploratory data analysis (EDA) to understand the structure of the dataset, check for missing values, and gain insights into the distribution of variables.
Data Preprocessing:
Initially, we loaded the dataset and inspected its shape to understand the number of rows and columns.
Basic information about the dataset, such as data types of columns, was obtained using the .info() function.
Null values were identified and dropped from the dataset to ensure data integrity and consistency.
We converted the funding amounts into a standardized format for analysis, considering values in millions for uniformity.
Exploratory Data Analysis (EDA):
EDA involved visualizing the distribution of funds raised by companies, identifying outliers using box plots, and treating them appropriately.
The frequency of operating states was analyzed to understand the distribution of companies in different states (e.g., operating, closed).
We conducted group-wise descriptive statistics to compare fund-raising amounts across different operating states, visualizing the results using bar plots.
Hypothesis Testing:
To assess whether fund-raising amounts differed significantly based on the operating state, we performed a t-test.
Null and alternate hypotheses were formulated and tested using appropriate statistical techniques.
Similarly, we conducted a chi-squared test to compare the proportion of companies operating between winners and contestants in the competition.
Insights and Conclusions:
Based on our analysis, we derived insights into the performance of companies in the competition, including their funding levels and operational status.
We concluded by discussing the implications of our findings and potential strategies for improving company performance or competition outcomes.
Output :
Dataset
Boxplot :
Freqency of the operatingstate
Frequency of the result variable
This provides insight into the distribution of operating states within each result category
In our project on analyzing company performance in EU competition, we've gleaned invaluable insights into the intricacies of startup dynamics and competition outcomes. Through rigorous data exploration, meticulous preprocessing, and insightful hypothesis testing, we've unraveled a wealth of information about company funding, operational states, and success rates.
We've identified notable trends such as the distribution of funding among companies, the prevalence of operating states, and the disparity in success rates between competition winners and contestants. Additionally, our analysis has illuminated the impact of funding on company longevity and the potential implications for future competition strategies.
If you require any assistance with the project discussed in this blog, or if you find yourself in need of similar support for other projects, please don't hesitate to reach out to us. Our team can be contacted at any time via email at contact@codersarts.com.
Comments