Descripción general del Data set
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv('Unicorn_Startups.csv')
df.head()
Unnamed: 0int64
Companyobject
0
0
Bytedance
1
1
SpaceX
2
2
Stripe
3
3
Klarna
4
4
Canva
df.columns
df = df.drop(['Unnamed: 0'], axis=1)
df
Companyobject
Bolt0.2%
Bytedance0.1%
933 others99.7%
Valuation ($B)float64
1.0 - 140.0
0
Bytedance
140
1
SpaceX
100.3
2
Stripe
95
3
Klarna
45.6
4
Canva
40
5
Instacart
39
6
Databricks
38
7
Revolut
33
8
Nubank
30
9
Epic Games
28.7
df.shape
df.columns
df.isnull().sum()
df = df.fillna('Nadie')
df.isnull().sum()
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 936 entries, 0 to 935
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Company 936 non-null object
1 Valuation ($B) 936 non-null float64
2 Date Joined 936 non-null object
3 Country 936 non-null object
4 City 936 non-null object
5 Industry 936 non-null object
6 Investor 1 936 non-null object
7 Investor 2 936 non-null object
8 Investor 3 936 non-null object
9 Investor 4 936 non-null object
dtypes: float64(1), object(9)
memory usage: 73.2+ KB
df.nunique()
df['Industry'].value_counts()
df['Industry'] = df['Industry'].astype('category')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 936 entries, 0 to 935
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Company 936 non-null object
1 Valuation ($B) 936 non-null float64
2 Date Joined 936 non-null object
3 Country 936 non-null object
4 City 936 non-null object
5 Industry 936 non-null category
6 Investor 1 936 non-null object
7 Investor 2 936 non-null object
8 Investor 3 936 non-null object
9 Investor 4 936 non-null object
dtypes: category(1), float64(1), object(8)
memory usage: 67.5+ KB
df.describe()
Valuation ($B)float64
count
936
mean
3.281153846
std
7.47317879
min
1
25%
1.05
50%
1.6
75%
3
max
140
Analizando las variables del Data set
df.columns
df_val = df.sort_values('Valuation ($B)', ascending=0).head(10)
plt.bar(df_val['Company'], df_val['Valuation ($B)'])
plt.xticks(rotation=90)
plt.show()
df_date = df.sort_values('Date Joined', ascending=1).head()
df_date
Companyobject
Valuation ($B)float64
557
Veepee
1.38
224
VANCL
3
99
Vice Media
5.7
3
Klarna
45.6
349
Trendy Group International
2
df_date = df.sort_values('Date Joined', ascending=0).head()
df_date
Companyobject
Valuation ($B)float64
935
Pet Circle
1
640
Pristyn Care
1.2
639
AgentSync
1.2
932
Anyscale
1
597
Incode Technologies
1.25
df_co = df.groupby('Country')['Company'].count().sort_values(ascending=False)
df_co
x = df.groupby('Country')['Company'].count().sort_values(ascending=False).index
y = df.groupby('Country')['Company'].count().sort_values(ascending=False).values
plt.bar(x[:10], y[:10])
plt.xticks(rotation=90)
plt.show()
df.groupby('City')['Company'].count().sort_values(ascending=False).head(10)
plt.bar(x[:20], y[:20])
plt.xticks(rotation=90)
plt.show()
Industria
df['Industry'].value_counts()
x = df['Industry'].value_counts().index
y = df['Industry'].value_counts().values
plt.bar(x, y)
plt.xticks(rotation=90)
plt.show()
df['Investor 1'].value_counts().head()
x = df['Investor 1'].value_counts().index
y = df['Investor 1'].value_counts().values
plt.bar(x[:5], y[:5])
plt.xticks(rotation = 90)
plt.show()
df['Investor 2'].value_counts().head()
x = df['Investor 2'].value_counts().head().index
y = df['Investor 2'].value_counts().head().values
plt.bar(x, y)
plt.xticks(rotation=90)
plt.show()
df['Investor 3'].value_counts().head()
x = df['Investor 3'].value_counts().head(10).index
y = df['Investor 3'].value_counts().head(10).values
plt.bar(x, y)
plt.xticks(rotation=90)
plt.show()
df['Investor 4'].value_counts()
x = df['Investor 4'].value_counts().head(10).index
y = df['Investor 4'].value_counts().head(10).values
plt.bar(x, y)
plt.xticks(rotation=90)
plt.show()