Data Science with Python
This notebook contains the data science applications possible in python and starts with basic introduction to python language from data science utilization perspective to dealing with data manipulation. Its divided intro 4 parts shown below
- Python Basics (Data types, storing values taking user input, string manipulation, mathematical operations) - Python Data Structures ( String, Arrays, Lists, Tuple, Dictionary) - Python Condition Programming ( OOPS, conditional statements, exception handling) - Python Data Manipulation (Read & Write files, data manipulations, statistical analysis, sampling, etc.)
Python Basics
Let's start by taking input from the user
Username given by user is Aditya
Now taking multiple inputs from the user
Two inputs provided by the user are 12 45
Variables in python along with some basic operations on them
Value of integer variable a is 10
Addition result for variables a and b is 40
Substraction result for variables a and b is -20
Multiplication result for variables a and b is 300
Divison result for variables a and b is 0.3333333333333333
Modulus result for variables a and b is 10
Operators
40
-20
300
0.3333333333333333
0
10
1000000000000000000000000000000
False
True
False
True
False
True
False
True
False
10
30
-11
20
2
40
10
20
10
100
102400
b is divisible by a
Python Data Structures
Discussing about various data structures present in python and how to use and manipulate them#
Printing first element of string1 is M
Printing last element of string1 is e
ecin si muideM
um is
um is ni
ium is nice
List data structure in python are like dynamically typed arrays.
List 1 is ['Medium', 'is', 'fun'] List 2 is [1, 'Medium', 2, 'fun']
Medium
2
3
['Medium', 'is', 'fun', '.']
['Medium', 'is', 'fun', '.', 'best', 'articles', 'are', 'here']
['here', 'are', 'articles', 'best', '.', 'fun', 'is', 'Medium']
['here', 'are', 'articles', '.', 'fun', 'is', 'Medium']
Tuples in python are collections just like above list data structure, any data type can be stored in tuple and are indexed by integers
('Medium', 'the', 'best')
('best', 'the', 'Medium')
Set in python is unordered collection data structure of data type that are iterative.
Set 1 elements are {'e', 'i', 'm', 'u', 'd'} and Set 2 elements are {'m', 'a', 'i'}
{'e', 'i', 'best', 'm', 'u', 'd'}
{'r', 't', 'o', 'e', 'l', 's', 'i', 'g', 'f', 'best', 'm', 'u', 'b', 'd'}
r
t
o
e
l
s
i
g
f
best
m
u
b
d
{'r', 't', 'o', 'e', 'l', 's', 'i', 'g', 'f', 'm', 'u', 'b', 'd'}
Dictionary in Python is the data structure that is collection of keys values used for data values like map and holds single value as an element
Dictionary 1 values are {1: 'medium', 2: 'is', 3: 'best'} and Dictionary 2 values are {1: 'single value', 2: {1: 'another single value'}}
{0: 'medium', 1: 'best'}
medium
{}
single value
dict_items([(1, 'single value'), (2, {1: 'another single value'})])
dict_keys([])
{0: 'medium', 1: 'bestest'}
dict_values(['medium', 'bestest'])
Array data structure in Python is a collection of same data type values at contiguous memory locations
array('i', [1, 2, 3])
array('i', [1, 2, 3, 4, 5])
array('i', [1, 2, 3, 4])
Sliced array b array('i', [5, 6, 7, 8, 9])
8
Python Condition Programming
Discussing about various conditional programming concepts present in python such as OOPS, conditional statements and exception handling
printing while loop
0
1
2
3
4
printing for loop
0
1
2
3
4
continue control statment
Letter currently, t
Letter currently, h
Letter currently, i
Letter currently, s
Letter currently, _
Letter currently, i
Letter currently, s
Letter currently, _
Letter currently, m
Letter currently, y
Letter currently, a
Letter currently, r
Letter currently, t
Letter currently, i
Letter currently, c
Letter currently, l
break control statment
Letter currently, t
Letter currently, h
Letter currently, i
Letter currently, s
Letter currently, _
Letter currently, i
Letter currently, s
Letter currently, _
Letter currently, m
Letter currently, y
Letter currently, a
Letter currently, r
Letter currently, t
Letter currently, i
Letter currently, c
Letter currently, l
pass control statment
Letter currently, e
Loop with else statment
0
1
2
if no break statment in loop else executes
mammal
Aditya
Dog Breed is German Shephard
this is dog class method
this is animal class method
None None
Beatle
Number supplied to function fun is 10
5
Name provided to function pname is None
hello
world
f1 hello
f2 world
[1, 2, 45, 4]
[1, 2, 45, 4]
4
Exception handling in python is managed using three keywords try, except and finally. Exceptions are instances that are raised when some internal event occurs which changes the flow of program.
Execution Error
error occured
ZeroDivisionError occured
finally executes regarless or whether exception occured or not
Exception occured
Python Data Manipulation
Data manipulation techniques and ways present in Python for Data Science work namely data preprocessing, data transformation and data visualization.
Data preprocessing is the task of converting data from a given form to a much useful and consumable form.
[[0. 1.]
[2. 3.]]
[10 20]
[[10. 21.]
[12. 23.]]
[[-10. -19.]
[ -8. -17.]]
[[ 0. 20.]
[20. 60.]]
[[0. 0.05]
[0.2 0.15]]
Data Transformation is process of converting data from one format or structure into another format or structure
Name Age Salary
0 Aditya 21 4500000
1 Annie 22 3400000
2 Aman 25 1200000
Name Age
0 Aditya 21
1 Annie 22
2 Aman 25
Name Age Salary Tax
0 Aditya 21 4500000 2025000.0
1 Annie 22 3400000 1530000.0
2 Aman 25 1200000 540000.0
Name Age Salary Tax
0 Aditya 21 4500000 2025000.0
1 Annie 22 3400000 1530000.0
2 Aman 25 1200000 540000.0
Name 0
Age 0
Salary 0
Tax 0
dtype: int64
Name Age Salary Tax
0 False False False False
1 False False False False
2 False False False False
0
Aditya
21
1
Annie
22
2
Aman
25
Data wrangling is the process of gathering, collecting, and transforming Raw data into another format for better understanding, decision-making, accessing, and analysis in less time.
It deals with issues such as data exploration, dealing with mining values, reshaping data, filtering data. All of the them are discussed above.
Data Visualization is the process of presenting data in the form of graphs or charts. It helps to understand large and complex amounts of data very easily. It allows the decision-makers to make decisions very efficiently and also allows them in identifying new trends and patterns very easily.
Data visualization using seaborn also as it offer more graphs and more attractive visuals
/tmp/ipykernel_685/1187558920.py:6: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(data, kde = True, color = "g")
Statistics with Python
26.938730853391686 26.0 0 24.0
dtype: float64 <bound method NDFrame._add_numeric_operations.<locals>.std of 0 25.0
1 25.0
2 27.0
3 22.0
4 29.0
...
453 26.0
454 24.0
455 26.0
456 26.0
457 NaN
Name: Age, Length: 458, dtype: float64> 19.395360666436332
count
457.0
457.0
mean
17.678336980306344
26.938730853391686
std
15.966090405679644
4.404016424405833
min
0.0
19.0
25%
5.0
24.0
50%
13.0
26.0
75%
25.0
30.0
max
99.0
40.0