Python Data Types
In Python, we have many data types. The most common ones are float (floating point), int (integer), str (string), bool (Boolean), list, and dict (dictionary).
- float - used for real numbers.
- int - used for integers.
str - used for texts. We can define strings using single quotes
'value', double quotes
"value", or triple quotes
"""value""". The triple quoted strings can be on multiple lines, the new lines will be included in the value of the variable. They’re also used for writing function documentation.
- bool - used for truthy values. Useful to perform a filtering operation on a data.
- list - used to store a collection of values.
- dict - used to store a key-values pairs.
We can use the
function to check the type of a specific variable. Operators in Python behave differently depending on the variable’s type and there are different built-in methods for each one.
Here we can look at some examples with creating a floating points, intergers, strings and booleans in Python.
Python list is a basic sequence type. We can use this type to store a collection of values. One list can contain values of any type . It is possible that one list contains another nested lists for its values. It’s not commonly used, but you can have a list with a mix of Python types. You can create a new one using square brackets like this:
fruits = ["pineapple", "apple", "lemon", "strawberry", "orange", "kiwi"]
You can use indexes to get element or elements from the list. In Python, the indexes start from
Therefore, the first element in the list will have an index
. We can also use negative indexes to access elements. The last element in the list will have an index
, the one before the last one will have an index
and so on. We have also something called
in Python which can be used to get multiple elements from a list. We can use it like this:
start_indexis the beginning index of the slice, the element at this index will be included to the result, the default value is
end_indexis the end index of the slice, the element at this index will not be included to the result, the default value will be the
length of the list. Also, the default value can be
- length of the list -1if the step is negative. If you skip this, you will get all the elements from the start index to the end.
stepis the amount by which the index increases,
the default value is
1. If we set a negative value for the step, we’ll move backward.
We can add element or elements to a list using
appendmethod or by using the
plus operator. If you’re using the plus operator on two lists, Python will give a new list of the contents of the two lists.
- We can change element or elements to list using the same square brackets that we already used for indexing and list slicing.
We can delete an element from a list with the
remove(value)method. This method will delete the first element of the list with the passed value.
It’s important to understand how lists work behind the scenes in Python. When you create a new list
, you’re storing the list in your computer memory, and the address of that list is stored in the
variable. The variable
doesn’t contain the elements of the list. It contains a reference to the list. If we copy a list with the equal sign only like this
my_list_copy = my_list
, you’ll have the reference copied in the
variable instead of the list values. So, if you want to copy the actual values, you can use the
function or slicing
The dictionaries are used to store
. They are helpful when you want your values to be indexed by
. In Python, you can create a dictionary using
. Also, a key and a value are separated by a
. If we want to get the value for a given key, we can do it like that:
Dictionaries vs Lists
Let’s see an example and compare the lists versus dictionaries. Imagine that we have some movies and you want to store the ratings for them. Also, we want to access the rating for a movie very fast by having the movie name. We can do this by using two lists or one dictionary. In examples the
code returns the index for the “Ex Machina” movie.
In this case, the usage of a dictionary is a more intuitive and convenient way to represent the ratings.
data from our dictionaries. When we want to add or update the data we can simply use this code
our_dict[key] = value
. When we want to delete a key-value pair we do this like that
We can also check if a given key is in our dictionary like that:
key in our_dict
A function is a piece of reusable code solving a specific task. We can write our functions using the
keyword like that:
However, there are many built-in function in Python like
max(iterable [, key]),
min(iterable [, key])
round(number [, ndigits])
, etc. So, in many cases when we need a function that solves a given task, we can research for a built-in function that solves this task or a Python package for that. We don’t have to “
reinventing the wheel
Most of the functions take some input and return some output. These functions have arguments, and Python matches the passed inputs in a function call to the arguments. If square brackets surround an argument, it’s optional.
We can use the function
to see the documentation of any function. If we’re using Jupyter Notebook, the
function will show us the documentation in the current cell, while the second option will show us the documentation in the pager.
We’ve seen that we have strings, floats, integers, booleans, etc. in Python. Each one of these data structures is an object. A method is a function that is available for a given object depending on the object’s type. So, each object has a specific type and a set of methods depending on this type.
Objects with different type can have methods with the same name. Depending on the object’s type, methods have different behavior.
Watch out! Some methods can change the objects they are called on. For example, the
method called on list type.
A module is a file containing Python definitions and statements. Modules specify functions, methods and new Python types which solved particular problems.
A package is a collection of modules in directories. There are many available packages for Python covering different problems. For example, “NumPy”, “matplotlib”, “seaborn”, and “scikit-learn” are very famous data science packages.
- “NumPy” is used for efficiently working with arrays
- “matplotlib” and “seaborn” are popular libraries used for data visualization
- “scikit-learn” is a powerful library for machine learning
There are some packages available in Python by default, but there are also so many packages that we need and that we don’t have by default. If we want to use some package, we have to have it installed already or just install it using pip (package maintenance system for Python).
However, there is also something called “Anaconda”.
Anaconda Distribution is a free, easy-to-install package manager, environment manager and Python distribution with a collection of 1,000+ open source packages with free community support.
So, if you don’t want to install many packages, I’ll recommend you to use the “Anaconda”. There are so many useful packages in this distribution.
Once you have installed the needed packages, you can import them into your Python files. We can import an entire package, submodules or specific functions from it. Also, we can add an alias for a package. We can see the different ways of import statements from the examples below.
We can also do something like this
from numpy import *
. The asterisk symbol here means to import everything from that module. This import statement creates references in the current namespace to all public objects defined by the
module. In other words, we can just use all available functions from
only with their names without prefix. For example, now we can use the NumPy’s absolute function like that
However, I’m not recommending you to use that because:
- If we import all functions from some modules like that, the current namespace will be filled with so many functions and if someone looks our code, he or she can get confused from which package is a specific function.
- If two modules have a function with the same name, the second import will override the function of the first.
NumPy is a fundamental package for scientific computing with Python. It’s very fast and easy to use. This package helps us to make calculations element-wise (element by element).
The regular Python list doesn’t know how to do operations element-wise. Of course, we can use Python lists, but they’re slow, and we need more code to achieve a wanted result. A better decision in most cases is to use
Unlike the regular Python list, the NumPy array always has one single type. If we pass an array with different types to the
, we can choose the wanted type using the parameter
. If this parameter is not given, then the type will be determined as the minimum type required to hold the objects.
NumPy array comes with his own attributes and methods. Remember that the operators in Python behave differently on the different data types? Well, in NumPy the operators behave element-wise.
If we check the type of a NumPy array the result will be
. Ndarray means n-dimensional array. In the examples above we used 1-dimensional arrays, but nothing can stop us to make 2, 3, 4 or more dimensional array. We can do subsetting on an array independently of that how much dimensions this array has. I’ll show you some examples with a 2-dimensional array.
If we want to see how many dimensional is our array and how much elements have each dimension, we can use the
attribute. For 2-dimensional arrays, the first element of the tuple will be the number of rows and the second the number of the columns.
The first step of analyzing data is to get familiar with the data. NumPy has a lot of methods which help us to do that. We’ll see some basic methods to make statistics on our data.
np.mean()- returns the arithmetic mean (the sum of the elements divided by the length of the elements).
np.median()- returns the median (the middle value of a sorted copy of the passed array, if the length of array is even - the average of the two middle values will be computed)
np.corrcoef()- returns a correlation matrix. This function is useful when we want to see if there is a correlation between two variables in our dataset or with other words, between two arrays with the same length.
np.std()- returns a standard deviation
From the example above, we can see that there is a high correlation between the hours of learning and the grade.
Also, we can see that:
- the mean for the learning hours is 4.6
- the median for the learning hours is 4.0
- the standard deviation for the learning hours is 3.2
NumPy also has some basic functions like
which exists in the basic Python lists, too. An important note here is that NumPy enforces a single type in an array and this speeds up the calculations.
I have prepared some exercises including subsetting, element-wise operations, and basic statistics. If you want, you can try to solve them.
- Subsetting Python list
- Subsetting 2-dimensional NumPy array
- NumPy element-wise operations
- NumPy basic statistics
Other Blog Posts by Me
- Jupyter Notebook shortcuts .
- Python Basics: Iteration and Looping
- Python Basics: List Comprehensions
- Data Science with Python: Intro to Data Visualization with Matplotlib
- Data Science with Python: Intro to Loading, Subsetting, and Filtering Data with pandas
- Introduction to Natural Language Processing for Text
Here is my LinkedIn profile in case you want to connect with me. I’ll be happy to be connected with you.
Thank you for the read. If you like this post, please hold the clap button and share it with your friends. Also, I’ll be happy to hear your feedback. If you want to be notified when I create a new blog post, you can subscribe to my newsletter .