12 Python Built-In Functions for Data Science and Analytics
Python is the programming language of choice for many data scientists and analysts. With its relative simplicity, versatility, and wide range of third-party libraries, it is an invaluable addition to any data science and analytics toolkit. From data manipulation to visualization and machine learning, libraries like Pandas, Numpy, Seaborn, and Scikit-Learn offer advanced capabilities to tackle these and much more. While these third-party libraries greatly enhance Python’s suitability for data analytics and machine learning, base Python does come with its own range of built-in functions that will prove useful to any data professional. In this article, I present 12 of these functions, along with their use cases and example code.
The enumerate() function allows you to write a for loop that comes with an index. This means we can automatically assign a count variable to each item in an iterable. When we use enumerate(), we get back the count of the current iteration, and then the value of the item at that iteration.
As seen from the example above, we need not worry about adding an index manually as Python handles everything automatically. By default, the index starts at 0, but we can change that by using the 'start' argument.
We can also create tuples containing the index and list item by using enumerate() in combination with the list() function. Here is an example:
Note: Just like in a normal for loop, the loop variables can be assigned any valid Python name. We used index and fruit in our example but you could also use a and b or any other acceptable variable name and it would work just fine.
As you may have guessed from the name, the sorted() function takes in an iterable object and returns a sorted list of all the items in said iterable. Strings are sorted alphabetically while numbers are sorted numerically. By default, the items are sorted in ascending order.
But we can specify the sort order by assigning True(descending) or False (ascending) to the 'reverse' argument.
Note that a list containing both string and numeric values cannot be sorted.
The zip() function is used to assign items from different iterators together. It returns a zip object, containing tuples of the corresponding items in each iterable.
One thing to note is that if the passed iterators have different lengths, zip() will stop when the shortest iterator is exhausted, thus ignoring the remaining items in the longer iterators.
The zip() function improves the readability of for loops. In the example below, we only use one zipped list instead of needing multiple inputs. Note that I included a third list, 'ages', which means we’re zipping three lists.
This function can also be used together with the built-in dict() function to create a dictionary object, like so.
Finally, we can use the zip() function to unpack a series of elements (tuples) in a list to independent tuples by using the * operator.
The map() function takes in another function and an iterable object such as a string or tuple and returns the results after applying the given function to each item in the iterable.
In the example above, we create a simple function that multiplies a given number by two. We then map the function to a list of numbers, which returns a map object with the results of each number in the list that was passed to the function.
Like the map() function, the filter() function also takes in another function and an iterable such as a list or tuple and then returns the elements in the iterable for which the function returns True.
In the example above, we write a simple custom function to check a given number and return True if it is an even number, and False if it’s not. After passing our function into the filter() function, along with a list of numbers, it returns a list containing numbers that are True for our custom function i.e even numbers.
The isinstance() function is used to check if an object is an instance of a specified class. It takes in two arguments, an object and a class name, and then returns True if the object is an instance of that class and False if it’s not.
The range() function returns a sequence of numbers with a specified length. It takes in three arguments;'start', 'stop', and 'step'. The 'start' argument is optional and specifies the beginning of the sequence, set to 0 by default. The 'stop' argument is a required integer specifying when the sequence ends. Finally, the 'step' argument, which is also optional specifies the value of the increment from one number to the next.
The round() function takes in a floating-point number and rounds it up to the nearest integer by default. If specified, the 'ndigits' argument will round up the input to the number of decimal places entered.
Sets are one of the four built-in collection data types in Python, and is used to store multiple items in a single variable. Set items are unordered, unchangeable, and do not allow duplicate values. The set() function takes in another collection object such as a list or tuple, and outputs a set.
Note that in the output, all duplicate values have been removed. This is especially useful if you want to return only the unique values from a list.
These two functions are somewhat related and often complement each other. The all() function takes in an iterable such as a list or tuple and returns True if all the elements in the iterable are true, and False if one or more of the items are false. In the example below, we check if all the numbers in the list are even numbers using the all() function.
The any() function accepts an iterable as its input and returns True if at least one element is true. It returns False if no single element in the list is true. In the example below, we check if at least one number in the list is an even number.
The eval() function allows a user to run Python code from a string-based or compiled-code-based input. This function is especially handy when you’re trying to evaluate Python expressions that come in a string format, such as a mathematical expression.
In the examples above, the function returns the value that results from evaluating the input, even though it’s in string format.
The format() function returns a formatted representation of a value based on a specified format. It takes in two arguments, the value to be formatted, and a specification of how the value is to be formatted.
And with this, we conclude our list of 12 Python built-in functions for data science and analytics. Of course, this list is by no means exhaustive, as Python comes with several more built-in functions that may have use cases and applications in data science. You can find the full list in the official Python documentation here: https://docs.python.org/3/library/functions.html
I encourage you to explore the list as you’ll find more information on all of the functions I have listed in this article. I hope you enjoyed reading this as much as I enjoyed writing it. All the best in your data journey!