Pandas Tutorial – GreatLearning

Artificial Intelligence

Pandas Tutorial – GreatLearning

theb2bnews

December 3, 2020

[ad_1]

Pandas is an open-source Python library, which can be used for data analysis and manipulation, and other types of computation. It is built on top of NumPy. In this tutorial, I will be covering Pandas, different features of it, and how to use it. Firstly let us see some features of Pandas.

Features

Provides an efficient way to explore data.
Supports multiple file formats.
Ability to handle missing data.
Ability to extract data, and run transformations on it.
Reshape, slice, and index.
Merge and join datasets.
Perform mathematical operations on data.
Time series functionality.
Visualize data.

Installation

Installing Pandas is pretty simple. There are two ways to install Pandas;

Using Anaconda. When you install Anaconda on your machine, Pandas and some other libraries get installed along with it. Click here to install Anaconda.
Using ‘pip’. If you have Python already installed, run the following command to install Pandas.
Alternatively, visit this website to install Pandas.

If you are a Linux user and want to install Pandas, the code may vary depending on the distribution you have, Refer to this site for proper installation guidance.

Data Types

A data type is used by a programming language to understand how to store and manipulate data. The table below summarizes the different data types in Pandas.

Data type	Use
int	Integer number, eg: 10, 12
float	Floating point number, eg: 100.2, 3.1415
bool	True/False value
object	Test, non-numeric, or a combination of text and non-numeric values, eg: Apple
DateTime	Date and time values
category	A finite list of values

Pandas Data Structures

There are two main data structures associated with Pandas, Series and DataFrame.

Series

You can think of Pandas Series like an array, or a list, capable of holding any data type. It is 1 dimensional. In simple language, you can think of Series like a column in an Excel sheet. It helps in storing data.

DataFrame

Pandas DataFrame is a 2-dimensional structure. The data is stored in a tabular format, containing rows and columns. You can think of a DataFrame as a collection of different Pandas Series. You can also create a single column DataFrame. Although it looks like a Pandas Series, since it is defined as a DataFrame, it will act as one. Also, a key thing to note is that even though a DataFrame looks like a SQL table or an Excel sheet, it is completely different from them.

How to create Pandas Series and DataFrame?

Pandas Series

Using Numpy Array:

To create a Pandas Series from a NumPy array, first I will define a NumPy array, and then I will call this array inside my Series initialization function.

# import pandas as pd

import pandas as pd

# import numpy as np

import numpy as np

# simple array

data = np.array([‘apple’,’mango’,’guava’,’grapes’,’banana’,’strawberry’])

ser = pd.Series(data)

ser.head()

Output

0 apple

1 mango

2 guava

3 grapes

4 banana

dtype: object

Using Python List:

Similar to creating a Pandas Series from a NumPy array, first I will define a list, and then I will call this list inside my Series initialization function.

list1 = [1,2,3,4,5,6,7,8,9,10]

# create series from a list

ser = pd.Series(list1)

ser.head()

Output

0 1

1 2

2 3

3 4

4 5

dtype: int64

Using the Python dictionary:

Similar to creating a Pandas Series from a NumPy array or a list, first I will define a dictionary, and then I will call this dictionary inside my Series initialization function.

# create a dictionary

dictionary1 = {1 : 100, 2 : 200, 3 : 300}

# create a series

ser = pd.Series(dictionary1)

ser.head()

Output

1 100

2 200

3 300

dtype: int64