Codementor Events

Pandas in Python

Published Feb 13, 2022Last updated Feb 20, 2022
Pandas in Python

Introduction :

  • Pandas is a Python library used for working with datasets.
  • It is used for exploring, cleaning, manipulating, and analyzing data.
  • The word "Pandas" has reference to "Panel Data" and "Python Data Analysis".

Importing Pandas :

import pandas

Now, it is ready to use.

import pandas
a={
"Fruits":["apple","mango","kiwi"],
"Qty":[1,2,3]
}
df=pandas.DataFrame(a)
print(df)

Importing Pandas with alias :

Usually, Pandas is imported with pd alias.
alias: alias is alternate name for referencing the same thing.

import pandas as pd
a={
"Fruits":["apple", "mango", "kiwi"],
"Qty":[1,2,3]
}
df=pd.DataFrame(a)
print(df)

Pandas Series :

  • Series is a one-dimensional array, capable of holding single type of data.
  • Series is like a column in a table.
import pandas as pd
a=[1,2,3]
s=pd.Series(a)
print(s)

Labels :

If index is not specified, the values are labelled with their index number, first element has index 0 and second has 1 and so on.

  • We can also access elements in series by index number :
import pandas as pd
a=[1,2,3]
s=pd.Series(a)
print(s[0])

Creating Labels:

We can also create our index with the help of index argument.

import pandas as pd
a=[1,2,3]
s=pd.Series(a,index= 'a','b','c')
print(s)

Pandas DataFrames :

  • DataFrame is like a tabular spreadsheet representing rows which contain one or more columns.
  • Series is like a column in a table where DataFrame is a table.
import pandas as pd
a={
"Fruits":["apple","mango","kiwi"],
"Qty.":[1,2,3]
}
df=pd.DataFrame(a)
print(df)

Index in DataFrame :

As in Series, we can also name the indexes in DataFrames.

import pandas as pd
a={
"Fruits":["apple","mango","kiwi"],
"Qty":[1,2,3]
}
df=pd.DataFrame(a,index="x","y","z")
print(df)

Loc :

loc[] attribute returns one or more specified rows.

import pandas as pd
a={
"Fruits":["apple","mango","banana"],
"Qty.":[1,2,3]
}
df=pd.DataFrame(a)
print(df.loc[0])

You can also access the DataFrame elements by referring named index using loc[] attribute :

import pandas as pd
a={
"Fruits":["apple","mango","kiwi"],
"Qty.":[1,2,3]
}
df=pd.DataFrame(a,index='x','y','z')
print(df.loc['x'])

Reading CSV File:

  • CSV stands for Comma Seprated Files.
  • Pandas provide read_csv() method to load CSV files in DataFrame.
  • I will be using 'data.csv' file as an example.
a=pd.read_csv('data.csv')
print(a)

By default it will print first 5 and last 5 rows with headers.
If you want to print the entire DataFrame, use to_string method.

import pandas as pd
a=pd.read_csv('data.csv')
print(a.to_string)

Analyzing the Data :

Head() Method :

The head() method returns headers and specified number of rows from the top of the dataset.

# Get the quick overview by printing 3 rows of the dataset :
import pandas as pd
a=pd.read_csv('data.csv')
print(a.head(3))

NOTE : If number of rows are not specified, head method will return 5 rows.

Tail Method :

The tail() method returns headers and specified number of rows from the bottom of dataset.

# Get the first 10 rows of the dataset
import pandas as pd
a=pd.read_csv('data.csv')
print(a.tail(10))

Information about Data :

The info() method is used to give more information about the dataset.

import pandas as pd
a=pd.read_csv('data.csv')
print(a.info())

Data Cleaning:

  • Data cleaning means fixing wrong data.
  • Wrong data can be empty values, duplicates, data in wrong format.

Remove Empty values :

  • One way to remove empty values is to remove rows that contain empty values.
  • The dropna() method is used to remove rows with duplicate values.
import pandas as pd
a=pd.read_csv('data.csv')
df=a.dropna()
print(df)

By default, the dropna() method will return a new DataFrame without affecting the original DataFrame.
If you want to change the original DataFrame, use inplace = True.

import pandas as pd
a=pd.read_csv('data.csv')
a.dropna(inplace=True)
print(a)
  • Another way to fill empty values is to fill a new value instead.
  • The fillna() method is to fill null values.
# Fill the null values with 130 :
import pandas as pd
a=pd.read_csv('data.csv')
a.fillna(130,inplace=True)
print(a)

Removing Duplicates :

To discover duplicates in a dataset, use duplicated() method.

import pandas as pd
a=pd.read_csv('data.csv')
print(a.duplicated())

The duplicated() method returns True and False for each row.

To remove duplicates from a dataset, use drop_duplicates() method.

import pandas as pd
a=pd.read_csv('data.csv')
print(a.drop_duplicates(inplace=True))
print(a)

Cleaning Wrong Data :

Wrong data can be data in wrong format.
To remove wrong data, use loc[] attribute.

import pandas as pd
a=pd.read_csv('data.csv')
a.loc[0,7]=45

Correlation in Pandas :

The corr() method returns relationship between each column in a dataset.

import pandas as pd
a=pd.read_csv('data.csv')
print(a.corr())
Discover and read more posts from anshika vohra
get started