Skip to content

Data Analysis using Python

About this short workshop

This workshop provides valuable insights for individuals starting their data analysis journey with Python. We will be using IPython notebooks that are compatible with both Visual Studio Code (VSCode) and Jupyter environments. To access the workshop, click on the provided link, which will open in Google Colab.

contents

Part I

Part II

Data set

California housing dataset

In this context, we make use of a tailored version of the California housing dataset for practice. This dataset is stored as a .csv file, where each row contains information about a particular strict. The columns in the dataset represent:

| Columns |
-----------------| longitude latitude
housing_median_age total_rooms total_bedrooms households median_income median_house_value

The first two rows of data/HTC.csv look line as below:

"longitude","latitude","housing_median_age","total_rooms","total_bedrooms","population","households","median_income","median_house_value"
-114.310000,34.190000,15.000000,5612.000000,1283.000000,1015.000000,472.000000,1.493600,66900.000000
-114.470000,34.400000,19.000000,7650.000000,1901.000000,1129.000000,463.000000,1.820000,80100.000000

Titanic dataset

For this exercise, we utilize the Titanic dataset, which is stored as a .csv file. In this dataset, each row contains information about a specific passenger, while the columns represent:

Columns
PassengerId Passenger id
Survived 0 = No; 1 = Yes
Pclass Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
Name Passenger name
Sex Passenger gender
Age Passenger age
SibSp Number of Siblings/Spouses Aboard
Parch Number of Parents/Children Aboard
Ticket Ticket Number
Fare Passenger fare
Cabin Cabin number
Embarked Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)