Data Analysis using Python¶

02-Excercise¶

Consider the Titanic dataset¶

The available metadata of the Titanic dataset provides the following information:

VARIABLE	DESCRIPTION
`PassengerId`	Passenger id
`Survived`	0 = No; 1 = Yes
`Pclass`	Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
`Name`	Passenger name
`Sex`	Passenger gender
`Age`	Passenger age
`SibSp`	Number of Siblings/Spouses Aboard
`Parch`	Number of Parents/Children Aboard
`Ticket`	Ticket Number
`Fare`	Passenger fare
`Cabin`	Cabin number
`Embarked`	Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)

2-1: Import the titanic data

In [1]:

  Copied!     
 
import pandas as pd 
titanic = pd.read_csv('../data/titanic.csv', sep=",")
import pandas as pd titanic = pd.read_csv('../data/titanic.csv', sep=",")

2-3: Determine the number of records (rows) and columns in a dataset?

In [3]:

  Copied!     
 
# Get the number of records (rows) and columns
num_records, num_columns = titanic.shape
# Get the number of records (rows) and columns num_records, num_columns = titanic.shape

Out[3]:

(891, 12)

2-4: Display the top and bottom rows of the dataset.

In [4]:

  Copied!     
 
titanic.head() # Top rows
titanic.tail() # Bottom rows
titanic.head() # Top rows titanic.tail() # Bottom rows

Out[4]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
886	887	0	2	Montvila, Rev. Juozas	male	27.0	0	0	211536	13.00	NaN	S
887	888	1	1	Graham, Miss. Margaret Edith	female	19.0	0	0	112053	30.00	B42	S
888	889	0	3	Johnston, Miss. Catherine Helen "Carrie"	female	NaN	1	2	W./C. 6607	23.45	NaN	S
889	890	1	1	Behr, Mr. Karl Howell	male	26.0	0	0	111369	30.00	C148	C
890	891	0	3	Dooley, Mr. Patrick	male	32.0	0	0	370376	7.75	NaN	Q

2-5: What is the data type of the columns in the dataset?

In [5]:

  Copied!     
 
titanic.dtypes
titanic.dtypes

Out[5]:

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

2-6: What are the column labels or names in the dataset?

In [6]:

  Copied!     
 
titanic.columns
titanic.columns

Out[6]:

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

Contents | Previous (1) Exercise | Next (3) Exercise ¶

02-Working with Data

Data Analysis using Python¶

02-Excercise¶

Consider the Titanic dataset¶

Contents | Previous (1) Exercise | Next (3) Exercise¶

Contents | Previous (1) Exercise | Next (3) Exercise ¶