Data Analysis using Python¶

03-Excercise¶

3-1: reconsider the Titanic dataset, and select the column containing the names of passengers,

In [37]:

  Copied!     
 
import pandas as pd 
import numpy as np 
titanic = pd.read_csv('../data/titanic.csv', sep=",")
titanic.Name
titanic['Name']
titanic.iloc[:,3]
import pandas as pd import numpy as np titanic = pd.read_csv('../data/titanic.csv', sep=",") titanic.Name titanic['Name'] titanic.iloc[:,3]

Out[37]:

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

3-2: Categorize the ages of passengers based on the age thresholds of 20 and 50, you can use the Pandas cut function.

In [4]:

  Copied!     
 
C1=titanic.Age<=20
C2=titanic.Age>=50

titanic.loc[C1,]
titanic.loc[~C1&~C2,]
titanic.loc[C2]

categories = pd.cut(titanic['Age'], bins=[0, 20, 50, float('inf')], labels=['Under 20', '20-50', 'Over 50'])
categories
C1=titanic.Age<=20 C2=titanic.Age>=50 titanic.loc[C1,] titanic.loc[~C1&~C2,] titanic.loc[C2] categories = pd.cut(titanic['Age'], bins=[0, 20, 50, float('inf')], labels=['Under 20', '20-50', 'Over 50']) categories 

Out[4]:

0         20-50
1         20-50
2         20-50
3         20-50
4         20-50
         ...   
886       20-50
887    Under 20
888         NaN
889       20-50
890       20-50
Name: Age, Length: 891, dtype: category
Categories (3, object): ['Under 20' < '20-50' < 'Over 50']

3-3: Create a column to save the classifications (L,M,H) of age in it.

In [15]:

  Copied!     
 
titanic['Agelev'] = pd.cut(titanic['Age'], bins=[0, 20, 50, float('inf')], labels=['L', 'M', 'H'])
titanic
titanic['Agelev'] = pd.cut(titanic['Age'], bins=[0, 20, 50, float('inf')], labels=['L', 'M', 'H']) titanic

Out[15]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	Agelav	Agelev
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S	M	M
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C	M	M
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S	M	M
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	S	M	M
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	0	373450	8.0500	NaN	S	M	M
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
886	887	0	2	Montvila, Rev. Juozas	male	27.0	0	0	211536	13.0000	NaN	S	M	M
887	888	1	1	Graham, Miss. Margaret Edith	female	19.0	0	0	112053	30.0000	B42	S	L	L
888	889	0	3	Johnston, Miss. Catherine Helen "Carrie"	female	NaN	1	2	W./C. 6607	23.4500	NaN	S	NaN	NaN
889	890	1	1	Behr, Mr. Karl Howell	male	26.0	0	0	111369	30.0000	C148	C	M	M
890	891	0	3	Dooley, Mr. Patrick	male	32.0	0	0	370376	7.7500	NaN	Q	M	M

891 rows × 14 columns

3-4: Select objects (rows) where the age is between 20 and 50,

In [17]:

  Copied!     
 
np.where(titanic.loc[:,'Agelev'].isin(['M']))
np.where(titanic.loc[:,'Agelev'].isin(['M']))

Out[17]:

(array([  0,   1,   2,   3,   4,   8,  13,  18,  20,  21,  23,  25,  30,
         34,  35,  37,  40,  41,  51,  52,  53,  56,  57,  60,  61,  62,
         66,  69,  70,  72,  73,  74,  75,  79,  80,  81,  83,  85,  88,
         89,  90,  92,  93,  97,  98,  99, 100, 102, 103, 104, 105, 106,
        108, 110, 112, 115, 117, 118, 120, 122, 123, 127, 129, 130, 132,
        133, 134, 135, 137, 139, 141, 142, 146, 148, 149, 151, 153, 157,
        160, 161, 162, 167, 169, 173, 177, 178, 179, 187, 188, 189, 190,
        194, 197, 199, 200, 202, 203, 206, 207, 209, 210, 211, 212, 213,
        215, 216, 217, 218, 219, 221, 224, 225, 227, 230, 231, 234, 236,
        239, 242, 243, 244, 245, 246, 247, 248, 251, 253, 254, 255, 257,
        258, 259, 263, 265, 267, 269, 271, 272, 273, 276, 279, 281, 285,
        286, 287, 288, 289, 290, 292, 293, 294, 296, 299, 308, 309, 310,
        312, 313, 314, 315, 316, 318, 319, 320, 321, 322, 323, 325, 327,
        328, 331, 332, 336, 337, 338, 339, 341, 342, 343, 344, 345, 346,
        349, 350, 353, 355, 356, 357, 360, 361, 362, 363, 365, 369, 370,
        373, 376, 377, 380, 382, 383, 387, 390, 391, 392, 393, 394, 395,
        396, 397, 398, 399, 400, 401, 402, 403, 405, 408, 412, 414, 416,
        418, 421, 422, 423, 426, 429, 430, 432, 434, 436, 437, 439, 440,
        442, 443, 447, 450, 452, 453, 455, 458, 460, 461, 462, 463, 465,
        471, 472, 473, 474, 476, 477, 478, 482, 484, 486, 488, 491, 494,
        498, 499, 501, 503, 506, 508, 509, 510, 512, 514, 515, 516, 518,
        519, 520, 521, 523, 525, 526, 528, 529, 534, 536, 537, 539, 540,
        543, 544, 548, 551, 553, 554, 556, 558, 559, 561, 562, 565, 567,
        569, 572, 576, 577, 579, 580, 581, 583, 586, 588, 590, 592, 594,
        595, 597, 599, 600, 603, 604, 605, 606, 607, 608, 609, 610, 614,
        615, 616, 617, 619, 620, 621, 623, 624, 627, 628, 632, 635, 636,
        637, 638, 641, 645, 649, 652, 655, 657, 658, 660, 661, 662, 663,
        665, 666, 668, 670, 671, 673, 676, 678, 679, 681, 685, 690, 693,
        696, 698, 699, 701, 703, 704, 705, 706, 707, 708, 710, 712, 713,
        716, 717, 719, 722, 723, 724, 726, 728, 729, 730, 733, 734, 735,
        736, 737, 741, 742, 743, 744, 747, 749, 752, 753, 754, 756, 758,
        759, 761, 763, 767, 769, 770, 771, 779, 782, 784, 785, 789, 794,
        795, 796, 797, 798, 799, 800, 801, 804, 805, 806, 808, 809, 810,
        811, 812, 814, 816, 817, 818, 821, 822, 823, 833, 835, 836, 838,
        842, 843, 845, 847, 848, 854, 856, 858, 860, 861, 862, 864, 865,
        866, 867, 870, 871, 872, 873, 874, 880, 881, 882, 883, 884, 885,
        886, 889, 890]),)

3-5: Sort the data by the 'Age' column in ascending order.

In [ ]:

  Copied!     
 
titanic.sort_values(by='Age')
titanic.sort_values(by='Age')

Contents | Previous (2) Exercise | Next (4) Exercise ¶

03-manipulating data frame

Data Analysis using Python¶

03-Excercise¶

Contents | Previous (2) Exercise | Next (4) Exercise¶

Contents | Previous (2) Exercise | Next (4) Exercise ¶