3-1: reconsider the Titanic dataset, and select the column containing the names of passengers,
In [37]:
Copied!
import pandas as pd
import numpy as np
titanic = pd.read_csv('../data/titanic.csv', sep=",")
titanic.Name
titanic['Name']
titanic.iloc[:,3]
import pandas as pd import numpy as np titanic = pd.read_csv('../data/titanic.csv', sep=",") titanic.Name titanic['Name'] titanic.iloc[:,3]
Out[37]:
0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
...
886 Montvila, Rev. Juozas
887 Graham, Miss. Margaret Edith
888 Johnston, Miss. Catherine Helen "Carrie"
889 Behr, Mr. Karl Howell
890 Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object 3-2: Categorize the ages of passengers based on the age thresholds of 20 and 50, you can use the Pandas cut function.
In [4]:
Copied!
C1=titanic.Age<=20
C2=titanic.Age>=50
titanic.loc[C1,]
titanic.loc[~C1&~C2,]
titanic.loc[C2]
categories = pd.cut(titanic['Age'], bins=[0, 20, 50, float('inf')], labels=['Under 20', '20-50', 'Over 50'])
categories
C1=titanic.Age<=20 C2=titanic.Age>=50 titanic.loc[C1,] titanic.loc[~C1&~C2,] titanic.loc[C2] categories = pd.cut(titanic['Age'], bins=[0, 20, 50, float('inf')], labels=['Under 20', '20-50', 'Over 50']) categories
Out[4]:
0 20-50
1 20-50
2 20-50
3 20-50
4 20-50
...
886 20-50
887 Under 20
888 NaN
889 20-50
890 20-50
Name: Age, Length: 891, dtype: category
Categories (3, object): ['Under 20' < '20-50' < 'Over 50'] 3-3: Create a column to save the classifications (L,M,H) of age in it.
In [15]:
Copied!
titanic['Agelev'] = pd.cut(titanic['Age'], bins=[0, 20, 50, float('inf')], labels=['L', 'M', 'H'])
titanic
titanic['Agelev'] = pd.cut(titanic['Age'], bins=[0, 20, 50, float('inf')], labels=['L', 'M', 'H']) titanic
Out[15]:
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Agelav | Agelev | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | M | M |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | M | M |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | M | M |
| 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S | M | M |
| 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S | M | M |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S | M | M |
| 887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S | L | L |
| 888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S | NaN | NaN |
| 889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C | M | M |
| 890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q | M | M |
891 rows × 14 columns
3-4: Select objects (rows) where the age is between 20 and 50,
In [17]:
Copied!
np.where(titanic.loc[:,'Agelev'].isin(['M']))
np.where(titanic.loc[:,'Agelev'].isin(['M']))
Out[17]:
(array([ 0, 1, 2, 3, 4, 8, 13, 18, 20, 21, 23, 25, 30,
34, 35, 37, 40, 41, 51, 52, 53, 56, 57, 60, 61, 62,
66, 69, 70, 72, 73, 74, 75, 79, 80, 81, 83, 85, 88,
89, 90, 92, 93, 97, 98, 99, 100, 102, 103, 104, 105, 106,
108, 110, 112, 115, 117, 118, 120, 122, 123, 127, 129, 130, 132,
133, 134, 135, 137, 139, 141, 142, 146, 148, 149, 151, 153, 157,
160, 161, 162, 167, 169, 173, 177, 178, 179, 187, 188, 189, 190,
194, 197, 199, 200, 202, 203, 206, 207, 209, 210, 211, 212, 213,
215, 216, 217, 218, 219, 221, 224, 225, 227, 230, 231, 234, 236,
239, 242, 243, 244, 245, 246, 247, 248, 251, 253, 254, 255, 257,
258, 259, 263, 265, 267, 269, 271, 272, 273, 276, 279, 281, 285,
286, 287, 288, 289, 290, 292, 293, 294, 296, 299, 308, 309, 310,
312, 313, 314, 315, 316, 318, 319, 320, 321, 322, 323, 325, 327,
328, 331, 332, 336, 337, 338, 339, 341, 342, 343, 344, 345, 346,
349, 350, 353, 355, 356, 357, 360, 361, 362, 363, 365, 369, 370,
373, 376, 377, 380, 382, 383, 387, 390, 391, 392, 393, 394, 395,
396, 397, 398, 399, 400, 401, 402, 403, 405, 408, 412, 414, 416,
418, 421, 422, 423, 426, 429, 430, 432, 434, 436, 437, 439, 440,
442, 443, 447, 450, 452, 453, 455, 458, 460, 461, 462, 463, 465,
471, 472, 473, 474, 476, 477, 478, 482, 484, 486, 488, 491, 494,
498, 499, 501, 503, 506, 508, 509, 510, 512, 514, 515, 516, 518,
519, 520, 521, 523, 525, 526, 528, 529, 534, 536, 537, 539, 540,
543, 544, 548, 551, 553, 554, 556, 558, 559, 561, 562, 565, 567,
569, 572, 576, 577, 579, 580, 581, 583, 586, 588, 590, 592, 594,
595, 597, 599, 600, 603, 604, 605, 606, 607, 608, 609, 610, 614,
615, 616, 617, 619, 620, 621, 623, 624, 627, 628, 632, 635, 636,
637, 638, 641, 645, 649, 652, 655, 657, 658, 660, 661, 662, 663,
665, 666, 668, 670, 671, 673, 676, 678, 679, 681, 685, 690, 693,
696, 698, 699, 701, 703, 704, 705, 706, 707, 708, 710, 712, 713,
716, 717, 719, 722, 723, 724, 726, 728, 729, 730, 733, 734, 735,
736, 737, 741, 742, 743, 744, 747, 749, 752, 753, 754, 756, 758,
759, 761, 763, 767, 769, 770, 771, 779, 782, 784, 785, 789, 794,
795, 796, 797, 798, 799, 800, 801, 804, 805, 806, 808, 809, 810,
811, 812, 814, 816, 817, 818, 821, 822, 823, 833, 835, 836, 838,
842, 843, 845, 847, 848, 854, 856, 858, 860, 861, 862, 864, 865,
866, 867, 870, 871, 872, 873, 874, 880, 881, 882, 883, 884, 885,
886, 889, 890]),) 3-5: Sort the data by the 'Age' column in ascending order.
In [ ]:
Copied!
titanic.sort_values(by='Age')
titanic.sort_values(by='Age')