Primer¶

Here and a brief of Useful functions of python core, Numpy and Panda are presented.

Libraries¶


`pip install numpy`	Install library
`pip install git+https://github.com/mwaskom/seaborn.git`	Install from GitHub
`import numpy`	load library

starting¶

By putting # infront line, Python ignores running the rest.


#	`x=1 # Python ignore`
`import os`
`os.getcwd()`	See the working directory
`os.chdir()`	Change the working directory

Mathematical operations¶


`1 + 2`	Addition
`1 - 2`	Subtraction
`1 / 2`	Division
`1 * 2`	Multiplication
`1 ** 2`	Power
`x += 1`	Assign the value of x + 1 to x
`x -= 1`	Assign the value of x - 1 to x

Built-in Constants¶


`None`	Absence of a value
`False`	The bool type false
`True`	The bool type true

Type¶


`str()`	Convert the string
`int()`	Convert the integer
`float()`	Convert the floating
`bool()`	Convert the boolen

Lists¶


`l = list()`	Assign an empty list to l
`l = [3,2]`	Create list and assgn 3 and 2 to it
`l[0]`	Return the first value
`l[-1]`	Return the last value
`l[-4:]`	Return that last 4 items.
`l[1:4]`	Return subset containing the second till fourth values
`l[1::3]`	Return every third items starting from l[1]
`l.append(1)`	Append the value 1 to the end of l
`l+[1]`	Append the value 1 to the end of l
`l.sort()`	Sort l and replace original with it
`l.reverse()`	Sort reversely the items in l
`l.remove(a)`	Removes the first item equals to a.
`l.pop(2)`	Restun the second item and drop it from l
`[i for i in range(1,100) if i%2==0]`	Generate list of even number between 1 and 100
`l= [1, "", 3]`	Creat a list with missing value.

Dictionary¶


`d={"weight":2.4, "height":15}`	Creat a dictionary with keys "weight" and "height", and their corresponding values of 2.4, 15
`d["weight"]`	Return values corrsponding "weight"
`d.keys()`	Return the keys from d
`d.values()`	Return the values from d
`d.items()`	Return (key, value) pairs from d

Built-in function¶


`len(x)`	Return the number of elements in x
`min(x)`	Return the min of the values of x
`sum(x)`	Return the sum of the values of x
`type(x)`	Return type of the values of x
`range(3,10)`	Generate a series of number from a number (3) to another number (10) with specific increment

User Function¶


`def name (argument):` `script` `return output`	Create a function
`name = lambda arguments: script`	Create one line function
`if state1:` `script1` `elif x < 0:` `script2` `else:` `script3`	Test state1 and state2, and run script corresponding to the correct statement, otherwise run script3

Iterates¶


`for value in obj:` `script`	Iterate the code for each value in obj
`while cond:` `script`	Run the code until reach to condion

Boolean Comparisons¶


`x == 2`	Test whether x is equal to 2
`x != 2`	Test whether x is not equal to 2
`x < 2`	Test whether x is less than 2
`x <= 2`	Test whether x is less than or equal to 2
`x > 2`	Test whether x is greater than 2
`x >= 2`	Test whether x is greater than or equal to 2
`(x == 2) or (y == 1)`	Test whether x is equal to 5 or y is equal to 1
`(x == 2) \| (y == 1)`	Test whether x is equal to 5 or y is equal to 1
`(x == 2) and (y == 1)`	Test whether x is equal to 5 and y is equal to 1
`(x == 2) & (y == 1)`	Test whether x is equal to 5 and y is equal to 1
`3 in l`	Checks whether the value 3 exists in the l

Numpy¶

Numpy (NUMerical PYthon) provides very useful arrays structure to work with data.


`import numpy as np`
`pips install numpy`
`python3 -m pip install --upgrade numpy`

Arrays¶


`np.array([1,2,3])`	One dimensional array
`np.array([(1,2,3),(4,5,6)])`	Two dimensional array
`arr[i]`	The ith element
`arr[i:]`	The ith row
`arr[i][j]`	The ith and jth element, the same as `arr[i,j]`
`np.full((2,1),2.2)`	2x1 array with all values 2.2
`np.linspace(0, 1, 10)`
`np.eye(3)`	A diagonal array of size 3x3 (Identity matrix)
`np.zeros(3)`	An array of length 3 with all values 0
`np.ones((4,2))`	An array of size 4x2 with all values 1
`np.arange(1,14,4)`	An 1D array from 1 to 14 with step 4

Random¶


`np.random.rand(4,2)`	Generates a 4x2 array of random number from uniform
`np.random.rand(4,2)`	Generates a 4x2 array of random number from standard normal
`np.random.randint(low=1,high=20, size=(2,3))`	Generates a 2x3 array of random ints between 1–20
`np.random.choice(arr,size=s,replace=True,p=pr)`	Resamples of size s from arr acording probability pr

O\S¶


`np.savetxt('file.txt',arr,delimiter=' ')`	Writes to a text file
`np.savetxt('file.csv',arr,delimiter=',')`	Writes to a CSV file
`np.loadtxt('file.txt')`	Loads from a text file
`np.genfromtxt('file.csv',delimiter=',')`	Loads from a CSV file
`np.save('file_of_arr.npy ', arr)`	Saves array into a file
`np.savez('file_of_arr.npz', arr1, arr2)`	Saves array into a file
`np.load('my_array.npy')`	Loads arrays

Inspecting arrays¶


`arr.dtype`	Returns type of elements in array
`arr.size`	Returns number of elements in array
`len(arr)`	Length of array
`arr.shape`	Returns dimensions of arr
`arr.astype(dtype)`	Convert arr elements to type dtype
`arr.tolist()`	Convert arr to a Python list

Sorting¶


`arr.sort()`	Sort elements of arr
`arr.sort(axis=0)`	Sorts elements of axis=0 of arr

Reshaping¶


`arr.reshape(4,3)`	Reshapes arr to 4x3 without changing data
`arr.resize((4,3))`	Changes arr shape to 4x3 and fills new values with 0
`arr.T`	Transposes arr

Concatenate¶


`np.concatenate(arr1,arr2,axis=0)`	concatenate arr2 to arr1 along the axis
`np.hstack((arr1,arr2))`	Stack arrays horizontally (column wise)
`np.vstack((arr1,arr2))`	Stack arrays vertically (row wise)

Copying¶


`arr2 = arr1.view()`	Create a view of the array with the same data
`np.copy(arr)`	Create a copy of aar
`arr2 = arr1.copy()`	Create a deep copy of the array

Adding/Removing Elements¶


`np.append(arr1,arr2)`	Append arr2 to arr1
`np.insert(arr, 1, 10)`	Insert 10 on index 1 items
`np.delete(a,2)`	Delete element on index 1 from array

Indexing¶


`arr[2]`	Returns the element at index 2
`arr[2]=3`	Assigns 3 to the element on index 2
`arr[2,3]`	Returns the array element on index [2,3]
`arr[2,3]=10`	Assigns 3 to the element on index [2,3]

Slicing¶


`arr[:2]`	Returns the elements at indices 0,1
`arr[2:4]`	Returns the elements at indices 2,3
`arr[0:2,3]`	Returns the elements on rows 0,1 at column 3
`arr[:,1]`	Returns the elements on column 2
`arr[[1,2],[2,3]]`	Returns the elements at indices [1,3] and [2,3]

Subsetting¶


`arr<2`	Returns a boolen array, True for arr<2 and False for the rest
`(arr1<2) & (arr2>3)`	Returns a boolen array, True for (arr1<3) & (arr2>5) and False for the rest
`arr[arr<2]`	Select array with elements smaller than 2
`arr[~(arr<2)]`	Select array with elements not smaller than 2

Vector Math¶


`np.add(arr1,arr2)`	Elementwise add arr2 to arr1
`np.subtract(arr1,arr2)`	Elementwise subtract arr2 from arr1
`np.multiply(arr1,arr2)`	Elementwise multiply arr1 by arr2
`np.divide(arr1,arr2)`	Elementwise divide arr1 by arr2
`np.power(arr1,arr2)`	Elementwise raise arr1 raised to the power of arr2
`np.array_equal(arr1,arr2)`	Returns True if the arrays have the same elements and shape
`np.sqrt(arr)`	Square root of each element in the array
`np.sin(arr)`	Sine of each element in the array
`np.log(arr)`	Natural log of each element in the array
`np.abs(arr)`	Absolute value of each element in the array
`np.ceil(arr)`	Rounds up to the nearest int
`np.floor(arr)`	Rounds down to the nearest int
`np.round(arr)`	Rounds to the nearest int

Aggregate Functions¶


`arr.min()`	Returns minimum value of arr
`arr.max(axis=0)`	Returns maximum value of specific axis
`np.mean(arr,axis=0)`	Returns mean of specific axis
`np.median(arr,axis=0)`	Returns median of specific axis
`arr.sum()`	Returns sum of arr
`np.var(arr)`	Returns the variance of array
`np.std(arr,axis=1)`	Returns the standard deviation of specific axis
`np.quantile(arr,q=(q1,q2,..), axis=1)`	Returns the (q1,q2, ....) quantiles of specific axis
`arr.corrcoef()`	Returns correlation coefficient of array

Pandas¶

Pandas is built for working with data set.


`import pandas as pd`
`pip3 install pandas`
`python3 -m pip install --upgrade pandas`

Create Data frame and series¶


`pd.DataFrame(matrix, column=)`	Create data frame from matrix.
`pd.DataFrame(dict)`	Create data frame from a dict, keys would be used as the name of columns
`pd.Series(list)`	Create a series from a list
`df.index = pd.date_range('2000/1/1', periods=df.shape[0])`	Add a date index to the data frame

I/O¶


`pd.read_csv(filename)`	Load CSV file
`pd.read_table(filename)`	Load from a delimited text file
`pd.read_excel(filename)`	Load from an Excel file
`pd.read_sql(query, connection_object)`	Load from a SQL table/database
`pd.read_json(json_string)`	Load from Read JSON formatted file s
`pd.read_html(url)`	Create a data from from an html URL
`pd.read_clipboard()`	Create a data frame from the contents of your clipboard
`df.to_csv(filename)`	Save df as a CSV file
`df.to_excel(filename)`	Save df as an Excel file
`df.to_sql(table_name, connection_object)`	Save df to a SQL table
`df.to_json(filename)`	Save df as a file in JSON format

Inspecting arrays¶


`arr.dtype`	Returns type of elements in array
`arr.size`	Returns number of elements in array
`len(arr)`	Length of array
`arr.shape`	Returns dimensions of arr
`arr.astype(dtype)`	Convert arr elements to type dtype
`arr.tolist()`	Convert arr to a Python list
`arr.value_counts(dropna=False)`	View unique values and counts
`df.head(l)`	Return the first l rows of the DataFrame
`df.tail(l)`	Return the last l rows of the DataFrame
`df.shape`	Return the number of rows and columns
`df.info()`	Return index, datatype, and memory information
`df.describe()`	Return the summary statistics of numerical columns

Sorting¶


`df.sort_values(col)`	Sort data frame values by col in ascending order
`df.sort_values(col,ascending=False)`	Sort data frame values by col in descending order
`df.sort_values([col1,col2],ascending=[True,False])`	data frame values by col1 in ascending order then col2 in descending order

Adding/Removing Elements¶


`df.columns = ['name1','name2']`	Add new name to columns
`df.rename(columns={'old_name': 'new_ name'})`	Rename columns
`df.set_index('colu')`	Change the index to the given column
`s.replace([1,2],['two','one'])`	Replace 1 and 2 with 'two' and 'one', respectively

Missing¶


`pd.isnull()`	Find the null values, True for the null
`pd.notnull()`	Opposite to pd.isnull()
`df.dropna()`	Drop all rows that contain null values
`df.dropna(axis=1)`	Drop all columns that contain null values
`df.dropna(axis=1,thresh=n)`	Keep only the column with at least n non null values
`df.fillna(x)`	Replace all null values with x
`df.fillna([‘A’:0,‘B’:0])`	Replace all null values in column ‘A’, and ‘B’, with 0, 1 respectively.
`s.fillna(s.mean())`	Replace all null values with the mean or any other statistics you define

Indexing¶


`df[col]`	Returns column with label col as Series
`df[[col1, col2]]`	Returns columns corresponding col1, col2
`df.iloc[i,:]`	The (i-1)th row
`df.iloc[i,j]`	The (i-1)th and (j-1)th element
`s.iloc[i]`	Return elements in position (i-1) by position
`s.loc['index']`	Return elements correspondingthe index

Subsetting¶


`df[df[col] < 1]`	Return element less than 1 in column col
`df[(df[col] > 0.5) & (df[col] < 1)]`	Return 0.5< element < 1 in column col

Group by¶


`df.groupby(col)`	Group data frame based on col.
`df.groupby(col).mean()`	Calculate mean after grouping based on col.
`df.groupby(col1).agg(fun)`	Group the data frame based on col and run function on it.
`df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean)`	Create a pivot table using index of col1 and calculates the mean (or any other function) of col2 and col3

Apply¶


`df.apply(fun)`	Apply the function fun across each column
`df.apply(np.max,axis=1)`	Apply the function fun across each row

Join/Combine¶


`df1.append(df2)`	Add the rows in df1 to the end of df2 (columns should be identical)
`pd.concat([df1, df2],axis=1)`	Add the columns in df1 to the end of df2 (rows index should be identical)

Aggregate Functions¶


`df.describe()`	Summary statistics for numerical columns
`df.count()`	Returns the number of non-null values in each DataFrame column
`df.min()`	Returns the minimum in each column
`df.max()`	Returns the maximum in each column
`df.mean()`	Returns the mean in each column
`df.median()`	Returns the median in each column
`df.std()`	Returns the standard deviation in each column
`df.corr()`	Returns the correlation between columns

Datetime¶

Datetime is built for working with date and tim.


`import datetime as dt`
`pip3 install datetime`
`python3 -m pip install --upgrade datetime`


`now = dt.datetime.now()`	Assigns datetime object representing the current time to now
`wks4 = dt.datetime.timedelta(weeks=4)`	Assigns a timedelta object representing a timespan of 4 weeks to wks4
`now - wks4`	Returns a datetime object representing the time 4 weeks prior to now
`newyear_2020 = dt.datetime(year=2020, month=12, day=31)`	Assigns a datetime object representing December 25, 2020 to newyear_2020
`newyear_2020.strftime("%A, %b %d, %Y")`	Returns "Thursday, Dec 31, 2020"
`dt.datetime.strptime('Dec 31, 2020',"%b %d, %Y")`	Returns a datetime object representing December 31, 2020

⬆ back to top¶

References¶

[sw] https://swcarpentry.github.io/sql-novice-survey/
[dpo] https://docs.python.org/3.7/library/sqlite3.html
[sw2] https://swcarpentry.github.io/sql-novice-survey/10-prog/index.html
[gc] https://github.com/CoreyMSchafer/code_snippets/tree/master/Python-SQLite
[dor] https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html
[wdc] https://www.datacamp.com/community/tutorials/sqlite-in-r
[sw3] https://swcarpentry.github.io/sql-novice-survey/11-prog-R/index.html
[w3] https://www.w3schools.com/sql/sql_create_table.asp
[wt] https://www.techonthenet.com/sql/index.php
[qlt] https://www.sqlitetutorial.net

Primer¶

Libraries¶

starting¶

Mathematical operations¶

Built-in Constants¶

Type¶

Lists¶

Dictionary¶

Built-in function¶

User Function¶

Iterates¶

Boolean Comparisons¶

Numpy¶

Arrays¶

Random¶

O\S¶

Inspecting arrays¶

Sorting¶

Reshaping¶

Concatenate¶

Copying¶

Adding/Removing Elements¶

Indexing¶

Slicing¶

Subsetting¶

Vector Math¶

Aggregate Functions¶

Pandas¶

Create Data frame and series¶

I/O¶

Inspecting arrays¶

Sorting¶

Adding/Removing Elements¶

Missing¶

Indexing¶

Subsetting¶

Group by¶

Apply¶

Join/Combine¶

Aggregate Functions¶

Datetime¶

⬆ back to top¶

References¶

License¶