Text files¶
Python has strong function to work with file and folders.
Manipulating¶
To access a text file in Python, you need to: 1) open the file, 2) read\write it, and 3) close the file.
Write¶
The following code creates a new file and writes text to it; if the file already exists, it will be overwritten
As you can see here we use 'w' mode, which means Write-only. Overwrites file if it exists; creates new if it doesn’t.Lines can be added easily to file.
Table can be added to file as below:
names = ['Emily', 'Bob', 'Charlie']
ages = [23, 45, 67]
f = open('my_file.txt', 'w')
for name, age in zip(names, ages):
line = f'{name};{age}\n'
f.write(line)
f.close()
Append¶
To append new line use a mode, you can use .write() or writelines():
test = open("example.txt", "a")
L = ["This is just example Delhi \n", "in Python \n"]
test.writelines(L)
test.close()
Read¶
The entire file can be viewed using the following code; as you can see, we use the 'r' mode.
The entire file can be read in a file
You can access each line of the file element by element as follows:
with open("example.txt", "r") as handle:
for line in handle:
print(line.strip())
with open("example.txt", "r") as handle:
print(handle.read())
If the columns are separated using ;, the file can be separated as
for line in open('my_file.txt'):
columns = line.strip().split(';')
first = columns[0]
last = columns[1]
age = int(columns[2])
If the file in not in the directory folder, you can recall it
Online file¶
A simple way to access to a file online is to use urllib module,
import urllib.request
url = "https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2020-Bioinformatics_Prerequisites_Workshop/master/Intro_to_Python/example_data.fastq.gz"
urllib.request.urlretrieve(url, 'example_data.fastq.gz')
import os
# to check whether file is in the working path
os.path.isfile("example_data.fastq.gz")
# to check the size of file
os.path.getsize("example_data.fastq.gz")
zip file¶
To handle the zip file, use the gzip module, as shown below.
import gzip
handle = gzip.open("example_data.fastq.gz", 'rt')
l = next(handle)
print(l)
handle.close()
table¶
If the data is saved as table, use pd.read_table,