CSV¶
Working with CSV in python is very simple
Read¶
Read the DataFrame from a CSV file, specifying that the first row contains the header or None
Specifying that a column is dates
Select only specific columns.
If the first column is not selected, try setting index_col=False
Adding column names, and therefore an index
To skip the comments:
export¶
tab-separated file¶
To work with a tab-separated values file, we use the csv module. The following code shows how to create a file.
import csv
with open('my_data.csv', 'w') as csvfile:
fieldnames = ['SampleID','Age','Treatment', 'Weight'] # Create a list with the column
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter='\t')
writer.writeheader()
writer.writerow({"SampleID":"mouse1", "Age":4, "Treatment":"Control", "Weight":3.2})
writer.writerow({"SampleID":"mouse2", "Age":5, "Treatment":"Control", "Weight":3.6 })
writer.writerow({"SampleID":"mouse3", "Age":4, "Treatment":"Control", "Weight":3.8 })
writer.writerow({"SampleID":"mouse4", "Age":4, "Treatment":"ad libitum", "Weight":3.6 })
writer.writerow({"SampleID":"mouse5", "Age":4, "Treatment":"ad libitum", "Weight":3.7 })
writer.writerow({"SampleID":"mouse6", "Age":4, "Treatment":"ad libitum", "Weight":3.5 })
You can use .read_csv() from the Panda module..
The following code shows how to handle read the file using r mode:
with open('my_data.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter='\t')
for row in reader:
print(f"SampleID:{row['SampleID']}, Age:{row['Age']}, Treatment:{row['Treatment']}, Weight:{row['Weight']}")
blocksize¶
Dask is a library designed for parallel computing, you can manage large datasets more efficiently using block sizes. It allows you to work with large data by breaking it up into smaller, called partitions. Each partition can be processed in parallel.