This function is internally called by runClustering

processClusters(map, clusters, out_dir, write_fastas)

Arguments

map

A data frame with sequences as row names and sequence identifiers in first column. Can be generated by createMap

clusters

The path to CD-HIT.fa

out_dir

Directory of CD-HIT result file and where generated files will be saved

write_fastas

Boolean that indicates whether a fasta file will be generated for each cluster

Value

A data frame with the columns 'qseqid', 'cl_id' and 'sequences' containing the sequence identifier, the sequence and the assigned cluster identifier.