This function is internally called by runClustering
processClusters(map, clusters, out_dir, write_fastas)
map | A data frame with sequences as row names and sequence identifiers in first column. Can be generated by createMap |
---|---|
clusters | The path to CD-HIT.fa |
out_dir | Directory of CD-HIT result file and where generated files will be saved |
write_fastas | Boolean that indicates whether a fasta file will be generated for each cluster |
A data frame with the columns 'qseqid', 'cl_id' and 'sequences' containing the sequence identifier, the sequence and the assigned cluster identifier.