同样的名为read_1.fa 的fasta文件,里面有若干序列,如:
>@r1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG>@r2NTTNTGATGCGGGCTTGTGGAGTTCAGCCGATCTGACTTATGTCATTACCTATGAAATGTGAGGACGCTATGCCTGTACCAAATCCTACAATGCCGGTGAAAGGTGCCGGGATCACCCTGTGGGTTTAT>@r3ATCGCCCGCAGACACCTTCACGCTGGACTGTTTCGGCTTTTACAGCGTCGCTTCATAATCCTTTTTCGCCGCCGCCATCAGCGTGTTGTAATCCGCCTGCAGGATTTTCCCGTCTTTCNGTGCCTTGNT..........等等
直接看代码:
1 #encoding = utf-8 2 3 """ 4 简介:fasta文件中按id或者seqence长度排序 5 作者:刘自军 6 data:2017年5月17 21:38 7 """ 8 9 import sys 10 11 args = sys.argv12 13 fasta = {}14 with open(args[1]) as f:15 16 for line in f:17 line = line.strip()18 if line.startswith('>'):19 ID = line20 fasta[ID] = ''21 else:22 fasta[ID] += line23 24 if args[2] == 'id':25 fasta = sorted(fasta.items(),key=lambda i:i[0]) #按id排序 #python3中废除类iteritems(),但用items()可以实现同样的效果26 elif args[2] == 'len':27 fasta = sorted(fasta.items(),key=lambda i:len(i[1])) #按每个序列的长度排序28 else:29 fasta = fasta.items()30 31 for k,v in fasta:32 print ('%s\n%s' %(k,v))