바이오파이썬 예제코드
아래 내용은 책 한주현, 바이오파이썬으로 만나는 생물정보학, 비제이퍼블릭, 2019을 읽고 실습한 코드를 정리한 것입니다.
In [1]:
import Bio
print(Bio.__doc__)
DNA 서열 다루기¶
In [2]:
from Bio import Entrez
from Bio import SeqIO
Entrez.email = "A.N.Other@example.com"
with Entrez.efetch(
db="nucleotide", rettype="fasta", retmode="text", id="1490011893"
) as handle:
seq_record = SeqIO.read(handle, "fasta")
seq_record
Out[2]:
서열 개수 세기¶
In [3]:
taq = seq_record.seq
len(taq)
Out[3]:
GC_content 계산하기¶
In [4]:
from Bio.SeqUtils import GC
GC(taq)
Out[4]:
대소문자 변환하기¶
In [5]:
taq.upper()
Out[5]:
In [6]:
taq.lower()
Out[6]:
DNA 서열 Translation, Transcription¶
In [7]:
taq.transcribe()
Out[7]:
In [8]:
taq.translate()
Out[8]:
Molecular weight 구하기¶
In [9]:
from Bio.SeqUtils import molecular_weight
molecular_weight(taq)
Out[9]:
In [10]:
molecular_weight(taq.translate())
Out[10]:
코돈 테이블¶
In [11]:
from Bio.Data import CodonTable
print(CodonTable.unambiguous_dna_by_name["Standard"])
Tm 값 계산하기¶
In [13]:
from Bio.SeqUtils import MeltingTemp
from Bio.Seq import Seq
primer_DNA = Seq("ATGGAGGAGATGCTGCCCCTCT")
MeltingTemp.Tm_Wallace(primer_DNA)
Out[13]:
아미노산 서열 약자 변환하기¶
In [14]:
from Bio.SeqUtils import seq1, seq3
seq3("MEEMLPLFEPKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGFAKSLLKA")
Out[14]:
In [15]:
seq1(
"MetGluGluMetLeuProLeuPheGluProLysGlyArgValLeuLeuValAspGlyHisHisLeuAlaTyrArgThrPheHisAlaLeuLysGlyLeuThrThrSerArgGlyGluProValGlnAlaValTyrGlyPheAlaLysSerLeuLeuLysAla"
)
Out[15]:
Weblogo 그리기¶
In [16]:
from Bio import AlignIO
from Bio.motifs import Motif
from Bio import motifs
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
from IPython.display import Image
alignment = AlignIO.read("./data/HBA.aln", "clustal")
instance = []
for record in alignment:
s = Seq(str(record.seq), IUPAC.protein)
instance.append(s)
m = motifs.create(instance)
Motif.weblogo(m, "HBA_logo.png")
Image("HBA_logo.png")
Out[16]:
계통수(phylogenetic tree) 그리기¶
In [17]:
%matplotlib inline
from Bio import Phylo
tree = Phylo.read("./data/HBA.newick", "newick")
Phylo.draw(tree)
Entrez 데이터 베이스 검색¶
In [3]:
from Bio import Entrez
Entrez.email = "your@email.com"
handle = Entrez.esearch(db="pubmed", term="machine learning")
record = Entrez.read(handle)
print("Pubmed에 machine learning를 검색하면 총 {}개의 결과".format(record["Count"]))
KEGG API 사용¶
In [4]:
from Bio.KEGG import REST
human_pathways = REST.kegg_list("pathway", "hsa").read()
pathways = []
for line in human_pathways.rstrip().split("\n"):
entry, description = line.split("\t")
if "repair" in description.lower():
pathways.append(entry)
print(entry, description)
print(pathways)
In [8]:
genes = []
for pathway in pathways:
pathway_file = REST.kegg_get(pathway).read()
current_section = None
for line in pathway_file.rstrip().split("\n"):
section = line[:12].strip()
if not section == "":
current_section = section
if current_section == "GENE":
gene_identifiers, gene_description = line[12:].split("; ")
gene_id, gene_symbol = gene_identifiers.split()
if gene_symbol not in genes:
genes.append(gene_symbol)
print("pathway에 연관된 유전자는 {} 이다".format(",".join(genes)))