Ancestral reconstruction
Ancestral Reconstruction¶
Methods¶
- Uniprot에서 목적 단백질을 검색합니다.
- Sequnce 정보를 얻기 위해 BLAST! 하고 결과를 서열을 다운 받습니다.
- Clustral Omega 에서 MSA 실행합니다.
- http://fastml.tau.ac.il 에서 업로드해서 분석 시작
- 결과 파일로 Ancestral reconstruction sequence와 tree를 다운로드합니다.
Results¶
In [24]:
import matplotlib
import matplotlib.pyplot as plt
from Bio import Phylo
def plot_tree(treedata, x, y):
# handle = StringIO(treedata) # parse the newick string
tree = Phylo.read(treedata, "newick")
tree.ladderize()
matplotlib.rc("font", size=7)
# set the size of the figure
fig = plt.figure(figsize=(x, y), dpi=100)
# alternatively
# fig.set_size_inches(10, 20)
axes = fig.add_subplot(1, 1, 1)
axes.axis("off")
Phylo.draw(tree, axes=axes)
# plt.savefig(output_file)
return
In [25]:
# Albumin phyrogenetic tree
plot_tree("tree.newick.txt", 8, 3)
각각의 공통조상의 아미노산 서열¶
FASTA 포멧으로 아래와 같습니다.
>N1
MKWVTLISLLFLFSSATSRNLQRFRRDAEAHKSEIAHRYNDLGEEHFKGLVLITFAQYLQKCPYEELAKLVKEVTDLAQACVADESAADCSKPLHTIFLDKICAVPKLRDTYGAMADCCAKADPERNECFLSHKDSQPDLVPPYQRPEPDVLCQAYQDNKESFLGHYIYEVARRHPFLYAPAILSFAQKFKAVLTECCEEADKGACLTTKLTALREKALIVSVKQRLSCGILQKFGDRVFQAWQLVRLSQKYPKAPFAEVSKLVTDLTKVHKECCHGDMLECMDDRADLTKHMCEHQDTISSKLKECCEKPIVERSHCIVELENDEMPADLPSLVEKFVEDKEVCKSFEEAKDVFLAEFLYEYSRRHPEFSVQLLLRIAKGYESTLEKCCETDNPHECYANAQDELNQLIKEPQDLVKQNCELLQKLGEYNFQNALLIRYTKKMPQVSTPTLVEISKSMTKVGSKCCKLPEAQRMPCAEGYLSVVINELCVLQETTPINENVTKCCSQSYANRRPCFTALGVDETYVPPEFNADTFTFHEDLCTLPEEERKIKKQTLLVNLVKHKPHVTEEQLKTIAGEFTAMVDKCCAAEDKEACFAEEGPKLIEQSKATLGLGA
>N2
MKWVTFISLLFLFSSAYSRGVQRFRRDAEAHKSEIAHRFNDLGEEHFKGLVLITFSQYLQKCPYEEHAKLVKEVTDLAKACVADESAANCDKSLHTIFGDKICAVPSLRDTYGDMADCCEKQEPERNECFLQHKDDKPDLVPPFARPEPDVLCKAFHDNEEAFLGHYLYEVARRHPYFYAPELLYYAQKYKAVLTECCEAADKGACLTPKLDALREKALISSAKQRLRCASLQKFGDRAFKAWALVRLSQKFPKADFAEISKLVTDLTKVHKECCHGDLLECADDRADLAKYMCEHQDTISSKLKECCDKPILEKSHCIAELENDEMPADLPALAEEFVEDKDVCKNYEEAKDVFLGKFLYEYSRRHPDYSVSLLLRLAKAYEATLEKCCATDDPHACYAKVLDEFKPLVEEPQNLVKQNCELFEKLGEYNFQNALLVRYTKKVPQVSTPTLVEISRSLGKVGSKCCKHPEAERMPCAEDYLSVVLNRLCVLHEKTPVSEKVTKCCSESLVNRRPCFSALGVDETYVPKEFNAETFTFHADICTLPETERKIKKQTALVELVKHKPHATEEQLKTVVGEFTALVDKCCAAEDKEACFAEEGPKLVESSKATLGLGA
>N3
MKWVTFISLLFLFSSAYSRGVQRFRRDAEAHKSEIAHRFNDLGEEHFKGLVLIAFSQYLQQCPFEEHVKLVNEVTEFAKTCVADESAANCDKSLHTLFGDKLCTVASLRETYGEMADCCEKQEPERNECFLQHKDDNPDLVPPLVRPEPDAMCTAFHDNEETFLGKYLYEVARRHPYFYAPELLYYAEKYKAVFTECCQAADKAACLTPKLDALREKVLASSAKQRLKCASLQKFGERAFKAWAVARLSQKFPKADFAEISKLVTDLTKVHKECCHGDLLECADDRADLAKYMCENQDSISSKLKECCDKPLLEKSHCIAEVENDEMPADLPALAADFVEDKDVCKNYQEAKDVFLGTFLYEYSRRHPDYSVSLLLRLAKAYEATLEKCCATDDPHACYAKVFDEFKPLVEEPQNLVKQNCELFEKLGEYGFQNALLVRYTKKVPQVSTPTLVEVSRSLGKVGSKCCKHPEAERMPCAEDYLSVVLNRLCVLHEKTPVSEKVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLPETEKQIKKQTALVELVKHKPKATEEQLKTVMGDFAAFVDKCCAAEDKEACFAEEGPKLVASSQAALALGA
>N4
MKWVTFISLLFLFSSAYSRGVQRFRRDAEAHKSEIAHRFNDLGEEHFKGLVLIAFSQYLQQCPFEEHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCEKQEPERNECFLQHKDDNPNLVPPLVRPEPDAMCTAFHDNEETFLGKYLYEVARRHPYFYAPELLYYAEKYKAVFTECCQAADKAACLTPKLDALREKVLASSAKQRLKCASLQKFGERAFKAWAVARLSQKFPKADFAEVSKLVTDLTKVHKECCHGDLLECADDRADLAKYMCENQDSISSKLKECCDKPLLEKSHCIAEVENDEMPADLPALAADFVEDKDVCKNYAEAKDVFLGTFLYEYSRRHPDYSVSLLLRLAKAYEATLEKCCATADPHACYAKVFDEFKPLVEEPQNLVKQNCELFEKLGEYGFQNALLVRYTKKVPQVSTPTLVEVSRSLGKVGSKCCKHPEAERMPCAEDYLSVVLNRLCVLHEKTPVSEKVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLPEKEKQIKKQTALVELVKHKPKATEEQLKTVMGDFAAFVDKCCKAEDKEACFAEEGPKLVASSQAALALGA
>N5
MKWVTFISLLFLFSSAYSRGVQRFRRDAEAHKSEIAHRFNDLGEKHFKGLVLIAFSQYLQQCPFEEHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCEKQEPERNECFLQHKDDNPNLVPPLVRPEPDAMCTAFQENPETFLGKYLYEVARRHPYFYAPELLYYAEKYKAVFTECCQAADKAACLTPKLDALKEKVLVSSAKQRLKCSSLQKFGERAFKAWAVARLSQKFPKADFAEVSKLVTDLTKVHKECCHGDLLECADDRADLAKYMCENQDSISSKLKACCDKPLLQKSHCIAEVENDDMPADLPALAADFVEDKDVCKNYAEAKDVFLGTFLYEYSRRHPDYSVSLLLRLAKTYEATLEKCCAEADPHACYATVFDEFKPLVEEPQNLVKQNCELFEKLGEYGFQNALLVRYTKKAPQVSTPTLVEVSRSLGKVGSKCCKLPEAERLPCAEDYLSVVLNRLCVLHEKTPVSEKVTKCCTESLVERRPCFSALEVDETYVPKEFKAETFTFHADICTLPEKEKQIKKQTALAELVKHKPKATEEQLKTVMGDFAAFVDKCCKAEDKEACFAEEGPKLVASSQAALALGA
>N6
MKWVTFLLLLFVSGSAFSRGVQRFRRDAEAHKSEIAHRYKDLGEKHFKGLVLIAFSQYLQKCPYEEHVKLVQEVTDFAKTCVADESAENCDKSLHTLFGDKLCAIPNLRENYGEMADCCAKQEPERNECFLQHKDDNPNLVPPFQRPEPDAMCTAFQENPETFMGHYLHEVARRHPYFYAPELLYYAEKYNAVLTECCAAADKAACLTPKLDALKEKALVSAVRQRLKCSSMQKFGERAFKAWAVARMSQTFPNADFAEITKLATDLTKVNKECCHGDLLECADDRAELAKYMCENQASISSKLQACCDKPLLQKSHCLAEVEHDDMPADLPALAADFVEDKDVCKNYAEAKDVFLGTFLYEYSRRHPDYSVSLLLRLAKKYEATLEKCCAEADPHACYGTVFDEFKPLVEEPQNLVKTNCELYEKLGEYGFQNAVLVRYTKKAPQVSTPTLVEAARSLGRVGTKCCTLPEAQRLPCVEDYLSAILNRVCVLHEKTPVSEKVTKCCSGSLVERRPCFSALTVDETYVPKEFKAETFTFHADICTLPEKEKQIKKQTALAELVKHKPKATEEQLKTVMGDFAEFVDKCCKAEDKEACFSTEGPKLVARSQEALALGA
>N7
MKWVTFLLLLFVSGSAFSRGVQRFRREAEAHKSEIAHRYKDLGEQHFKGLVLIAFSQYLQKCPYEEHVKLVQEVTDFAKTCVADESAENCDKSLHTLFGDKLCAIPNLRENYGELADCCAKQEPERNECFLQHKDDNPNLVPPFQRPEAEAMCTSFQENPTTFMGHYLHEVARRHPYFYAPELLYYAEKYNEVLTQCCAEADKAACLTPKLDAVKEKALVSAVRQRMKCSSMQKFGERAFKAWAVARMSQTFPNADFAEITKLATDLTKVNKECCHGDLLECADDRAELAKYMCENQATISSKLQACCDKPLLQKSHCLAEVEHDNMPADLPAIAADFVEDKEVCKNYAEAKDVFLGTFLYEYSRRHPDYSVSLLLRLAKKYEATLEKCCAEADPPACYGTVLAEFQPLVEEPKNLVKTNCELYEKLGEYGFQNAVLVRYTQKAPQVSTPTLVEAARNLGRVGTKCCTLPEAQRLPCVEDYLSAILNRVCVLHEKTPVSEKVTKCCSGSLVERRPCFSALTVDETYVPKEFKAETFTFHSDICTLPEKEKQIKKQTALAELVKHKPKATEEQLKTVMGDFAQFVDKCCKAADKDTCFSTEGPNLVARSKEALALGA
>N8
MKWVTFISLLFLFSSAYSRGVQRFRRDAEAHKSEVAHRFKDLGEEHFKGLVLIAFSQYLQQCPFEEHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLVPPLVRPEVDVMCTAFHDNEETFLKKYLYEVARRHPYFYAPELLFFAARYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQKFPKAEFAEVSKLVTDLTKVHTECCHGDLLECADDRADLAKYMCENQDSISSKLKECCDKPLLEKSHCIAEVENDEMPADLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKAYEATLEKCCAAADPHECYAKVFDEFKPLVEEPQNLVKQNCELFEQLGEYKFQNALLVRYTKKVPQVSTPTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNRLCVLHEKTPVSEKVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKEKQIKKQTALVELVKHKPKATKEQLKTVMDDFAAFVEKCCKADDKEACFAEEGPKLVAASQAALALGA
>N9
MKWVTFISLLFLFSSAYSRGVQRFRRDAEAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLVPRLVRPEVDVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTKVHTECCHGDLLECADDRADLAKYICENQDSISSKLKECCEKPLLEKSHCIAEVENDEMPADLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKCCAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVSTPTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSERVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKATKEQLKTVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGLGA
>N10
MKWVTFISLLFLFSSAYSRGVQRFRRDAEAHKSEIAHRFNDLGEEHFKGLVLIAFSQYLQQCPFEEHVKLVNEVTEFAKTCVADESAANCDKSLHTLFGDKLCTVASLRETYGDMADCCEKQEPERNECFLQHKDDNPDLVPPLVRPEPDAMCTAFHDNEQRFLGKYLYEIARRHPYFYAPELLYYAEKYKGVFTECCQAADKAACLTPKIDALREKVLASSAKQRLKCASLQKFGERAFKAWSVARLSQKFPKAEFAEISKLVTDLTKVHKECCHGDLLECADDRADLAKYMCENQDSISSKLKECCDKPLLEKSHCIAEVEKDEMPADLPPLAADFVEDKDVCKNYQEAKDVFLGTFLYEYSRRHPEYSVSLLLRLAKEYEATLEKCCATDDPHACYAKVFDEFKPLVEEPQNLVKQNCELFEKLGEYGFQNALLVRYTKKVPQVSTPTLVEVSRSLGKVGSKCCKHPEAERMPCAEDYLSVVLNRLCVLHEKTPVSEKVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLPETEKQIKKQTALVELLKHKPKATEEQLKTVMGDFAAFVDKCCAAEDKEACFAEEGPKLVASSQAALALGA
>N11
MKWVTFISLLFLFSSAYSRGVQRVRREAEAHKSEIAHRFNDLGEEHFRGLVLVAFSQYLQQCPFEDHVKLVNEVTEFAKACVADESAANCDKSLHTLFGDKLCTVASLRDKYGDMADCCEKQEPERNECFLQHKDDNPGFVPPLVTPEPDAMCTAFHDNEQRFLGKYLYEIARRHPYFYAPELLYYAEKYKGVFTECCQAADKAACLTPKIDALREKVLASSAKERLKCASLQKFGERAFKAWSVARLSQKFPKAEFAEISKLVTDLTKVHKECCHGDLLECADDRADLAKYMCENQDSISTKLKECCDKPVLEKSHCIAEVERDELPADLPPLAADFVEDKEVCKNYQEAKDVFLGTFLYEYSRRHPEYSVSLLLRLAKEYEATLEKCCATDDPPACYAKVFDEFKPLVEEPQNLVKTNCELFEKLGEYGFQNALLVRYTKKVPQVSTPTLVEVSRSLGKVGSKCCKHPEAERMSCAEDYLSVVLNRLCVLHEKTPVSERVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADLCTLPEAEKQIKKQTALVELLKHKPKATEEQLKTVMGDFGAFVDKCCAAEDKEACFAEEGPKLVAAAQAALALGA