PythonでFasta Moonlightタンパク質配列を抽出

PythonでMoonlighting Protein Database（www.moonlightingproteins.org/results.php?search_text=）のアミノ酸配列を持つFASTAファイルを抽出したいのですが、これは反復プロセスなので、私はむしろ手作業よりもプログラミングする方法を学びたいと思いますが、私たちは2016年です。問題は、私が新人プログラマーなのでコードを書く方法を知らないことです。基本的な擬似コードは次のようになります！事前にPythonでFasta Moonlightタンパク質配列を抽出

for protein_name in site: www.moonlightingproteins.org/results.php?search_text=: 

     go to the uniprot option 

     download the fasta file 

     store it in a .txt file inside a given folder

おかげ

出典

2016-09-20 Manolo Flores

私はグーグルで「ウェブの詮索」やそれに類似した用語を使用して、ちょっとしたことを思いつくことをお勧めします。今あなたの質問は抽象的すぎます。 – Swier

私は強くデータベースの作成者に依頼することをお勧めから。：

Iは、バイオインフォマティクスを用いアミノ酸配列または構造を分析するために、プロジェクトにMoonProtデータベースを使用したいです。

の配列解析用にMoonProtデータベースを使用したい場合や、月光タンパク質の構造に興味がある場合は、[email protected]までご連絡ください。

興味深いものがあると思われる場合、どのように論文や論文に引用しますか？「著者らの同意なしに公開ウェブページからシーケンスを掻き集めました」元の研究者に信用を与える方がはるかに良い。

戻ってあなたの元の質問にscraping

の入門ですが。

import requests 
from lxml import html 
#let's download one protein at a time, change 3 to any other number 
page = requests.get('http://www.moonlightingproteins.org/detail.php?id=3') 
#convert the html document to something we can parse in Python 
tree = html.fromstring(page.content) 
#get all table cells 
cells = tree.xpath('//td') 

for i, cell in enumerate(cells): 
    if cell.text: 
     #if we get something which looks like a FASTA sequence, print it 
     if cell.text.startswith('>'): 
      print(cell.text) 
    #if we find a table cell which has UniProt in it 
    #let's print the link from the next cell 
    if 'UniProt' in cell.text_content(): 
     if cells[i + 1].find('a') is not None and 'href' in cells[i + 1].find('a').attrib: 
      print(cells[i + 1].find('a').attrib['href'])

出典

2016-09-20 21:17:04

PythonでFasta Moonlightタンパク質配列を抽出

答えて

関連する問題