2017-09-28 17 views

答えて

1

Bioinformatics ToolboxgenbankreadをMatlabで使用できます。あなたが望むものを実現する方法の例を次に示します。

results = []; 

% unzip data 
gunzip('*.gbff.gz'); 

% process each file 
files = dir('*.gbff'); 
for file = {files.name} 
    data = genbankread(char(file)); 

    % process each file entry 
    for i = 1:size(data, 2) 
    LocusName = ''; 
    Definition = ''; 
    Organism = ''; 
    GenesTotal = NaN; 
    GenesCoding = NaN; 
    RRNAs = ''; 
    TRNAs = NaN; 
    IsolationSource = ''; 
    Country = ''; 

    % copy fields 
    if isfield(data(i), 'LocusName') 
     LocusName = data(i).LocusName; 
    end 
    if isfield(data(i), 'Definition') 
     Definition = data(i).Definition; 
    end 
    if isfield(data(i), 'Source') 
     Organism = data(i).Source; 
    end 

    % parse comments 
    if isfield(data(i), 'Comment') 
     for j = 1:size(data(i).Comment, 1) 
     tokens = regexp(data(i).Comment(j, :), ... 
      '^\s*([^\s].*[^\s])\s*::\s*([^\s].*[^\s])\s*$', 'tokens'); 
     if ~isempty(tokens) 
      switch tokens{1}{1} 
      case 'Genes (total)' 
       GenesTotal = str2double(tokens{1}{2}); 
      case 'Genes (coding)' 
       GenesCoding = str2double(tokens{1}{2}); 
      case 'rRNAs' 
       RRNAs = tokens{1}{2}; 
      case 'tRNAs' 
       TRNAs = str2double(tokens{1}{2}); 
      end 
     end 
     end 
    end 

    % parse features 
    if isfield(data(i), 'Features') 
     Feature = ''; 
     for j = 1:size(data(i).Features, 1) 
     tokens = regexp(data(i).Features(j, :), '^(\w+)', 'tokens'); 
     if isempty(tokens) 
      tokens = regexp(data(i).Features(j, :), ... 
      '^\s+/(\w+)="([^"]+)"', 'tokens'); 
      if ~isempty(tokens) 
      switch Feature 
       case 'source' 
       switch tokens{1}{1} 
        case 'isolation_source' 
        IsolationSource = tokens{1}{2}; 
        case 'country' 
        Country = tokens{1}{2}; 
       end 
      end 
      end 
     else 
      Feature = tokens{1}{1}; 
     end 
     end 
    end 

    % append entries to results 
    results = [results; struct(... 
     'File', char(file), 'LocusName', LocusName, 'Definition', Definition, ... 
     'Organism', Organism, 'GenesTotal', GenesTotal, ... 
     'GenesCoding', GenesCoding, 'RRNAs', RRNAs, 'TRNAs', TRNAs, ... 
     'IsolationSource', IsolationSource, 'Country', Country)]; 
    end 
end 

% data is in variable results 
+0

恐ろしい、ありがとう! – user2861089

+0

'/ isolation_source =" Human "'と '/ country =" Switzerland "'のような変数を結果に追加しようとしましたが、エラーが出ます。フロント?とにかく、他のすべてがうまくいく。ありがとう。 – user2861089

+1

@ user2861089シンプルな解析とフィーチャブロックの抽出を含むようにコードを更新しました –

関連する問題