はあなたの問題を解決するのに役立つ可能性がコメントした例である:
# Create some way of associating output files with links
# The output file names will be built from the keys: "chain_{key}.gz"
# One could probably directly use output file names as keys
links = {
"1" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
"2" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
"3" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}
rule download:
output:
# We inform snakemake that this rule will generate
# the following list of files:
# ["outdir/chain_1.gz", "outdir/chain_2.gz", "outdir/chain_3.gz"]
# Note that we don't need to use {output} in the "run" or "shell" part.
# This list will be used if we later add rules
# that use the files generated by the present rule.
expand("outdir/chain_{n}.gz", n=links.keys())
run:
# The sort is there to ensure the files are in the 1, 2, 3 order.
# We could use an OrderedDict if we wanted an arbitrary order.
for link_num in sorted(links.keys()):
shell("wget {link} -O outdir/chain_{n}.gz".format(link=links[link_num], n=link_num))
そしてここでは、ダウンロードしたファイルの任意の名前を使用し、(ビット人為が)output
を使用して行うための別の方法、次のとおりです。
links = [
("foo_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz"),
("bar_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz"),
("baz_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz")]
rule download:
output:
# We inform snakemake that this rule will generate
# the following list of files:
# ["outdir/foo_chain.gz", "outdir/bar_chain.gz", "outdir/baz_chain.gz"]
["outdir/{f}".format(f=filename) for (filename, _) in links]
run:
for i in range(len(links)):
# output is a list, so we can access its items by index
shell("wget {link} -O {chain_file}".format(
link=links[i][1], chain_file=output[i]))
# using a direct loop over the pairs (filename, link)
# could be considered "cleaner"
# for (filename, link) in links:
# shell("wget {link} -0 outdir/{filename}".format(
# link=link, filename=filename))
snakemake -j 3
を使用して、3件のダウンロードを並行して行うことができる。例:
# To use os.path.join,
# which is more robust than manually writing the separator.
import os
# Association between output files and source links
links = {
"foo_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
"bar_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
"baz_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}
# Make this association accessible via a function of wildcards
def chainfile2link(wildcards):
return links[wildcards.chainfile]
# First rule will drive the rest of the workflow
rule all:
input:
# expand generates the list of the final files we want
expand(os.path.join("outdir", "{chainfile}"), chainfile=links.keys())
rule download:
output:
# We inform snakemake what this rule will generate
os.path.join("outdir", "{chainfile}")
params:
# using a function of wildcards in params
link = chainfile2link,
shell:
"""
wget {params.link} -O {output}
"""
'-j'オプションを指定してsnakemakeを実行しないと、指定した時間に1つのルールインスタンスしか実行されません。ファイルを正確な順序でダウンロードする必要がありますか? – bli
また、入力のみを持つ最初の 'all'ルールを使うのが一般的です。そのためにexpandを使うことができます。これは残りのワークフローを推進します。 – bli
ダウンロードしたファイルの名前を決定するために使用できるリンクの名前にパターンがありますか? Snakemakeはファイル名の規則性を保つことを目的としています。 – bli