运行第一个WDL脚本

二代测序数据处理过程中,比对是最为基本的操作之一。下面将向读者展示如何通过WDL脚本运行一个bwa比对作业输出bam比对结果文件。

1. 流程镜像及测试数据获取

  • 镜像获取 极道为用户测试提供了相应的测试数据和Docker镜像资源。用户在Linux Shell 界面下,可通过以下命令获取测试所需镜像: 在镜像获取前,需要对Docker 源进行配置,运行以下命令,进行源更新:
    wd@there:~$ sudo echo '{ "insecure-registries":["107.150.123.203:8280"] }' > /etc/docker/daemon.json
    wd@there:~$ sudo systemctl restart docker
    
    配置后,可以进行镜像拉取,执行以下命令:
wd@there:~$ sudo docker pull 107.150.123.203:8280/library/bwa:0.7.16a
  • 测试数据获取

用户可以通过以下命令获取测试数据

wd@there:~$ cd /mnt/vol1/ 
wd@there:~$ wget -c http://achelous.org/download/demo_dataset.zip
wd@there:~$ unzip demo_dataset.zip

2. 流程添加

下载demo pipeline对应的文件并解压

wd@there:~$ cd ~
wd@there:~$ wget http://achelous.org/download/WGS-src.zip
wd@there:~$ unzip WGS-src.zip
wd@there:~$ cd WGS-src/  
wd@there:~$ biocli pipeline add demo.pipeline.json -d src/

demo_pipeline 目录下包含三项内容:

  • pipeline.json pipeline 信息文件,其中记录了pipeline的相关信息,其内容为
{
    "Name":"FastqtoBam",
    "Type" :"WDL",
    "Description" : "Simple pipeline for fastq to bam by bwa mem",
    "wdl" : {
            "WorkflowFile" : "demo.wdl"
        }
}
  • src/toy-pipeline.wdl 中为流程的WDL脚本。
### Achelous Demo pipeline ###
### contact us e-mail: di.wu@xtaotech.com ####
workflow fastqtobam{
    File fastq1
    File fastq2
    File Ref
    call bwa_mem { input: fastq1 = fastq1, fastq2 = fastq2, Ref = Ref }
    call samtobam { input: sam = bwa_mem.sam }
    output { 
        File bam_file = samtobam.bam
    }
}

task bwa_mem{
    File fastq1
    File fastq2
    File Ref
    command{
        /bio/bwa/bwa mem ${Ref} -t 5  ${fastq1} ${fastq2} > toy.sam
    }
    output {
        File sam = "toy.sam"
    }
    runtime {docker:"bwa:0.7.16a";cpu:"5";memory:"2G"}
}

task samtobam{
    File sam
    command {
        /bio/samtools/samtools view -bS ${sam} -o toy.bam
    }
    output {
        File bam = "toy.bam"
    }
    runtime {docker:"bwa:0.7.16a";cpu:"1";memory:"2G"}
}

该流程实现了从二代测序数据文件fastq进过bwa比对和samtools文件格式转化两个步骤。
流程中涉及三个参数,分别为双端测序的 关于如何编写WDL脚本,可参考link_here 中的说明

  • job.json 作业文件

该文件记录作业投递所需要的参数,其内容如下

{
    "Name" : "My_first_WDL_job",
    "Pipeline" : "demo_pipeline",
    "WorkDir" : "",
    "InputDataSet" : {
        "WorkflowInput" : {
                "fastqtobam.fastq1" : "",
                "fastqtobam.fastq2" : "",
                "fastqtobam.ref" : ""
        }
    },
    "Priority" : 7
}

其中 WorkDir 和 WorkflowInput 中的内容,需要用户进行填写。

3. 作业参数修改

用户可以对 demo.job.json 文件进行编辑。其中WorkDir是流程运行的输出路径,WorkflowInput中三个参数分别对应测试数据集中read1.fastq、read2.fastq和ref.fa的路径。
Achelous系统支持在文件绝对路径进行参数填写。

{
    "Name" : "My_first_WDL_job",
    "Pipeline" : "demo_pipeline",
    "WorkDir" : "/mnt/vol1/my_first_wdl_job",
    "InputDataSet" : {
        "WorkflowInput" : {
                "fastqtobam.fastq1" : "/mnt/vol1/demo_data_set/read1.fastq",
                "fastqtobam.fastq2" : "/mnt/vol1/demo_data_set/read2.fastq",
                "fastqtobam.ref" : "/mnt/vol1/demo_data_set/ref.fa"
        }
    },
    "Priority" : 7
}

修改完成后保存退出即可

4. 作业投递

job json文件修改完成后,在linux shell中可以通过以下命令进行作业投递

wd@there:~$ biocli job submit demo.job.json
The job added success, job ID is: d4ae1a5b-65e9-4106-4c44-2e44364f799d

此例中,成功投递后会返回对应 job ID 为 d4ae1a5b-65e9-4106-4c44-2e44364f799d

5. 作业状态查看

当用户需要查看作业运行状态时,可运行以下命令

wd@there:~$ biocli job status 4047f22d-d21c-4b84-6078-8a05128a27ce

Status of Job d4ae1a5b-65e9-4106-4c44-2e44364f799d:
 Name: My_first_WDL_job
 Pipeline: FASTQTOBAM
 State: CREATED
 Owner: C
 WorkDir: /mnt/vol1/my_first_wdl_job
 PausedState: N/A
 Created: 2021-04-21T18:26:39+08:00
 Finished: N/A
 RetryLimit: 3
 RunCount: 0
 UserStageCount: 0
 StageQuota: -1
 Priority: 7
 FailReason:
 GraphBuildStatus: Completed
 DoneStages: No stage done
 RunningStages: No stage running
 WaitingStages: No stage waiting
 ForbiddenStages: No stage forbidden

作业状态查看,除了使用完整job ID进行查看之外,也可以截取job ID中任意字串进行查询。返回信息中有作业状态,工作路径等信息。

6. 查看作业结果

当程序运行结束后,执行作业状态查看命令,会返回如下信息:

Status of Job d4ae1a5b-65e9-4106-4c44-2e44364f799d:
 Name: wd
 Pipeline: wes
 State: FINISHED
 Owner: uec
 WorkDir: /mnt/vol1/my_first_wdl_job
 PausedState:
 Created: 2021-4-21T09:33:26Z
 Finished: 2021-4-21T09:35:58Z
 RetryLimit: 3

用户可进入结果路径查看对应的

wd@there:~$ cd  /mnt/vol1/my_first_wdl_job 
wd@there:~$ ls -alR ./ 

fastqtobam-fastqtobam.bwa_mem  fastqtobam-fastqtobam.samtobam  logs  wdllibiofiles

/autofs/vol6/Bioinformatics-pipeline/demo_results/fastqtobam-fastqtobam.bwa_mem:
toy.sam

/autofs/vol6/Bioinformatics-pipeline/demo_results/fastqtobam-fastqtobam.samtobam:
toy.bam

...
Powered by XTAO TechnologyLast Modified On:2021 2021-08-11 06:40:16

results matching ""

    No results matching ""