SCRIdb.worker.worker_main

SCRIdb.worker.worker_main(f_in, source_path=None, target_path=None, runseqc=True, hashtag=True, vdj=True, atac=True, cr=True, no_rsync=False, save=False, **args)

A method to process raw sequencing data returned from IGO. Newly sequenced samples are copied from IGO shared drive to a defined S3URI. Then, the proper pipeline is called to process the copied raw data.

Parameters
  • f_in (Union[str, list]) – Input file name, a single sample name, or a list of sample names, sequenced and ready to be processed.

  • source_path (Optional[str]) – Source path to parent directory of sequenced samples, usually an IGO shared drive.

  • target_path (Optional[str]) – Target path to parent directory of sequenced samples, usually, a S3URI.

  • runseqc (bool) – Call seqc pipeline. Default: True.

  • hashtag (bool) – Call hashtag pipeline. Default: True.

  • vdj (bool) – Call VDJ pipeline. Default: True.

  • atac (bool) – Call atac-seq pipeline. Default: True.

  • cr (bool) – Call Cell Ranger pipeline. Default: True.

  • no_rsync (bool) – Skip copying files to S3.

  • save (bool) – Write sample_data to .csv output configured in --results_output.

  • args – Additional args passed to other methods.

Return type

None

Returns

None.

Example

>>> from SCRIdb.worker import *
>>> args = json.load(open(os.path.expanduser("~/.config.json")))
>>> args["jobs"] = "jobs.yml"
>>> args["seqcargs"] = {"min-poly-t": 0}
>>> db_connect.conn(args)
>>> worker_main(
    f_in=[
            "Sample_CCR7_DC_1_IGO_10587_12",
            "Sample_CCR7_DC_2_IGO_10587_13",
            "Sample_CCR7_DC_3_IGO_10587_14",
            "Sample_CCR7_DC_4_IGO_10587_15"
    ],
    source_path="/Volumes/peerd/FASTQ/Project_10587/MICHELLE_0194",
    target_path="s3://dp-lab-data/sc-seq/Project_10587",
    runseqc = False,
    no_rsync = True,
    **args
)