SCRIdb.tools.jobs_yml_config¶

SCRIdb.tools.jobs_yml_config(sample_data, config_jobs_yml=None, ami='', instance_type='r5.2xlarge', star_args='runRNGseed=0', email=None, seqcargs=None, save=True)¶

Constructor for yaml formatted file for batch processing of samples.

Parameters

sample_data (DataFrame) – Data frame of samples to be processed, generated by this function sample_data_frame()
config_jobs_yml (Optional[str]) – Path to output .yml file in config directory for batch processing
ami (str) – SEQC AMI (Amazon Machine Image) to use
instance_type (str) – EC2 instance type to be used
star_args (str) – Arguments passed to the STAR aligner.
email (Optional[str]) – Email address to receive run summary or errors when running remotely. Optional only if running locally.
seqcargs (Optional[Dict]) – Additional arguments passed to seqc.
save (bool) – Save a copy on jobs file to path defined in config_jobs_yml, else return a copy.

Return type

Optional[Dict]

Returns

None

Example

>>> from SCRIdb.worker import *
>>> args = json.load(open(os.path.expanduser("~/.config.json")))
>>> args["jobs"] = "jobs.yml"
>>> args["seqcargs"] = {"min-poly-t": 0}
>>> db_connect.conn(args)

>>> f_in=[
            "Sample_CCR7_DC_1_IGO_10587_12",
            "Sample_CCR7_DC_2_IGO_10587_13",
            "Sample_CCR7_DC_3_IGO_10587_14",
            "Sample_CCR7_DC_4_IGO_10587_15"
    ]
>>> f_in = " ".join(f_in)
>>> source_path="/Volumes/peerd/FASTQ/Project_10587/MICHELLE_0194"
>>> target_path="s3://dp-lab-data/sc-seq/Project_10587"
>>> sd = pd.DataFrame(
        {
            "proj_folder": [source_path],
            "s3_loc": [target_path],
            "fastq": [f_in]
        }
    )
>>> sample_data = sample_data_frame(sd)
>>> jobs_yml_config(
        sample_data,
        email=args["email"],
        config_jobs_yml=os.path.join(args["dockerizedSEQC"], "config",
        args["jobs"]),
        seqcargs=args["seqcargs"],
    )