SCRIdb.tools.jobs_yml_config¶
-
SCRIdb.tools.
jobs_yml_config
(sample_data, config_jobs_yml=None, ami='', instance_type='r5.2xlarge', star_args='runRNGseed=0', email=None, seqcargs=None, save=True)¶ Constructor for
yaml
formatted file for batch processing of samples.- Parameters
sample_data (
DataFrame
) – Data frame of samples to be processed, generated by this functionsample_data_frame()
config_jobs_yml (
Optional
[str
]) – Path to output.yml
file inconfig
directory for batch processingami (
str
) – SEQC AMI (Amazon Machine Image) to useinstance_type (
str
) – EC2 instance type to be usedstar_args (
str
) – Arguments passed to theSTAR
aligner.email (
Optional
[str
]) – Email address to receive run summary or errors when running remotely. Optional only if running locally.seqcargs (
Optional
[Dict
]) – Additional arguments passed to seqc.save (
bool
) – Save a copy on jobs file to path defined in config_jobs_yml, else return a copy.
- Return type
- Returns
None
Example
>>> from SCRIdb.worker import * >>> args = json.load(open(os.path.expanduser("~/.config.json"))) >>> args["jobs"] = "jobs.yml" >>> args["seqcargs"] = {"min-poly-t": 0} >>> db_connect.conn(args)
>>> f_in=[ "Sample_CCR7_DC_1_IGO_10587_12", "Sample_CCR7_DC_2_IGO_10587_13", "Sample_CCR7_DC_3_IGO_10587_14", "Sample_CCR7_DC_4_IGO_10587_15" ] >>> f_in = " ".join(f_in) >>> source_path="/Volumes/peerd/FASTQ/Project_10587/MICHELLE_0194" >>> target_path="s3://dp-lab-data/sc-seq/Project_10587" >>> sd = pd.DataFrame( { "proj_folder": [source_path], "s3_loc": [target_path], "fastq": [f_in] } ) >>> sample_data = sample_data_frame(sd) >>> jobs_yml_config( sample_data, email=args["email"], config_jobs_yml=os.path.join(args["dockerizedSEQC"], "config", args["jobs"]), seqcargs=args["seqcargs"], )