Uploading sequence data with the CLI
The CLI can be used to upload all types of sequence data supported by Trakka.
FASTQ data may alternatively be uploaded via the web interface. Refer to the Sequence uploads page for instructions.
Sequence data types
Sequence data types are listed on the Sequence data page.
Each data type is specified on the command line by a short name, such as fastq-ill-pe for paired-end Illumina FASTQ data.
Adding paired-end Illumina FASTQ sequences to a sample
Uploading of FASTQ sequences is undertaken using a comma-separated file to map Seq_IDs
to sequences; if these are new samples these can be optionally created with the seq add commands.
You can upload *.fa(sta) and/or *.fastq.gz.
Please note the size of uploaded files cannot exceed 4GB.
The input CSV file for paired-end FASTQ should have three columns:
| Header | Description |
|---|---|
| Seq_ID | the sample name |
| filepath1 | The local path of the read 1 to be uploaded |
| filepath2 | The local path of the read 2 to be uploaded |
Having created a files.csv, you can upload the sequence files listed in your CSV file by running:
trakka seq add fastq-ill-pe --owner <org-abbreviation> files.csv
This assumes that the Seq_ID values map to existing samples, which must be owned by the specified owning organisation.
If these are new samples, you can create them by
including --create, --owner, and optionally if required --project on the seq add command:
trakka seq add fastq-ill-pe files.csv --create --owner <org-abbreviation> --project <project-abbreviation>
where
org-abbreviationis the abbreviation of the organisation that will own all samples created by running the command.project-abbreviationis an abbreviation of a project which the samples will be shared to. Zero or more can be specified. All samples listed in the upload will be shared, even if they are not all new samples.
Adding consensus FASTA sequences to a sample
Single-contig consensus FASTA sequences (e.g. viral genomes) are represented by the fasta-cns data type.
To upload FASTA sequences, we upload a FASTA file where the FASTA IDs must exactly match the Seq_IDs of the sample records we want the sequences to be stored against. Multiple FASTA sequences (for multiple Seq_IDs) can be uploaded in a single file.
After doing this, you can upload the sequences in the FASTA file by running
trakka seq add fasta-cns --owner <org-abbreviation> <sequence-file.fa>
The --create and --project options function as described above.
Adding other sequence data types to a sample
For other sequence data types, the process is similar to that for paired-end Illumina data, but only one sequence file is expected per sample.
The input file to the command should be a CSV file with two columns:
| Header | Description |
|---|---|
| Seq_ID | the sample name |
| filepath | The local path of the sequence file to be uploaded |
You can upload sequences of type fastq-ill-se, fastq-ont, or fasta-asm by running
trakka seq add <data-type> --owner <org-abbreviation> files.csv
where <data-type> is one of fastq-ill-se, fastq-ont, or fasta-asm.
The --create and --project options function as described above.