Skip to main content

Uploading sequence data with the CLI

The CLI can be used to upload all types of sequence data supported by Trakka.

FASTQ data may alternatively be uploaded via the web interface. Refer to the Sequence uploads page for instructions.

Sequence data types

Sequence data types are listed on the Sequence data page. Each data type is specified on the command line by a short name, such as fastq-ill-pe for paired-end Illumina FASTQ data.

Adding paired-end Illumina FASTQ sequences to a sample

Uploading of FASTQ sequences is undertaken using a comma-separated file to map Seq_IDs to sequences; if these are new samples these can be optionally created with the seq add commands.

You can upload *.fa(sta) and/or *.fastq.gz. Please note the size of uploaded files cannot exceed 4GB.

The input CSV file for paired-end FASTQ should have three columns:

HeaderDescription
Seq_IDthe sample name
filepath1The local path of the read 1 to be uploaded
filepath2The local path of the read 2 to be uploaded

Having created a files.csv, you can upload the sequence files listed in your CSV file by running:

trakka seq add fastq-ill-pe --owner <org-abbreviation> files.csv

This assumes that the Seq_ID values map to existing samples, which must be owned by the specified owning organisation.

If these are new samples, you can create them by including --create, --owner, and optionally if required --project on the seq add command:

trakka seq add fastq-ill-pe files.csv --create --owner <org-abbreviation> --project <project-abbreviation>

where

  • org-abbreviation is the abbreviation of the organisation that will own all samples created by running the command.
  • project-abbreviation is an abbreviation of a project which the samples will be shared to. Zero or more can be specified. All samples listed in the upload will be shared, even if they are not all new samples.

Adding consensus FASTA sequences to a sample

Single-contig consensus FASTA sequences (e.g. viral genomes) are represented by the fasta-cns data type.

To upload FASTA sequences, we upload a FASTA file where the FASTA IDs must exactly match the Seq_IDs of the sample records we want the sequences to be stored against. Multiple FASTA sequences (for multiple Seq_IDs) can be uploaded in a single file.

After doing this, you can upload the sequences in the FASTA file by running

trakka seq add fasta-cns --owner <org-abbreviation> <sequence-file.fa>

The --create and --project options function as described above.

Adding other sequence data types to a sample

For other sequence data types, the process is similar to that for paired-end Illumina data, but only one sequence file is expected per sample.

The input file to the command should be a CSV file with two columns:

HeaderDescription
Seq_IDthe sample name
filepathThe local path of the sequence file to be uploaded

You can upload sequences of type fastq-ill-se, fastq-ont, or fasta-asm by running

trakka seq add <data-type> --owner <org-abbreviation> files.csv

where <data-type> is one of fastq-ill-se, fastq-ont, or fasta-asm.

The --create and --project options function as described above.