|   | union | 
Please help by correcting and extending the Wiki pages.
union reads in several sequences, concatenates them and writes them out as a single sequence. The input is typically a list file containing references to multiple sequences or subsequences (regions of a sequence). Optionally, feature information will be used.
The output can have source features generated which document composite sequences in the EMBL/GenBank feature table. The -findoverlap optin checks for overlaps between adjacent joined regions and reports them in the overlap file.
The file 'cds.list' contains a list of the regions making up the coding sequence of 'embl:x65923':
| % union Concatenate multiple sequences into a single sequence Input (gapped) sequence(s): @cds.list output sequence [x65921.fasta]: | 
Go to the input files for this example
Go to the output files for this example
| 
Concatenate multiple sequences into a single sequence
Version: EMBOSS:6.6.0.0
   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     (Gapped) sequence(s) filename and optional
                                  format, or reference (input USA)
  [-outseq]            seqout     [ | 
| Qualifier | Type | Description | Allowed values | Default | 
|---|---|---|---|---|
| Standard (Mandatory) qualifiers | ||||
| [-sequence] (Parameter 1) | seqall | (Gapped) sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required | 
| [-outseq] (Parameter 2) | seqout | Sequence filename and optional format (output USA) | Writeable sequence | <*>.format | 
| Additional (Optional) qualifiers | ||||
| -overlapfile | outfile | Sequence overlaps output file (optional) | Output file | <*>.union | 
| Advanced (Unprompted) qualifiers | ||||
| -feature | boolean | Use feature information | Boolean value Yes/No | No | 
| -source | boolean | Create source features | Boolean value Yes/No | No | 
| -findoverlap | boolean | Look for overlaps when joining | Boolean value Yes/No | No | 
| Associated qualifiers | ||||
| "-sequence" associated seqall qualifiers | ||||
| -sbegin1 -sbegin_sequence | integer | Start of each sequence to be used | Any integer value | 0 | 
| -send1 -send_sequence | integer | End of each sequence to be used | Any integer value | 0 | 
| -sreverse1 -sreverse_sequence | boolean | Reverse (if DNA) | Boolean value Yes/No | N | 
| -sask1 -sask_sequence | boolean | Ask for begin/end/reverse | Boolean value Yes/No | N | 
| -snucleotide1 -snucleotide_sequence | boolean | Sequence is nucleotide | Boolean value Yes/No | N | 
| -sprotein1 -sprotein_sequence | boolean | Sequence is protein | Boolean value Yes/No | N | 
| -slower1 -slower_sequence | boolean | Make lower case | Boolean value Yes/No | N | 
| -supper1 -supper_sequence | boolean | Make upper case | Boolean value Yes/No | N | 
| -scircular1 -scircular_sequence | boolean | Sequence is circular | Boolean value Yes/No | N | 
| -squick1 -squick_sequence | boolean | Read id and sequence only | Boolean value Yes/No | N | 
| -sformat1 -sformat_sequence | string | Input sequence format | Any string | |
| -iquery1 -iquery_sequence | string | Input query fields or ID list | Any string | |
| -ioffset1 -ioffset_sequence | integer | Input start position offset | Any integer value | 0 | 
| -sdbname1 -sdbname_sequence | string | Database name | Any string | |
| -sid1 -sid_sequence | string | Entryname | Any string | |
| -ufo1 -ufo_sequence | string | UFO features | Any string | |
| -fformat1 -fformat_sequence | string | Features format | Any string | |
| -fopenfile1 -fopenfile_sequence | string | Features file name | Any string | |
| "-outseq" associated seqout qualifiers | ||||
| -osformat2 -osformat_outseq | string | Output seq format | Any string | |
| -osextension2 -osextension_outseq | string | File name extension | Any string | |
| -osname2 -osname_outseq | string | Base file name | Any string | |
| -osdirectory2 -osdirectory_outseq | string | Output directory | Any string | |
| -osdbname2 -osdbname_outseq | string | Database name to add | Any string | |
| -ossingle2 -ossingle_outseq | boolean | Separate file for each entry | Boolean value Yes/No | N | 
| -oufo2 -oufo_outseq | string | UFO features | Any string | |
| -offormat2 -offormat_outseq | string | Features format | Any string | |
| -ofname2 -ofname_outseq | string | Features file name | Any string | |
| -ofdirectory2 -ofdirectory_outseq | string | Output directory | Any string | |
| "-overlapfile" associated outfile qualifiers | ||||
| -odirectory | string | Output directory | Any string | |
| General qualifiers | ||||
| -auto | boolean | Turn off prompts | Boolean value Yes/No | N | 
| -stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | 
| -filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | 
| -options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | 
| -debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | 
| -verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | 
| -help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | 
| -warning | boolean | Report warnings | Boolean value Yes/No | Y | 
| -error | boolean | Report errors | Boolean value Yes/No | Y | 
| -fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | 
| -die | boolean | Report dying program messages | Boolean value Yes/No | Y | 
| -version | boolean | Report version number and exit | Boolean value Yes/No | N | 
| tembl-id:X65921[782:856] tembl-id:X65921[951:1095] tembl-id:X65921[1557:1612] tembl-id:X65921[1787:1912] | 
You may find the program yank useful for creating List files.
| >X65921 X65921.1 H.sapiens fau 1 gene atgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacg gtcgcccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtc gtgctcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggag gccctgactaccctggaagtagcaggccgcatgcttggaggtaaagtccatggttccctg gcccgtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaag aagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtg cccacctttggcaagaagaagggccccaatgccaactcttaa | 
The result is a normal sequence file containing a single sequence resulting from the concatenation of the input sequences.
union is most useful when the input sequences are specified in a "list file". A list file contain references to any number of sequences which are retrieved from some other file or database. Each sequence reference is a Uniform Sequence Address (USA) which can include the specification of sub-regions of the sequence, eg. em:x65923[20:55]). Specifying several such subregions in a sequence or sequences allows you to enter disjoint sequences to be joined.
| Program name | Description | 
|---|---|
| aligncopy | Read and write alignments | 
| aligncopypair | Read and write pairs from alignments | 
| biosed | Replace or delete sequence sections | 
| codcopy | Copy and reformat a codon usage table | 
| cutseq | Remove a section from a sequence | 
| degapseq | Remove non-alphabetic (e.g. gap) characters from sequences | 
| descseq | Alter the name or description of a sequence | 
| entret | Retrieve sequence entries from flatfile databases and files | 
| extractalign | Extract regions from a sequence alignment | 
| extractfeat | Extract features from sequence(s) | 
| extractseq | Extract regions from a sequence | 
| featcopy | Read and write a feature table | 
| featmerge | Merge two overlapping feature tables | 
| featreport | Read and write a feature table | 
| feattext | Return a feature table original text | 
| listor | Write a list file of the logical OR of two sets of sequences | 
| makenucseq | Create random nucleotide sequences | 
| makeprotseq | Create random protein sequences | 
| maskambignuc | Mask all ambiguity characters in nucleotide sequences with N | 
| maskambigprot | Mask all ambiguity characters in protein sequences with X | 
| maskfeat | Write a sequence with masked features | 
| maskseq | Write a sequence with masked regions | 
| newseq | Create a sequence file from a typed-in sequence | 
| nohtml | Remove mark-up (e.g. HTML tags) from an ASCII text file | 
| noreturn | Remove carriage return from ASCII files | 
| nospace | Remove whitespace from an ASCII text file | 
| notab | Replace tabs with spaces in an ASCII text file | 
| notseq | Write to file a subset of an input stream of sequences | 
| nthseq | Write to file a single sequence from an input stream of sequences | 
| nthseqset | Read and write (return) one set of sequences from many | 
| pasteseq | Insert one sequence into another | 
| revseq | Reverse and complement a nucleotide sequence | 
| seqcount | Read and count sequences | 
| seqret | Read and write (return) sequences | 
| seqretsetall | Read and write (return) many sets of sequences | 
| seqretsplit | Read sequences and write them to individual files | 
| sizeseq | Sort sequences by size | 
| skipredundant | Remove redundant sequences from an input set | 
| skipseq | Read and write (return) sequences, skipping first few | 
| splitsource | Split sequence(s) into original source sequences | 
| splitter | Split sequence(s) into smaller sequences | 
| trimest | Remove poly-A tails from nucleotide sequences | 
| trimseq | Remove unwanted characters from start and end of sequence(s) | 
| trimspace | Remove extra whitespace from an ASCII text file | 
| vectorstrip | Remove vectors from the ends of nucleotide sequence(s) | 
| yank | Add a sequence reference (a full USA) to a list file | 
You may find the program yank useful for creating List files.
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.