Free Phylogenetic Network Software | DOS Version | Instructions and Troubleshooting

Refresh button:

Your internet browser stores recently viewed pages in a disk buffer and may load the old version from your disk instead of loading the updated page from our website. Please click on Refresh/Reload (in the View menu of Internet Explorer or Firefox) to force the browser to load the most recent version of this page.

Download problems: On some computer systems it is not possible to download EXE-files correctly, in which case the browser settings should be checked (usually under security options) and in the last resort the system administrator should be contacted or a private internet facility with its own dial-in node and browser should be used (AOL, CompuServe, or others).

1. DOS version limitations

Network 2.x will run under Macs with DOS emulator as well as under Windows95/98 and Windows 2000. Since June 2001 a Windows Version - NETWORK 3.x or 4.x - can be downloaded. Therefore the DOS version will continue to be provided for free, but will not be extensively refined.

1.1 DOS version startup problems

A few users have reported problems with running Network 2.x on their computers. In one case the solution was to download and run the program from drive A. When too much RAM is occupied by other Windows programs running in parallel, the program may timeout or a subroutine may fail.

1.2 DOS pathname conventions

Due to a compiler bug, NETWORK 2.x often cannot be run if it is installed in directories which do not conform to old DOS pathname conventions. The directory names should not be longer than 8 characters and should best not contain lower case characters, spaces, or special characters. For example, the pathname
C:\PROGRAMS\MYNETW
will be ok, but
C:\PROGRAMME\MYNETW
can lead to an error, because "PROGRAMME" has 9 characters, and
C:\PROGRAMS\MYNETWORK
can lead to an error, because "MYNETWORK" has 9 characters, and
C:\Programs\MYNETW
can lead to an error, because "Programs" contains lower case characters.

The strange thing about this compiler error is that these problems do not always need to occur. (This compiler error has also been encountered with other compilers which we have used.)

1.3 Japanese fonts

When NETWORK 2.x is run on computers using Japanese (and also presumably some other non-Western fonts), the text in the program fails to appear. Please run the DOS program on PCs with US or other Western codepage and keyboard settings, or use the Windows version of Network.

1.4 Hardware problems

We believe that problems with NETWORK 2.x happened on some (but not necessarily all) computers on some local area networks (LANs) and that these were due to timing problems in the LAN-drivers which are part of the operating system. This update of Network 2.x attempts to circumvent these LAN-problems using techniques which have been successful in the engineering software B2.

If problems nevertheless occur, please try running Network 2.x on a different computer or a non-networked computer.

2. Compatibility between WIN and DOS versions

Network3.x for Windows accepts all input and output files generated by Network2.x for DOS. Reverse compatibility (from 3.x back to 2.x) also works, with the following exceptions:

(a) One file name ending has been renamed: *.mat (version 2.x) is now *.rmf (version 3.x)
(b) Deletion coding (dash) is available in 3.x, but not in 2.x.
(c) Branch lengths are limited to 70 mutations in 3.x, and 25 mutations in 2.x.
(d) File names may be up to 255 characters long in version 3.x , but only 8 characters in version 2.x.
(e) Network3.x displays text on Japanese computers, whereas Network2.x does not.

3. Choice of data format

The network methods are designed for non-recombining DNA haplotypes, RNA or amino acid sequences. The mutating units (characters) should be known and coded at the highest possible resolution: for example, artefacts were often produced in the human mtDNA RFLP literature by measuring independent mutations at 3 adjacent nucleotides with only two endonucleases (Fig. 1 in Bandelt et al. 1999). Similarly, in human Y STRs, a compound STR such as DYS389II should be resolved into its mutational subcomponents m, n, and q (Forster et al. 2000) to avoid artefacts.

Multistate data typically are amino acid sequences, and also DNA sequences containing nucleotide positions with more than two different nucleotides. STR data are generally binary (if a single-repeat mutation mechanism has generated the STR alleles). Multistate data can be analysed only by the Median-Joining (MJ) network method, which is unreliable for longer branches, and not by the more robust Reduced Median (RM) network method. Therefore, code your data in a multistate format (multistate *.rdf for DNA and multistate *.ami for amino acids) only if you are sure that they are not binary. Furthermore it is good practice to explore every possibility how multistate data can be represented as binary data (for example by omitting multistate nucleotide positions, or by grouping variants into transitions and transversions) to run an exploratory RM analysis.

Binary data typically are STRs and closely related DNA sequences (within a species). Furthermore, RFLPs are always binary, for which you can choose the Torroni RFLP format (*.tor). If you have a mixture of STR data, RFLP data and/or binary point mutation DNA data from one chromosomal segment, then choose the Y-STR data format (*.ych) and pretend that each RFLP and/or point mutation is an STR with two length variants. DNA sequences which happen to be binary at each nucleotide position can be entered as binary rdf format (*.rdf), or alternatively as Torroni RFLP format (*.tor) if it is more convenient to pretend that each point mutation is a recognition site gain.

4. Data entry

Small data sets are best re-entered manually using the explicit data entry options in the File menu. Remember that you cannot code deletions in Network2.0; instead, code deletions with any nucleotide and replace them with a dash (-) afterwards using Notepad/Editor if you wish to use the file in Network4.x.

For entering FASTA format into Network, the new software DNA Alignment is recommended. Alternatively, users may wish to reformat their existing files to Network specifications using for example Editor/Notepad. Please consult the example files in the Network download: five acceptable entry formats are multistate *.rdf, binary *.rdf, *.ych, *.ami and *.tor. Two of these formats, multistate *.rdf and amino acid format (*.ami) are not included as examples in the DOS version download. Taxon names in any file format should not be longer than 6 characters, and each taxon name MUST be unique in the file. Nucleotide, RFLP and STR names must not be longer than 5 characters in any of the three file formats. Sequence length (number of characters) must not be longer than 500 positions in any file format.

For Roehl data format (*.rdf), consult the example in the download.

For Y STR format (*.ych) note that there is a limit of 100 STR loci. Furthermore, beware that each STR entered in ych-format will be broken up into several characters when Network converts it into rdf-format, possibly exceeding the limit of 500 positions per sequence; a trial run is advisable.

Torroni RFLP format (*.tor) is the simplest to generate; consult the self-explanatory *.tor example file in the download.

Common formatting errors. A single format error may cause Network to produce artefacts without an error message. Beware that MS Word and Windows Wordpad are unsuitable for editing your data, because they can insert/delete spaces unpredictably. This causes Network to produce artefacts. It is safe to use the Windows text editor NOTEPAD (called EDITOR on some non-US-language Windows versions). In Network for Windows, the file format is not recognised if the appropriate file ending (*.rdf or *.ych or *.tor) is missing. Network for DOS is flexible in this respect. Files generate error messages if there are empty lines at the end of a file (often the case when converting from Excel). Files generate error messages if values are placed in quotation marks (often the case when converting from Excel).

5. Example files

Example files for Torroni RFLP format (Tibetan mtDNA RFLPs and east Asian mtDNA RFLPs), Y STR format (Amerind Y STRs) and Roehl data format (Nuu Chah Nulth mtDNA control region) are included in the download. All four analyses are discussed in the literature: consult Figs 1 and 6 in Bandelt et al. (1999) for the Tibetans; Fig 3 in Forster et al. (2001) for the east Asians; Fig 4 in Forster et al. (2000) for the Amerinds; and Fig 7 in Bandelt et al. (1995) for the Nuu Chah Nulth.

6. Network calculation

If your data are binary and you expect branches which are more than a few mutations long (you can get an idea by displaying the mismatch distribution available in the File menu), then preferentially use the RM algorithm, otherwise resort to the MJ algorithm.

The first thing you should do with any data file is to call up the Change Weights option within RM or MJ to check whether all your characters or nucleotide positions were entered and weighted correctly. We suggest running an initial analysis with the default settings, that is, r set to 2 if you choose RM, or epsilon set to zero if you choose MJ.

If the networks turn out to be clean (i.e. treelike, and without large cycles), you should experiment with slightly higher settings to visualise the extent of homoplasy (potentially due to recurrent mutations, sequence errors, recombination etc.).

If on the other hand the initial network is messy (high-dimensional cubes) or contains an empty cycle larger than a rectangle (only in MJ networks), then something is amiss (recurrent mutations, sequence errors, recombination). To explore or overcome the problem, activate the frequency>1 option before running the algorithms; this option will select only those sequences confirmed at least twice in the data set. If the network is still messy, you can investigate whether this may be due to a few rapidly mutating characters by consulting the statistics option. These characters are candidates for downweighting before running another analysis. Weighting may be particularly relevant for STRs: the program internally codes the entered STRs assuming a single-repeat mutation mechanism. If this is known to be unrealistic for a given STR, the offending STR should be dealt with by downweighting it as a whole, or by differentially weighting its length transitions (labeled with a, b, c...). In general, weights often are most effective when chosen conservatively, e.g. a known tenfold higher mutation rate for a nucleotide position should be translated into a much less extreme than tenfold lower weight setting in the network calculation.

If despite these efforts the network still contains many high-dimensional features, then (for binary data files), RM and MJ can be applied sequentially. First, use RM to generate a *.rmf file (the *.out file will also be generated but is of no consequence here); then, apply MJ to the *.rmf file.

7. Large data sets

If your data set contains hundreds of sequences and the corresponding network is consequently difficult to visualise, use the star contraction option prior to the phylogenetic analysis. The star contraction option reduces large data sets to smaller data sets by identifying and contracting any starlike phylogenetic cluster into one ancestral type. The reduced data set can then be run in a phylogenetic algorithm (either our network methods or other tree-building methods) to produce a simplified skeleton phylogeny. In the graphical display, the algorithm remembers which sequences were contracted. An example star contraction analysis is included in the download and is discussed in Forster et al. (2001). Note that the publication contains some typographical errors (which do not influence the presented values or conclusions) on pages 1870/1871: for the Asian analysis the value for delta is set to 5, and the first round of star contraction reduces the data from 245 sequence types to 113 sequence types.

8. Time estimates

Time estimates for nodes in the phylogeny are not available in the DOS version, only in the Windows version.

9. Known bugs

In the case of multistate DNA files, the "Calculate Network" menu of Network2.0c offers an option to change the transition/transversion ratio (default setting 1:1). This option should be used with caution however, as a different ratio can cause the MJ method to enter a continuous loop during the network calculation. Remedy: If differential weighting for transitions and transversions is desired, identify and weight the relevant nucleotide positions manually.

In multistate DNA file format, there is no coding for deleted nucleotides: "D" stands for "A or G or T" as recommended by the nomenclature committee of the International Union of Biochemistry. (Note however that in the mismatch distribution option, "D" DOES stand for "deletion".) Remedy: Choose a nucleotide (A, G, C, or T) to code a deletion.

In the MJ algorithm for epsilon > 0, network links may be erroneously omitted when calculating complex data sets. Remedy: use Network 4.x.

10. Citation

Please cite this website (fluxus-engineering.com), as well as Bandelt et al. (1995) when using RM, Bandelt et al. (1999) when using MJ, or Forster et al. (2001) when using star contraction.

To page top