Skip to content

taxidTools.Taxonomy.Taxonomy

Bases: UserDict

Stores Taxonomy nodes and their relationships

A Taxonomy is instanciated as a dictionnary and each Node can be accessed by its taxid. A Taoxonomy object can be instanciated directly from a dictionnary, iteratively with the method Taxonomy.addNode method or from a list of taxdump files..

Attributes:

  • root (Node) –
  • data (dict) –

    data store

Raises:

  • InvalidNodeError

    If trying to access a Node that doesn't exist with a bracket expression

Notes

Taxonomy objects are mutable and some methods will modify the underlying Node objects. Do a deep copy or use the Taxonomy.copy() method if you wish to keep the original object.

A Taxonomy always assumes a unique root node.

See Also

Taxonomy.from_list: load a Taxonomy object from a list of Node read_taxdump: load a Taxonomy object from taxdump files read_json: load a Taxonomy from a previously exported json file Taxonomy.addNode: add a Node to a Taxonomy

Examples:

>>> root = Node(1, "root", "root")
>>> branch1 = Node(11, "node11", "middle", root)
>>> branch2 = Node(12, "node12", "middel", root)
>>> leaf1 = Node(111, "node111", "leaf", branch1)
>>> leaf2 = Node(112, "node112", "leaf", branch1)
>>> leaf3 = Node(121, "node121", "leaf", branch2)
>>> leaf4 = Node(13, "node13", "leaf", root)

From a dictionnary of Nodes:

>>> tax = Taxonomy({"1" : root,
...     11: branch1,
...     12: branch2,
...     111: leaf1,
...     112: leaf2,
...     121: leaf3,
...     13: leaf4})

Instanciate from a list:

>>> tax = Taxonomy.from_list(
    [root, branch1, branch2, leaf1, leaf2, leaf3, leaf4])

Or iteratively:

>>> tax = Taxonomy()
>>> for node in [root, branch1, branch2, leaf1, leaf2, leaf3, leaf4]:
...     tax.addNode(node)
...

Or from the taxdump files:

>>> tax = Taxonomy.read_taxdump("nodes.dmp', 'rankedlineage.dmp', 'merged.dmp')

root: Node property

Returns the root Node, assumes a single root shared by all Nodes

__getitem__(key)

Element getter with brackets

Overloading default behavior to: - return a specific error on non-existing key - handle MergedNodes to return the new node

addNode(node)

Add a Node to an existing Taxonomy object.

The Node taxid will be used a key to access element.

Parameters:

  • node (Node) –

    A Node to add to the Taxonomy

Examples:

>>> tax = Taxonomy()
>>> tax.addNode(Node(1))

consensus(taxid_list, min_consensus, ignore_missing=False)

Find a taxonomic consensus for the given taxid with a minimal agreement level.

Parameters:

  • taxid_list (list[Union[str, int]]) –

    list of taxonomic identification numbers

  • min_consensus (float) –

    minimal consensus level, between 0.5 and 1. Note that a minimal consensus of 1 will return the same result as lastCommonNode()

  • ignore_missing (bool, default: False ) –

    if True will ignore missing taxids form the analysis. If False (default), will raise an Error on missing taxids

Returns:

  • _BaseNode

Raises:

  • ValueError

    If taxid_list contains no valid taxid and ignore_missing is True

  • InvalidNodeError

    If taxid_list contains invalid taxids and ignore_missing is False

Notes

If no consensus can be found (for example because the Taxonomy contains multiple trees), an IndexError will be raised.

See Also

Taxonomy.lca

Examples:

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.consensus([11, 12, 2], 0.8)
Node(0)
>>> tax.consensus([11, 12, 2], 0.6)
Node(1)

copy()

Create a deepcopy of the current Taxonomy instance.

Equivalent to running copy.deepcopy()

Returns:

distance(taxid1, taxid2)

Measures the distance between two nodes.

Parameters:

  • taxid1 (Union[str, int]) –

    Taxonomic identification number

  • taxid2 (Union[str, int]) –

    Taxonomic identification number

Returns:

  • int

Examples:

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.distance(11, 2)
3
>>> tax.distance(11, 12)
2

filterRanks(ranks=linne(), inplace=True)

Filter a Taxonomy to keep only the ranks provided as arguments.

Modifies Taxonomy in-place to keep only the Nodes at the requested ranks. Nodes will be modified to conserve linkage in the Taxonomy.

Parameters:

  • ranks (Optional[list[str]], default: linne() ) –

    List of ranks to keep. Must be sorted by ascending ranks.

  • inplace (Optional[bool], default: True ) –

    perfrom the operation inplace and mutate the underlying objects or return a mutated copy of the instance, keep the original unchanged

Returns:

  • None
Notes

In order to enforce ankering of the Taxonomy, the root node will always be kept.

Examples:

>>> node1 = Node(1, rank = "root")
>>> node11 = Node(11, rank = "rank1", parent = node1)
>>> node111 = Node(111, rank = "rank2", parent = node11)
>>> node001 = Node('001', rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node1, node11, node111, node001])
>>> tax.filterRanks(['rank2', 'rank1', 'root'])
>>> tax
{Node(1), Node(11), DummyNode(tO841ymu), Node(111), Node(001)}

DummyNodes are created as placeholders for missing ranks in the taxonomy:

>>> node001.parent
DummyNode(tO841ymu)

Note that the root will be kept regardless of the input:

>>> node1 = Node(1, rank = "root")
>>> node11 = Node(11, rank = "rank1", parent = node1)
>>> node111 = Node(111, rank = "rank2", parent = node11)
>>> node001 = Node('001', rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node1, node11, node111, node001])
>>> tax.filterRanks(['rank2', 'rank1'])
>>> tax
{DummyNode(wmnar5QT), Node(001), Node(1), Node(11), Node(111)}

It is also possible to keep the original instance intact and return a filtered copy:

>>> new = tax.filterRanks(['rank1'], inplace=False)
>>> new
{DummyNode(wmnar5QT), Node(1), Node(11)}
>>> tax
{DummyNode(wmnar5QT), Node(001), Node(1), Node(11), Node(111)}

from_list(node_list) classmethod

Create a Taxonomy object from a list of Nodes

Convert a list of Nodes into a valid Taxonomy object where each Node can be accessed using its taxid as key.

Parameters:

  • node_list (list[_BaseNode]) –

    List of Node objects

Returns:

Examples:

>>> txd = Taxonomy.from_list([Node(1), Node(2)])

getAncestry(taxid)

Retrieve the ancestry of the given taxid

Parameters:

  • taxid (Union[str, int]) –

    Taxonomic identification number

Returns:

  • Lineage

Examples:

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.getAncestry(2)
Lineage([Node(2), Node(1)])

getChildren(taxid, value=None)

Retrieve the children Nodes

Parameters:

  • taxid (Union[str, int]) –

    Taxonomic identification number

  • value (Optional[Any], default: None ) –

    A value to return if name does not exist

Returns:

  • list

Examples:

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.getChildren(1)
[Node(2)]

getName(taxid, value=None)

Get taxid name

Parameters:

  • taxid (Union[str, int]) –

    Taxonomic identification number

  • value (Optional[Any], default: None ) –

    A value to return if name does not exist

Returns:

  • str

Examples:

>>> node = Node(1, "node", "rank")
>>> tax = Taxonomy({'1':node})
>>> tax.getName(1)
'node'

getParent(taxid, value=None)

Retrieve parent Node

Parameters:

  • taxid (Union[str, int]) –

    Taxonomic identification number

  • value (Optional[Any], default: None ) –

    A value to return if name does not exist

Returns:

  • _BaseNode

Examples:

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.getParent(2)
Node(1)

getRank(taxid, value=None)

Get taxid rank

Parameters:

  • taxid (Union[str, int]) –

    Taxonomic identification number

  • value (Optional[Any], default: None ) –

    A value to return if name does not exist

Returns:

  • str

Examples:

>>> node = Node(1, "node", "rank")
>>> tax = Taxonomy({'1':node})
>>> tax.getRank(1)
'rank'

getTaxid(name, value=None)

Get taxid from name

Parameters:

  • name (Union[int, str]) –

    Node name

  • value (Optional[Any], default: None ) –

    A value to return if name does not exist

Returns:

  • str

Examples:

>>> node = Node(1, "node", "rank")
>>> tax = Taxonomy({'1':node})
>>> tax.getTaxid('node')
'1'

isAncestorOf(taxid, child)

Test if taxid is an ancestor of child

Parameters:

  • taxid (Union[str, int]) –

    Taxonomic identification number

  • child (Union[str, int]) –

    Taxonomic identification number

Returns:

  • bool
See Also

Taxonomy.isDescendantOf

Examples:

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.isAncestorOf(1, 2)
True
>>> tax.isAncestorOf(2, 1)
False

isDescendantOf(taxid, parent)

Test if taxid is an descendant of parent

Parameters:

  • taxid (Union[str, int]) –

    Taxonomic identification number

  • parent (Union[str, int]) –

    Taxonomic identification number

Returns:

  • bool
See Also

Taxonomy.isAncestorOf

Examples:

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.isDescendantOf(1, 2)
False
>>> tax.isDescendantOf(2, 1)
True

lca(taxid_list, ignore_missing=False)

Get lowest common node of a bunch of taxids

Parameters:

  • taxid_list (list[Union[str, int]]) –

    list of taxonomic identification numbers

  • ignore_missing (bool, default: False ) –

    if True will ignore missing taxids form the analysis. If False (default), will raise an Error on missing taxids

Returns:

  • _BasNode

Raises:

  • ValueError

    If taxid_list contains no valid taxid and ignore_missing is True

  • InvalidNodeError

    If taxid_list contains invalid taxids and ignore_missing is False

See Also

Taxonomy.consensus

Examples:

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.lca([11, 12, 2])
Node(0)

listDescendant(taxid, ranks=None)

List all descendant of a node

Parameters:

  • taxid (Union[str, int]) –

    Taxonomic identification number

  • ranks (Optional[list], default: None ) –

    list of ranks for which to return nodes

Returns:

  • list

Examples:

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11", #
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.listDescendant(1)
[Node(11), Node(12)]
>>> tax.listDescendant(2)
[]

prune(taxid, inplace=True)

Prune the Taxonomy at the given taxid

Nodes not in the lineage (upwards and downwards) of the given taxid will be discarded. The Ancestors of the given taxid will be kept!

Parameters:

  • taxid (Union[str, int]) –

    taxid whose Lineage to keep

  • inplace (Optional[bool], default: True ) –

    perfrom the operation inplace and mutate the underlying objects or return a mutated copy of the instance, keep the original unchanged

Returns:

  • None

Examples:

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.prune(1)

Ancestry is kept

>>> tax.getAncestry(11)
Lineage([Node(11), Node(1), Node(0)])

But other branches are gone

>>> tax.get('2')
None

We can keep a copy of the:

>>> new = tax.prune(11, inplace=False)
>>> new.get('12')
KeyError: '12'
>>> tax.getAncestry('12')
Lineage([Node(12), Node(1), Node(0)])

toNewick(names='name')

Generate a Newock string fro the current taxonomy

Export as Newick tree string for compatibility with other packages Import in ETE with format 8 (all names). Experimental feature

Parameters:

  • names (str, default: 'name' ) –

    Node attribute to use as node name, choice of 'name' or 'taxid'

Returns:

  • str

write(path)

Write taxonomy to a JSON file.

Parameters:

  • path (str) –

    File path for the output

See Also

taxidTools.read_json