taxidTools.Taxonomy module

Taxonomy object definition

class taxidTools.Taxonomy.Taxonomy(*args, **kwargs)[source]

Bases: UserDict

Store Taxonomy nodes

A Taxonomy is instanciated as a dictionnary and each Node can be accessed by its taxid. A Taoxonomy object can be instanciated directly from a dictionnary, iteratively with the method Taxonomy.addNode method or from a list of taxdump files..

Notes

Taxonomy objects are mutable and some methods will modify the underlying Node objects. Do a deep copy if you wish to keep the original object.

A Taxonomy always assumes a unique root node.

See also

Taxonomy.from_taxdump

load a Taxonomy object from taxdump files

Taxonomy.from_list

load a Taxonomy object from a list of Node

Taxonomy.from_json

load a Taxonomy from a previously exported json file

Taxonomy.addNode

add a Node to a Taxonomy

Examples

>>> root = Node(1, "root", "root")
>>> branch1 = Node(11, "node11", "middle", root)
>>> branch2 = Node(12, "node12", "middel", root)
>>> leaf1 = Node(111, "node111", "leaf", branch1)
>>> leaf2 = Node(112, "node112", "leaf", branch1)
>>> leaf3 = Node(121, "node121", "leaf", branch2)
>>> leaf4 = Node(13, "node13", "leaf", root)
>>> tax = Taxonomy({"1" : root,
...     11: branch1,
...     12: branch2,
...     111: leaf1,
...     112: leaf2,
...     121: leaf3,
...     13: leaf4})

Instanciate from a list:

>>> tax = Taxonomy.from_list(
    [root, branch1, branch2, leaf1, leaf2, leaf3, leaf4])

Or iteratively:

>>> tax = Taxonomy()
>>> for node in [root, branch1, branch2, leaf1, leaf2, leaf3, leaf4]:
...     tax.addNode(node)
...

Or from the taxdump files:

>>> tax = Taxonomy.from_taxdump("nodes.dmp', 'rankedlineage.dmp')
addNode(node)[source]

Add a Node to an existing Taxonomy object.

The Node taxid will be used a key to access element.

Parameters:

node (Node) – A Node to add to the Taxonomy

Return type:

None

Examples

>>> tax = Taxonomy()
>>> tax.addNode(Node(1))
consensus(taxid_list, min_consensus)[source]

Find a taxonomic consensus for the given taxid with a minimal agreement level.

Parameters:
  • taxid_list (list[Union[str, int]]) – list of taxonomic identification numbers

  • min_consensus (float) – minimal consensus level, between 0.5 and 1. Note that a minimal consensus of 1 will return the same result as lastCommonNode()

Return type:

Node

Notes

If no consensus can be found (for example because the Taxonomy contains multiple trees), an IndexError will be raised.

See also

Taxonomy.lca

Examples

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.consensus([11, 12, 2], 0.8)
Node(0)
>>> tax.consensus([11, 12, 2], 0.6)
Node(1)
distance(taxid1, taxid2)[source]

Measures the distance between two nodes.

Parameters:
  • taxid1 (Union[str, int]) – Taxonomic identification number

  • taxid2 (Union[str, int]) – Taxonomic identification number

Return type:

int

Examples

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.distance(11, 2)
3
>>> tax.distance(11, 12)
2
filterRanks(ranks=['species', 'genus', 'family', 'order', 'class', 'phylum', 'kingdom'])[source]

Filter a Taxonomy to keep only the ranks provided as arguments.

Modifies Taxonomy in-place to keep only the Nodes at the requested ranks. Nodes will be modified to conserve linkage in the Taxonomy.

Parameters:

ranks (list[str]) – List of ranks to keep. Must be sorted by ascending ranks.

Return type:

None

Notes

In order to enforce ankering of the Taxonomy, the root node will always be kept.

Examples

>>> node1 = Node(1, rank = "root")
>>> node11 = Node(11, rank = "rank1", parent = node1)
>>> node111 = Node(111, rank = "rank2", parent = node11)
>>> node001 = Node('001', rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node1, node11, node111, node001])
>>> tax.filterRanks(['rank2', 'rank1', 'root'])
>>> tax
{Node(1), Node(11), DummyNode(tO841ymu), Node(111), Node(001)}

DummyNodes are created s placeholders for missing ranks in the taxonomy:

>>> node001.parent
DummyNode(tO841ymu)

Note that the root will be kept regardless of the input:

>>> node1 = Node(1, rank = "root")
>>> node11 = Node(11, rank = "rank1", parent = node1)
>>> node111 = Node(111, rank = "rank2", parent = node11)
>>> node001 = Node('001', rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node1, node11, node111, node001])
>>> tax.filterRanks(['rank2', 'rank1'])
>>> tax
{DummyNode(wmnar5QT), Node(001), Node(1), Node(11), Node(111)}
classmethod from_json(path)[source]

Load a Taxonomy from a previously exported json file.

Parameters:

path (str) – Path of file to load

Return type:

Taxonomy

See also

Taxonomy.write

classmethod from_list(node_list)[source]

Create a Taxonomy object from a list of Nodes

Convert a list of Nodes into a valid Taxonomy object where each Node can be accessed using its taxid as key.

Parameters:

node_list (list[_BaseNode]) – List of Node objects

Return type:

Taxonomy

Examples

>>> txd = Taxonomy.from_list([Node(1), Node(2)])
classmethod from_taxdump(nodes, rankedlineage)[source]

Create a Taxonomy object from the NBI Taxdump files

Load the taxonomic infromation form the nodes.dmp and rankedlineage.dmp files available from the NCBI servers.

Parameters:
  • nodes (str) – Path to the nodes.dmp file

  • rankedlineage (str) – Path to the rankedlineage.dmp file

Return type:

Taxonomy

Examples

>>> tax = Taxonomy.from_taxdump("nodes.dmp', 'rankedlineage.dmp')
getAncestry(taxid)[source]

Retrieve the ancestry of the given taxid

Parameters:

taxid (Union[str, int]) – Taxonomic identification number

Return type:

Lineage

Examples

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.getAncestry(2)
Lineage([Node(2), Node(1)])
getChildren(taxid)[source]

Retrieve the children Nodes

Parameters:

taxid (Union[str, int]) – Taxonomic identification number

Return type:

list[Node]

Examples

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.getChildren(1)
[Node(2)]
getName(taxid)[source]

Get taxid name

Parameters:

taxid (Union[str, int]) – Taxonomic identification number

Return type:

str

Examples

>>> node = Node(1, "node", "rank")
>>> tax = Taxonomy({'1':node})
>>> tax.getName(1)
'node'
getParent(taxid)[source]

Retrieve parent Node

Parameters:

taxid (Union[str, int]) – Taxonomic identification number

Return type:

Node

Examples

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.getParent(2)
Node(1)
getRank(taxid)[source]

Get taxid rank

Parameters:

taxid (Union[str, int]) – Taxonomic identification number

Return type:

str

Examples

>>> node = Node(1, "node", "rank")
>>> tax = Taxonomy({'1':node})
>>> tax.getRank(1)
'rank'
getTaxid(name)[source]

Get taxid from name

Parameters:

name (str) – Node name

Return type:

str

Examples

>>> node = Node(1, "node", "rank")
>>> tax = Taxonomy({'1':node})
>>> tax.getTaxid('node')
'1'
isAncestorOf(taxid, child)[source]

Test if taxid is an ancestor of child

Parameters:
  • taxid (Union[str, int]) – Taxonomic identification number

  • child (Union[str, int]) – Taxonomic identification number

Return type:

bool

Examples

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.isAncestorOf(1, 2)
True
>>> tax.isAncestorOf(2, 1)
False
isDescendantOf(taxid, parent)[source]

Test if taxid is an descendant of parent

Parameters:
  • taxid (Union[str, int]) – Taxonomic identification number

  • parent (Union[str, int]) – Taxonomic identification number

Return type:

bool

Examples

>>> root = Node(1, "root", "root")
>>> node = Node(2, "node", "rank", root)
>>> tax = Taxonomy({'1': root, '2': node})
>>> tax.isDescendantOf(1, 2)
False
>>> tax.isDescendantOf(2, 1)
True
lca(taxid_list)[source]

Get lowest common node of a bunch of taxids

Parameters:

taxid_list (list[Union[str, int]]) – list of taxonomic identification numbers

Return type:

Node

Examples

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.lca([11, 12, 2])
Node(0)
listDescendant(taxid, ranks=None)[source]

List all descendant of a node

Parameters:
  • taxid (Union[str, int]) – Taxonomic identification number

  • ranks (Optional[list]) – list of ranks for which to return nodes

Return type:

set[Node]

Examples

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11", #
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.listDescendant(1)
[Node(11), Node(12)]
>>> tax.listDescendant(2)
[]
prune(taxid)[source]

Prune the Taxonomy at the given taxid

Nodes not in the lineage (upwards and downwards) of the given taxid will be discarded. The Ancestors of the given taxid will be kept!

Parameters:

taxid (Union[str, int]) – taxid whose Lineage to keep

Return type:

None

Examples

>>> node0 = Node(taxid = 0, name = "root",
                 rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
                 rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
                 rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
                  rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
                  rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.prune(1)

Ancestry is kept_

>>> tax.getAncestry(11)
Lineage([Node(11), Node(1), Node(0)])

But other branches are gone

>>> tax.get('2')
KeyError: '2'
property root: Node

Returns the root Node, assumes a single root shared by all Nodes

write(path)[source]

Write taxonomy to a JSON file.

Parameters:

path (str) – File path for the output

Return type:

None

taxidTools.Taxonomy.load(path)[source]

Load a Taxonomy from a previously exported json file.

Parameters:

path (str) – Path of file to load

Return type:

Taxonomy

taxidTools.Taxonomy.load_ncbi(nodes, rankedlineage)[source]

Load a Taxonomy from the NCBI`s taxdump files

Parameters:
  • nodes (str) – Path to the nodes.dmp file

  • rankedlineage (str) – Path to the rankedlineage.dmp file

Return type:

Taxonomy

Examples

>>> tax = load_ncbi("nodes.dmp', 'rankedlineage.dmp')