taxidTools.Taxonomy.Taxonomy
Bases: UserDict
Stores Taxonomy nodes and their relationships
A Taxonomy is instanciated as a dictionnary and
each Node can be accessed by its taxid.
A Taoxonomy object can be instanciated directly from a dictionnary,
iteratively with the method Taxonomy.addNode
method or from a
list of taxdump files..
Attributes:
Raises:
-
InvalidNodeError
–If trying to access a Node that doesn't exist with a bracket expression
Notes
Taxonomy objects are mutable and some methods will modify the underlying Node objects. Do a deep copy or use the Taxonomy.copy() method if you wish to keep the original object.
A Taxonomy always assumes a unique root node.
See Also
Taxonomy.from_list: load a Taxonomy object from a list of Node read_taxdump: load a Taxonomy object from taxdump files read_json: load a Taxonomy from a previously exported json file Taxonomy.addNode: add a Node to a Taxonomy
Examples:
>>> root = Node(1, "root", "root")
>>> branch1 = Node(11, "node11", "middle", root)
>>> branch2 = Node(12, "node12", "middel", root)
>>> leaf1 = Node(111, "node111", "leaf", branch1)
>>> leaf2 = Node(112, "node112", "leaf", branch1)
>>> leaf3 = Node(121, "node121", "leaf", branch2)
>>> leaf4 = Node(13, "node13", "leaf", root)
From a dictionnary of Nodes:
>>> tax = Taxonomy({"1" : root,
... 11: branch1,
... 12: branch2,
... 111: leaf1,
... 112: leaf2,
... 121: leaf3,
... 13: leaf4})
Instanciate from a list:
Or iteratively:
>>> tax = Taxonomy()
>>> for node in [root, branch1, branch2, leaf1, leaf2, leaf3, leaf4]:
... tax.addNode(node)
...
Or from the taxdump files:
root: Node
property
Returns the root Node, assumes a single root shared by all Nodes
__getitem__(key)
Element getter with brackets
Overloading default behavior to: - return a specific error on non-existing key - handle MergedNodes to return the new node
addNode(node)
Add a Node to an existing Taxonomy object.
The Node taxid will be used a key to access element.
Parameters:
-
node
(Node
) –A Node to add to the Taxonomy
Examples:
consensus(taxid_list, min_consensus, ignore_missing=False)
Find a taxonomic consensus for the given taxid with a minimal agreement level.
Parameters:
-
taxid_list
(list[Union[str, int]]
) –list of taxonomic identification numbers
-
min_consensus
(float
) –minimal consensus level, between 0.5 and 1. Note that a minimal consensus of 1 will return the same result as
lastCommonNode()
-
ignore_missing
(bool
, default:False
) –if True will ignore missing taxids form the analysis. If False (default), will raise an Error on missing taxids
Returns:
-
_BaseNode
–
Raises:
-
ValueError
–If
taxid_list
contains no valid taxid andignore_missing
isTrue
-
InvalidNodeError
–If
taxid_list
contains invalid taxids andignore_missing
isFalse
Notes
If no consensus can be found (for example because
the Taxonomy contains multiple trees),
an IndexError
will be raised.
See Also
Taxonomy.lca
Examples:
>>> node0 = Node(taxid = 0, name = "root",
rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.consensus([11, 12, 2], 0.8)
Node(0)
>>> tax.consensus([11, 12, 2], 0.6)
Node(1)
copy()
Create a deepcopy of the current Taxonomy instance.
Equivalent to running copy.deepcopy()
Returns:
-
Taxonomy
–
distance(taxid1, taxid2)
Measures the distance between two nodes.
Parameters:
-
taxid1
(Union[str, int]
) –Taxonomic identification number
-
taxid2
(Union[str, int]
) –Taxonomic identification number
Returns:
-
int
–
Examples:
>>> node0 = Node(taxid = 0, name = "root",
rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.distance(11, 2)
3
>>> tax.distance(11, 12)
2
filterRanks(ranks=linne(), inplace=True)
Filter a Taxonomy to keep only the ranks provided as arguments.
Modifies Taxonomy in-place to keep only the Nodes at the requested ranks. Nodes will be modified to conserve linkage in the Taxonomy.
Parameters:
-
ranks
(Optional[list[str]]
, default:linne()
) –List of ranks to keep. Must be sorted by ascending ranks.
-
inplace
(Optional[bool]
, default:True
) –perfrom the operation inplace and mutate the underlying objects or return a mutated copy of the instance, keep the original unchanged
Returns:
-
None
–
Notes
In order to enforce ankering of the Taxonomy, the root node will always be kept.
Examples:
>>> node1 = Node(1, rank = "root")
>>> node11 = Node(11, rank = "rank1", parent = node1)
>>> node111 = Node(111, rank = "rank2", parent = node11)
>>> node001 = Node('001', rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node1, node11, node111, node001])
>>> tax.filterRanks(['rank2', 'rank1', 'root'])
>>> tax
{Node(1), Node(11), DummyNode(tO841ymu), Node(111), Node(001)}
DummyNodes are created as placeholders for missing ranks in the taxonomy:
Note that the root will be kept regardless of the input:
>>> node1 = Node(1, rank = "root")
>>> node11 = Node(11, rank = "rank1", parent = node1)
>>> node111 = Node(111, rank = "rank2", parent = node11)
>>> node001 = Node('001', rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node1, node11, node111, node001])
>>> tax.filterRanks(['rank2', 'rank1'])
>>> tax
{DummyNode(wmnar5QT), Node(001), Node(1), Node(11), Node(111)}
It is also possible to keep the original instance intact and return a filtered copy:
from_list(node_list)
classmethod
Create a Taxonomy object from a list of Nodes
Convert a list of Nodes into a valid Taxonomy object where each Node can be accessed using its taxid as key.
Parameters:
-
node_list
(list[_BaseNode]
) –List of Node objects
Returns:
-
Taxonomy
–
Examples:
getAncestry(taxid)
Retrieve the ancestry of the given taxid
Parameters:
-
taxid
(Union[str, int]
) –Taxonomic identification number
Returns:
-
Lineage
–
Examples:
getChildren(taxid, value=None)
Retrieve the children Nodes
Parameters:
-
taxid
(Union[str, int]
) –Taxonomic identification number
-
value
(Optional[Any]
, default:None
) –A value to return if name does not exist
Returns:
-
list
–
Examples:
getName(taxid, value=None)
getParent(taxid, value=None)
Retrieve parent Node
Parameters:
-
taxid
(Union[str, int]
) –Taxonomic identification number
-
value
(Optional[Any]
, default:None
) –A value to return if name does not exist
Returns:
-
_BaseNode
–
Examples:
getRank(taxid, value=None)
getTaxid(name, value=None)
isAncestorOf(taxid, child)
Test if taxid is an ancestor of child
Parameters:
-
taxid
(Union[str, int]
) –Taxonomic identification number
-
child
(Union[str, int]
) –Taxonomic identification number
Returns:
-
bool
–
See Also
Taxonomy.isDescendantOf
Examples:
isDescendantOf(taxid, parent)
Test if taxid is an descendant of parent
Parameters:
-
taxid
(Union[str, int]
) –Taxonomic identification number
-
parent
(Union[str, int]
) –Taxonomic identification number
Returns:
-
bool
–
See Also
Taxonomy.isAncestorOf
Examples:
lca(taxid_list, ignore_missing=False)
Get lowest common node of a bunch of taxids
Parameters:
-
taxid_list
(list[Union[str, int]]
) –list of taxonomic identification numbers
-
ignore_missing
(bool
, default:False
) –if True will ignore missing taxids form the analysis. If False (default), will raise an Error on missing taxids
Returns:
-
_BasNode
–
Raises:
-
ValueError
–If
taxid_list
contains no valid taxid andignore_missing
isTrue
-
InvalidNodeError
–If
taxid_list
contains invalid taxids andignore_missing
isFalse
See Also
Taxonomy.consensus
Examples:
>>> node0 = Node(taxid = 0, name = "root",
rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.lca([11, 12, 2])
Node(0)
listDescendant(taxid, ranks=None)
List all descendant of a node
Parameters:
-
taxid
(Union[str, int]
) –Taxonomic identification number
-
ranks
(Optional[list]
, default:None
) –list of ranks for which to return nodes
Returns:
-
list
–
Examples:
>>> node0 = Node(taxid = 0, name = "root",
rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11", #
rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.listDescendant(1)
[Node(11), Node(12)]
>>> tax.listDescendant(2)
[]
prune(taxid, inplace=True)
Prune the Taxonomy at the given taxid
Nodes not in the lineage (upwards and downwards) of the given taxid will be discarded. The Ancestors of the given taxid will be kept!
Parameters:
-
taxid
(Union[str, int]
) –taxid whose Lineage to keep
-
inplace
(Optional[bool]
, default:True
) –perfrom the operation inplace and mutate the underlying objects or return a mutated copy of the instance, keep the original unchanged
Returns:
-
None
–
Examples:
>>> node0 = Node(taxid = 0, name = "root",
rank = "root", parent = None)
>>> node1 = Node(taxid = 1, name = "node1",
rank = "rank1", parent = node0)
>>> node2 = Node(taxid = 2, name = "node2",
rank = "rank1", parent = node0)
>>> node11 = Node(taxid = 11, name = "node11",
rank = "rank2", parent = node1)
>>> node12 = Node(taxid = 12, name = "node12",
rank = "rank2", parent = node1)
>>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12])
>>> tax.prune(1)
Ancestry is kept
But other branches are gone
We can keep a copy of the:
toNewick(names='name')
Generate a Newock string fro the current taxonomy
Export as Newick tree string for compatibility with other packages Import in ETE with format 8 (all names). Experimental feature
Parameters:
-
names
(str
, default:'name'
) –Node attribute to use as node name, choice of 'name' or 'taxid'
Returns:
-
str
–
write(path)
Write taxonomy to a JSON file.
Parameters:
-
path
(str
) –File path for the output
See Also
taxidTools.read_json