taxidTools.Taxonomy module¶
Taxonomy object definition
- class taxidTools.Taxonomy.Taxonomy(*args, **kwargs)[source]¶
Bases:
UserDict
Store Taxonomy nodes
A Taxonomy is instanciated as a dictionnary and each Node can be accessed by its taxid. A Taoxonomy object can be instanciated directly from a dictionnary, iteratively with the method Taxonomy.addNode method or from a list of taxdump files..
Notes
Taxonomy objects are mutable and some methods will modify the underlying Node objects. Do a deep copy if you wish to keep the original object.
A Taxonomy always assumes a unique root node.
See also
Taxonomy.from_taxdump
load a Taxonomy object from taxdump files
Taxonomy.from_list
load a Taxonomy object from a list of Node
Taxonomy.from_json
load a Taxonomy from a previously exported json file
Taxonomy.addNode
add a Node to a Taxonomy
Examples
>>> root = Node(1, "root", "root") >>> branch1 = Node(11, "node11", "middle", root) >>> branch2 = Node(12, "node12", "middel", root) >>> leaf1 = Node(111, "node111", "leaf", branch1) >>> leaf2 = Node(112, "node112", "leaf", branch1) >>> leaf3 = Node(121, "node121", "leaf", branch2) >>> leaf4 = Node(13, "node13", "leaf", root)
>>> tax = Taxonomy({"1" : root, ... 11: branch1, ... 12: branch2, ... 111: leaf1, ... 112: leaf2, ... 121: leaf3, ... 13: leaf4})
Instanciate from a list:
>>> tax = Taxonomy.from_list( [root, branch1, branch2, leaf1, leaf2, leaf3, leaf4])
Or iteratively:
>>> tax = Taxonomy() >>> for node in [root, branch1, branch2, leaf1, leaf2, leaf3, leaf4]: ... tax.addNode(node) ...
Or from the taxdump files:
>>> tax = Taxonomy.from_taxdump("nodes.dmp', 'rankedlineage.dmp')
- addNode(node)[source]¶
Add a Node to an existing Taxonomy object.
The Node taxid will be used a key to access element.
- Parameters:
node (
Node
) – A Node to add to the Taxonomy- Return type:
None
Examples
>>> tax = Taxonomy() >>> tax.addNode(Node(1))
- consensus(taxid_list, min_consensus)[source]¶
Find a taxonomic consensus for the given taxid with a minimal agreement level.
- Parameters:
taxid_list (
list
[Union
[str
,int
]]) – list of taxonomic identification numbersmin_consensus (
float
) – minimal consensus level, between 0.5 and 1. Note that a minimal consensus of 1 will return the same result as lastCommonNode()
- Return type:
Notes
If no consensus can be found (for example because the Taxonomy contains multiple trees), an IndexError will be raised.
See also
Examples
>>> node0 = Node(taxid = 0, name = "root", rank = "root", parent = None) >>> node1 = Node(taxid = 1, name = "node1", rank = "rank1", parent = node0) >>> node2 = Node(taxid = 2, name = "node2", rank = "rank1", parent = node0) >>> node11 = Node(taxid = 11, name = "node11", rank = "rank2", parent = node1) >>> node12 = Node(taxid = 12, name = "node12", rank = "rank2", parent = node1) >>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12]) >>> tax.consensus([11, 12, 2], 0.8) Node(0) >>> tax.consensus([11, 12, 2], 0.6) Node(1)
- distance(taxid1, taxid2)[source]¶
Measures the distance between two nodes.
- Parameters:
taxid1 (
Union
[str
,int
]) – Taxonomic identification numbertaxid2 (
Union
[str
,int
]) – Taxonomic identification number
- Return type:
int
Examples
>>> node0 = Node(taxid = 0, name = "root", rank = "root", parent = None) >>> node1 = Node(taxid = 1, name = "node1", rank = "rank1", parent = node0) >>> node2 = Node(taxid = 2, name = "node2", rank = "rank1", parent = node0) >>> node11 = Node(taxid = 11, name = "node11", rank = "rank2", parent = node1) >>> node12 = Node(taxid = 12, name = "node12", rank = "rank2", parent = node1) >>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12]) >>> tax.distance(11, 2) 3 >>> tax.distance(11, 12) 2
- filterRanks(ranks=['species', 'genus', 'family', 'order', 'class', 'phylum', 'kingdom'])[source]¶
Filter a Taxonomy to keep only the ranks provided as arguments.
Modifies Taxonomy in-place to keep only the Nodes at the requested ranks. Nodes will be modified to conserve linkage in the Taxonomy.
- Parameters:
ranks (
list
[str
]) – List of ranks to keep. Must be sorted by ascending ranks.- Return type:
None
Notes
In order to enforce ankering of the Taxonomy, the root node will always be kept.
Examples
>>> node1 = Node(1, rank = "root") >>> node11 = Node(11, rank = "rank1", parent = node1) >>> node111 = Node(111, rank = "rank2", parent = node11) >>> node001 = Node('001', rank = "rank2", parent = node1) >>> tax = Taxonomy.from_list([node1, node11, node111, node001]) >>> tax.filterRanks(['rank2', 'rank1', 'root']) >>> tax {Node(1), Node(11), DummyNode(tO841ymu), Node(111), Node(001)}
DummyNodes are created s placeholders for missing ranks in the taxonomy:
>>> node001.parent DummyNode(tO841ymu)
Note that the root will be kept regardless of the input:
>>> node1 = Node(1, rank = "root") >>> node11 = Node(11, rank = "rank1", parent = node1) >>> node111 = Node(111, rank = "rank2", parent = node11) >>> node001 = Node('001', rank = "rank2", parent = node1) >>> tax = Taxonomy.from_list([node1, node11, node111, node001]) >>> tax.filterRanks(['rank2', 'rank1']) >>> tax {DummyNode(wmnar5QT), Node(001), Node(1), Node(11), Node(111)}
- classmethod from_json(path)[source]¶
Load a Taxonomy from a previously exported json file.
- Parameters:
path (
str
) – Path of file to load- Return type:
See also
- classmethod from_list(node_list)[source]¶
Create a Taxonomy object from a list of Nodes
Convert a list of Nodes into a valid Taxonomy object where each Node can be accessed using its taxid as key.
- Parameters:
node_list (
list
[_BaseNode
]) – List of Node objects- Return type:
Examples
>>> txd = Taxonomy.from_list([Node(1), Node(2)])
- classmethod from_taxdump(nodes, rankedlineage)[source]¶
Create a Taxonomy object from the NBI Taxdump files
Load the taxonomic infromation form the nodes.dmp and rankedlineage.dmp files available from the NCBI servers.
- Parameters:
nodes (
str
) – Path to the nodes.dmp filerankedlineage (
str
) – Path to the rankedlineage.dmp file
- Return type:
Examples
>>> tax = Taxonomy.from_taxdump("nodes.dmp', 'rankedlineage.dmp')
- getAncestry(taxid)[source]¶
Retrieve the ancestry of the given taxid
- Parameters:
taxid (
Union
[str
,int
]) – Taxonomic identification number- Return type:
Examples
>>> root = Node(1, "root", "root") >>> node = Node(2, "node", "rank", root) >>> tax = Taxonomy({'1': root, '2': node}) >>> tax.getAncestry(2) Lineage([Node(2), Node(1)])
- getChildren(taxid)[source]¶
Retrieve the children Nodes
- Parameters:
taxid (
Union
[str
,int
]) – Taxonomic identification number- Return type:
list
[Node
]
Examples
>>> root = Node(1, "root", "root") >>> node = Node(2, "node", "rank", root) >>> tax = Taxonomy({'1': root, '2': node}) >>> tax.getChildren(1) [Node(2)]
- getName(taxid)[source]¶
Get taxid name
- Parameters:
taxid (
Union
[str
,int
]) – Taxonomic identification number- Return type:
str
Examples
>>> node = Node(1, "node", "rank") >>> tax = Taxonomy({'1':node}) >>> tax.getName(1) 'node'
- getParent(taxid)[source]¶
Retrieve parent Node
- Parameters:
taxid (
Union
[str
,int
]) – Taxonomic identification number- Return type:
Examples
>>> root = Node(1, "root", "root") >>> node = Node(2, "node", "rank", root) >>> tax = Taxonomy({'1': root, '2': node}) >>> tax.getParent(2) Node(1)
- getRank(taxid)[source]¶
Get taxid rank
- Parameters:
taxid (
Union
[str
,int
]) – Taxonomic identification number- Return type:
str
Examples
>>> node = Node(1, "node", "rank") >>> tax = Taxonomy({'1':node}) >>> tax.getRank(1) 'rank'
- getTaxid(name)[source]¶
Get taxid from name
- Parameters:
name (
str
) – Node name- Return type:
str
Examples
>>> node = Node(1, "node", "rank") >>> tax = Taxonomy({'1':node}) >>> tax.getTaxid('node') '1'
- isAncestorOf(taxid, child)[source]¶
Test if taxid is an ancestor of child
- Parameters:
taxid (
Union
[str
,int
]) – Taxonomic identification numberchild (
Union
[str
,int
]) – Taxonomic identification number
- Return type:
bool
See also
Examples
>>> root = Node(1, "root", "root") >>> node = Node(2, "node", "rank", root) >>> tax = Taxonomy({'1': root, '2': node}) >>> tax.isAncestorOf(1, 2) True >>> tax.isAncestorOf(2, 1) False
- isDescendantOf(taxid, parent)[source]¶
Test if taxid is an descendant of parent
- Parameters:
taxid (
Union
[str
,int
]) – Taxonomic identification numberparent (
Union
[str
,int
]) – Taxonomic identification number
- Return type:
bool
See also
Examples
>>> root = Node(1, "root", "root") >>> node = Node(2, "node", "rank", root) >>> tax = Taxonomy({'1': root, '2': node}) >>> tax.isDescendantOf(1, 2) False >>> tax.isDescendantOf(2, 1) True
- lca(taxid_list)[source]¶
Get lowest common node of a bunch of taxids
- Parameters:
taxid_list (
list
[Union
[str
,int
]]) – list of taxonomic identification numbers- Return type:
See also
Examples
>>> node0 = Node(taxid = 0, name = "root", rank = "root", parent = None) >>> node1 = Node(taxid = 1, name = "node1", rank = "rank1", parent = node0) >>> node2 = Node(taxid = 2, name = "node2", rank = "rank1", parent = node0) >>> node11 = Node(taxid = 11, name = "node11", rank = "rank2", parent = node1) >>> node12 = Node(taxid = 12, name = "node12", rank = "rank2", parent = node1) >>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12]) >>> tax.lca([11, 12, 2]) Node(0)
- listDescendant(taxid, ranks=None)[source]¶
List all descendant of a node
- Parameters:
taxid (
Union
[str
,int
]) – Taxonomic identification numberranks (
Optional
[list
]) – list of ranks for which to return nodes
- Return type:
set
[Node
]
Examples
>>> node0 = Node(taxid = 0, name = "root", rank = "root", parent = None) >>> node1 = Node(taxid = 1, name = "node1", rank = "rank1", parent = node0) >>> node2 = Node(taxid = 2, name = "node2", rank = "rank1", parent = node0) >>> node11 = Node(taxid = 11, name = "node11", # rank = "rank2", parent = node1) >>> node12 = Node(taxid = 12, name = "node12", rank = "rank2", parent = node1) >>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12]) >>> tax.listDescendant(1) [Node(11), Node(12)] >>> tax.listDescendant(2) []
- prune(taxid)[source]¶
Prune the Taxonomy at the given taxid
Nodes not in the lineage (upwards and downwards) of the given taxid will be discarded. The Ancestors of the given taxid will be kept!
- Parameters:
taxid (
Union
[str
,int
]) – taxid whose Lineage to keep- Return type:
None
Examples
>>> node0 = Node(taxid = 0, name = "root", rank = "root", parent = None) >>> node1 = Node(taxid = 1, name = "node1", rank = "rank1", parent = node0) >>> node2 = Node(taxid = 2, name = "node2", rank = "rank1", parent = node0) >>> node11 = Node(taxid = 11, name = "node11", rank = "rank2", parent = node1) >>> node12 = Node(taxid = 12, name = "node12", rank = "rank2", parent = node1) >>> tax = Taxonomy.from_list([node0, node1, node2, node11, node12]) >>> tax.prune(1)
Ancestry is kept_
>>> tax.getAncestry(11) Lineage([Node(11), Node(1), Node(0)])
But other branches are gone
>>> tax.get('2') KeyError: '2'
- taxidTools.Taxonomy.load(path)[source]¶
Load a Taxonomy from a previously exported json file.
- Parameters:
path (
str
) – Path of file to load- Return type:
See also
- taxidTools.Taxonomy.load_ncbi(nodes, rankedlineage)[source]¶
Load a Taxonomy from the NCBI`s taxdump files
- Parameters:
nodes (
str
) – Path to the nodes.dmp filerankedlineage (
str
) – Path to the rankedlineage.dmp file
- Return type:
Examples
>>> tax = load_ncbi("nodes.dmp', 'rankedlineage.dmp')
See also