From mk1853387 at gmail.com Tue Jan 9 12:39:16 2024
From: mk1853387 at gmail.com (marc nicole)
Date: Tue, 9 Jan 2024 18:39:16 +0100
Subject: [scikit-learn] How to extract subtree from a RegressionTree using
the tree attribute?
Message-ID:
I want to extract the subtree from the RegressionTree resulting from
training the associated model based on inputs: rootNode and depth,
Here's my buggy code (that I want it to be checked for errors)
def extract_tree_depth_first_traversal(tree, root_start, t_depth):
depth = 1
sub_tree = []
stack = Queue()
stack.put(root_start)
while stack:
current_node = stack.get(0)
sub_tree.append(current_node)
left_child = tree.children_left[current_node]
if left_child >= 0:
stack.put(left_child)
right_child = tree.children_right[current_node]
if right_child >= 0:
stack.put(right_child)
children_current_node = [left_child, right_child]
for child in children_current_node:
sub_tree.append(child)
if depth >= t_depth:
break
depth = depth + 1
return sub_tree
Could somebody spot the error for me ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From christian.braune79 at gmail.com Tue Jan 9 13:34:49 2024
From: christian.braune79 at gmail.com (Christian Braune)
Date: Tue, 9 Jan 2024 19:34:49 +0100
Subject: [scikit-learn] How to extract subtree from a RegressionTree
using the tree attribute?
In-Reply-To:
References:
Message-ID:
Hi Marc,
a first observation: stack.get(0) returns but does NOT remove the first
element from a list (even if you name it stack). If you want a stack, you
need to use the pop method.
See also here:
https://docs.python.org/3/tutorial/datastructures.html#using-lists-as-stacks
Best regards
Christian
marc nicole schrieb am Di., 9. Jan. 2024, 18:37:
> I want to extract the subtree from the RegressionTree resulting from
> training the associated model based on inputs: rootNode and depth,
>
> Here's my buggy code (that I want it to be checked for errors)
>
> def extract_tree_depth_first_traversal(tree, root_start, t_depth):
> depth = 1
> sub_tree = []
> stack = Queue()
> stack.put(root_start)
> while stack:
> current_node = stack.get(0)
> sub_tree.append(current_node)
> left_child = tree.children_left[current_node]
> if left_child >= 0:
> stack.put(left_child)
> right_child = tree.children_right[current_node]
> if right_child >= 0:
> stack.put(right_child)
> children_current_node = [left_child, right_child]
> for child in children_current_node:
> sub_tree.append(child)
> if depth >= t_depth:
> break
> depth = depth + 1
> return sub_tree
>
> Could somebody spot the error for me ?
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From mk1853387 at gmail.com Sun Jan 14 16:15:38 2024
From: mk1853387 at gmail.com (marc nicole)
Date: Sun, 14 Jan 2024 22:15:38 +0100
Subject: [scikit-learn] level search traversal on binary decision regression
tree with recursive calls returning wrong node order
Message-ID:
Hi all,
Suppose I have this binary tree that I want to level-based traverse using
recursive algorithm:
.
??? 1/
??? 2/
? ??? 3/
? ? ??? 4
? ? ??? 9
? ??? 30
??? 71/
??? 72
??? 99
I wrote this algorithm inspired by the level first traversal of a tree
algorithm which stops at a certain input depth:
def get_subtree_from_rt(subtree, root_start, max_depth):
if max_depth == 0:
return []
nodes = [root_start]
if root_start == -1:
return []
else:
nodes.extend([subtree.children_left[root_start],
subtree.children_right[root_start]])
print(nodes)
nodes.extend(child for child in get_subtree_from_rt(subtree,
subtree.children_left[root_start], max_depth - 1) if
child not in list(filter(lambda a: a != -1, nodes)))
nodes.extend(child for child in get_subtree_from_rt(subtree,
subtree.children_right[root_start], max_depth - 1) if
child not in list(filter(lambda a: a != -1, nodes)))
return nodes
The algorithm does traverse the tree but in an unwanted order, namely the
returned result for the mentioned tree was:
[1, 2, 71, 3, 30, 4, 9]
While the right one should have been:
[1, 2, 71, 3, 30, 72, 99]
Indeed the root_start is not the same for both recursive calls, since the
first recursive call alters its value.
My question is how to obtain the mentioned results but avoid calling the
second recursive call on a different root_start value?
use: tree_stucture as input as subtree
import pandas as pd
import numpy as np
from sklearn import *
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn import tree
dataset = pd.read_csv("anydatasetPath")
x = dataset.drop(dataset.columns[9],axis = 1)
y = dataset.iloc[:,9]
x_train, x_test,y_train,y_test = train_test_split(x,y,test_size=
0.2,random_state = 28)
model = DecisionTreeRegressor(random_state=0)
model.fit(x_train,y_train)
y_pred = model.predict(x_test)
tree_stucture = model.tree_
print(get_subtree_from_rt(tree_stucture,1,3))
with many thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From christian.braune79 at gmail.com Mon Jan 15 02:48:31 2024
From: christian.braune79 at gmail.com (Christian Braune)
Date: Mon, 15 Jan 2024 08:48:31 +0100
Subject: [scikit-learn] level search traversal on binary decision
regression tree with recursive calls returning wrong node order
In-Reply-To:
References:
Message-ID:
Hello Marc,
you might want to look at the intro to algorithms and data structures
course from Sedgewick (your specific problem is discussed here:
https://www.cs.princeton.edu/courses/archive/spring15/cos226/lectures/31ElementarySymbolTables+32BinarySearchTrees.pdf,
p50/51 (slide 22 specifically).
In short: Level-order traversal is better solved using an iterative
approach.
I also believe that your problem is not specific to sklearn, right?
Best regards
Christian
Am So., 14. Jan. 2024 um 22:13 Uhr schrieb marc nicole :
> Hi all,
>
> Suppose I have this binary tree that I want to level-based traverse using
> recursive algorithm:
>
> .
> ??? 1/
> ??? 2/
> ? ??? 3/
> ? ? ??? 4
> ? ? ??? 9
> ? ??? 30
> ??? 71/
> ??? 72
> ??? 99
>
> I wrote this algorithm inspired by the level first traversal of a tree
> algorithm which stops at a certain input depth:
>
> def get_subtree_from_rt(subtree, root_start, max_depth):
> if max_depth == 0:
> return []
> nodes = [root_start]
> if root_start == -1:
> return []
> else:
> nodes.extend([subtree.children_left[root_start], subtree.children_right[root_start]])
> print(nodes)
> nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_left[root_start], max_depth - 1) if
> child not in list(filter(lambda a: a != -1, nodes)))
>
> nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_right[root_start], max_depth - 1) if
> child not in list(filter(lambda a: a != -1, nodes)))
> return nodes
>
> The algorithm does traverse the tree but in an unwanted order, namely the
> returned result for the mentioned tree was:
>
> [1, 2, 71, 3, 30, 4, 9]
>
> While the right one should have been:
>
> [1, 2, 71, 3, 30, 72, 99]
>
> Indeed the root_start is not the same for both recursive calls, since the
> first recursive call alters its value.
>
>
> My question is how to obtain the mentioned results but avoid calling the
> second recursive call on a different root_start value?
>
> use: tree_stucture as input as subtree
>
> import pandas as pd
> import numpy as np
> from sklearn import *
> from sklearn.model_selection import train_test_split
> from sklearn.tree import DecisionTreeRegressor
> from sklearn import tree
> dataset = pd.read_csv("anydatasetPath")
> x = dataset.drop(dataset.columns[9],axis = 1)
> y = dataset.iloc[:,9]
>
> x_train, x_test,y_train,y_test = train_test_split(x,y,test_size= 0.2,random_state = 28)
>
>
> model = DecisionTreeRegressor(random_state=0)
> model.fit(x_train,y_train)
> y_pred = model.predict(x_test)
>
> tree_stucture = model.tree_
>
> print(get_subtree_from_rt(tree_stucture,1,3))
>
>
>
> with many thanks
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From mk1853387 at gmail.com Mon Jan 15 13:07:22 2024
From: mk1853387 at gmail.com (marc nicole)
Date: Mon, 15 Jan 2024 19:07:22 +0100
Subject: [scikit-learn] level search traversal on binary decision
regression tree with recursive calls returning wrong node order
In-Reply-To:
References:
Message-ID:
thanks for the reply, no it is not specific to scikit learn but one
application is about scikit learn
Le lun. 15 janv. 2024 ? 08:50, Christian Braune <
christian.braune79 at gmail.com> a ?crit :
> Hello Marc,
>
> you might want to look at the intro to algorithms and data structures
> course from Sedgewick (your specific problem is discussed here:
> https://www.cs.princeton.edu/courses/archive/spring15/cos226/lectures/31ElementarySymbolTables+32BinarySearchTrees.pdf,
> p50/51 (slide 22 specifically).
> In short: Level-order traversal is better solved using an iterative
> approach.
> I also believe that your problem is not specific to sklearn, right?
>
> Best regards
> Christian
>
> Am So., 14. Jan. 2024 um 22:13 Uhr schrieb marc nicole <
> mk1853387 at gmail.com>:
>
>> Hi all,
>>
>> Suppose I have this binary tree that I want to level-based traverse using
>> recursive algorithm:
>>
>> .
>> ??? 1/
>> ??? 2/
>> ? ??? 3/
>> ? ? ??? 4
>> ? ? ??? 9
>> ? ??? 30
>> ??? 71/
>> ??? 72
>> ??? 99
>>
>> I wrote this algorithm inspired by the level first traversal of a tree
>> algorithm which stops at a certain input depth:
>>
>> def get_subtree_from_rt(subtree, root_start, max_depth):
>> if max_depth == 0:
>> return []
>> nodes = [root_start]
>> if root_start == -1:
>> return []
>> else:
>> nodes.extend([subtree.children_left[root_start], subtree.children_right[root_start]])
>> print(nodes)
>> nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_left[root_start], max_depth - 1) if
>> child not in list(filter(lambda a: a != -1, nodes)))
>>
>> nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_right[root_start], max_depth - 1) if
>> child not in list(filter(lambda a: a != -1, nodes)))
>> return nodes
>>
>> The algorithm does traverse the tree but in an unwanted order, namely the
>> returned result for the mentioned tree was:
>>
>> [1, 2, 71, 3, 30, 4, 9]
>>
>> While the right one should have been:
>>
>> [1, 2, 71, 3, 30, 72, 99]
>>
>> Indeed the root_start is not the same for both recursive calls, since
>> the first recursive call alters its value.
>>
>>
>> My question is how to obtain the mentioned results but avoid calling the
>> second recursive call on a different root_start value?
>>
>> use: tree_stucture as input as subtree
>>
>> import pandas as pd
>> import numpy as np
>> from sklearn import *
>> from sklearn.model_selection import train_test_split
>> from sklearn.tree import DecisionTreeRegressor
>> from sklearn import tree
>> dataset = pd.read_csv("anydatasetPath")
>> x = dataset.drop(dataset.columns[9],axis = 1)
>> y = dataset.iloc[:,9]
>>
>> x_train, x_test,y_train,y_test = train_test_split(x,y,test_size= 0.2,random_state = 28)
>>
>>
>> model = DecisionTreeRegressor(random_state=0)
>> model.fit(x_train,y_train)
>> y_pred = model.predict(x_test)
>>
>> tree_stucture = model.tree_
>>
>> print(get_subtree_from_rt(tree_stucture,1,3))
>>
>>
>>
>> with many thanks
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From jeremie.du-boisberranger at inria.fr Fri Jan 19 06:15:25 2024
From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger)
Date: Fri, 19 Jan 2024 12:15:25 +0100
Subject: [scikit-learn] [ANN] scikit-learn 1.4.0 release
In-Reply-To:
References:
Message-ID: <478b3e81-3b41-45dc-acd2-2e09ec4db5b1@inria.fr>
Hi everyone,
We're happy to announce the 1.4.0 release which you can install via pip
or conda:
??? pip install -U scikit-learn
or
??? conda install -c conda-forge scikit-learn
You can read the release highlights under
https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_4_0.html
and the long list of the changes under
https://scikit-learn.org/stable/whats_new/v1.4.html
This version supports Python versions 3.9 to 3.12.
Thanks to all contributors who helped on this release !
J?r?mie,
On behalf of the scikit-learn maintainers team.
From lorentzen.ch at gmail.com Fri Jan 19 11:32:36 2024
From: lorentzen.ch at gmail.com (Christian Lorentzen)
Date: Fri, 19 Jan 2024 17:32:36 +0100
Subject: [scikit-learn] [ANN] scikit-learn 1.4.0 release
In-Reply-To: <478b3e81-3b41-45dc-acd2-2e09ec4db5b1@inria.fr>
References: <478b3e81-3b41-45dc-acd2-2e09ec4db5b1@inria.fr>
Message-ID: <822B1F6A-BB14-4B39-8454-17D93B5AF139@gmail.com>
Thank you very much, J?r?mie, for taking care of this release. I?m excited to use the new features and improvements.
Christian
>
> Am 19.01.2024 um 12:18 schrieb Jeremie du Boisberranger :
>
> ?Hi everyone,
>
> We're happy to announce the 1.4.0 release which you can install via pip or conda:
>
> pip install -U scikit-learn
>
> or
>
> conda install -c conda-forge scikit-learn
>
>
> You can read the release highlights under https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_4_0.html and the long list of the changes under https://scikit-learn.org/stable/whats_new/v1.4.html
>
> This version supports Python versions 3.9 to 3.12.
>
> Thanks to all contributors who helped on this release !
>
> J?r?mie,
> On behalf of the scikit-learn maintainers team.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
From apoorva.kulkarni at rwth-aachen.de Fri Jan 26 13:47:05 2024
From: apoorva.kulkarni at rwth-aachen.de (Kulkarni, Apoorva)
Date: Fri, 26 Jan 2024 18:47:05 +0000
Subject: [scikit-learn] Decsion tree Visualization
Message-ID:
Hello,
For an academic project I have used decision tree with depth of 70.
To document the data I need visual tree represention only upto depth of 5. Is there any way to do that? Please suggest.
Apoorva
Get Outlook for Android
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From christian.braune79 at gmail.com Fri Jan 26 13:53:34 2024
From: christian.braune79 at gmail.com (Christian Braune)
Date: Fri, 26 Jan 2024 19:53:34 +0100
Subject: [scikit-learn] Decsion tree Visualization
In-Reply-To:
References:
Message-ID:
Hello Apoorva,
have you tried this function:
https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html
? It has a max_depth parameter which might just do, what you need.
Have a nice weekend!
Kulkarni, Apoorva schrieb am Fr., 26.
Jan. 2024, 19:49:
> Hello,
>
> For an academic project I have used decision tree with depth of 70.
>
> To document the data I need visual tree represention only upto depth of 5.
> Is there any way to do that? Please suggest.
>
> Apoorva
>
>
>
> Get Outlook for Android
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From apoorva.kulkarni at rwth-aachen.de Fri Jan 26 14:06:26 2024
From: apoorva.kulkarni at rwth-aachen.de (Kulkarni, Apoorva)
Date: Fri, 26 Jan 2024 19:06:26 +0000
Subject: [scikit-learn] Decsion tree Visualization
In-Reply-To:
References: ,
Message-ID:
Hello,
The suggested solution worked. We are beginners in this domain, hence grateful for your valuable input.
Thank you so much for your prompt help.
Apoorva
________________________________
From: scikit-learn on behalf of Christian Braune
Sent: Friday, January 26, 2024 7:53:34 PM
To: Scikit-learn mailing list
Subject: Re: [scikit-learn] Decsion tree Visualization
Hello Apoorva,
have you tried this function: https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html ? It has a max_depth parameter which might just do, what you need.
Have a nice weekend!
Kulkarni, Apoorva > schrieb am Fr., 26. Jan. 2024, 19:49:
Hello,
For an academic project I have used decision tree with depth of 70.
To document the data I need visual tree represention only upto depth of 5. Is there any way to do that? Please suggest.
Apoorva
Get Outlook for Android
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From mk1853387 at gmail.com Sun Jan 28 13:16:10 2024
From: mk1853387 at gmail.com (marc nicole)
Date: Sun, 28 Jan 2024 19:16:10 +0100
Subject: [scikit-learn] How to create a binary tree hierarchy given a list
of elements as its leaves
Message-ID:
So I am trying to build a binary tree hierarchy given numerical elements
serving for its leaves (last level of the tree to build). From the leaves I
want to randomly create a name for the higher level of the hierarchy and
assign it to the children elements. For example: if the elements inputted
are `0,1,2,3` then I would like to create firstly 4 elements (say by random
giving them a label composed of a letter and a number) then for the second
level (iteration) I assign each of 0,1 to a random name label (e.g. `b1`)
and `2,3` to another label (`b2`) then for the third level I assign a
parent label to each of `b1` and `b2` as `c1`.
An illustation of the example is the following tree:
[image: tree_exp.PNG]
For this I use numpy's `array_split()` to get the chunks of arrays based on
the iteration needs.
for example to get the first iteration arrays I use `np.array_split(input,
(input.size // k))` where `k` is an even number. In order to assign a
parent node to the children the array range should enclose the children's.
For example to assign the parent node with label `a1` to children `b1` and
`b2` with range respectively [0,1] and [2,3], the parent should have the
range [0,3].
All is fine until a certain iteration (k=4) returns parent with range [0,8]
which is overlapping to children ranges and therefore cannot be their
parent.
My question is how to evenly partition such arrays in a binary way and
create such binary tree so that to obtain for k=4 the first range to be
[0,7] instead of [0,8]?
My code is the following:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import string
import random
import numpy as np
def generate_numbers_list_until_number(stop_number):
if str(stop_number).isnumeric():
return np.arange(stop_number)
else:
raise TypeError('Input should be a number!')
def generate_node_label():
return random.choice(string.ascii_lowercase) \
+ str(random.randint(0, 10))
def main():
data = generate_numbers_list_until_number(100)
k = 1
hierarchies = []
cells_arrays = np.array_split(data, data.size // k)
print cells_arrays
used_node_hierarchy_name = []
node_hierarchy_name = [generate_node_label() for _ in range(0,
len(cells_arrays))]
used_node_hierarchy_name.extend(node_hierarchy_name)
while len(node_hierarchy_name) > 1:
k = k * 2
# bug here in the following line
cells_arrays = list(map(lambda x: [x[0], x[-1]],
np.array_split(data, data.size // k)))
print cells_arrays
node_hierarchy_name = []
# node hierarchy names should not be redundant in another level
for _ in range(0, len(cells_arrays)):
node_name = generate_node_label()
while node_name in used_node_hierarchy_name:
node_name = generate_node_label()
node_hierarchy_name.append(node_name)
used_node_hierarchy_name.extend(node_hierarchy_name)
print used_node_hierarchy_name
hierarchies.append(list(zip(node_hierarchy_name, cells_arrays)))
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tree_exp.PNG
Type: image/png
Size: 21487 bytes
Desc: not available
URL:
From mdiramali at yahoo.com Mon Jan 29 08:53:24 2024
From: mdiramali at yahoo.com (Murat DIRAMALI)
Date: Mon, 29 Jan 2024 13:53:24 +0000 (UTC)
Subject: [scikit-learn] Data Analysis Advice
References: <587408342.1192495.1706536404241.ref@mail.yahoo.com>
Message-ID: <587408342.1192495.1706536404241@mail.yahoo.com>
Hello,I need an advice on the usage of K-Fold cross-validation for the master's thesis I'm supervising. As I know, we run it with the best parameters but do we use train or test dataset? I'm sharing the python code that I'm working on. I would appreciate if you correct my mistakes.Yours sincerely,Murat D?ramal?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Code Snippet.py
Type: text/x-python
Size: 2449 bytes
Desc: not available
URL: