How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?
marc nicole
mk1853387 at gmail.com
Sun Jan 21 11:54:09 EST 2024
Thanks for the reply,
I think using a Pandas (or a Numpy) approach would optimize the execution
of the program.
Target cells could be up to 10% the size of the dataset, a good example to
start with would have from 10 to 100 values.
Let me know your thoughts, here's a reproducible example which I formatted:
from numpy import random
import pandas as pd
import numpy as np
import operator
import math
from collections import deque
from queue import *
from queue import Queue
from itertools import product
def select_target_values(dataframe, number_of_target_values):
target_cells = []
for _ in range(number_of_target_values):
row_x = random.randint(0, len(dataframe.columns) - 1)
col_y = random.randint(0, len(dataframe) - 1)
target_cells.append((row_x, col_y))
return target_cells
def select_contours(target_cells):
contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]
contour_cells = []
for target_cell in target_cells:
# random contour count for each cell
contour_cells_count = random.randint(1, 4)
try:
contour_cells.append(
[
tuple(
map(
lambda i, j: i + j,
(target_cell[0], target_cell[1]),
contour_coordinates[iteration_],
)
)
for iteration_ in range(contour_cells_count)
]
)
except IndexError:
continue
return contour_cells
def create_zipf_distribution():
zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5))
zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3)
return zipf_distribution_dataset
def apply_contours(target_cells, contour_cells):
target_cells_with_contour = []
# create one single list of cells
for idx, target_cell in enumerate(target_cells):
target_cell_with_contour = [target_cell]
target_cell_with_contour.extend(contour_cells[idx])
target_cells_with_contour.append(target_cell_with_contour)
return target_cells_with_contour
def create_possible_datasets(dataframe, target_cells_with_contour):
all_datasets_final = []
dataframe_original = dataframe.copy()
list_tuples_idx_cells_all_datasets = list(
filter(
lambda x: x,
[list(tuples) for tuples in
list(product(*target_cells_with_contour))],
)
)
target_original_cells_coordinates = list(
map(
lambda x: x[0],
[
target_and_contour_cell
for target_and_contour_cell in target_cells_with_contour
],
)
)
for dataset_index_values in list_tuples_idx_cells_all_datasets:
all_datasets = []
for idx_cell in range(len(dataset_index_values)):
dataframe_cpy = dataframe.copy()
dataframe_cpy.iat[
target_original_cells_coordinates[idx_cell][1],
target_original_cells_coordinates[idx_cell][0],
] = dataframe_original.iloc[
dataset_index_values[idx_cell][1],
dataset_index_values[idx_cell][0]
]
all_datasets.append(dataframe_cpy)
all_datasets_final.append(all_datasets)
return all_datasets_final
def main():
zipf_dataset = create_zipf_distribution()
target_cells = select_target_values(zipf_dataset, 5)
print(target_cells)
contour_cells = select_contours(target_cells)
print(contour_cells)
target_cells_with_contour = apply_contours(target_cells, contour_cells)
datasets = create_possible_datasets(zipf_dataset,
target_cells_with_contour)
print(datasets)
main()
Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list <
python-list at python.org> a écrit :
> On 1/21/2024 7:37 AM, marc nicole via Python-list wrote:
> > Hello,
> >
> > I have an initial dataframe with a random list of target cells (each cell
> > being identified with a couple (x,y)).
> > I want to yield four different dataframes each containing the value of
> one
> > of the contour (surrounding) cells of each specified target cell.
> >
> > the surrounding cells to consider for a specific target cell are :
> (x-1,y),
> > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 cells from
> > these and consider for replacement to the target cell.
> >
> > I want to do that through a pandas-specific approach without having to
> > define the contour cells separately and then apply the changes on the
> > dataframe
>
> 1. Why do you want a Pandas-specific approach? Many people would rather
> keep code independent of special libraries if possible;
>
> 2. How big can these collections of target cells be, roughly speaking?
> The size could make a big difference in picking a design;
>
> 3. You really should work on formatting code for this list. Your code
> below is very complex and would take a lot of work to reformat to the
> point where it is readable, especially with the nearly impenetrable
> arguments in some places. Probably all that is needed is to replace all
> tabs by (say) three spaces, and to make sure you intentionally break
> lines well before they might get word-wrapped. Here is one example I
> have reformatted (I hope I got this right):
>
> list_tuples_idx_cells_all_datasets = list(filter(
> lambda x: utils_tuple_list_not_contain_nan(x),
> [list(tuples) for tuples in list(
> itertools.product(*target_cells_with_contour))
> ]))
>
> 4. As an aside, it doesn't look like you need to convert all those
> sequences and iterators to lists all over the place;
>
>
> > (but rather using an all in one approach):
> > for now I have written this example which I think is not Pandas specific:
> [snip]
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list