How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?
marc nicole
mk1853387 at gmail.com
Sun Jan 21 13:25:17 EST 2024
It is part of a larger project aiming at processing data according to a
given algorithm
Do you have any comments or any enhancing recommendations on the code?
Thanks.
Le dim. 21 janv. 2024 à 18:28, Thomas Passin via Python-list <
python-list at python.org> a écrit :
> On 1/21/2024 11:54 AM, marc nicole wrote:
> > Thanks for the reply,
> >
> > I think using a Pandas (or a Numpy) approach would optimize the
> > execution of the program.
> >
> > Target cells could be up to 10% the size of the dataset, a good example
> > to start with would have from 10 to 100 values.
>
> Thanks for the reformatted code. It's much easier to read and think about.
>
> For say 100 points, it doesn't seem that "optimization" would be much of
> an issue. On my laptop machine and Python 3.12, your example takes
> around 5 seconds to run and print(). OTOH if you think you will go to
> much larger datasets, certainly execution time could become a factor.
>
> I would think that NumPy arrays and/or matrices would have good potential.
>
> Is this some kind of a cellular automaton, or an image filtering process?
>
> > Let me know your thoughts, here's a reproducible example which I
> formatted:
> >
> >
> >
> > from numpy import random
> > import pandas as pd
> > import numpy as np
> > import operator
> > import math
> > from collections import deque
> > from queue import *
> > from queue import Queue
> > from itertools import product
> >
> >
> > def select_target_values(dataframe, number_of_target_values):
> > target_cells = []
> > for _ in range(number_of_target_values):
> > row_x = random.randint(0, len(dataframe.columns) - 1)
> > col_y = random.randint(0, len(dataframe) - 1)
> > target_cells.append((row_x, col_y))
> > return target_cells
> >
> >
> > def select_contours(target_cells):
> > contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]
> > contour_cells = []
> > for target_cell in target_cells:
> > # random contour count for each cell
> > contour_cells_count = random.randint(1, 4)
> > try:
> > contour_cells.append(
> > [
> > tuple(
> > map(
> > lambda i, j: i + j,
> > (target_cell[0], target_cell[1]),
> > contour_coordinates[iteration_],
> > )
> > )
> > for iteration_ in range(contour_cells_count)
> > ]
> > )
> > except IndexError:
> > continue
> > return contour_cells
> >
> >
> > def create_zipf_distribution():
> > zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5))
> >
> > zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3)
> >
> > return zipf_distribution_dataset
> >
> >
> > def apply_contours(target_cells, contour_cells):
> > target_cells_with_contour = []
> > # create one single list of cells
> > for idx, target_cell in enumerate(target_cells):
> > target_cell_with_contour = [target_cell]
> > target_cell_with_contour.extend(contour_cells[idx])
> > target_cells_with_contour.append(target_cell_with_contour)
> > return target_cells_with_contour
> >
> >
> > def create_possible_datasets(dataframe, target_cells_with_contour):
> > all_datasets_final = []
> > dataframe_original = dataframe.copy()
> >
> > list_tuples_idx_cells_all_datasets = list(
> > filter(
> > lambda x: x,
> > [list(tuples) for tuples in
> > list(product(*target_cells_with_contour))],
> > )
> > )
> > target_original_cells_coordinates = list(
> > map(
> > lambda x: x[0],
> > [
> > target_and_contour_cell
> > for target_and_contour_cell in target_cells_with_contour
> > ],
> > )
> > )
> > for dataset_index_values in list_tuples_idx_cells_all_datasets:
> > all_datasets = []
> > for idx_cell in range(len(dataset_index_values)):
> > dataframe_cpy = dataframe.copy()
> > dataframe_cpy.iat[
> > target_original_cells_coordinates[idx_cell][1],
> > target_original_cells_coordinates[idx_cell][0],
> > ] = dataframe_original.iloc[
> > dataset_index_values[idx_cell][1],
> > dataset_index_values[idx_cell][0]
> > ]
> > all_datasets.append(dataframe_cpy)
> > all_datasets_final.append(all_datasets)
> > return all_datasets_final
> >
> >
> > def main():
> > zipf_dataset = create_zipf_distribution()
> >
> > target_cells = select_target_values(zipf_dataset, 5)
> > print(target_cells)
> > contour_cells = select_contours(target_cells)
> > print(contour_cells)
> > target_cells_with_contour = apply_contours(target_cells,
> contour_cells)
> > datasets = create_possible_datasets(zipf_dataset,
> > target_cells_with_contour)
> > print(datasets)
> >
> >
> > main()
> >
> > Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list
> > <python-list at python.org <mailto:python-list at python.org>> a écrit :
> >
> > On 1/21/2024 7:37 AM, marc nicole via Python-list wrote:
> > > Hello,
> > >
> > > I have an initial dataframe with a random list of target cells
> > (each cell
> > > being identified with a couple (x,y)).
> > > I want to yield four different dataframes each containing the
> > value of one
> > > of the contour (surrounding) cells of each specified target cell.
> > >
> > > the surrounding cells to consider for a specific target cell are
> > : (x-1,y),
> > > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4
> > cells from
> > > these and consider for replacement to the target cell.
> > >
> > > I want to do that through a pandas-specific approach without
> > having to
> > > define the contour cells separately and then apply the changes on
> the
> > > dataframe
> >
> > 1. Why do you want a Pandas-specific approach? Many people would
> > rather
> > keep code independent of special libraries if possible;
> >
> > 2. How big can these collections of target cells be, roughly
> speaking?
> > The size could make a big difference in picking a design;
> >
> > 3. You really should work on formatting code for this list. Your
> code
> > below is very complex and would take a lot of work to reformat to the
> > point where it is readable, especially with the nearly impenetrable
> > arguments in some places. Probably all that is needed is to replace
> > all
> > tabs by (say) three spaces, and to make sure you intentionally break
> > lines well before they might get word-wrapped. Here is one example I
> > have reformatted (I hope I got this right):
> >
> > list_tuples_idx_cells_all_datasets = list(filter(
> > lambda x: utils_tuple_list_not_contain_nan(x),
> > [list(tuples) for tuples in list(
> > itertools.product(*target_cells_with_contour))
> > ]))
> >
> > 4. As an aside, it doesn't look like you need to convert all those
> > sequences and iterators to lists all over the place;
> >
> >
> > > (but rather using an all in one approach):
> > > for now I have written this example which I think is not Pandas
> > specific:
> > [snip]
> >
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> > <https://mail.python.org/mailman/listinfo/python-list>
> >
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list