Most efficient way to replace "," with "." in a array and/or dataframe
MRAB
python at mrabarnett.plus.com
Sat Sep 21 21:50:54 EDT 2019
On 2019-09-22 00:42, Markos wrote:
> Hi,
>
> I have a table.csv file with the following structure:
>
> , Polyarene conc ,, mg L-1 ,,,,,,,
> Spectrum, Py, Ace, Anth,
> 1, "0,456", "0,120", "0,168"
> 2, "0,456", "0,040", "0,280"
> 3, "0,152", "0,200", "0,280"
>
> I open as dataframe with the command:
>
> data = pd.read_csv ('table.csv', sep = ',', skiprows = 1)
>
> and the variable "data" has the structure:
>
> Spectrum, Py, Ace, Anth,
> 0 1 0,456 0,120 0,168
> 1 2 0,456 0,040 0,280
> 2 3 0,152 0,200 0,280
>
> I copy the numeric fields to an array with the command:
>
> data_array = data.values [:, 1:]
>
> And the data_array variable gets the fields in string format:
>
> [['0,456' '0,120' '0,168']
> ['0,456' '0,040' '0,280']
> ['0,152' '0,200' '0,280']]
>
> The only way I found to change comma "," to dot "." was using the method
> replace():
>
> for i, line in enumerate (data_array):
> data_array [i] = ([float (element.replace (',', '.')) for element in
> data_array [i]])
>
> But I'm wondering if there is another, more "efficient" way to make this
> change without having to "iterate" all elements of the array with a loop
> "for".
>
> Also I'm also wondering if there would be any benefit of making this
> modification in dataframe before extracting the numeric fields to the array.
>
> Please, any comments or tip?
>
I'd suggest doing all of the replacements in the CSV file first,
something like this:
import re
with open('table.csv') as file:
csv_data = file.read()
# Convert the decimal points and also make them look numeric.
csv_data = re.sub(r'"(-?\d+),(\d+)"', r'\1.\2', csv_data)
with open('fixed_table.csv', 'w') as file:
file.write(csv_data)
More information about the Python-list
mailing list