how to fast processing one million strings to remove quotes
Peter Otten
__peter__ at web.de
Fri Aug 4 02:52:00 EDT 2017
Tim Daneliuk wrote:
> On 08/02/2017 10:05 AM, Daiyue Weng wrote:
>> Hi, I am trying to removing extra quotes from a large set of strings (a
>> list of strings), so for each original string, it looks like,
>>
>> """str_value1"",""str_value2"",""str_value3"",1,""str_value4"""
>>
>>
>> I like to remove the start and end quotes and extra pairs of quotes on
>> each string value, so the result will look like,
>>
>> "str_value1","str_value2","str_value3",1,"str_value4"
>
> <SNIP>
>
> This part can also be done fairly efficiently with sed:
>
> time cat hugequote.txt | sed 's/"""/"/g;s/""/"/g' >/dev/null
>
> real 0m2.660s
> user 0m2.635s
> sys 0m0.055s
>
> hugequote.txt is a file with 1M copies of your test string above in it.
>
> Run on a quad core i5 on FreeBSD 10.3-STABLE.
It looks like Python is fairly competetive:
$ wc -l hugequote.txt
1000000 hugequote.txt
$ cat unquote.py
import csv
with open("hugequote.txt") as instream:
for field, in csv.reader(instream):
print(field)
$ time python3 unquote.py > /dev/null
real 0m3.773s
user 0m3.665s
sys 0m0.082s
$ time cat hugequote.txt | sed 's/"""/"/g;s/""/"/g' > /dev/null
real 0m4.862s
user 0m4.721s
sys 0m0.330s
Run on ancient AMD hardware ;)
More information about the Python-list
mailing list