how to fast processing one million strings to remove quotes
Nick Mellor
thebalancepro at gmail.com
Fri Aug 4 00:21:11 EDT 2017
Sorry Daiyue,
Try this correction: I'm writing code without being able to execute it.
> split_on_dbl_dbl_quote = original_list.join('|').split('""')
> remove_dbl_dbl_quotes_and_outer_quotes = split_on_dbl_dbl_quote[::2].join('').split('|')
split_on_dbl_dbl_quote = original_list.join('|').split('""')
remove_dbl_dbl_quotes_and_outer_quotes = '"'.join(split_on_dbl_dbl_quote[::2]).split('|')
Cheers,
Nick
>
> You need to be sure of your data: [::2] (return just even-numbered elements) relies on all double-double-quotes both opening and closing within the same string.
>
> This runs in under a second for a million strings but does affect *all* elements, not just strings. The non-strings would become strings after the second statement.
>
> As to multi-processing: I would be looking at well-optimised single-thread solutions like split/join before I consider MP. If you can fit the problem to a split-join it'll be much simpler and more "pythonic".
>
> Cheers,
>
> Nick
More information about the Python-list
mailing list