From deronnax at gmail.com Wed May 31 15:27:28 2017 From: deronnax at gmail.com (Mathieu Dupuy) Date: Wed, 31 May 2017 21:27:28 +0200 Subject: [Datetime-SIG] strategy for the C part of ISO 8601 datetime parsing Message-ID: Hi datetime mates I would like to resume soon the C implementation of datetime iso format parsing in CPython I started days ago (http://bugs.python.org/issue15873). Currently I have 2 solutions and would like to know which one do you prefer: * iterating on the string the string, stopping when something is wrong (might process almost all of the string and finally give up because last part is wrong, EG incorrect microseconds or time zone. Penalize invalid strings, best case when most of the strings to process are valid) * first checking the string is correct, then iterating over it and handling each part. Early detection of incorrect strings, useless overhead for valid string. Penalize valid strings, best case when most of the strings to process are invalid). I have a preference for solution #1. I first thought of using sscanf but it's impossible for many reasons, the first of them is scanf is unsuitable for variable numbers of match (you can't express optional match in scanf format). Waiting for your input. From alexander.belopolsky at gmail.com Wed May 31 15:45:29 2017 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 31 May 2017 15:45:29 -0400 Subject: [Datetime-SIG] strategy for the C part of ISO 8601 datetime parsing In-Reply-To: References: Message-ID: As I mentioned at the bug tracker, I would prefer to start with the C implementation falling back to Python. This is what we do for strptime and I don't see why fromisoformat should be different. Let's focus of finalizing the desired behavior and getting the Python implementation checked in. We don't want to maintain two implementations while the features are still subject to revision. Once Python code is mature enough, we can implement the C acceleration. On Wed, May 31, 2017 at 3:27 PM, Mathieu Dupuy wrote: > Hi datetime mates > > I would like to resume soon the C implementation of datetime iso > format parsing in CPython I started days ago > (http://bugs.python.org/issue15873). Currently I have 2 solutions and > would like to know which one do you prefer: > > * iterating on the string the string, stopping when something is wrong > (might process almost all of the string and finally give up because > last part is wrong, EG incorrect microseconds or time zone. Penalize > invalid strings, best case when most of the strings to process are > valid) > * first checking the string is correct, then iterating over it and > handling each part. Early detection of incorrect strings, useless > overhead for valid string. Penalize valid strings, best case when most > of the strings to process are invalid). > > I have a preference for solution #1. I first thought of using sscanf > but it's impossible for many reasons, the first of them is scanf is > unsuitable for variable numbers of match (you can't express optional > match in scanf format). > > Waiting for your input. > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed May 31 18:37:09 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 31 May 2017 15:37:09 -0700 Subject: [Datetime-SIG] strategy for the C part of ISO 8601 datetime parsing In-Reply-To: References: Message-ID: On Wed, May 31, 2017 at 12:27 PM, Mathieu Dupuy wrote: > * iterating on the string the string, stopping when something is wrong > (might process almost all of the string and finally give up because > last part is wrong, EG incorrect microseconds or time zone. Penalize > invalid strings, best case when most of the strings to process are > valid) > * first checking the string is correct, then iterating over it and > handling each part. Early detection of incorrect strings, useless > overhead for valid string. Penalize valid strings, best case when most > of the strings to process are invalid). > > I have a preference for solution #1. I agree -- I suspect that it won't take much longer to convert the string than it would to validate it anyway. so (2) would add a lot of overhead. Also -- I think it's fair to optimize for most strings being valid if you are parsing a lot of datetimes (the only time you care about performance), most of them had better be valid, or performance is your least concern. Alexander Belopolsky wrote: > > As I mentioned at the bug tracker, I would prefer to start with the C > implementation falling back to Python. This is what we do for strptime and > I don't see why fromisoformat should be different. Let's focus of > finalizing the desired behavior and getting the Python implementation > checked in. We don't want to maintain two implementations while the > features are still subject to revision. Once Python code is mature enough, > we can implement the C acceleration. it seems the isostring parsing is a single function, yes? Couldn't the work be done in parallel? if Mathieu wants to write a C version, it could be dropped in to datetime at any point. Ideally, there would be a comprehensive test suite, and then there's little impact. IIUC, an iso 8601 string has three parts: date time tz-offset so a function that returned: date, time, offset = parse_iso(a_string) could be plugged right into the rest of the implementation. (I'm suggesting that deciding exactly what to do with the various options for offset, etc be kep t out of this particular function) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed May 31 19:00:22 2017 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 31 May 2017 19:00:22 -0400 Subject: [Datetime-SIG] strategy for the C part of ISO 8601 datetime parsing In-Reply-To: References: Message-ID: On Wed, May 31, 2017 at 6:37 PM, Chris Barker wrote: > Couldn't the work be done in parallel? At this point this seems to be wasteful. More work for reviewers and potentially more work for Mathieu if we decide to change the behavior before applying the patch. > if Mathieu wants to write a C version, it could be dropped in to datetime > at any point. Right, and this is the reason to focus on the pure Python version first. -------------- next part -------------- An HTML attachment was scrubbed... URL: