[scikit-learn] Mapping fulltext OCR to issue type
Thomas Güttler
guettliml at thomas-guettler.de
Wed Jun 13 05:43:55 EDT 2018
I am still willing to learn.
Does anyone have a recommendation which book or website could help me?
Regards,
Thomas
Am 08.06.2018 um 10:48 schrieb Thomas Güttler:
> We run an issue tracking application. A lot of issues get generated
> from scanned letters.
>
> I have 70k full text OCR result files. Their got created with tesseract.
>
> Every file of these 70k files corresponds to a issue. Each issue has an issue type.
>
> I want to use machine learning and in the future the machine
> should be able to guess the issue type by looking at the full text OCR.
>
> The issue types are not a simple list, it is a tree.
>
> Example:
>
> electricity / power grid
> electricity / outages
> customer support / invoices / complaint
> customer support / invoices / tax
> ....
>
>
> If the machine can't guess
>
> "customer support / invoices / complaint"
>
> it would be nice if it could at least guess roughly the parent issue type:
>
> "customer support / invoices"
>
> I never used sciki before, but I use Python since several years.
>
> Could you please guide me to the right direction?
>
> Regards,
> Thomas Güttler
>
>
--
Thomas Guettler http://www.thomas-guettler.de/
I am looking for feedback: https://github.com/guettli/programming-guidelines
More information about the scikit-learn
mailing list