Today I wanted to share what I have done in a recent experiment to address an issue that has been quite rapidly getting out of control. We have a responsibility of pushing out a number of catalogues on a week by week basis, and since the source information has got product labels truncated down due to the size of the original input field, we have got the manual task of renaming to their full displayed length.
Of course, we have got our ways of reducing the amount of manual updates on this, but we certainly could not pick up on most things as fast as a machine could. It is not as simple as just a "find and replace" though, because while such abbrieviations as "sce" mostly means "sauce, it's never a one-size-fits all, and depending on the brand, they can also each have their own spelling.
I then came up with an idea; using some form of diff algorithm, and having a dictionary be learned associated with the products respective brand name. The library I found is called "finediff", a simple implementation which breaks down the copy/find-replace segments in to op-codes. By modifying the class's private function doFragmentDiff to public, this is the crux of how I have applied it to my application:
And there you have it! What you don't see here is first just making a reference to the brand of the current product, but from the two columns, we are just storing what text we are looking for, and what to replace it with. This is technically not machine learning, but on the surface, it kind of replicates the next best thing.
Hope this inspires.
Of course, we have got our ways of reducing the amount of manual updates on this, but we certainly could not pick up on most things as fast as a machine could. It is not as simple as just a "find and replace" though, because while such abbrieviations as "sce" mostly means "sauce, it's never a one-size-fits all, and depending on the brand, they can also each have their own spelling.
I then came up with an idea; using some form of diff algorithm, and having a dictionary be learned associated with the products respective brand name. The library I found is called "finediff", a simple implementation which breaks down the copy/find-replace segments in to op-codes. By modifying the class's private function doFragmentDiff to public, this is the crux of how I have applied it to my application:
And there you have it! What you don't see here is first just making a reference to the brand of the current product, but from the two columns, we are just storing what text we are looking for, and what to replace it with. This is technically not machine learning, but on the surface, it kind of replicates the next best thing.
Hope this inspires.
Comments
Post a Comment