Thursday, August 27, 2020

PSF GSoC students blogs: GSOC 2020 - Final Report

Hi, first of all the link to the newly created number-parser library - https://github.com/scrapinghub/number-parser. The entire library was created from scratch
as part of GSoC 2020. Going over the github stats :-

  • 58 commits
  • 46000+ lines of code added
  • 22000+ lines of code deleted.

Phew, that's a lot. Before going any further about the details of the challenges and work done , I would like to thank all my mentors - this journey was possible only due to their support and inputs. Special shout-out to Marc @noviluni , without your constant code-reviews and inputs this library would not have been half as good.
 

Work Done
The README gives a more detailed explanation of what the library is capable of and how to use the library. Basically the goal was to have a way to convert numbers written in the natural language to their numeric form and I am proud to say that we were able to succeed in doing so to a large extent.
Additionally it supports multiple languages :-

For cardinal numbers -> English , Russian , Hindi , Spanish
For ordinal numbers -> English

Challenges / Work Remaining
The toughest challenge was actually starting the process from scratch with little reference. However once this phase was done it was a fun and relatively smooth working.
One major aspect was setting up good tests to ensure the library works well , once again thanks to all my mentors for helping with adding tests in different languages.
Apart from that the library is also planned to be a dependency of date-parser - https://github.com/scrapinghub/dateparser/pull/711 this in turn meant keeping a high level of code quality. Also it was important to structure the code and add features (like auto-language detection) so that the incorporation was smooth.

The library is still in the early stages and there is an endless scope for improvements. So request one and all to contribute and make it better. From the list of pending issues
https://github.com/scrapinghub/number-parser/issues, the major ones to revamp the library are :-

All in all it has been an amazing summer and I would like to thank everyone who was part of it.

Signing off
Arnav



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...