Monday, July 6, 2020

PSF GSoC students blogs: Weekly Blog #3 (29th Jun - 6th Jul)

Hey everyone we are done with the first third of the program and I will use this blog to both give the weekly update as well as summarize the current state of progress. In the past 4 weeks , we have created a new number-parser library from scratch and build an MVP that is being continuously improved.

Last week was spent fine-tuning the parser to retrieve the relevant data from the CLDR RBNF repo. This 'rule based number parser' (RBNF) repo is basically a Java library that converts a number (23) to the corresponding word. (twenty-three) It has a lot of hard-coded values and data that are very useful to our library and thus we plan to extract all this information accurately and efficiently.

In addition to this there are multiple nuances in each of the language that was being taken care , accents in languages. For eg) the french '0' is written as zéro with (accent aigu over the e ) However we don't expect the users to enter these accents each time hence we need to normalise (i.e remove) these accents.

The most challenging aspect was definitely understanding (which I am still not completely clear) the CLDR RBNF structure , there is only a little documentation explaining some of the basic rules however it's tough to identify which are the relevant rules and which aren't.

Originally I was hoping to add more tests as well in this week however all this took longer than expected so the testing aspect is going to be pushed to the current week.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...