Tuesday, July 9, 2019

PSF GSoC students blogs: Weekly Check-in #7: (5 July - 11 July)

Hey! here is an update on what I have achieved so far.

What did you do this week?

  • Protego now passes all tests borrowed from reppy, rep-cpp and robotexclusionrulesparser.
  • Made few changes to Protego to make it compatible with google's parser.
  • Worked on changes suggest on the interface pull request.
  • Wrote code to fetch robots.txt files from top 1000 websites, and generate statistics we need. ( link )
  • Looked at the code of Google's robots.txt parser for the purpose of creating a python interface on top of it. I might need to modify its code as currently it parses the robots.txt file for answering every query. (Working on anything in C++ that uses pointers or STL heavily makes me feel uncomfortable).

What is coming up next?

  • Modify protego to make it behave similar to Google's parser (will need to add few more features like record group merging), and add more tests.
  • Document Protego.
  • Benchmarking Protego's performance.
  • I would need to read how to call C/C++ code from python, for creating an interface on top Google's parser. I am currently thinking of using Cython.
  • Would work on blog posts (planning to write 3 blog posts within this week).

Did you get stuck anywhere?

No, I got to work with some data science tools like jupyter notebook & pandas.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...