Daily Python: Python Bytes: #220 What, why, and where of friendly errors in Python

Sponsored by Datadog: <a href="http://pythonbytes.fm/datadog">pythonbytes.fm/datadog</a> Special guest: <a href="https://twitter.com/HannahStepanek">Hannah Stepanek</a> <a href='https://www.youtube.com/watch?v=N62W-x1_Sbo' style='font-weight: bold;'>Watch on YouTube</a> Michael #1: <a href="https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded-10-000-000-jupyter-notebooks-from-github-this-is-what-we-learned/">We Downloaded 10,000,000 Jupyter Notebooks From Github – This Is What We Learned</a> <ul> <li>by Alena Guzharina from JetBrains</li> <li>Used the hundreds of thousands of publicly accessible repos on GitHub to learn more about the current state of data science. I think it’s inspired by <a href="https://talkpython.fm/episodes/show/268/analyzing-dozens-of-notebook-environments">work showcased here on Talk Python</a>.</li> <li>2 years ago there were 1,230,000 Jupyter Notebooks published on GitHub. By October 2020 this number had grown 8 times, and we were able to download 9,720,000 notebooks. 8x growth.</li> <li>Despite the rapid growth in popularity of R and Julia in recent years, Python still remains the most commonly used language for writing code in Jupyter Notebooks by an enormous margin.</li> <li>Python 2 went from 53% → 11% in the last two years.</li> <li>Interesting graphs about package usage</li> <li>Not all notebooks are story telling with code: 50% of notebooks contain fewer than 4 Markdown cells and more than 66 code cells.</li> <li>Although there are some outliers, like notebooks with more than 25,000 code lines, 95% of the notebooks contain less than 465 lines of code.</li> </ul> Brian #2: <a href="https://pypi.org/project/pytest-pythonpath/">pytest-pythonpath</a> <ul> <li>plugin for adding to the PYTHONPATH from the pytests.ini file before tests run</li> <li>Mentioned briefly in <a href="https://pythonbytes.fm/episodes/show/62/wooey-and-gooey-are-simple-python-guis">episode 62</a> as a temporary stopgap until you set up a proper package install for your code. (cringing at my arrogance).</li> <li>Lots of projects are NOT packages. For example, applications.</li> <li>I’ve been working with more and more people to get started on testing and the first thing that often comes up is “My tests can’t see my code. Please fix.”</li> <li>Example <ul> <li>proj/src/stuff_you_want_to_test.py</li> <li>proj/tests/test_code.py</li> <li>You can’t import stuff_you_want_to_test.py from the proj/tests directory by default.</li> </ul></li> <li>The more I look at the problem, the more I appreciate the simplicity of pytest-pythonpath</li> <li>pytest-pythonpath does one thing I really care about: <ul> <li>Add this to a pytest.ini file at the <code>proj</code> level:</li> </ul></li> </ul> <pre><code> [pytest] python_paths = src </code></pre> <ul> <li>That’s it. That’s all you have to do to fix the above problem.</li> <li>Paths relative to the directory that pytest.ini is in. Which should be a parent or grandparent of the tests directory.</li> <li>I really can’t think of a simpler way for people to get around this problem.</li> </ul> Hannah #3: <a href="https://www.apress.com/gp/book/9781484258385">Thinking in Pandas</a> <ul> <li>Pandas dependency hierarchy (simplified): <ul> <li><code>Pandas -> NumPy -> BLAS</code> (Basic Linear Algebra Subprograms)</li> </ul></li> <li>Languages: </li> - </ul> <pre><code> - Python -> C -> Assembly df["C"] = df["A"] + df["B"] A = [ 1 4 2 0 ] B = [ 3 2 5 1 ] C = [ 1 + 3 4 + 2 2 + 5 0 + 1 ] </code></pre> <ul> <li>Pandas tries to get the best performance by running operations in parallel.</li> <li>You might think we could speed this problem up by doing something like this:</li> </ul> <pre><code> Thread 1: 1 + 3 Thread 2: 4 + 2 Thread 3: 2 + 5 Thread 4: 0 + 1 </code></pre> <ul> <li>However, the GIL (Global Interpreter Lock) prevents us from achieving the performance improvement we are hoping for. </li> <li>Below is an example of a common threading problem and how a lock solves that problem.</li> - </ul> <pre><code> Thread 1 total Thread 2 1 + 3 + 4 + 2 0 0 + 5 10 0 + 6 + 2 total += 10 0 13 total =10 0 total += 13 10 total = 13 13 Thread 1 total Thread 2 1 + 3 + 4 + 2 0 unlocked 0 + 5 10 0 unlocked + 6 + 2 total += 10 0 locked 13 total =10 0 locked 10 unlocked 10 locked total += 13 10 locked total = 13 23 unlocked </code></pre> <ul> <li>As it turns out, because Python manages memory for you every object in Python would be subject to these kinds of threading issues:</li> </ul> <pre><code> a = 1 # reference count = 1 b = a # reference count = 2 del(b) # reference count = 1 del(a) # reference count = 0 </code></pre> <ul> <li>So, the GIL was invented to avoid this headache which only lets one thread run at a time.</li> <li>Certain parts of the Pandas dependency hierarchy are not subject to the GIL (simplified): <ul> <li><code>Pandas -> NumPy -> BLAS (Basic Linear Algebra Subprograms)</code></li> <li><code>GIL -> no GIL -> hardware optimizations</code></li> </ul></li> <li>So we can get around the GIL in C land but what kind of optimizations does BLAS provide us with? <ul> <li>Parallel operations inside the CPU via Vector registers</li> </ul></li> <li>A vector register is like a regular register but instead of holding one value it can hold multiple values.</li> </ul> <pre><code>| 1 | 4 | 2 | 0 | + + + + | 3 | 2 | 5 | 1 | = = = = | 4 | 6 | 7 | 1 | </code></pre> <ul> <li>Vector registers are only so large though, so the Dataframe is broken up into chunks and the vector operations are performed on each chunk.</li> </ul> Michael #4: <a href="https://jcristharif.com/quickle/">Quickle</a> <ul> <li>Fast. <a href="https://jcristharif.com/quickle/benchmarks.html">Benchmarks</a> show it’s among the fastest serialization methods for Python.</li> <li>Safe. <a href="https://jcristharif.com/quickle/faq.html#why-not-pickle">Unlike pickle</a>, deserializing a user provided message doesn’t allow for arbitrary code execution.</li> <li>Flexible. Unlike <code>msgpack</code> or <code>json</code>, Quickle natively supports a wide range of Python builtin types.</li> <li>Versioning. Quickle supports <a href="https://jcristharif.com/quickle/#schema-evolution">“schema evolution”</a>. Messages can be sent between clients with different schemas without error.</li> <li>Example</li> </ul> <pre><code> >>> import quickle >>> data = quickle.dumps({"hello": "world"}) >>> quickle.loads(data) {'hello': 'world'} </code></pre> Brian #5: <a href="https://aroberge.github.io/friendly-traceback-docs/docs/html/repl.html">what(), why(), where(), explain(), more() from friendly-traceback console</a> <ul> <li>Do this:</li> </ul> <pre><code> $ pip install friendly-friendly_traceback.install() $ python -i >>> import friendly_traceback >>> friendly_traceback.start_console() >>> </code></pre> <ul> <li>Now, after an exception happens, you can ask questions about it.</li> </ul> <pre><code> >>> pass = 1 Traceback (most recent call last): File "[HTML_REMOVED]", line 1 pass = 1 ^ SyntaxError: invalid syntax >>> what() SyntaxError: invalid syntax A `SyntaxError` occurs when Python cannot understand your code. >>> why() You were trying to assign a value to the Python keyword `pass`. This is not allowed. >>> where() Python could not understand the code in the file '[HTML_REMOVED]' beyond the location indicated by --> and ^. -->1: pass = 1 ^ </code></pre> <ul> <li>Cool for teaching or learning.</li> </ul> Hannah #6: <a href="https://bandit.readthedocs.io/en/latest/">Bandit</a> <ul> <li>Bandit is a static analysis security tool.</li> <li>It’s like a linter but for security issues.</li> </ul> <pre><code> pip install bandit bandit -r . </code></pre> <ul> <li>I prefer to run it in a git pre-commit hook:</li> </ul> <pre><code># .pre-commit-config.yaml repos: repo: https://ift.tt/2mIyAAr rev: '1.7.6' hooks: - id: bandit </code></pre> <ul> <li>It finds <a href="https://bandit.readthedocs.io/en/latest/plugins/index.html#complete-test-plugin-listing">issues</a> like: <ul> <li><a href="https://bandit.readthedocs.io/en/latest/plugins/b201_flask_debug_true.html">flask_debug_true</a></li> <li><a href="https://bandit.readthedocs.io/en/latest/plugins/b501_request_with_no_cert_validation.html">request_with_no_cert_validation</a></li> </ul></li> <li>You can ignore certain issues just like any other linter:</li> </ul> <pre><code> assert len(foo) == 1 # nosec </code></pre> Extras: Brian: <ul> <li>Meetups this week 2/3 done. <ul> <li>NOAA Tuesday, Aberdeen this morning - “pytest Fixtures”</li> <li><a href="https://www.meetup.com/Python-PDX-West/events/275321571/">PDX West tomorrow</a> - Michael Presenting “Python Memory Deep Dive”</li> </ul></li> <li>Updated my training page, <a href="https://testandcode.com/training">testandcode.com/training</a> <ul> <li>Feedback welcome. </li> <li>I really like working directly with teams and now that trainings can be virtual, a couple half days is super easy to do.</li> </ul></li> </ul> Michael: <ul> <li><a href="https://www.python.org/dev/peps/pep-0634/">PEP 634 -- Structural Pattern Matching: Specification accepted in 3.10</a></li> <li><a href="https://us.pycon.org/2021/registration/information/">PyCon registration open</a></li> <li><a href="https://2021.pythonwebconf.com/">Python Web Conf reg open</a></li> <li><a href="https://twitter.com/m4rc0v0nh4g3n/status/1353172246093819906">Hour of code - minecraft</a></li> </ul> Joke: Sent in via Michel Rogers-Vallée, Dan Bader, and Allan Mcelroy. :) PEP 8 Song <a href='https://www.youtube.com/watch?v=hgI0p1zf31k' style='font-weight: bold;'>Watch on YouTube</a> <ul> <li>By <a href="https://lemonsaur.us/">Leon Sandoy</a> and team at <a href="https://pythondiscord.com/">Python Discord</a></li> </ul>

from Planet Python
via read more

Daily Python

Wednesday, February 10, 2021

Python Bytes: #220 What, why, and where of friendly errors in Python

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

Search This Blog