Tuesday, June 15, 2021

Python Bytes: #238 A cloud-based file system for Python and a new GUI!

<p><strong>Watch the live stream:</strong></p> <a href='https://www.youtube.com/watch?v=0dk7ZhZwcrM' style='font-weight: bold;'>Watch on YouTube</a><br> <br> <p><strong>About the show</strong></p> <p>Sponsored by Sentry:</p> <ul> <li>Sign up at <a href="https://pythonbytes.fm/sentry"><strong>pythonbytes.fm/sentry</strong></a></li> <li>And please, when signing up, click <strong><em>Got a promo code? Redeem</em></strong> and enter <strong>PYTHONBYTES</strong></li> </ul> <p>Special guest: <a href="https://twitter.com/JSignell"><strong>Julia Signell</strong></a></p> <p><strong>Brain #1:</strong> <a href="https://hakibenita.com/sql-for-data-analysis"><strong>Practical SQL for Data Analysis</strong></a></p> <ul> <li>Haki Benita</li> <li>Pandas is awesome, but … “In this article I demonstrate how to use SQL to perform fast and efficient data analysis.”</li> <li>First part of the article. <ul> <li>SQL is faster than Pandas</li> <li>But they are great together</li> </ul></li> <li>Then tons of examples showing exactly how to best use SQL queries and Pandas in data analysis:: <ul> <li>Basics including random data and sampling</li> <li>Descriptive statistics</li> <li>Subtotals including rollup and groupign sets</li> <li>Pivot tables, both conditional expressions and aggregate expressions</li> <li>Running and cumulative agregation</li> <li>Linear Regression</li> <li>Interpolation</li> </ul></li> <li>Super cheat sheet for useful SQL queries</li> </ul> <p><strong>Michael #2:</strong> <a href="https://twitter.com/ruslanoid/status/1396890700634066945"><strong>Git Blame in your Python Tracebacks</strong></a></p> <ul> <li>via Ruslan Portnoy, by Ofer Koren</li> <li>Helpful Modules: traceback &amp; linecache</li> <li>traceback uses linecache, and we can change linecache line’s text</li> <li>They create a git blame bit of functionality to add to line’s source</li> <li>Turns out this flows to things like PDB.</li> <li>Ripe for a proper package we can add to requirements-dev.txt</li> </ul> <p><strong>Julia #3:</strong> <a href="https://filesystem-spec.readthedocs.io/en/latest/"><strong>fsspec: a unified file system library</strong></a></p> <ul> <li>Martin Durant</li> <li>Other libraries conform to the interface so that each part of the analysis pipeline is like an interchangeable building block (for example s3fs, gcsfs)</li> <li>With the cloud providers competing to host data, fsspec makes it easy to swap out the read layer so that you can hop clouds.</li> </ul> <p><strong>Brian #4:</strong> <a href="https://iximiuz.com/en/posts/thick-container-vulnerabilities/"><strong>The need for slimmer containers</strong></a> or I’m even more confused now as to the usefulness of official base images on Docker Hub</p> <ul> <li><strong>Ivan Velichko</strong> <a href="https://twitter.com/iximiuz"><strong>@iximiuz</strong></a></li> <li>I read this article recently and it had me concerned. Then just yesterday read it again and there are some updates. I’m still concerned, but now also confused. So let’s run it down.</li> <li><code>docker scan</code> can be run on official Python images. <ul> <li>It uses <a href="https://snyk.io/product/container-vulnerability-management/">Snyk Container</a>. We talked about <a href="https://pythonbytes.fm/episodes/show/227/no-more-awaiting-async-comes-to-sqlalchemy">one form of Snyk on Episode 227</a>.</li> </ul></li> <li>Spoiler, all of the official Python containers have vulnerabilities except alpine. <ul> <li>But. In an update, the author says that Alpine has a bunch of problems.</li> </ul></li> <li>The update includes some discussion on Hacker News <ul> <li>vulnerability scanners tend to have lots of false positives</li> <li>official base images are rarely updated</li> <li>some people suggest adding an upgrade command in the beginning of every Dockerfile.</li> <li>but others object saying that the practice leads to unrepeatable builds</li> </ul></li> <li>So, I’m left with wondering if using official Python images are even worth it.</li> <li>Michael: <a href="https://hub.docker.com/_/python">Python’s official image on docker hub</a></li> <li>Michael: <a href="https://www.python.org/dev/peps/pep-0656/">PEP 656 -- Platform Tag for Linux Distributions Using Musl</a></li> <li>Michael: We dive a lot into this in our latest Talk Python recording (not out yet, but <a href="https://www.youtube.com/watch?v=yDend6I9nwE"><strong>live stream is available</strong></a>)</li> <li>Some stats:</li> <li>Ubuntu: Found 32 vulnerabilities, 31 with upgrade.</li> <li><code>python:latest</code>: Found 364 vulnerabilities, 353 with upgrade</li> <li>Ubuntu with source Python: 35 total, 28 low, 7 medium, several from intermediate tools such as wget, gcc, etc.</li> <li>Removing many dev tools SHOULD lower the count, but doesn’t (e.g. wget, gcc)</li> <li>Switching from <code>python:3-9</code> to <code>python:3.9-slim-buster</code> dropped the issues to 69.</li> </ul> <p><strong>Michael #5:</strong> <a href="https://github.com/adamerose/pandasgui"><strong>PandasGUI: A GUI for analyzing Pandas DataFrames</strong></a></p> <ul> <li>Features</li> <li>View DataFrames and Series (with MultiIndex support)</li> <li>Interactive plotting</li> <li>Filtering</li> <li>Statistics summary</li> <li>Data editing and copy / paste</li> <li>Import CSV files with drag &amp; drop</li> <li>Search toolbar</li> <li>Best way to see what it’s about is to <a href="https://www.youtube.com/watch?v=NKXdolMxW2Y"><strong>watch the video</strong></a>.</li> </ul> <p><strong>Julia #6:</strong> <a href="https://pypi.org/project/xarray/"><strong>xarray: pandas-like API for labeled N-dimensional data</strong></a></p> <ul> <li>We’ve been talking a lot about the pandas API and how it’s a common target for dataframe libraries.</li> <li>Xarray is not a dataframe library, it’s for labeled N-dimensional data. </li> <li>People use it in geosciences, and in image processing where they don’t have tabular data, but the axes mean something (lat, lon, time, band…)</li> <li>You can select, aggregate, resample, using the real dimension labels. </li> <li>It can be backed with dask arrays or numpy arrays (or other types of arrays).</li> <li>It supports plotting with <code>.plot</code></li> </ul> <p><strong>Extras</strong></p> <p><strong>Michael</strong></p> <ul> <li><a href="https://www.python.org/downloads/release/python-3100b1/">Python 3.10.0b2 is available</a> (even <a href="https://ift.tt/3gpEIdI store</a>)</li> <li>Django security releases issued: 3.2.4, 3.1.12, and 2.2.24</li> <li><a href="https://twitter.com/Fronkan/status/1403350583260684292"><strong>Another method overloading library</strong></a>?</li> <li>Recently moved to <strong>pip-compile</strong> requirements.in style after last week</li> <li>I’m <a href="https://blog.jetbrains.com/pycharm/2021/06/pycharm-2021-2-eap/"><strong>running PyCharm EAP</strong></a></li> </ul> <p><strong>Brian</strong></p> <ul> <li>Someone responded to me the other day on twitter with an emoji that I was not clear on the meaning of. So I looked it up on <a href="https://emojipedia.org/">emojipedia.org</a>. Super useful for occasionally out of touch people like myself.</li> <li><a href="https://pytestbook.com">pytestbook.com</a> (redirects to <a href="https://pythontest.com/pytest-book/">pythontest.com/pytest-book/)</a> has a facelift and a new home, to get ready for an announcement later this week. It’s built on markdown, hugo, github, and Netlify, so changes can be done super quick with just a commit and push. I just needed a nice readable theme, and <a href="https://pradyunsg.me/">Pradyun’s blog</a> looked great, so I copied his choices.</li> <li>The blog will eventually also have writing, the legacy posts worth keeping from pythontesting.net, and probably transcripts from <a href="https://testandcode.com/">Test &amp; Code</a>.</li> </ul> <p><strong>Julia</strong></p> <ul> <li>GH CLI</li> <li>entrypoints - they are so cool! Example - with pandas you can plot with different backends not just matplotlib and the logic for those backends is contained in the plotting libraries not pandas.</li> </ul> <p><strong>Joke</strong> </p> <p>From <strong><a href="https://upjoke.com/programmer-jokes">https://upjoke.com/programmer-jokes</a></strong></p> <ul> <li>I asked a programmer what her New Year's resolution will be.</li> <li><p>She answered: 1920x1080.</p></li> <li><p>How does a programmer confuse a mathematician?</p></li> <li><p>x = x + 1</p></li> <li><p>Why do Python programmers have low self esteem?</p></li> <li>They're constantly comparing their <code>self</code> to <code>other</code>.</li> </ul>

from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...