Daily Python: December 2019

Tuesday, December 31, 2019

Tryton News: Newsletter January 2020

@ced wrote:

photo-of-2020-on-pink-background-3401900.jpg1280×854 211 KB

The Tryton team wishes you a happy new year.
Here are the new features the team has already prepared for the next version.

Contents:

Changes for users

Changes for developers

Changes For The User

We prevent posting draft moves that were created when a statement was validated. Such moves are posted when the statement is actually posted. This ensures a coherent state between the statement and the moves.

In every case the production cost is now allocated automatically to the outgoing moves. The allocation is based on the list price of each of the outgoing products. Any products with no list price are considered as waste and do not have a cost allocated.

We added the list of selection criteria to the carrier. So if you duplicate a carrier, the criteria are also automatically duplicated.

The company module now has its own menu entry and its own administrator group. This provides finer access control.

The “project invoice” group no longer gives access to the timesheets. This provides a better separation of roles.

On small screens (like mobile), the web client no longer displays empty cells. This optimizes the space available for use with other information.

You no longer need to enter a work center for production works that are in a request or draft state.

We now use the multiselection widget to select the week days that a supplier can deliver on.

When starting an CSV export from a client, by default all of the columns in the current view are selected. Before it skipped the Many2One fields for technical reasons, but these are now also selected. This provides the user with better, more expected, behavior.

You can now define a supervisor for each employee. This can be used to define access rules based on the company’s organization.

To improve the display performance on the web client, the number of records displayed is reset to its default value when the list is reloaded.

We implemented a new strategy to position new records in the list. The client now tries to position the new record depending of the list’s current ordering.

Changes For The Developer

Tryton can now use WeasyPrint to convert html and xhtml reports into PDFs. This avoids using LibreOffice which isn’t always as good at rendering the html reports.

The back-end classes can now be imported directly instead of using an indirect function.

The current employee is now in the evaluation context of the record rules. This allows you to create rules that depends on the user’s business role instead of just their user account.

You can now update the action from the ModelView.button_action by returning a dictionary with the value to change. This avoids needing to use a wizard to create a dynamic action.

All the tests are run on the new docker images before publishing them. This reinforces the stability of the published images.

It is now possible to add help text to the keys of Dict fields.

We have extended the generic tests on the wizards to ensure that the buttons point to an existing state.

The views for One2Many and Many2Many fields are no longer pre-fetched if the relation field is displayed with a different widget such as the multiselection widget which doesn’t need a view.

We now run the desktop client tests in our drone CI. This normalizes how tests are executed for all our packages and ensures no regressions are introduced by mistake.

It is now possible to load WSGI middleware using the configuration file. For example to load the InternerExplorerFix from werkzeug:
[wsgi middleware]
ie = werkzeug.contrib.fixers.InternetExplorerFix

[wsgi ie]
kwargs={'fix_attach': False}
We use now the non-standard SKIP LOCK instead of the also non-standard advisory lock when pulling out task from the queue. It has better performance and is more SQL-ish. But this also provides a nicer fallback if the feature is not implemented by the back-end.

It is now possible to use a MultiSelection field as the key for a Dict field.

Posts: 1

Participants: 1

Read full topic

from Planet Python
via read more

PyCoder’s Weekly: Issue #401 (Dec. 31, 2019)

#401 – DECEMBER 31, 2019
View in Browser »

Python 2.7 Retires Today

Python 2.7 will not be maintained past Jan 1st, 2020. So long Python 2, and thank you for your years of faithful service. Python 3, your time is now!
PYTHONCLOCK.ORG

Meditations on the Zen of Python

“The Zen of Python is not ‘the rules of Python’ or ‘guidelines of Python’. It is full of contradiction and allusion. It is not intended to be followed: it is intended to be meditated upon. In this spirit, I offer this series of meditations on the Zen of Python.”
MOSHE ZADKA

Scout APM for Python

Check out Scout’s developer-friendly application performance monitoring solution for Python. Scout continually tracks down N+1 database queries, sources of memory bloat, performance abnormalities, and more. Get back to coding with Scout →
SCOUT APM sponsor

Python Timer Functions: Three Ways to Monitor Your Code

Learn how to use Python timer functions to monitor how fast your programs are running. You’ll use classes, context managers, and decorators to measure your program’s running time. You’ll learn the benefits of each method and which to use given the situation.
REAL PYTHON

Open Source Migrates With Emotional Distress

The creator of Flask reflects on the Python 2 to 3 migration and how the Python community handled the transition. Interesting read!
ARMIN RONACHER

Python REPL and Shell Integration Tips

Some good tips and ways to minimize the context interruption when moving between the shell and a Python session.
JOHN D. COOK

My Business Card Runs Linux & MicroPython

Embedded systems engineer builds a card-sized computer that boots Linux and runs MicroPython. Cool!
GEORGE HILLIARD

PyPy 7.3.0 Released

PYPY BLOG

Python Jobs

Articles & Tutorials

The Python Packaging Ecosystem

“[It] seems worthwhile for me to write-up my perspective as one of the lead architects for that ecosystem on how I characterize the overall problem space of software publication and distribution, where I think we are at the moment, and where I’d like to see us go in the future.”
NICK COGHLAN

Python Developers Are in Demand on Vettery

Vettery is an online hiring marketplace that’s changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today →
VETTERY sponsor

Sorting Data With Python

In this step-by-step course, you’ll learn how to sort in Python. You’ll know how to sort various types of data in different data structures, customize the order, and work with two different ways of sorting in Python.
REAL PYTHON video

Training on Batch: How Do You Split the Data?

Creating data batches for model training evaluated in context of loading data using Python generators, HDF5 files and NumPy using a sound processing machine-learning model as an example.
OLEG ŻERO

How to use Pandas `get_dummies` to Create Dummy Variables in Python

Dummy variables (or binary/indicator variables) are often used in statistical analyses as well as in more simple descriptive statistics.
ERIK MARSJA

Python Type Hints & MyPy Tutorial

This post covers mypy in general terms as well many examples demonstrating the syntax and capabilities of this type checker.
GUILHERME KUNIGAMI

Pipx: Installing, Uninstalling & Upgrading Python Packages in Virtual Envs

Here you will learn how to install, uninstall, & upgrade Python packages using the pipx tool.
ERIK MARSJA

Magic-Wormhole: Get Things From One Computer to Another, Safely

MAGIC-WORMHOLE.READTHEDOCS.IO

Heap Sort in Python

OLIVERA POPOVIĆ

Projects & Code

drf_dynamics: Dynamic Queryset and Serializer Setup for Django REST Framework

Handles the hassle of handling the amount of fields to be serialized and queryset changes for each request for you.
GITHUB.COM/IMBOKOV • Shared by Ilya Bokov

Astropy: Astronomy With Python

ASTROPY.ORG

AI_Sudoku: Extract a Sudoku Puzzle From a Photo and Solve It

GITHUB.COM/NEERU1207

ffmpeg-python: Python Bindings for FFmpeg

GITHUB.COM/KKROENING

Magic-Wormhole: Get Things From One Computer to Another, Safely

MAGIC-WORMHOLE.READTHEDOCS.IO

pyopengl: OpenGL Bindings for Python

GITHUB.COM/MCFLETCH

Events

PyDelhi User Group Meetup

January 4, 2020
MEETUP.COM

Melbourne Python Users Group, Australia

January 6, 2020
J.MP

Dominican Republic Python User Group

January 7, 2020
PYTHON.DO

Heidelberg Python Meetup

January 8, 2020
MEETUP.COM

Python North East

January 8, 2020
PYTHONNORTHEAST.COM

PyStaDa

January 8, 2020
PYSTADA.GITHUB.IO

pyCologne User Group Treffen

January 8, 2020
PYCOLOGNE.DE

Santa Cruz Python Meetup

January 8, 2020
MEETUP.COM

PyMNTos

January 9, 2020
PYTHON.MN

Python Atlanta

January 9, 2020
MEETUP.COM

Happy Pythoning!
This was PyCoder’s Weekly Issue #401.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

from Planet Python
via read more

Test and Code: 97: 2019 Retrospective, 2020 Plans, and an amazing decade

This episode is not just a look back on 2019, and a look forward to 2020.
Also, 2019 is the end of an amazingly transofrmative decade for me, so I'm going to discuss that as well.

top 10 episodes of 2019

10: episode 46, Testing Hard To Test Applications - Anthony Shaw
9: episode 64, Practicing Programming to increase your value
8: episode 70, Learning Software without a CS degree - Dane Hillard
7: episode 75, Modern Testing Principles - Alan Page
6: episode 72, Technical Interview Fixes - April Wensel
5: episode 69, Andy Hunt - The Pragmatic Programmer
4: episode 73, PyCon 2019 Live Recording
3: episode 71, Memorable Tech Talks, The Ultimate Guide - Nina Zakharenko
2: episode 76, TDD: Don’t be afraid of Test-Driven Development - Chris May
1: episode 89, Improving Programming Education - Nicholas Tollervey

Looking back on the last decade
Some amazing events, like 2 podcasts, a book, a blog, speaking events, and teaching has led me to where we're at now.

Looking forward to 2020 and beyond
I discussed what's in store in the next year and moving forward.

A closing quote
Software is a blast. At least, it should be.
I want everyone to have fun writing software.
Leaning on automated tests is the best way I know to allow me confidence and freedome to:

rewrite big chunks of code
play with the code
try new things
have fun without fear
go home feeling good about what I did
be proud of my code I want everyone to have that.

That's why I promote and teach automated testing.

I hope you had an amazing decade.
And I wish you a productive and fun 2020 and the upcoming decade.
If we work together and help eachother reach new heights, we can achieve some pretty amazing things

Catalin George Festila: News : The Python 2.7 no longer support from Python team.

The 1st of January 2020 will mark the sunset of Python 2.7. It’s clear that Python 3 is more popular these days. You can learn more about the popularity of both on Google Trends. Python 3.0 was released in December 2008. The main goal was to fix problems existing in Python 2. Since the 1st January 2020, Python 2 will no longer receive any support whatsoever from the core Python team. Migrating to

from Planet Python
via read more

How does sourmash's lca classification routine compare with GTDB classifications?

GTDB databases again!

from Planet SciPy
read more

John Cook: Area of sinc and jinc function lobes

Someone left a comment this morning on my blog post on sinc and jinc integrals regarding the area of the lobes.

It would be nice to have the values of integrals of each lobe, i.e. integrals between 0 and multiples of pi. Anyone knows of such a table?

This post will include Python code to address that question.

First, let me back up and explain the context. The sinc function is defined as [1]

sinc(x) = sin(x) / x

and the jinc function is defined analogously as

jinc(x) = J₁(x) / x,

substituting the Bessel function J₁ for the sine function. You could think of Bessel functions as analogs of sines and cosines. Bessel functions often come up when vibrations are described in polar coordinates, just as sines and cosines come up when using rectangular coordinates.

Here’s a plot of the sinc and jinc functions:

The lobes are the regions between crossings of the x-axis. For the sinc function, the lobe in the middle runs from -π to π, and for n > 0 the nth lobe runs from nπ to (n+1)π. The zeros of Bessel functions are not uniformly spaced like the zeros of the sine function, but they come up in application frequently and so it’s easy to find software to compute their locations.

First of all we’ll need some imports.

    from scipy import sin, pi
    from scipy.special import jn, jn_zeros
    from scipy.integrate import quad

The sinc and jinc functions are continuous at zero, but the computer doesn’t know that [2]. To prevent division by zero, we return the limiting value of each function for very small arguments.

    def sinc(x):
        return 1 if abs(x) < 1e-8 else sin(x)/x

    def jinc(x):
        return 0.5 if abs(x) < 1e-8 else jn(1,x)/x

You can show via Taylor series that these functions are exact to the limits of floating point precision for |x| < 10^-8.

Here’s code to compute the area of the sinc lobes.

    def sinc_lobe_area(n):
        n = abs(n)
       integral, info = quad(sinc, n*pi, (n+1)*pi)
       return 2*integral if n == 0 else integral

The corresponding code for the jinc function is a little more complicated because we need to compute the zeros for the Bessel function J₁. Our solution is a little clunky because we have an upper bound N on the lobe number. Ideally we’d work out an asymptotic value for the lobe area and compute zeros up to the point where the asymptotic approximation became sufficiently accurate, and switch over to the asymptotic formula for sufficiently large n.

    def jinc_lobe_area(n):
        n = abs(n)
        assert(n < N)
        integral, info = quad(jinc, jzeros[n-1], jzeros[n])
        return 2*integral if n == 0 else integral

Note that the 0th element of the array returned by jn_zeros is the first positive zero of J₁; it doesn’t include the zero at the origin.

For both sinc and jinc, the even numbered lobes have positive area and the odd numbered lobes have negative area. Here’s a plot of the absolute values of the lobe areas.

[1] Some authors define sinc(x) as sin(nx)/nx. Both definitions are common.

[2] Scipy has a sinc function in scipy.special, defined as sin(nx)/nx, but it doesn’t have a jinc function.

from Planet Python
via read more

Real Python: Sorting Data With Python

All programmers will have to write code to sort items or data at some point. Sorting can be critical to the user experience in your application, whether it’s ordering a user’s most recent activity by timestamp, or putting a list of email recipients in alphabetical order by last name. Python sorting functionality offers robust features to do basic sorting or customize ordering at a granular level.

In this course, you’ll learn how to sort various types of data in different data structures, customize the order, and work with two different methods of sorting in Python.

By the end of this tutorial, you’ll know how to:

Implement basic Python sorting and ordering on data structures
Differentiate between sorted() and .sort()
Customize a complex sort order in your code based on unique requirements

For this course, you’ll need a basic understanding of lists and tuples as well as sets. Those data structures will be used in this course, and some basic operations will be performed on them. Also, this course uses Python 3, so example output might vary slightly from what you’d see if you were using Python 2.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

from Planet Python
via read more

Sorting Data With Python

In this course, you’ll learn how to sort various types of data in different data structures, customize the order, and work with two different methods of sorting in Python.

By the end of this tutorial, you’ll know how to:

Implement basic Python sorting and ordering on data structures
Differentiate between sorted() and .sort()
Customize a complex sort order in your code based on unique requirements

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

from Real Python
read more

S. Lott: Christmas Ornament

See https://github.com/slott56/cpx-xmas-ornament

You'll need a Circuit Playground Express https://www.adafruit.com/product/3333

Install the code. Enjoy the noise and blinky lights.

The MML translation isn't as complete as you might like. The upper/lower case for the various commands isn't handled quite as cleanly as it could be. AFAIK, case shouldn't matter, but I omitted any lower() functions, making the MML parser case sensitive. It only mattered for one of the four songs, and it was easier to edit the song.

The processing leaves a great deal of "clickiness" in the start_tone() processing. I think I know how to address it.

There are barely 96 or so different tones available in MML compositions. It might be possible to generate the wave shapes in advance to have a smoother music experience.

One could image having an off-line translator to transform the MML text into a sequence of bytes with note number and duration. This would slightly compress the song, but would speed up processing by eliminating the overhead of parsing.

Additionally, having 96 wave tables could speed up tone production. The tiny bit of time to recompute the sine wave at a given frequency would be eliminated. But. Memory is limited.

from Planet Python
via read more

Monday, December 30, 2019

Moshe Zadka: Meditations on the Zen of Python

(This is based on the series published in opensource.com as 9 articles: 1, 2, 3, 4, 5, 6, 7, 8, 9)

Python contributor Tim Peters introduced us to the Zen of Python in 1999. Twenty years later, its 19 guiding principles continue to be relevant within the community.

The Zen of Python is not "the rules of Python" or "guidelines of Python". It is full of contradiction and allusion. It is not intended to be followed: it is intended to be meditated upon.

In this spirit, I offer this series of meditations on the Zen of Python.

Beautiful is better than ugly.

It was in Structure and Interpretation of Computer Programs (SICP) that the point was made: "Programs must be written for people to read and only incidentally for machines to execute." Machines do not care about beauty, but people do.

A beautiful program is one that is enjoyable to read. This means first that it is consistent. Tools like Black, flake8, and Pylint are great for making sure things are reasonable on a surface layer.

But even more important, only humans can judge what humans find beautiful. Code reviews and a collaborative approach to writing code are the only realistic way to build beautiful code. Listening to other people is an important skill in software development.

Finally, all the tools and processes are moot if the will is not there. Without an appreciation for the importance of beauty, there will never be an emphasis on writing beautiful code.

This is why this is the first principle: it is a way of making "beauty" a value in the Python community. It immediately answers: "Do we really care about beauty?" We do.

Explicit is better than implicit.

We humans celebrate light and fear the dark. Light helps us make sense of vague images. In the same way, programming with more explicitness helps us make sense of abstract ideas. It is often tempting to make things implicit.

"Why is self explicitly there as the first parameter of methods?"

There are many technical explanations, but all of them are wrong. It is almost a Python programmer's rite of passage to write a metaclass that makes explicitly listing self unnecessary. (If you have never done this before, do so; it makes a great metaclass learning exercise!)

The reason self is explicit is not because the Python core developers did not want to make a metaclass like that the "default" metaclass. The reason it is explicit is because there is one less special case to teach: the first argument is explicit.

Even when Python does allow non-explicit things, such as context variables, we must always ask: Are we sure we need them? Could we not just pass arguments explicitly? Sometimes, for many reasons, this is not feasible. But prioritizing explicitness means, at least, asking the question and estimating the effort.

Simple is better than complex.

When it is possible to choose at all, choose the simple solution. Python is rarely in the business of disallowing things. This means it is possible, and even straightforward, to design baroque programs to solve straightforward problems.

It is worthwhile to remember at each point that simplicity is one of the easiest things to lose and the hardest to regain when writing code.

This can mean choosing to write something as a function, rather than introducing an extraneous class. This can mean avoiding a robust third-party library in favor of writing a two-line function that is perfect for the immediate use-case. Most often, it means avoiding predicting the future in favor of solving the problem at hand.

It is much easier to change the program later, especially if simplicity and beauty were among its guiding principles, than to load the code down with all possible future variations.

Complex is better than complicated.

This is possibly the most misunderstood principle because understanding the precise meanings of the words is crucial. Something is complex when it is composed of multiple parts. Something is complicated when it has a lot of different, often hard to predict, behaviors.

When solving a hard problem, it is often the case that no simple solution will do. In that case, the most Pythonic strategy is to go "bottom-up." Build simple tools and combine them to solve the problem.

This is where techniques like object composition shine. Instead of having a complicated inheritance hierarchy, have objects that forward some method calls to a separate object. Each of those can be tested and developed separately and then finally put together.

Another example of "building up" is using singledispatch, so that instead of one complicated object, we have a simple, mostly behavior-less object and separate behaviors.

Flat is better than nested.

Nowhere is the pressure to be "flat" more obvious than in Python's strong insistence on indentation. Other languages will often introduce an implementation that "cheats" on the nested structure by reducing indentation requirements. To appreciate this point, let's take a look at JavaScript.

JavaScript is natively async, which means that programmers write code in JavaScript using a lot of callbacks.

a(function(resultsFromA) {
  b(resultsFromA, function(resultsfromB) {
    c(resultsFromC, function(resultsFromC) {
      console.log(resultsFromC)
   }
  }
}

Ignoring the code, observe the pattern and the way indentation leads to a right-most point. This distinctive "arrow" shape is tough on the eye to quickly walk through the code, so it's seen as undesirable and even nicknamed "callback hell." However, in JavaScript, it is possible to "cheat" and not have indentation reflect nesting.

a(function(resultsFromA) {
b(resultsFromA,
  function(resultsfromB) {
c(resultsFromC,
  function(resultsFromC) {
    console.log(resultsFromC)
}}}

Python affords no such options to cheat: every nesting level in the program must be reflected in the indentation level. So deep nesting in Python looks deeply nested. That means "callback hell" was a worse problem in Python than in JavaScript: nesting callbacks mean indenting with no options to "cheat" with braces.

This challenge, in combination with the Zen principle, has led to an elegant solution by a library I worked on. In the Twisted framework, we came up with the deferred abstraction, which would later inspire the popular JavaScript promise abstraction. In this way, Python's unwavering commitment to clear code forces Python developers to discover new, powerful abstractions.

future_value = future_result()
future_value.addCallback(a)
future_value.addCallback(b)
future_value.addCallback(c)

(This might look familiar to modern JavaScript programmers: Promises were heavily influenced by Twisted's deferreds.)

Sparse is better than dense.

The easiest way to make something less dense is to introduce nesting. This habit is why the principle of sparseness follows the previous one: after we have reduced nesting as much as possible, we are often left with dense code or data structures. Density, in this sense, is jamming too much information into a small amount of code, making it difficult to decipher when something goes wrong.

Reducing that denseness requires creative thinking, and there are no simple solutions. The Zen of Python does not offer simple solutions. All it offers are ways to find what can be improved in the code, without always giving guidance for "how."

Take a walk. Take a shower. Smell the flowers. Sit in a lotus position and think hard, until finally, inspiration strikes. When you are finally enlightened, it is time to write the code.

Readability counts.

In some sense, this middle principle is indeed the center of the entire Zen of Python. The Zen is not about writing efficient programs. It is not even about writing robust programs, for the most part. It is about writing programs that other people can read.

Reading code, by its nature, happens after the code has been added to the system. Often, it happens long after. Neglecting readability is the easiest choice since it does not hurt right now. Whatever the reason for adding new code -- a painful bug or a highly requested feature -- it does hurt. Right now.

In the face of immense pressure to throw readability to the side and just "solve the problem," the Zen of Python reminds us: readability counts. Writing the code so it can be read is a form of compassion for yourself and others.

Special cases aren't special enough to break the rules.

There is always an excuse. This bug is particularly painful; let's not worry about simplicity. This feature is particularly urgent; let's not worry about beauty. The domain rules covering this case are particularly hairy; let's not worry about nesting levels.

Once we allow special pleading, the dam wall breaks, and there are no more principles; things devolve into a Mad Max dystopia with every programmer for themselves, trying to find the best excuses.

Discipline requires commitment. It is only when things are hard, when there is a strong temptation, that a software developer is tested. There is always a valid excuse to break the rules, and that's why the rules must be kept the rules. Discipline is the art of saying no to exceptions. No amount of explanation can change that.

Although, practicality beats purity.

"If you think only of hitting, springing, striking, or touching the enemy, you will not be able actually to cut him.", Miyamoto Musashi, The Book of Water

Ultimately, software development is a practical discipline. Its goal is to solve real problems, faced by real people. Practicality beats purity: above all else, we must solve the problem. If we think only about readability, simplicity, or beauty, we will not be able to actually solve the problem.

As Musashi suggested, the primary goal of every code change should be to solve a problem. The problem must be foremost in our minds. If we waver from it and think only of the Zen of Python, we have failed the Zen of Python. This is another one of those contradictions inherent in the Zen of Python.

Errors should never pass silently...

Before the Zen of Python was a twinkle in Tim Peters' eye, before Wikipedia became informally known as "wiki," the first WikiWiki site, C2, existed as a trove of programming guidelines. These are principles that mostly came out of a Smalltalk programming community. Smalltalk's ideas influenced many object-oriented languages, Python included.

The C2 wiki defines the Samurai Principle: "return victorious, or not at all." In Pythonic terms, it encourages eschewing sentinel values, such as returning None or -1 to indicate an inability to complete the task, in favor of raising exceptions. A None is silent: it looks like a value and can be put in a variable and passed around. Sometimes, it is even a valid return value.

The principle here is that if a function cannot accomplish its contract, it should "fail loudly": raise an exception. The raised exception will never look like a possible value. It will skip past the returned_value = call_to_function(parameter) line and go up the stack, potentially crashing the program.

A crash is straightforward to debug: there is a stack trace indicating the problem as well as the call stack. The failure might mean that a necessary condition for the program was not met, and human intervention is needed. It might mean that the program's logic is faulty. In either case, the loud failure is better than a hidden, "missing" value, infecting the program's valid data with None, until it is used somewhere and an error message says "None does not have method split," which you probably already knew.

Unless explicitly silenced.

Exceptions sometimes need to be explicitly caught. We might anticipate some of the lines in a file are misformatted and want to handle those in a special way, maybe by putting them in a "lines to be looked at by a human" file, instead of crashing the entire program.

Python allows us to catch exceptions with except. This means errors can be explicitly silenced. This explicitness means that the except line is visible in code reviews. It makes sense to question why this is the right place to silence, and potentially recover from, the exception. It makes sense to ask if we are catching too many exceptions or too few.

Because this is all explicit, it is possible for someone to read the code and understand which exceptional conditions are recoverable.

In the face of ambiguity, refuse the temptation to guess.

What should the result of 1 + "1" be? Both "11" and 2 would be valid guesses. This expression is ambiguous: there is no single thing it can do that would not be a surprise to at least some people.

Some languages choose to guess. In JavaScript, the result is "11". In Perl, the result is 2. In C, naturally, the result is the empty string. In the face of ambiguity, JavaScript, Perl, and C all guess.

In Python, this raises a TypeError: an error that is not silent. It is atypical to catch TypeError: it will usually terminate the program or at least the current task (for example, in most web frameworks, it will terminate the handling of the current request).

Python refuses to guess what 1 + "1" means. The programmer is forced to write code with clear intention: either 1 + int("1"), which would be 2 or str(1) + "1", which would be "11"; or "1"[1:], which would be an empty string. By refusing to guess, Python makes programs more predictable.

There should be one -- and preferably only one -- obvious way to do it.

Prediction also goes the other way. Given a task, can you predict the code that will be written to achieve it? It is impossible, of course, to predict perfectly. Programming, after all, is a creative task.

However, there is no reason to intentionally provide multiple, redundant ways to achieve the same thing. There is a sense in which some solutions are "better" or "more Pythonic."

Part of the appreciation for the Pythonic aesthetic is that it is OK to have healthy debates about which solution is better. It is even OK to disagree and keep programming. It is even OK to agree to disagree for the sake of harmony. But beneath it all, there has to be a feeling that, eventually, the right solution will come to light. There must be the hope that eventually we can live in true harmony by agreeing on the best way to achieve a goal.

Although that way may not be obvious at first (unless you're Dutch).

This is an important caveat: It is often not obvious, at first, what is the best way to achieve a task. Ideas are evolving. Python is evolving. The best way to read a file block-by-block is, probably, to wait until Python 3.8 and use the walrus operator.

This common task, reading a file block-by-block, did not have a "single best way to do it" for almost 30 years of Python's existence.

When I started using Python in 1998 with Python 1.5.2, there was no single best way to read a file line-by-line. For many years, the best way to know if a dictionary had a key was to use .haskey until the in operator became the best way.

It is only by appreciating that sometimes, finding the one (and only one) way of achieving a goal can take 30 years of trying out alternatives that Python can keep aiming to find those ways. This view of history, where 30 years is an acceptable time for something to take, often feels foreign to people in the United States, when the country has existed for just over 200 years.

The Dutch, whether it's Python creator Guido van Rossum or famous computer scientist Edsger W. Dijkstra, have a different worldview according to this part of the Zen of Python. A certain European appreciation for time is essential.

Now is better than never.

There is always the temptation to delay things until they are perfect. They will never be perfect, though. When they look "ready" enough, that is when it is time to take the plunge and put them out there. Ultimately, a change always happens at some now: the only thing that delaying does is move it to a future person's "now."

Although never is often better than right now.

This, however, does not mean things should be rushed. Decide the criteria for release in terms of testing, documentation, user feedback, and so on. "Right now," as in before the change is ready, is not a good time.

This is a good lesson not just for popular languages like Python, but also for your personal little open source project.

If the implementation is hard to explain, it's a bad idea.

The most important thing about programming languages is predictability. Sometimes we explain the semantics of a certain construct in terms of abstract programming models, which do not correspond exactly to the implementation. However, the best of all explanations just explains the implementation.

If the implementation is hard to explain, it means the avenue is impossible.

If the implementation is easy to explain, it may be a good idea.

Just because something is easy does not mean it is worthwhile. However, once it is explained, it is much easier to judge whether it is a good idea.

This is why the second half of this principle intentionally equivocates: nothing is certain to be a good idea, but it always allows people to have that discussion.

Namespaces in Python

Python uses namespaces for everything. Though simple, they are sparse data utructures -- which is often the best way to achieve a goal.

Modules are namespaces. This means that correctly predicting module semantics often just requires familiarity with how Python namespaces work. Classes are namespaces. Objects are namespaces. Functions have access to their local namespace, their parent namespace, and the global namespace.

The simple model, where the . operator accesses an object, which in turn will usually, but not always, do some sort of dictionary lookup, makes Python hard to optimize, but easy to explain.

Indeed, some third-party modules take this guideline and run with it. For example, the variants package turns functions into namespaces of "related functionality." It is a good example of how the Zen of Python can inspire new abstractions.

from Planet Python
via read more

Codementor: Making Your First GUI: Python3, Tkinter

Your First GUI: Python3, Tkinter! Making a conversion app with a GUI

from Planet Python
via read more

Learn PyQt: LearnPyQt — One year in, and more to come.

It's been a very good year.

Back in May I was looking through my collection of PyQt tutorials and videos and trying to decide what to do with them. They were pretty popular, but being hosted on multiple sites meant they lacked structure between them and were less useful than they could be. I needed somewhere to put them.

Having looked the options available for hosting tutorials and courses I couldn't find something that fit my requirements. So I committed the #1 programmer mistake of building my own.

LearnPyQt.com was born, and it turned out pretty great.

The site uses a freemium model — long detailed text tutorials, with an upgrade to buy video courses and books for those that want them. Built on the Django-based Wagtail CMS it has been extended with some custom apps into a fully-fledged learning management system. But it's far from complete. Plans include adding progress tracking, certificates and some lightweight gamification. The goal here is to provide little hooks and challenges, to keep you inspired and experimenting with PyQt (and Python).

The availability of the free tutorials is key — not everyone wants videos or books and not wanting those things is no reason not to learn something. Even so, the upgrade is a one-off payment to keep it affordable for as many people as possible, no subscriptions here!

New Tutorials

Once the existing tutorials and videos were up and running I set about creating more. These new tutorials were modelled on the popular multithreading tutorial, taking frequently asked PyQt5 questions and pain points and tackling them in detail together with working examples. This led first to the (often dreaded) ModelView architecture which really isn't that bad and then later to bitmap graphics which unlocks the power of QPainter giving you the ability to create your own custom widgets.

As the list of obvious targets dries up I'll be adding a topic-voting system on site to allow students to request and vote for their particular topics of interest, to keep me on topic with what people actually want.

New Videos

The video tutorials were where it all started, however in the past year these have fallen a little behind. This will be rectified in the coming months, with new video tutorials recorded for the advanced tutorials and updates to the existing videos following shortly after. The issue has been balancing between writing new content and recording new content, but that problem is solved now we have...

New Writers

With the long list of things to tackle I was very happy to be joined this year by a new writer — John Lim. John is a Python developer from Malaysia, who's been developing with PyQt5 for over 2 years and still remembers all the pain points getting started. His first tutorials covered embedding custom widgets from Qt Designer and basic plotting with PyQtGraph both of which were a huge success.

If you're interested in becoming a writer, you can! You get paid, and — assuming you enjoy writing about PyQt — it's a lot of fun.

New Types of Content

In addition to all the new tutorials and videos, we've been experimenting with new types of content on the site. First of all we have been working on a set of example apps and widgets which you can use for inspiration — or just plain use the code from — for your own projects. Everything on the site is open source and free to use.

We've also been experimenting with alternatives short-form tutorials/documentation for core Qt widgets and features. The first of these by John covers adding scrollable regions with QScrollArea to your app. We'll have more of these, together with more complete documentation re-written for Python coming soon.

New Year

That's all for this year.

To help the year go out with a bang, we're currently running a 50% discount on all courses and books with the code NEWYEAR20. Every purchase gets unlimited access to all future updates and upgrades, so this is a great way to get in ahead of all the good stuff coming down the pipeline.

The same code will give 10% off after New Year. Feel free to share it with the people you love, or wait a few days and share it with people you love slightly less.

Here's to another year building GUI apps with Python!

from Planet Python
via read more

Zero-with-Dot (Oleg Żero): Training on batch: how to split data effectively?

Introduction

With increasing volumes of the data, a common approach to train machine-learning models is to apply the so-called training on batch. This approach involves splitting a dataset into a series of smaller data chunks that are handed to the model one at a time.

In this post, we will present three ideas to split the dataset for batches:

creating a “big” tensor,
loading partial data with HDF5,
python generators.

For illustration purposes, we will pretend that the model is a sound-based detector, but the analysis presented in this post is generic. Despite the example is framed as a particular case, the steps discussed here are essentially splitting, preprocessing and iterating over the data. It conforms to a common procedure. Regardless of the data comes in for of image files, table derived from a SQL query or an HTTP response, it is the procedure that is our main concern.

Specifically, we will compare our methods by looking into the following aspects:

code quality,
memory footprint,
time efficiency.

What is a batch?

Formally, a batch is understood as an input-output pair (X[i], y[i]), being a subset of the data. Since our model is a sound-based detector, it expects a processed audio sequence as input and returns the probability of occurrence of a certain event. Naturally, in our case, the batch is consisted of:

X[t] - a matrix representing processed audio track sampled within a time-window, and
y[t] - a binary label denoting the presence of the event,

where t to denote the time-window (figure 1.).

/assets/splitting-to-batches/data-input.png

Figure 1. An example of data input. Top: simple binary label (random), middle: raw audio channel (mono), bottom: spectrogram represented as naural logarithm of the spectrum. The vertical lines represent slicing of the sequence into batches of 1 second length.

Spectrogram

As for the spectrogram, you can think of it as a way of describing how much of each “tune” is present within the audio track. For instance, when a bass guitar is being played, the spectrogram would reveal high intensity more concentrated on the lower side of the spectrum. Conversely, with a soprano singer we would observe the opposite. With this kind of “encoding”, a spectrogram naturally represents useful features for the model.

Comparing ideas

As a common prerequisite for our comparison, let’s briefly define the following imports and constants.

from scipy.signal import spectrogram
from os.path import join
from math import ceil
import numpy as np


FILENAME = 'test'
FILEPATH = 'data'
CHANNEL  = 0        # mono track only
SAMPLING = 8000     # sampling rate (audio at 8k samples per s)
NFREQS   = 512      # 512 frequencies for the spectrogram
NTIMES   = 400      # 400 time-points for the spectrogram
SLEN     = 1        # 1 second of audio for a batch

N = lambda x: (x - x.mean())/x.std() # normalization

filename = join(FILEPATH, FILENAME)

Here, the numbers are somewhat arbitrary. We decide to go for the lowest sampling rate (other common values are 16k and 22.4k fps), and let every X-chunk be a spectrogram of 512 frequency channels that is calculated from a non-overlapping audio sequence of 1s, using 400 data points along the time axis. In other words, each batch will be a pair of a 512-by-400 matrix, supplemented with a binary label.

Idea #1 - A “big” tensor

The input to the model is a 2-dimensional tensor. As the last step involves iterating over the batches, it makes sense to increase the rank of the tensor and reserve the third dimension for the batch count. Consequently, the whole process can be outlined as follows:

Load the x-data.
Load the y-label.
Slice X and y into batches.
Extract features on each batch (here: the spectrogram).
Collate X[t] and y[t] together.

Why wouldn’t that be a good idea? Let’s see an example of the implementation.

def create_X_tensor(audio, fs, slen=SLEN, bsize=(NFREQS, NTIMES)):
    X = np.zeros((n_batches, bsize[0], bsize[1]))

    for bn in range(n_batches):
        aslice = slice(bn*slen*fs, (bn + 1)*slen*fs)
        *_, spec = spectrogram(
                N(audio(aslice)), 
                fs       = fs, 
                nperseg  = int(fs/bsize[1]),
                noverlap = 0,
                nfft     = bsize[0])
        X[bn, :, :spec.shape[1]] = spec
    return np.log(X + 1e-6) # to avoid -Inf

def get_batch(X, y, bn):
    return X[bn, :, :], y[bn]


if __name__ == '__main__':
    audio = np.load(filename + '.npy')[:, CHANNEL]
    label = np.load(filename + '-lbl.npy')

    X = create_X_tensor(audio, SAMPLING)
    for t in range(X.shape[0]):
        batch = get_batch(X, y, t)
        print ('Batch #{}, shape={}, label={}'.format(
            t, X.shape, y[i]))

The essence of this method can best be described as load it all now, worry about it later.

While creating X a self-contained data piece can be viewed as an advantage, this approach has disadvantages:

We lead all data into the RAM, regardless of the RAM can store such data or not.
We use the first dimension of X for the batch count. However, this is solely based on a convention. What if the next time somebody decides that it should be the last one instead?
Although X.shape[0] tells us exactly how many batches we have, we still have to create an auxiliary variable t to help us keep track of the batches. This design enforces the model training code to adhere to this decision.
Finally, it asks for the get_batch function to be defined. Its only purpose is to select a subset of X and y and collate them together. It looks undesired at best.

Idea #2 - Loading batches with HDF5

Let’s start with eliminating the most dreaded problem that is having to load all data into the RAM. If the data comes from a file, it would make sense to be able to only load portions of it and operate on these portions.

Using skiprows and nrows arguments from Pandas’ read_csv it is possible to load fragments of a .csv file. However, with the CSV format being rather impractical for storing sound data, Hierarchical Data Format (HDF5) is a better choice. The format allows us to store multiple numpy-like arrays and access them in a numpy-like way.

Here, we assume that the file contains intrinsic datasets called 'audio' and 'label'. Check out Python h5py library for more information.

def get_batch(filepath, t, slen=SLEN, bsize=(NFREQS, NTIMES)):
    with h5.File(filepath + '.h5', 'r') as f:
        fs    = f['audio'].attrs['sampling_rate']
        audio = f['audio'][t*slen*fs:(t + 1)*slen*fs, CHANNEL]
        label = f['label'][t]

    *_, spec = spectrogram(
            N(audio),
            fs          = fs,
            nperseg     = int(fs/bsize[1]),
            noverlap    = 0,
            nfft        = bsize[0])
    X = np.zeros((bsize[0] // 2 + 1, bsize[1]))
    X[:, :spec.shape[1]] = spec
    return np.log(X + 1e-6), label

def get_number_of_batches(filepath):
    with h5.File(filepath + '.h5', 'r') as f:
        fs = f['audio'].attrs['sampling_rate']
        sp = f['audio'].shape[0]
    return ceil(sp/fs)
    

if __name__ == '__main__':
    n_batches = get_number_of_batches(filename)
    for t in range(n_batches):
        batch = get_batch(filename, t)
        print ('Batch #{}, shape={}, label={}'.format(
            i, batch[0].shape, batch[1]))

Hopefully, our data is now manageable (if it was not before)! Moreover, we have also achieved some progress when it comes to the overall quality:

We got rid of the previous get_batch function and replaced it with the one that more meaningful. It computes what is necessary and delivers the data. Simple.
Our X tensor no longer needs to be artificially modified.
In fact, by changing get_batch(X, y, t) to get_batch(filename, t), we have abstracted access to our dataset and removed X and y from the namespace.
The dataset has also became a single file. We do not need to source the data and the labels from two different files.
We do not need to supply fs (the sampling rate) argument. Thanks to the so-called attributes in HDF5, it can be a part of the dataset file.

Despite the advantages, we are still left with two… inconveniences.

Because the new get_batch does not remember the state. We have to rely on controlling t using a loop as before. However, as there is no mechanism within get_batch to tell how large the loop needs to be (apart from adding the third output argument, making it weird), we need to check the size of our data beforehand. Apart from adding the third output to get_batch, which would make this function rather weird, it requires us to create a second function: get_number_of_batches.

Unfortunately, it does not make the solution as elegant as it can be. If we only transform get_batch to a form where it would preserve the state, we can do better.

Idea #3 - Generators

Let’s recognize the pattern. We are only interested in accessing, processing and delivering of data pieces one after the other. We do not need it all at once.

For these opportunities, Python has a special construct, namely generators. Generators are functions that return generator iterators Instead of eagerly performing the computation, the iterators deliver a bit of the result at the time and wait to be asked to continue. Perfect, right?

Generator iterators can be constructed in three ways:

through an expression that is similar to a list comprehansion: e.g. (i for i in iterable), but using () instead of [],
from a generator function - by replacing return with yield, or
from a class object that defines custom __iter__ (or __getitem__) and __next__ methods (see docs).

Here, using yield fits naturally in what we need to do.

def get_batches(filepath, slen=SLEN, bsize=(NFREQS, NTIMES)):
    with h5.File(filepath + '.h5', 'r') as f:
        fs = f['audio'].attrs['sampling_rate']
        n_batches = ceil(f['audio'].shape[0]/fs)

        for t in range(n_batches):
            audio = f['audio'][t*slen*fs:(t + 1)*slen*fs, CHANNEL]
            label = f['label'][t]
            *_, spec = spectrogram(
                    N(audio),
                    fs          = fs,
                    nperseg     = int(fs/bsize[1]),
                    noverlap    = 0,
                    nfft        = bsize[0])
            X = np.zeros((bsize[0] // 2 + 1, bsize[1]))
            X[:, :spec.shape[1]] = spec
            yield np.log(X + 1e-6), label


if __name__ == '__main__':
    for b in get_batches(filename):
        print ('shape={}, label={}'.format(b[0].shape, b[1]))

The loop is now inside of the function. Thanks to the yield statement, the (X[t], y[t]) pair will only be returned after get_batches be called t - 1 times. The model training code does not need to manage the state of the loop. The function remembers its state between calls, allowing the user to iterate over batches as opposed to having some artificial batch index.

It is useful to compare generator iterators to containers with data. As batches get removed with every iteration, at some point the container becomes empty. Consequently, neither indexing nor a stop condition is necessary. Data gets consumed until there is no more data and the process stops.

Performance: time and memory

We have intentionally started with a discussion on the code quality, as it was tightly related to the way our solution has been evolving. However, it is just as important to consider resource constraints, especially when data grows in volume.

Figure 2. presents the time it takes to deliver the batches using the three different methods described earlier. As we can see, the time it takes to process and hand over the data is nearly the same. Regardless if we load all data to process and then slice it or load and process it bit-by-bit from the beginning, the total time to get the solution is almost equal. This, of course, could be the consequence of having SSD that allows faster access to the data. Still, the strategy chosen seems to have little impact on overall time performance.

/assets/splitting-to-batches/time-performance.png

Figure 2. Time performance comparison. The red-solid line refers to timing both loading the data to the memory and performing the computation. The red-dotted line times only the loop, where slices are delivered, assuming that data was precomputed. The green-dotted line refers to loading batches from HDF5 file and the blue-dashed-dotted line implements a generator. Comparing the red lines, we can see that just accessing of the data once it is in the RAM is almost for free. When data is local, the differences between the other cases are minimal, anyway.

Much more difference can be observed when looking at figure 3. Considering the first approach, it is the most memory-hungry of all, making the 1-hour long audio sample throws MemoryError. Conversely, when loading data in chunks, the allocated RAM is determined by the batch size, leaving us safely below the limit.

/assets/splitting-to-batches/ram-consumption.png

Figure 3. Memory consumption comparison, expressed in terms of the percentage of the available RAM being consumed by the python script, evaluated using: (env)$ python idea.py & top -b -n 10 > capture.log; cat capture.log | egrep python > analysis.log, and post-processed.

Surprisingly (or not), there is no significant difference between the second and the third approach. What the figure tells us, is that choosing or not choosing to implement a generator iterator makes no impact on the memory footprint on our solution.

This is an important take-away. It is often encouraged to use generators as more efficient solutions to save both time and memory Instead, the figure shows that generators alone do not contribute to better solutions in terms of the resources. What matters is only how quickly we can access the resources and how much data we can handle at once.

Using an HDF5 file proves to be efficient since we can access the data very quickly, and flexible enough that we do not need to load it all at once. At the same time, the implementation of a generator improves code readability and quality. Although we could also frame the first approach in a generator form, it would not make any sense, since, without the ability to load data in smaller quantities, generators would only improve the syntax. Consequently, the best approach seems to be the simultaneous usage of loading partial data and a generator, which is represented by the 3rd approach.

Final remarks

In this post, we have presented three different ways to split and process our data in batches. We compared both the performance of each of the approaches and the overall code quality. We have also stated that the generators on their own do not make the code more efficient. The final performance is dictated by the time and memory constraints, however, generators can make the solution more elegant.

What solution do you find the most appealing?

from Planet Python
via read more

Tuesday, December 31, 2019

Changes For The User

Changes For The Developer

Python Jobs

Articles & Tutorials

Projects & Code

Events

Related posts

Monday, December 30, 2019

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although, practicality beats purity.

Errors should never pass silently...

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one -- and preferably only one -- obvious way to do it.

Although that way may not be obvious at first (unless you're Dutch).

Now is better than never.

Although never is often better than right now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces in Python

New Tutorials

New Videos

New Writers

New Types of Content

New Year

Introduction

What is a batch?

Figure 1.

Spectrogram

Comparing ideas

Idea #1 - A “big” tensor

Idea #2 - Loading batches with HDF5

Idea #3 - Generators

Performance: time and memory

Figure 2.

Figure 3.

Final remarks