Friday, July 31, 2020
Codementor: I Wrote an Online Escape Game
from Planet Python
via read more
NumFOCUS: Dask Life Sciences Fellow [Open Job]
Dask is an open-source library for parallel computing in Python that interoperates with existing Python data science libraries like Numpy, Pandas, Scikit-Learn, and Jupyter. Dask is used today across many different scientific domains. Recently, we’ve observed an increase in use in a few life sciences applications: Large scale imaging in microscopy Single cell analysis Genomics […]
The post Dask Life Sciences Fellow [Open Job] appeared first on NumFOCUS.
from Planet Python
via read more
Mike Driscoll: Real Python Podcast Interview
I am on the latest Real Python podcast where I talk about my ReportLab book, wxPython, and lots more.
The podcast episode that I take part in is called Episode 20: Building PDFs in Python with ReportLab. Check it out and feel free to ask questions in the comments.
Related Articles
The post Real Python Podcast Interview appeared first on The Mouse Vs. The Python.
from Planet Python
via read more
Catalin George Festila: Python 3.8.5 : PyEphem astronomy library for Python - part 001.
from Planet Python
via read more
Python⇒Speed: A tableau of crimes and misfortunes: the ever-useful `docker history`
If you want to understand a Docker image, there is no more useful tool than the docker history
command. Whether it’s telling you why your image is so large, or helping you understand how a base image was constructed, the history
command will let you peer into the innards of any image, allowing you to see the good, the bad, and the ugly.
Let’s see what this command does, what it can teach us about the construction of Docker images, and some examples of why it’s so useful.
Read more...from Planet Python
via read more
This Week in Machine Learning: Should We Be Afraid of AI, SER, Disney, and More
Machine learning is fascinating. New things happen every second while we’re busy performing our daily tasks. If you want to know what […]
The post This Week in Machine Learning: Should We Be Afraid of AI, SER, Disney, and More appeared first on neptune.ai.
from Planet SciPy
read more
PSF GSoC students blogs: Week 5 Blog Post
I am not feeling well this week and have asked for leave this week with my mentors. I will catch up with my plan on this weekend or next week.
from Planet Python
via read more
Real Python: The Real Python Podcast – Episode #20: Building PDFs in Python with ReportLab
Have you wanted to generate advanced reports as PDFs using Python? Maybe you want to build documents with tables, images, or fillable forms. This week on the show we have Mike Driscoll to talk about his book "ReportLab - PDF Processing with Python."
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
from Planet Python
via read more
Learn PyQt: Creating multiple windows in PyQt5/PySide2
In an earlier tutorial we've already covered how to open dialog windows. These are special windows which (by default) grab the focus of the user, and run their own event loop, effectively blocking the execution of the rest of your app.
However, quite often you will want to open a second window in an application, without interrupting the main window -- for example, to show the output of some long-running process, or display graphs or other visualizations. Alternatively, you may want to create an application that allows you to work on multiple documents at once, in their own windows.
It's relatively straightforward to open new windows but there are a few things to keep in mind to make sure they work well. In this tutorial we'll step through how to create a new window, and how to show and hide external windows on demand.
Creating a new window
In Qt any widget without a parent is a window. This means, to show a new window you just need to create a new instance of a widget. This can be any widget type (technically any subclass of QWidget
) including another QMainWindow
if you prefer.
There is no restriction on the number of QMainWindow
instances you can have. If you need toolbars or menus on your second window you will have to use a QMainWindow
to achieve this. This can get confusing for users however, so make sure it's necessary.
As with your main window, creating a window is not sufficient, you must also show it.
- PyQt5
- PySide2
from PyQt5.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window")
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.show_new_window)
self.setCentralWidget(self.button)
def show_new_window(self, checked):
w = AnotherWindow()
w.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
from PySide2.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window")
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.show_new_window)
self.setCentralWidget(self.button)
def show_new_window(self, checked):
w = AnotherWindow()
w.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
If you run this, you'll see the main window. Clicking the button may show the second window, but if you see it it will only be visible for a fraction of a second. What's happening?
def show_new_window(self, checked):
w = AnotherWindow()
w.show()
Inside this method, we are creating our window (widget) object, storing it in the variable w
and showing it. However, once we leave the method we no longer have a reference to the w
variable (it is a local variable) and so it will be cleaned up – and the window destroyed. To fix this we need to keep a reference to the window somewhere, for example on the self
object.
def show_new_window(self, checked):
self.w = AnotherWindow()
self.w.show()
Now, when you click the button to show the new window, it will persist.
However, what happens if you click the button again? The window will be re-created! This new window will replace the old in the self.w
variable, and – because there is now no reference to it – the previous window will be destroyed.
You can see this in action if you change the window definition to show a random number in the label each time it is created.
from random import randint
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0,100))
layout.addWidget(self.label)
self.setLayout(layout)
The __init__
block is only run when creating the window. If you keep clicking the button the number will change, showing that the window is being re-created.
One solution is to simply check whether the window has already being created before creating it. The example below shows this in action.
- PyQt5
- PySide2
from PyQt5.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
from random import randint
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0,100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.w = None # No external window yet.
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.show_new_window)
self.setCentralWidget(self.button)
def show_new_window(self, checked):
if self.w is None:
self.w = AnotherWindow()
self.w.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
from PySide2.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
from random import randint
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0,100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.w = None # No external window yet.
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.show_new_window)
self.setCentralWidget(self.button)
def show_new_window(self, checked):
if self.w is None:
self.w = AnotherWindow()
self.w.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
Using the button you can pop up the window, and use the window controls to close it. If you click the button again, the same window will re-appear.
This approach is fine for windows that you create temporarily – for example if you want to pop up a window to show a particular plot, or log output. However, for many applications you have a number of standard windows that you want to be able to show/hide them on demand.
In the next part we'll look at how to work with these types of windows.
Toggling a window
Often you'll want to toggle the display of a window using an action on a toolbar or in a menu. As we previously saw, if no reference to a window is kept, it will be discarded (and closed). We can use this behaviour to close a window, replacing the show_new_window
method from the previous example with –
def show_new_window(self, checked):
if self.w is None:
self.w = AnotherWindow()
self.w.show()
else:
self.w = None # Discard reference, close window.
By setting self.w
to None
the reference to the window will be lost, and the window will close.
If we set it to any other value that None
the window will still close, but the if self.w is None
test will not pass the next time we click the button and so we will not be able to recreate a window.
This will only work if you have not kept a reference to this window somewhere else. To make sure the window closes regardless, you may want to explicitly call .close()
on it. The full example is shown below.
- PyQt5
- PySide2
from PyQt5.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
from random import randint
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0,100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.w = None # No external window yet.
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.show_new_window)
self.setCentralWidget(self.button)
def show_new_window(self, checked):
if self.w is None:
self.w = AnotherWindow()
self.w.show()
else:
self.w.close() # Close window.
self.w = None # Discard reference.
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
from PySide2.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
from random import randint
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0,100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.w = None # No external window yet.
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.show_new_window)
self.setCentralWidget(self.button)
def show_new_window(self, checked):
if self.w is None:
self.w = AnotherWindow()
self.w.show()
else:
self.w.close() # Close window.
self.w = None # Discard reference.
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
Persistent windows
So far we've looked at how to create new windows on demand. However, sometimes you have a number of standard application windows. In this case rather than create the windows when you want to show them, it can often make more sense to create them at start-up, then use .show()
to display them when needed.
In the following example we create our external window in the __init__
block for the main window, and then our show_new_window
method simply calls self.w.show()
to display it.
- PyQt5
- PySide2
from PyQt5.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
from random import randint
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0,100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.w = AnotherWindow()
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.show_new_window)
self.setCentralWidget(self.button)
def show_new_window(self, checked):
self.w.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
from PySide2.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
from random import randint
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0,100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.w = AnotherWindow()
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.show_new_window)
self.setCentralWidget(self.button)
def show_new_window(self, checked):
self.w.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
If you run this, clicking on the button will show the window as before. However, note that the window is only created once and calling .show()
on an already visible window has no effect.
Showing & hiding persistent windows
Once you have created a persistent window you can show and hide it without recreating it. Once hidden the window still exists, but will not be visible and accept mouse/other input. However you can continue to call methods on the window and update it's state -- including changing it's appearance. Once re-shown any changes will be visible.
Below we update our main window to create a toggle_window
method which checks, using .isVisible()
to see if the window is currently visible. If it is not, it is shown using .show()
, if it is already visible we hide it with .hide()
.
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.w = AnotherWindow()
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.toggle_window)
self.setCentralWidget(self.button)
def toggle_window(self, checked):
if self.w.isVisible():
self.w.hide()
else:
self.w.show()
The complete working example of this persistent window and toggling the show/hide state is shown below.
- PyQt5
- PySide2
from PyQt5.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
from random import randint
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0,100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.w = AnotherWindow()
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.toggle_window)
self.setCentralWidget(self.button)
def toggle_window(self, checked):
if self.w.isVisible():
self.w.hide()
else:
self.w.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
from PySide2.QtWidgets import QApplication, QMainWindow, QPushButton, QLabel, QVBoxLayout, QWidget
import sys
from random import randint
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent, it
will appear as a free-floating window as we want.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0,100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.w = AnotherWindow()
self.button = QPushButton("Push for Window")
self.button.clicked.connect(self.toggle_window)
self.setCentralWidget(self.button)
def toggle_window(self, checked):
if self.w.isVisible():
self.w.hide()
else:
self.w.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
Note that, again, the window is only created once -- the window's __init__
block is not re-run (so the number in the label does not change) each time the window is re-shown.
Multiple windows
You can use the same principle for creating multiple windows -- as long as you keep a reference to the window, things will work as expected. The simplest approach is to create a separate method to toggle the display of each of the windows.
- PyQt5
- PySide2
import sys
from random import randint
from PyQt5.QtWidgets import (
QApplication,
QLabel,
QMainWindow,
QPushButton,
QVBoxLayout,
QWidget,
)
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent,
it will appear as a free-floating window.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0, 100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.window1 = AnotherWindow()
self.window2 = AnotherWindow()
l = QVBoxLayout()
button1 = QPushButton("Push for Window 1")
button1.clicked.connect(self.toggle_window1)
l.addWidget(button1)
button2 = QPushButton("Push for Window 2")
button2.clicked.connect(self.toggle_window2)
l.addWidget(button2)
w = QWidget()
w.setLayout(l)
self.setCentralWidget(w)
def toggle_window1(self, checked):
if self.window1.isVisible():
self.window1.hide()
else:
self.window1.show()
def toggle_window2(self, checked):
if self.window2.isVisible():
self.window2.hide()
else:
self.window2.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
import sys
from random import randint
from PySide2.QtWidgets import (
QApplication,
QLabel,
QMainWindow,
QPushButton,
QVBoxLayout,
QWidget,
)
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent,
it will appear as a free-floating window.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0, 100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.window1 = AnotherWindow()
self.window2 = AnotherWindow()
l = QVBoxLayout()
button1 = QPushButton("Push for Window 1")
button1.clicked.connect(self.toggle_window1)
l.addWidget(button1)
button2 = QPushButton("Push for Window 2")
button2.clicked.connect(self.toggle_window2)
l.addWidget(button2)
w = QWidget()
w.setLayout(l)
self.setCentralWidget(w)
def toggle_window1(self, checked):
if self.window1.isVisible():
self.window1.hide()
else:
self.window1.show()
def toggle_window2(self, checked):
if self.window2.isVisible():
self.window2.hide()
else:
self.window2.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
However, you can also create a generic method which handles toggling for all windows -- see transmitting extra data with Qt signals for a detailed explanation of how this works. The example below shows that in action, using a lambda
function to intercept the signal from each button and pass through the appropriate window. We can also discard the checked
value since we aren't using it.
- PyQt5
- PySide2
import sys
from random import randint
from PyQt5.QtWidgets import (
QApplication,
QLabel,
QMainWindow,
QPushButton,
QVBoxLayout,
QWidget,
)
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent,
it will appear as a free-floating window.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0, 100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.window1 = AnotherWindow()
self.window2 = AnotherWindow()
l = QVBoxLayout()
button1 = QPushButton("Push for Window 1")
button1.clicked.connect(
lambda checked: self.toggle_window(self.window1)
)
l.addWidget(button1)
button2 = QPushButton("Push for Window 2")
button2.clicked.connect(
lambda checked: self.toggle_window(self.window2)
)
l.addWidget(button2)
w = QWidget()
w.setLayout(l)
self.setCentralWidget(w)
def toggle_window(self, window):
if window.isVisible():
window.hide()
else:
window.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
import sys
from random import randint
from PySide2.QtWidgets import (
QApplication,
QLabel,
QMainWindow,
QPushButton,
QVBoxLayout,
QWidget,
)
class AnotherWindow(QWidget):
"""
This "window" is a QWidget. If it has no parent,
it will appear as a free-floating window.
"""
def __init__(self):
super().__init__()
layout = QVBoxLayout()
self.label = QLabel("Another Window % d" % randint(0, 100))
layout.addWidget(self.label)
self.setLayout(layout)
class MainWindow(QMainWindow):
def __init__(self):
super().__init__()
self.window1 = AnotherWindow()
self.window2 = AnotherWindow()
l = QVBoxLayout()
button1 = QPushButton("Push for Window 1")
button1.clicked.connect(
lambda checked: self.toggle_window(self.window1)
)
l.addWidget(button1)
button2 = QPushButton("Push for Window 2")
button2.clicked.connect(
lambda checked: self.toggle_window(self.window2)
)
l.addWidget(button2)
w = QWidget()
w.setLayout(l)
self.setCentralWidget(w)
def toggle_window(self, window):
if window.isVisible():
window.hide()
else:
window.show()
app = QApplication(sys.argv)
w = MainWindow()
w.show()
app.exec_()
from Planet Python
via read more
The Real Python Podcast – Episode #20: Building PDFs in Python with ReportLab
Have you wanted to generate advanced reports as PDFs using Python? Maybe you want to build documents with tables, images, or fillable forms. This week on the show we have Mike Driscoll to talk about his book "ReportLab - PDF Processing with Python."
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
from Real Python
read more
PSF GSoC students blogs: Week 8
Just a brief check-in. Setting up a MySQL environment for manual testing on a mac mini and then moving on to tickets #362 and #363.
from Planet Python
via read more
Thursday, July 30, 2020
Python Insider: Upgrade to pip 20.2, plus, changes coming in 20.3
The highlights for this release are:
- The beta of the next-generation dependency resolver is available -- please test
- Faster installations from wheel files
- Improved handling of wheels containing non-ASCII file contents
- Faster pip list using parallelized network operations
- Installed packages now contain metadata about whether they were directly requested by the user (PEP 376’s REQUESTED file)
The new dependency resolver is off by default because it is in beta and not yet ready for everyday use. The new dependency resolver is significantly stricter and more consistent when it receives incompatible instructions, and reduces support for certain kinds of constraints files, so some workarounds and workflows may break. Please test it with the --use-feature=2020-resolver flag. Please see our guide on how to test and migrate, how to report issues, and context for the change.
Thanks to all who tested the alpha of the new resolver in pip 20.1 for feedback that helped us get it to the beta stage.
We are preparing to change the default dependency resolution behavior and make the new resolver the default in pip 20.3 (in October 2020).
This release also partially optimizes pip’s network usage during installation (as part of a Google Summer of Code project by McSinyx). Please test it with pip install --use-feature=2020-resolver --use-feature=fast-deps and report bugs to the issue tracker. This functionality is still experimental and not ready for everyday use.
You can find more details (including deprecations and removals) in the changelog.
As with all pip releases, a significant amount of the work was contributed by pip’s user community. Huge thanks to all who have contributed, whether through code, documentation, issue reports and/or discussion. Your help keeps pip improving, and is hugely appreciated. Specific thanks go to Mozilla (through its Mozilla Open Source Support Awards) and to the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation, for their funding that enabled substantial work on the new resolver.
from Planet Python
via read more
Matt Layman: Docs, Bugs, and Reports - Building SaaS #66
from Planet Python
via read more
Upgrade to pip 20.2, plus, changes coming in 20.3
The highlights for this release are:
- The beta of the next-generation dependency resolver is available -- please test
- Faster installations from wheel files
- Improved handling of wheels containing non-ASCII file contents
- Faster pip list using parallelized network operations
- Installed packages now contain metadata about whether they were directly requested by the user (PEP 376’s REQUESTED file)
The new dependency resolver is off by default because it is in beta and not yet ready for everyday use. The new dependency resolver is significantly stricter and more consistent when it receives incompatible instructions, and reduces support for certain kinds of constraints files, so some workarounds and workflows may break. Please test it with the --use-feature=2020-resolver flag. Please see our guide on how to test and migrate, how to report issues, and context for the change.
Thanks to all who tested the alpha of the new resolver in pip 20.1 for feedback that helped us get it to the beta stage.
We are preparing to change the default dependency resolution behavior and make the new resolver the default in pip 20.3 (in October 2020).
This release also partially optimizes pip’s network usage during installation (as part of a Google Summer of Code project by McSinyx). Please test it with pip install --use-feature=2020-resolver --use-feature=fast-deps and report bugs to the issue tracker. This functionality is still experimental and not ready for everyday use.
You can find more details (including deprecations and removals) in the changelog.
As with all pip releases, a significant amount of the work was contributed by pip’s user community. Huge thanks to all who have contributed, whether through code, documentation, issue reports and/or discussion. Your help keeps pip improving, and is hugely appreciated. Specific thanks go to Mozilla (through its Mozilla Open Source Support Awards) and to the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation, for their funding that enabled substantial work on the new resolver.
from Python Insider
read more
Paolo Amoroso: Reading Impractical Python Projects
I got a recent book that brought back that fascination and excitement with programming, Impractical Python Projects: Playful Programming Activities to Make You Smarter by Lee Vaughan.
The cover of the book Impractical Python Projects in the Google Play Books app on my Pixel 2 XL phone. |
The book is not a Python tutorial or guide. Instead, it presents stimulating coding projects for non-programmers who want to use Python for doing experiments, test theories, or simulate natural phenomena. This includes professionals who are not software developers but use programming to solve problems in science and engineering.
Exploring and understanding the problem domain is an integral part of the book’s projects along with the coding. This is unlike typical programming books where the examples are often trivial, have little or no domain depth, and are stripped of everything but the essentials.
The science and engineering Impractical Python Projects covers include some great ones that match my interest in astronomy and space such as estimating alien civilizations with the Fermi Paradox, simulating a volcano on Jupiter’s moon Io, simulating orbital maneuvers, and stacking planetary images.
The sample code is straightforward and clear. Since the book is not a language tutorial, it focuses on prototyping and exploration rather than building large and maintainable systems.
This book is worth alone the Humble Bundle of No Starch Press Python programming books I purchased it with.
This post by Paolo Amoroso was published on Moonshots Beyond the Cloud.
from Planet Python
via read more
Janusworx: A Hundred Days of Code, Day 022 - Getting into the Groove
Did the same time as yesterday.
Only about an hour.
Was much more prodcutive though.
Getting the hang of how to sit and program and work through things I do not know.
Gaining a bit of experience with the workflow now.
I have the basics in hand. I know what I want to look up.
So check problem, work a bit, look up, try, fail, repeat, gain incremental success, work some more.
Love the immediate feedback loop.
With other stuff I try, I have to wait days, weeks, months.
Here, it’s immediate.
Beginning to love the work, as I get more familiar with it.
Tomorrow is another day :)
from Planet Python
via read more
Wednesday, July 29, 2020
PSF GSoC students blogs: Weekly Check In - 8
What did I do till now?
Last week I added tests for H2Agent and H2DownloaderHandler
What's coming up next?
Next week I plan to continue working on ScrapyTunnelingH2Agent.
Did I get stuck anywhere?
Yes. I got stuck for a long time while setting up the testing environment of H2DownloaderHandler. The problem was a bit weird one, till now Scrapy was using the Twisted's WrappingFactory class to wrap the Site instance, which allows only upto HTTP/1.1 (for unknown reasons) which took me a long time to realize. After removing the WrappingFactory, the tests environment was setup as required. Apart from this another hurdle I'm still facing is about the CONNECT Protocol in HTTP/2.0, I couldn't really find much blogs/articles on this to get a better idea. I plan to look at some open-source libraries' implementation of HTTP/2.0 CONNECT now.
from Planet Python
via read more
PSF GSoC students blogs: Weekly Check-in #9
<meta charset="utf-8">
What did I do this week?
I added support for Immediate response in the HTTP server. I also
added a new command-line option, so that we can run dataflow without
the need for Sources.
What's next?
I'll be adding tests for the same and updating the documentation to use the new features.
Did I get stuck somewhere?
No.
from Planet Python
via read more
Israel Fruchter: How much fun was EuroPython 2020
How much fun was EuroPython 2020
#pyconil got canceled
This year I’ve finally got enough courage and will, and I had 2 submissions for #pyconil. COVID-19 had other plans, and #pyconil was canceled
I’ve told @ultrabug about this (Numberly CTO, Alexys Jacob), after a few weeks he surprised me with telling me he’s gonna present scylla-driver in europython2020, the shard-aware driver we were working on in the last 6 months.
At the time it wasn’t yet ready nor publish. (Also found out that Numberly were sponsoring europython for years now) Took me a few seconds to figure that he just set me deadline without my consent…
Fast-forwarding a bit, and scylla-driver initial release came out: https://ift.tt/2P5275s
And my tickets for europython2020 were booked…
Discord is fun
Few days before the date, I’ve got an email with instruction to connect to the discord of europython2020, since it’s my first COVID-19 online conference, I’ve registered and login straight away.
It was really nice to be start to start talking with people and meet them a few days before
Each track got its own chat room, backed with zoom webinar and online youtube stream each talk got its own chat room, so people can promote their and answer question before and after the talks. Sponsors had rooms too, and all the sprints had rooms (we’ll get to that later on)
Talk about timing, day before europython2020, this press release came out https://ift.tt/2ZNsW4u
Talks — Day 1
The opening keynote was cancelled cause of technical difficulties, that lost touch with whom was supposed to talk. (it rescheduled to next day)
-
Elias Mistler was talking about “How to write multi-paradigm code” show casing and coding live a sudoku solver, with functional and object originated approaches, and how to mix them both. The code is available as notebook
-
Bojan Miletic was talking about “Django Testing on Steroid: pytest + Hypothesis” — shows the basic usage of hypothesis, I’m hear about hypothesis all over the place, maybe it will bring me one step close to use is in one of my projects
-
Martin Christen gave a very cool demo of “pyRT - Computer Graphics in Jupyter Notebooks for Fun and Teaching” - I love Jupyter notebook visual trick, and I love 3d. So this one I enjoyed a lot, Martin was quite a performer and funny. I strongly recommend trying out the demo
-
Bernat Gabor gave an excellent talk “Lessons from the Trenches: rewriting and re-releasing virtualenv” about a big refactoring virtualenv had, packaging is a subject close to my heart. This is an excellent lesson on how to approach major refactor and a reminder that it would always take more time than you’ll estimate it. I must say that this change, from a POV of a heavy user of virtualenv, wasn’t even noticed. When he was demoing a few things I’ve realized that I’ve seen it in logs, but didn’t know the big story happing behind the scenes.
-
In the next one, Philip Jones was building a ASGI server from scratch, show the details of the ASGI protocol — “An ASGI Server from scratch”
-
“A deep dive and comparison of Python drivers for Cassandra and Scylla” — Alexys great slides (which I had a peek at few days before)with lot of emojis, going down to the details of how cassandra/scylla hash ring works, and how scylla shard share nothing architecture is use by the scylla-driver fork we work on for the last few months. And he finished with data from Numberly production before/after moving to shard aware client, 2x-2.5x speed up in response time I was live tweeting the whole thing into our company slack…
- “Elegant Exception Handling”
- “Best practices for production-ready Docker packaging”
- “Pluggable Architecture”
- “Writing Zenlike Python”
Social — the missing e
in my whisky
During the whole day I was playing cat and mouse with people, trying to bring them into the open track, to have a face to face meeting, only later after all the sessions end, some people came in, most of them were the part of the organizing team. Every one where show their beer or whisky that they were drinking. Keith Gaughan was laughing at the ice in my whisky, and also thought me a real whisky is spelled with whiskey with an e
in it.
Marc-Andre Lemburg was talking about the challenges they have as organizers, one thing led to anther, and I ask how can I help. He said they need a hand with handling the tweeter account, and that he’ll hook me tomorrow with whom handle it.
After a few more rounds of whisky, I call it a day.
Talks — Day 2
TODO:
Sprints
- packaging -
- hypothesis -
- terminusdb-client -
from Planet Python
via read more
Codementor: Face Mask Detection using Yolo V3
from Planet Python
via read more
Real Python: Namespaces and Scope in Python
This tutorial covers Python namespaces, the structures used to organize the symbolic names assigned to objects in a Python program.
The previous tutorials in this series have emphasized the importance of objects in Python. Objects are everywhere! Virtually everything that your Python program creates or acts on is an object.
An assignment statement creates a symbolic name that you can use to reference an object. The statement x = 'foo'
creates a symbolic name x
that refers to the string object 'foo'
.
In a program of any complexity, you’ll create hundreds or thousands of such names, each pointing to a specific object. How does Python keep track of all these names so that they don’t interfere with one another?
In this tutorial, you’ll learn:
- How Python organizes symbolic names and objects in namespaces
- When Python creates a new namespace
- How namespaces are implemented
- How variable scope determines symbolic name visibility
Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.
Namespaces in Python
A namespace is a collection of currently defined symbolic names along with information about the object that each name references. You can think of a namespace as a dictionary in which the keys are the object names and the values are the objects themselves. Each key-value pair maps a name to its corresponding object.
Namespaces are one honking great idea—let’s do more of those!
— The Zen of Python, by Tim Peters
As Tim Peters suggests, namespaces aren’t just great. They’re honking great, and Python uses them extensively. In a Python program, there are four types of namespaces:
- Built-In
- Global
- Enclosing
- Local
These have differing lifetimes. As Python executes a program, it creates namespaces as necessary and deletes them when they’re no longer needed. Typically, many namespaces will exist at any given time.
The Built-In Namespace
The built-in namespace contains the names of all of Python’s built-in objects. These are available at all times when Python is running. You can list the objects in the built-in namespace with the following command:
>>> dir(__builtins__)
['ArithmeticError', 'AssertionError', 'AttributeError',
'BaseException','BlockingIOError', 'BrokenPipeError', 'BufferError',
'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError',
'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError',
'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError',
'Exception', 'False', 'FileExistsError', 'FileNotFoundError',
'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError',
'ImportError', 'ImportWarning', 'IndentationError', 'IndexError',
'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt',
'LookupError', 'MemoryError', 'ModuleNotFoundError', 'NameError', 'None',
'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError',
'OverflowError', 'PendingDeprecationWarning', 'PermissionError',
'ProcessLookupError', 'RecursionError', 'ReferenceError', 'ResourceWarning',
'RuntimeError', 'RuntimeWarning', 'StopAsyncIteration', 'StopIteration',
'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError',
'TimeoutError', 'True', 'TypeError', 'UnboundLocalError',
'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError',
'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError',
'Warning', 'ZeroDivisionError', '_', '__build_class__', '__debug__',
'__doc__', '__import__', '__loader__', '__name__', '__package__',
'__spec__', 'abs', 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray',
'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex',
'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate',
'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset',
'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input',
'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list',
'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct',
'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr',
'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod',
'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']
You’ll see some objects here that you may recognize from previous tutorials—for example, the StopIteration
exception, built-in functions like max()
and len()
, and object types like int
and str
.
The Python interpreter creates the built-in namespace when it starts up. This namespace remains in existence until the interpreter terminates.
The Global Namespace
The global namespace contains any names defined at the level of the main program. Python creates the global namespace when the main program body starts, and it remains in existence until the interpreter terminates.
Strictly speaking, this may not be the only global namespace that exists. The interpreter also creates a global namespace for any module that your program loads with the import
statement. For further reading on main functions and modules in Python, see these resources:
- Defining Main Functions in Python
- Python Modules and Packages—An Introduction
- Course: Python Modules and Packages
You’ll explore modules in more detail in a future tutorial in this series. For the moment, when you see the term global namespace, think of the one belonging to the main program.
The Local and Enclosing Namespaces
As you learned in the previous tutorial on functions, the interpreter creates a new namespace whenever a function executes. That namespace is local to the function and remains in existence until the function terminates.
Read the full article at https://realpython.com/python-namespaces-scope/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
from Planet Python
via read more
Janusworx: A Hundred Days of Code, Day 021 - Swing and a miss
Only did about an hour of distracted work and exercises today.
I’ll still count it though.
Tomorrow is another day :)
from Planet Python
via read more
Django Weblog: Django Developers Community Survey 2020
We're conducting a seventeen question survey to assess how the community feels about the current Django development process. This was last done in 2015.
Please take a few minutes to complete the 2020 survey. Your feedback will help guide future efforts.
from Planet Python
via read more
Namespaces and Scope in Python
This tutorial covers Python namespaces, the structures used to organize the symbolic names assigned to objects in a Python program.
The previous tutorials in this series have emphasized the importance of objects in Python. Objects are everywhere! Virtually everything that your Python program creates or acts on is an object.
An assignment statement creates a symbolic name that you can use to reference an object. The statement x = 'foo'
creates a symbolic name x
that refers to the string object 'foo'
.
In a program of any complexity, you’ll create hundreds or thousands of such names, each pointing to a specific object. How does Python keep track of all these names so that they don’t interfere with one another?
In this tutorial, you’ll learn:
- How Python organizes symbolic names and objects in namespaces
- When Python creates a new namespace
- How namespaces are implemented
- How variable scope determines symbolic name visibility
Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.
Namespaces in Python
A namespace is a collection of currently defined symbolic names along with information about the object that each name references. You can think of a namespace as a dictionary in which the keys are the object names and the values are the objects themselves. Each key-value pair maps a name to its corresponding object.
Namespaces are one honking great idea—let’s do more of those!
— The Zen of Python, by Tim Peters
As Tim Peters suggests, namespaces aren’t just great. They’re honking great, and Python uses them extensively. In a Python program, there are four types of namespaces:
- Built-In
- Global
- Enclosing
- Local
These have differing lifetimes. As Python executes a program, it creates namespaces as necessary and deletes them when they’re no longer needed. Typically, many namespaces will exist at any given time.
The Built-In Namespace
The built-in namespace contains the names of all of Python’s built-in objects. These are available at all times when Python is running. You can list the objects in the built-in namespace with the following command:
>>> dir(__builtins__)
['ArithmeticError', 'AssertionError', 'AttributeError',
'BaseException','BlockingIOError', 'BrokenPipeError', 'BufferError',
'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError',
'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError',
'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError',
'Exception', 'False', 'FileExistsError', 'FileNotFoundError',
'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError',
'ImportError', 'ImportWarning', 'IndentationError', 'IndexError',
'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt',
'LookupError', 'MemoryError', 'ModuleNotFoundError', 'NameError', 'None',
'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError',
'OverflowError', 'PendingDeprecationWarning', 'PermissionError',
'ProcessLookupError', 'RecursionError', 'ReferenceError', 'ResourceWarning',
'RuntimeError', 'RuntimeWarning', 'StopAsyncIteration', 'StopIteration',
'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError',
'TimeoutError', 'True', 'TypeError', 'UnboundLocalError',
'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError',
'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError',
'Warning', 'ZeroDivisionError', '_', '__build_class__', '__debug__',
'__doc__', '__import__', '__loader__', '__name__', '__package__',
'__spec__', 'abs', 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray',
'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex',
'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate',
'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset',
'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input',
'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list',
'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct',
'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr',
'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod',
'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']
You’ll see some objects here that you may recognize from previous tutorials—for example, the StopIteration
exception, built-in functions like max()
and len()
, and object types like int
and str
.
The Python interpreter creates the built-in namespace when it starts up. This namespace remains in existence until the interpreter terminates.
The Global Namespace
The global namespace contains any names defined at the level of the main program. Python creates the global namespace when the main program body starts, and it remains in existence until the interpreter terminates.
Strictly speaking, this may not be the only global namespace that exists. The interpreter also creates a global namespace for any module that your program loads with the import
statement. For further reading on main functions and modules in Python, see these resources:
- Defining Main Functions in Python
- Python Modules and Packages—An Introduction
- Course: Python Modules and Packages
You’ll explore modules in more detail in a future tutorial in this series. For the moment, when you see the term global namespace, think of the one belonging to the main program.
The Local and Enclosing Namespaces
As you learned in the previous tutorial on functions, the interpreter creates a new namespace whenever a function executes. That namespace is local to the function and remains in existence until the function terminates.
Read the full article at https://realpython.com/python-namespaces-scope/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
from Real Python
read more
PyCharm: PyCharm 2020.2 Out Now!
Complete the full Pull Request workflow, quickly catch exceptions, and apply project-wide refactorings. All without leaving your IDE. Download the new version now, or upgrade from within PyCharm.
New in PyCharm
- New pull request dedicated view: You no longer need to switch between the browser and your IDE to manage your GitHub Pull Request workflow. Do it all in PyCharm!
- Smart in-editor exceptions preview: Don’t spend time browsing your code after exceptions. PyCharm now automatically finds it for you and displays a preview of the problem directly in your editor.
- In-place signature-change refactoring: Simply add, remove, or edit your method signature in-place and use context actions (Alt+Enter) or the new gutter-icon to preview the changes and apply the refactoring.
- Support for Django configuration constants completion in settings.py: Stop typing the same Django configuration variables in settings.py over and over again. Speed up your flow and let PyCharm autocomplete documented Django settings for you.
These and a lot more!
Read about them all on our What’s New page or check the release notes.
from Planet Python
via read more
Stack Abuse: Deep Learning in Keras - Data Preprocessing
Introduction
Deep learning is one of the most interesting and promising areas of artificial intelligence (AI) and machine learning currently. With great advances in technology and algorithms in recent years, deep learning has opened the door to a new era of AI applications.
In many of these applications, deep learning algorithms performed equal to human experts and sometimes surpassed them.
Python has become the go-to language for Machine Learning and many of the most popular and powerful deep learning libraries and frameworks like TensorFlow, Keras, and PyTorch are built in Python.
In this series, we'll be using Keras to perform Exploratory Data Analysis (EDA), Data Preprocessing and finally, build a Deep Learning Model and evaluate it.
If you haven't already, check out our first article - Deep Learning Models in Keras - Exploratory Data Analysis (EDA).
Data Preprocessing
In the preprocessing stage, we'll prepare the data to be fed to the Keras model. The first step is clearing the dataset of null values. Then, we'll use one-hot encoding to convert categorical variables to numerical variables. Neural Nets work with numerical data, not categorical.
We'll also split the data into a training and testing set. Finally, we'll scale the data/standardize it so that it ranges from -1 to 1. This standardization helps both train the model better and allows it to converge easier.
Dealing with Missing Values
Let's find out the number and percentage of missing values in each variable in the dataset:
missing_values = pd.DataFrame({
'Column': df.columns.values,
'# of missing values': df.isna().sum().values,
'% of missing values': 100 * df.isna().sum().values / len(df),
})
missing_values = missing_values[missing_values['# of missing values'] > 0]
print(missing_values.sort_values(by='# of missing values',
ascending=False
).reset_index(drop=True))
This code will produce the following table which shows us variables that contain missing values and how many missing values they contain:
Column | # of missing values | % of missing values | |
0 | Pool QC | 2917 | 99.5563 |
1 | Misc Feature | 2824 | 96.3823 |
2 | Alley | 2732 | 93.2423 |
3 | Fence | 2358 | 80.4778 |
4 | Fireplace Qu | 1422 | 48.5324 |
5 | Lot Frontage | 490 | 16.7235 |
6 | Garage Cond | 159 | 5.42662 |
7 | Garage Qual | 159 | 5.42662 |
8 | Garage Finish | 159 | 5.42662 |
9 | Garage Yr Blt | 159 | 5.42662 |
10 | Garage Type | 157 | 5.35836 |
11 | Bsmt Exposure | 83 | 2.83276 |
12 | BsmtFin Type 2 | 81 | 2.76451 |
13 | BsmtFin Type 1 | 80 | 2.73038 |
14 | Bsmt Qual | 80 | 2.73038 |
15 | Bsmt Cond | 80 | 2.73038 |
16 | Mas Vnr Area | 23 | 0.784983 |
17 | Mas Vnr Type | 23 | 0.784983 |
18 | Bsmt Half Bath | 2 | 0.0682594 |
19 | Bsmt Full Bath | 2 | 0.0682594 |
20 | Total Bsmt SF | 1 | 0.0341297 |
Since Pool QC
, Misc Feature
, Alley
, Fence
, and Fireplace Qu
variables contain a high percentage of missing values as shown in the table, we will simply remove them as they probably won't affect the results much at all:
df.drop(['Pool QC', 'Misc Feature', 'Alley', 'Fence', 'Fireplace Qu'],
axis=1, inplace=True)
For other variables that contain missing values, we will replace these missing values depending on the data type of the variable: whether it is numerical or categorical.
If it is numerical, we will replace missing values with the variable mean. If it is categorical, we will replace the missing values with the variable mode. This removes the false bias that can be created with missing values in a neutral way.
To know which variables are numerical and which are categorical, we will print out 5 unique items for each of the variables that contain missing values using this code:
cols_with_missing_values = df.columns[df.isna().sum() > 0]
for col in cols_with_missing_values:
print(col)
print(df[col].unique()[:5])
print('*'*30)
And we get the following results:
Lot Frontage
[141. 80. 81. 93. 74.]
******************************
Mas Vnr Type
['Stone' 'None' 'BrkFace' nan 'BrkCmn']
******************************
...
Let's replace the values of missing numerical values with the mean:
num_with_missing = ['Lot Frontage', 'Mas Vnr Area', 'BsmtFin SF 1', 'BsmtFin SF 2',
'Bsmt Unf SF', 'Total Bsmt SF', 'Bsmt Full Bath', 'Bsmt Half Bath',
'Garage Yr Blt', 'Garage Cars', 'Garage Area']
for n_col in num_with_missing:
df[n_col] = df[n_col].fillna(df[n_col].mean())
Here, we just put them all in a list and assigned new values to them. Next, let's replace missing values for categorical variables:
cat_with_missing = [x for x in cols_with_missing_values if x not in num_with_missing]
for c_col in cat_with_missing:
df[c_col] = df[c_col].fillna(df[c_col].mode().to_numpy()[0])
After this step, our dataset will have no missing values in it.
One-Hot Encoding of Categorical Variables
Keras models, like all machine learning models fundamentally work with numerical data. Categorical data has no meaning to a computer, but it does do us. We need to convert these categorical variables into numerical representations in order for the dataset to be usable.
The technique that we will use to do that conversion is One-Hot Encoding. Pandas provides us with a simple way to automatically perform One-Hot encoding on all categorical variables in the data.
Before that though, we must ensure that no categorical variable in our data is represented as a numerical variable by accident.
Checking Variables Data Types
When we read a CSV dataset using Pandas as we did, Pandas automatically tries to determine the type of each variable in the dataset.
Sometimes, Pandas can determine this incorrectly - if a categorical variable is represented with numbers, it can wrongfully infer that it's a numerical variable.
Let's check if there are any data type discrepancies in the DataFrame
:
data_types = pd.DataFrame({
'Column': df.select_dtypes(exclude='object').columns.values,
'Data type': df.select_dtypes(exclude='object').dtypes.values
})
print(data_types)
Column | Data type | |
0 | MS SubClass | int64 |
1 | Lot Frontage | float64 |
2 | Lot Area | int64 |
3 | Overall Qual | int64 |
4 | Overall Cond | int64 |
5 | Year Built | int64 |
6 | Year Remod/Add | int64 |
Based on this table and the variables descriptions from Kaggle, we can notice which variables were falsely considered numerical by Pandas.
For example, MS SubClass
was detected as a numerical variable with a data type of int64
. However, based on the description of this variable, it specifies the type of the unit being sold.
If we take a look at the unique values of this variable:
df['MS SubClass'].unique().tolist()
We get this output:
[20, 60, 120, 50, 85, 160, 80, 30, 90, 190, 45, 70, 75, 40, 180, 150]
This variable represent different unit types as numbers like 20
(one story dwellings built in 1946 and newer), 60
(2 story dwellings built in 1946 and newer), etc.
This actually isn't a numerical variable but a categorical one. Let's convert it back into a categorical variable by reassigning it as a string:
df['MS SubClass'] = df['MS SubClass'].astype(str)
Performing One-Hot Encoding
Before performing One-Hot Encoding, we want to select a subset of the features from our data to use from now on. We'll want to do so because our dataset contains 2,930 records and 75 features.
Many of these features are categorical. So if we keep all the features and perform One-Hot Encoding, the resulting number of features will be large and the model might suffer from the curse of dimensionality as a result.
Let's make a list of the variables we want to keep in a subset and trim the DataFrame
so we only use these:
selected_vars = ['MS SubClass', 'MS Zoning', 'Lot Frontage', 'Lot Area',
'Neighborhood', 'Overall Qual', 'Overall Cond',
'Year Built', 'Total Bsmt SF', '1st Flr SF', '2nd Flr SF',
'Gr Liv Area', 'Full Bath', 'Half Bath', 'Bedroom AbvGr',
'Kitchen AbvGr', 'TotRms AbvGrd', 'Garage Area',
'Pool Area', 'SalePrice']
df = df[selected_vars]
Now we can perform One-Hot Encoding easily by using Pandas' get_dummies()
function:
df = pd.get_dummies(df)
After one-hot encoding, the dataset will have 67 variables. Here are the capped first few rows - there are many more variables than this:
Lot Frontage | Lot Area | Overall Qual | Overall Cond | Year Built | Total Bsmt SF | 1st Flr SF | 2nd Flr SF | Gr Liv Area | |
0 | 141 | 31770 | 6 | 5 | 1960 | 1080 | 1656 | 0 | 1656 |
1 | 80 | 11622 | 5 | 6 | 1961 | 882 | 896 | 0 | 896 |
2 | 81 | 14267 | 6 | 6 | 1958 | 1329 | 1329 | 0 | 1329 |
Splitting Data into Training and Testing Sets
One of the last steps in data preprocessing is to split it in a training and testing subset. We'll be training the model on the training subset, and evaluating it with an unseen test set.
We will split the data randomly so that the training set will have 80% of the data and the testing set will have 20% of the data. Generally, the training set typically has anywhere between 70-80% of the data, while 20-30% is used for validation.
This is made really simple with Pandas' sample()
and drop()
functions:
train_df = df.sample(frac=0.8, random_state=9)
test_df = df.drop(train_df.index)
Now train_df
holds our training data and test_df
holds our testing data.
Next, we will store the target variable SalePrice
separately for each of the training and testing sets:
train_labels = train_df.pop('SalePrice')
test_labels = test_df.pop('SalePrice')
We're removing the SalePrice
value because, well, we want to predict it. There's no point predicting something we already know and have fed to the model. We'll be using the actual values to verify if our predictions are correct.
After this step, train_df
will contain the predictor variables of our training data (i.e. all variables excluding the target variable), and train_labels
will contain the target variable values for train_df
. The same applies to test_df
and test_labels
.
We perform this operation to prepare for the next step of data scaling.
Note that Pandas'
pop()
function will return the specified column (in our case, it isSalePrice
) from the dataframe (train_df
for example) with removing that column from the dataframe.
At the end of this step, here are the number of records (rows) and features (columns) for each of train_df
and test_df
:
Set | Number of records | Number of features |
`train_df` | 2344 | 67 |
`test_df` | 586 | 67 |
Moreover, train_labels
has 2,344 labels for the 2,344 records of train_df
and test_labels
has 586 labels for the 586 records in test_df
.
Without preprocessing this data, we would have a much messier dataset to work with.
Data Scaling: Standardization
Finally, we will standardize each variable - except the target variable, of course - in our data.
For training data which is stored now in train_df
, we will calculate the mean and standard deviation of each variable. After that, we will subtract the mean from the values of each variable and then divide the resulting values by the standard deviation.
For testing data, we will subtract the training data mean from the values of each variable and then divide the resulting values by the training data standard deviation.
If you'd like to read up on Calculating Mean, Median and Mode in Python or Calculating Variance and Standard Deviation in Python, we've got you covered!
We use values calculated using training data because of the general principle: anything you learn, must be learned from the model's training data. Everything from the test dataset will be completely unknown to the model before testing.
Let's perform the standardization now:
predictor_vars = train_df.columns
for col in predictor_vars:
# Calculating variable mean and std from training data
col_mean = train_df[col].mean()
col_std = train_df[col].std()
if col_std == 0:
col_std = 1e-20
train_df[col] = (train_df[col] - col_mean) / col_std
test_df[col] = (test_df[col] - col_mean) / col_std
In this code, we first get the names of the predictor variables in our data. These names are the same for training and testing sets because these two sets contain the same variables but different data values.
Then for each predictor variable, we calculate the mean and standard deviation using the training data (train_df
), subtract the calculated mean and divide by the calculated standard deviation.
Note that sometimes, the standard deviation is equal to 0 for some variables. In that case, we make the standard deviation equal to an extremely small amount because if we keep it equal to 0, we will get a division-by-zero error when we use it for division later.
This nets us scaled and standardized data in the range of -1 and 1.
With that done, our dataset is ready to be used to train and evaluate a model. We'll be building a deep neural network in the next article.
Conclusion
Data preprocessing is a crucial step in a Machine Learning pipeline. Without dropping certain variables, dealing with missing values, encoding categorical values and standardization - we'd be feeding a messy (or impossible) dataset into a model.
The model will only be as good as the data we feed it and in this article - we've prepped a dataset to fit a model.
from Planet Python
via read more
Andre Roberge: HackInScience: friendly Python learning
I learned about it via an issue filed for Friendly-traceback: yes, HackInScience does use Friendly-traceback to provide feedback to users when their code raises Python exceptions. These real-life experiences have resulted in additional cases being covered by Friendly-traceback: there are now 128 different test cases, each providing more helpful explanation as to what went wrong than that offered by Python. Python versions 3.6 to 3.9 inclusively are supported.
Previously, I thought I would get feedback about missing cases from teachers or beginners using either Mu or Thonny - both of which can make use of Friendly-traceback. However, this has not been the case yet, and this makes me extremely grateful for the feedback received from HackInScience.
While Friendly-traceback can provide feedback in either English or French [1], HackInScience only uses the English version - this, in spite of the fact that it was created by four French programmers. I suspect that it is only a matter of time until they make a French version of their site.
One excellent additional feature provided by HackInScience is the addition of formatting (including some colour) in the output provided by Friendly-traceback.
The additional cases provided by Julien Palard from HackInScience have motivated me to clear out the accumulated backlog of test cases I had identified on my own. Now, there is only one (new) issue: enabling coloured output from Friendly-traceback's console.
Please, feel free to interrupt my work on this new issue by submitting new cases that are not covered by Friendly-traceback! ;-)
[1] Anyone interested in providing translations in other languages is definitely welcome!
from Planet Python
via read more
TestDriven.io: Working with Static and Media Files in Django
This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...
-
Graph traversal algorithms are used to perform various operations on a graph data structure. In this article, we will use the breadth-first ...
-
Podcasts are a great way to immerse yourself in an industry, especially when it comes to data science. The field moves extremely quickly, an...
-
In an earlier tutorial we've already covered how to open dialog windows. These are special windows which (by default) grab the focus o...