Sunday, February 17, 2019

Reuven Lerner: Python’s str.isdigit vs. str.isnumeric

Let’s say that I want to write some Python code that invites the user to enter a number, and then prints that number, tripled. We could say:

>>> n = input("Enter a number: ")
>>> print(f"{n} * 3 = {n*3}")

The good news is that this code works just fine. The bad news is that it probably doesn’t do what you might expect. If I run this program, I’ll see:

Enter a number: 5
5 * 3 = 555

The reason for this output is that the “input” function always returns a string. So sure, we asked the user for a number, but we got the string ‘5’, rather than the integer 5. The ‘555’ output is thanks to the fact that you can multiply strings in Python by integers, getting a longer string back. So ‘a’ * 5 will give us ‘aaaaa’.

Of course, we can always create an integer from a string by applying the “int” class to the user’s input:

>>> n = input("Enter a number: ")
>>> n = int(n)
>>> print(f"{n} * 3 = {n*3}")

Sure enough, we get the following output:

Enter a number: 5
5 * 3 = 15

Great, right? But what happens if the user gives us something that’s no longer numeric? The program will blow up:

Enter a number: abcd

ValueError: invalid literal for int() with base 10: 'abcd'

Clearly, we want to avoid this problem. You could make a good argument that in this case, it’s probably best to run the conversion inside of a “try” block, and trap any exception that we might get.

But there’s another way to test this, one which I use with my into Python classes, before we’ve covered exceptions: Strings have a great method called “isdigit” that we can run, to tell us whether a string contains only digits (0-9), or if it contains something else. For example:

>>> '1234'.isdigit()
True

>>> '1234 '.isdigit() # space at the end
False

>>> '1234a'.isdigit() # letter at the end
False

>>> 'a1234'.isdigit() # letter at the start
False

>>> '12.34'.isdigit() # decimal point
False

>>> ''.isdigit() # empty string
False

If you know regular expressions, then you can see that str.isdigit returns True for ‘^\d+$’. Which can be very useful, as we can see here:

>>> n = input("Enter a number: ")
>>> if n.isdigit():
n = int(n)
print(f"{n} * 3 = {n*3}")

But wait: Python also includes another method, str.isnumeric. And it’s not at all obvious, at least at first, what the difference is between them, because they would seem to give the same results:

>>> n = input("Enter a number: ")
>>> if n.numeric():
n = int(n)
print(f"{n} * 3 = {n*3}")

So, what’s the difference? It’s actually pretty straightforward, but took some time for me to find out: Bascially, str.isdigit only returns True for what I said before, strings containing solely the digits 0-9.

By contrast, str.isnumeric returns True if it contains any numeric characters. When I first read this, I figured that it would mean decimal points and minus signs — but no! It’s just the digits 0-9, plus any character from another language that’s used in place of digits.

For example, we’re used to writing numbers with Arabic numerals. But there are other languages that traditionally use other characters. For example, in Chinese, we count 1, 2, 3, 4, 5 as 一,二,三,四, 五. It turns out that the Chinese characters for numbers will return False for str.isdigit, but True for str.isnumeric, behaving differently from their 0-9 counterparts:

>>> '12345'.isdigit()
True

>>> '12345'.isnumeric()
True

>>> '一二三四五'.isdigit()
False

>>> '一二三四五'.isnumeric()
True

So, which should you use? For most people, “isdigit” is probably a better choice, simply because it’s more clearly what you likely want. Of course, if you want to accept other types of numerals and numeric characters, then “isnumeric” is better. But if you’re interested in turning strings into integers, then you’re probably safer using “isdigit”, just in case someone tries to enter something else:

>>> int('二')
ValueError: invalid literal for int() with base 10: '二'

I tried to determine whether there was any difference in speed between the two methods, just in case, but after numerous tests with “%timeit” in Jupyter, I found that I was getting roughly the same speeds from both methods.

If you’re like me, and ever wondered how a language that claims to have “one obvious way to do it” can have two different, seemingly identical methods… well, now you know!

The post Python’s str.isdigit vs. str.isnumeric appeared first on Lerner Consulting Blog.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...