Wednesday, November 3, 2021

Stack Abuse: Python: Split String into List with split()

Data can take many shapes and forms - and it's oftentimes represented as strings.

Be it from a CSV file or input text, we split strings oftentimes to obtain lists of features or elements.

In this guide, we'll take a look at how to split a string into a list in Python, with the split() method.

Split String into List in Python

The split() method of the string class is fairly straightforward. It splits the string, given a delimiter, and returns a list consisting of the elements split out from the string.

By default, the delimiter is set to a whitespace - so if you omit the delimiter argument, your string will be split on each whitespace.

Let's take a look at the behavior of the split() method:

string = "Age,University,Name,Grades"

lst = string.split(',')

print(lst)
print('Element types:', type(lst[0]))
print('Length:', len(lst))

Our string had elements delimited with a comma, as in a CSV (comma-separated values) file, so we've set the delimieter appropriately.

This results in a list of elements of type str, no matter what other type they can represent:

['Age', 'University', 'Name', 'Grades']
Element types: <class 'str'>
Length: 4

Split String into List, Trim Whitespaces and Change Capitalization

Not all input strings are clean - so you won't always have a perfectly formatted string to split. Sometimes, strings may contain whitespaces that shouldn't be in the "final product" or have a mismatch of capitalized and non-capitalized letters.

Thankfully, it's pretty easy to process this list and each element in it, after you've split it:

# Contains whitespaces after commas, which will stay after splitting
string = "age, uNiVeRsItY, naMe, gRaDeS"
lst = string.split(',')

print(lst)

This results in:

['age', ' uNiVeRsItY', ' naMe', ' gRaDeS']

No good! Each element starts with a whitespace and the elements aren't properly capitalized at all. Applying a function to each element of a list can easily be done through a simple for loop so we'll want to apply a strip()/trim() (to get rid of the whitespaces) and a capitalization function.

Since we're not only looking to capitalize the first letter but also keep the rest lowercase (to enforce conformity), let's define a helper function for that:

def capitalize_word(string):
    return string[:1].capitalize() + string[1:].lower()

The method takes a string, slices it on its first letter and capitalizes it. The rest of the string is converted to lowercase and the two changed strings are then concatenated.

We can now use this method in a loop as well:

string = "age, uNiVeRsItY, naMe, gRaDeS"

lst = string.split(',')
lst = [s.strip() for s in lst]
lst = [capitalize_word(s) for s in lst]

print(lst)
print('Element types:', type(lst[0]))
print('Length:', len(lst))

This results in a clean:

['Age', 'University', 'Name', 'Grades']
Element types: <class 'str'>
Length: 4

Split String into List and Convert to Integer

What happens if you're working with a string-represented list of integers? After splitting, you won't be able to perform integer operations on these, since they're ostensibly be strings.

Thankfully, we can use the same for loop as before to convert the elements into integers:

string = "1,2,3,4"

lst = string.split(',')
lst = [int(s) for s in lst]

print(lst)
print('Element types:', type(lst[0]))
print('Length:', len(lst))

Which now results in:

[1, 2, 3, 4]
Element types: <class 'int'>
Length: 4

Split String into List with Limiter

Besides the delimiter, the split() method accepts a limiter - the number of times a split should occur.

It's an integer and is defined after the delimiter:

string = "Age, University, Name, Grades"

lst = string.split(',', 2)
print(lst)

Here, two splits occur, on the first and second comma, and no splits happen after that:

['Age', ' University', ' Name, Grades']

Conclusion

In this short guide, you've learned how to split a string into a list in Python.

You've also learned how to trim the whitespaces and fix capitalization as a simple processing step alongside splitting a string into a list.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...