Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Modern Python Cookbook

You're reading from   Modern Python Cookbook The latest in modern Python recipes for the busy modern programmer

Arrow left icon
Product type Paperback
Published in Nov 2016
Publisher Packt
ISBN-13 9781786469250
Length 692 pages
Edition 1st Edition
Languages
Arrow right icon
Toc

Table of Contents (12) Chapters Close

Preface 1. Numbers, Strings, and Tuples 2. Statements and Syntax FREE CHAPTER 3. Function Definitions 4. Built-in Data Structures – list, set, dict 5. User Inputs and Outputs 6. Basics of Classes and Objects 7. More Advanced Class Design 8. Input/Output, Physical Format, and Logical Layout 9. Testing 10. Web Services 11. Application Integration

Rewriting an immutable string

How can we rewrite an immutable string? We can't change individual characters inside a string:

    >>> title = "Recipe 5: Rewriting, and the Immutable String"
    >>> title[8]= ''
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'str' object does not support item assignment

Since this doesn't work, how do we make a change to a string?

Getting ready

Let's assume we have a string like this:

>>> title = "Recipe 5: Rewriting, and the Immutable String"

We'd like to do two transformations:

  • Remove the part before the :
  • Replace the punctuation with _, and make all the characters lowercase

Since we can't replace characters in a string object, we have to work out some alternatives. There are several common things we can do, shown as follows:

  • A combination of slicing and concatenating a string to create a new string.
  • When shortening, we often use the partition() method.
  • We can replace a character or a substring with the replace() method.
  • We can expand the string into a list of characters, then join the string back into a single string again. This is the subject for a separate recipe, Building complex strings with a list of characters.

How to do it...

Since we can't update a string in place, we have to replace the string variable's object with each modified result. We'll use a statement that looks like this:

    some_string = some_string.method()

Or we could even use:

    some_string = some_string[:chop_here]

We'll look at a number of specific variations on this general theme. We'll slice a piece of a string, we'll replace individual characters within a string, and we'll apply blanket transformations such as making the string lowercase. We'll also look at ways to remove extra _ that show up in our final string.

Slicing a piece of a string

Here's how we can shorten a string via slicing:

  1. Find the boundary:
      >>> colon_position = title.index(':')

The index function locates a particular substring and returns the position where that substring can be found. If the substring doesn't exist, it raises an exception. This is always true of the result title[colon_position] == ':'.

  1. Pick the substring:
      >>> discard_text, post_colon_text = title[:colon_position], title[colon_position+1:]
      >>> discard_text
      'Recipe 5'
      >>> post_colon_text
      ' Rewriting, and the Immutable String'

We've used the slicing notation to show the start:end of the characters to pick. We also used multiple assignment to assign two variables, discard_text and post_colon_text, from two expressions.

We can use partition() as well as manual slicing. Find the boundary and partition:

>>> pre_colon_text, _, post_colon_text = title.partition(':')
>>> pre_colon_text
'Recipe 5'
>>> post_colon_text
' Rewriting, and the Immutable String'

The partition function returns three things: the part before the target, the target, and the part after the target. We used multiple assignment to assign each object to a different variable. We assigned the target to a variable named _ because we're going to ignore that part of the result. This is a common idiom for places where we must provide a variable, but we don't care about using the object.

Updating a string with a replacement

We can use replace() to remove punctuation marks. When using replace to switch punctuation marks, save the results back into the original variable. In this case, post_colon_text:

>>> post_colon_text = post_colon_text.replace(' ', '_')
>>> post_colon_text = post_colon_text.replace(',', '_')
>>> post_colon_text
'_Rewriting__and_the_Immutable_String'

This has replaced the two kinds of punctuation with the desired _ characters. We can generalize this to work with all punctuation. This leverages the for statement, which we'll look at in Chapter 2, Statements and Syntax.

We can iterate through all punctuation characters:

>>> from string import whitespace, punctuation
>>> for character in whitespace + punctuation:
...     post_colon_text = post_colon_text.replace(character, '_')
>>> post_colon_text
'_Rewriting__and_the_Immutable_String'

As each kind of punctuation character is replaced, we assign the latest and greatest version of the string to the post_colon_text variable.

Making a string all lowercase

Another transformational step is changing a string to all lowercase. As with the previous examples, we'll assign the results back to the original variable. Use the lower() method, assigning the result to the original variable:

>>> post_colon_text = post_colon_text.lower()

Removing extra punctuation marks

In many cases, there are some additional steps we might follow. We often want to remove leading and trailing _ characters. We can use strip() for this:

>>> post_colon_text = post_colon_text.strip('_')

In some cases, we'll have multiple _ characters because we had multiple punctuation marks. The final step would be something like this to cleanup up multiple _ characters:

>>> while '__' in post_colon_text:
...    post_colon_text = post_colon_text.replace('__', '_')

This is yet another example of the same pattern we've been using to modify a string in place. This depends on the while statement, which we'll look at in Chapter 2, Statements and Syntax.

How it works...

We can't—technically—modify a string in place. The data structure for a string is immutable. However, we can assign a new string back to the original variable. This technique behaves the same as modifying a string in place.

When a variable's value is replaced, the previous value no longer has any references and is garbage collected. We can see this by using the id() function to track each individual string object:

>>> id(post_colon_text) 
4346207968
>>> post_colon_text = post_colon_text.replace('_','-')
>>> id(post_colon_text) 
4346205488

Your actual id numbers may be different. What's important is that the original string object assigned to post_colon_text had one id. The new string object assigned to post_colon_text has a different id. It's a new string object.

When the old string has no more references, it is removed from memory automatically.

We made use of slice notation to decompose a string. A slice has two parts: [start:end]. A slice always includes the starting index. String indices always start with zero as the first item. It never includes the ending index.

The items in a slice have an index from start to end-1. This is sometimes called a half-open interval.

Think of a slice like this: all characters where the index, i, are in the range start ≤ i < end.

We noted briefly that we can omit the start or end indices. We can actually omit both. Here are the various options available:

  • title[colon_position]: A single item, the : we found using title.index(':').
  • title[:colon_position]: A slice with the start omitted. It begins at the first position, index of zero.
  • title[colon_position+1:]: A slice with the end omitted. It ends at the end of the string, as if we said len(title).
  • title[:]: Since both start and end are omitted, this is the entire string. Actually, it's a copy of the entire string. This is the quick and easy way to duplicate a string.

There's more...

There are more features to indexing in Python collections like a string. The normal indices start with 0 at the left end. We have an alternate set of indices using negative names that work from the right end of a string.

  • title[-1] is the last character in the title, g
  • title[-2] is the next-to-last character, n
  • title[-6:] is the last six characters, String

We have a lot of ways to pick pieces and parts out of a string.

Python offers dozens of methods for modifying a string. Section 4.7 of the Python Standard Library describes the different kinds of transformations that are available to us. There are three broad categories of string methods. We can ask about a string, we can parse a string, and we can transform a string. Methods such as isnumeric() tell us if a string is all digits.

Here's an example:

>>> 'some word'.isnumeric()
False
>>> '1298'.isnumeric()
True

We've looked at parsing with the partition() method. And we've looked at transforming with the lower() method.

See also

  • We'll look at the string as list technique for modifying a string in the Building complex strings from lists of characters recipe.
  • Sometimes we have data that's only a stream of bytes. In order to make sense of it, we need to convert it into characters. That's the subject for the Decoding bytes – how to get proper characters from some bytes recipe.
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image