Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Modern Python Cookbook

You're reading from   Modern Python Cookbook 130+ updated recipes for modern Python 3.12 with new techniques and tools

Arrow left icon
Product type Paperback
Published in Jul 2024
Publisher Packt
ISBN-13 9781835466384
Length 818 pages
Edition 3rd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Steven F. Lott Steven F. Lott
Author Profile Icon Steven F. Lott
Steven F. Lott
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Chapter 1 Numbers, Strings, and Tuples FREE CHAPTER 2. Chapter 2 Statements and Syntax 3. Chapter 3 Function Definitions 4. Chapter 4 Built-In Data Structures Part 1: Lists and Sets 5. Chapter 5 Built-In Data Structures Part 2: Dictionaries 6. Chapter 6 User Inputs and Outputs 7. Chapter 7 Basics of Classes and Objects 8. Chapter 8 More Advanced Class Design 9. Chapter 9 Functional Programming Features 10. Chapter 10 Working with Type Matching and Annotations 11. Chapter 11 Input/Output, Physical Format, and Logical Layout 12. Chapter 12 Graphics and Visualization with Jupyter Lab 13. Chapter 13 Application Integration: Configuration 14. Chapter 14 Application Integration: Combination 15. Chapter 15 Testing 16. Chapter 16 Dependencies and Virtual Environments 17. Chapter 17 Documentation and Style 18. Other Books You May Enjoy
19. Index

1.6 Using the Unicode characters that aren’t on our keyboards

A big keyboard might have almost 100 individual keys. Often, fewer than 50 of these keys are letters, numbers, and punctuation. At least a dozen are function keys that do things other than simply insert letters into a document. Some of the keys are different kinds of modifiers that are meant to be used in conjunction with another key—for example, we might have Shift, Ctrl, Option, and Command.

Most operating systems will accept simple key combinations that create about 100 or so characters. More elaborate key combinations may create another 100 or so less popular characters. This isn’t even close to covering the vast domain of characters from the world’s alphabets. And there are icons, emojis, and dingbats galore in our computer fonts. How do we get to all of those glyphs?

1.6.1 Getting ready

Python works in Unicode. There are thousands of individual Unicode characters available.

We can see all the available characters at https://en.wikipedia.org/wiki/List_of_Unicode_characters, as well as at http://www.unicode.org/charts/.

We’ll need the Unicode character number. We may also want the Unicode character name.

A given font on our computer may not be designed to provide glyphs for all of those characters. In particular, Windows computer fonts may have trouble displaying some of these characters. Using the following Windows command to change to code page 65001 is sometimes necessary:

chcp 65001

Linux and macOS rarely have problems with Unicode characters.

1.6.2 How to do it...

Python uses escape sequences to extend the ordinary characters we can type to cover the vast space of Unicode characters. Each escape sequence starts with a \ character. The next character tells us exactly which of the Unicode characters to create. Locate the character that’s needed. Get the name or the number. The numbers are always given as hexadecimal, base 16. Websites describing Unicode often write the character as U+2680. The name might be DIE FACE-1. Use \unnnn with up to a four-digit number, nnnn. Or, use \N{name} with the spelled-out name. If the number is more than four digits, use \Unnnnnnnn with the number padded out to exactly eight digits:

>>> ’You Rolled \u2680’ 
 
’You Rolled ’ 
 >>> ’You drew \U0001F000’ 
 
’You drew ’ 
 >>> ’Discard \N{MAHJONG TILE RED DRAGON}’ 
 
’Discard ’

Yes, we can include a wide variety of characters in Python output. To place a \ in the string without the following characters being part of an escape sequence, we need to use \\. For example, we might need this for Windows file paths.

1.6.3 How it works...

Python uses Unicode internally. The 128 or so characters we can type directly using the keyboard all have handy internal Unicode numbers.

When we write:

’HELLO’

Python treats it as shorthand for this:

’\u0048\u0045\u004c\u004c\u004f’

Once we get beyond the characters on our keyboards, the remaining thousands of characters are identified only by their number.

When the string is being compiled by Python, \uxxxx, \Uxxxxxxxx, and \N{name} are all replaced by the proper Unicode character. If we have something syntactically wrong—for example, \N{name with no closing }—we’ll get an immediate error from Python’s internal syntax checking.

Regular expressions use a lot of \ characters and that we specifically do not want Python’s normal compiler to touch them; we used the r’ prefix on a regular expression string to prevent \ from being treated as an escape and possibly converted into something else. To use the full domain of Unicode characters, we cannot avoid using \ as an escape.

What if we need to use Unicode in a regular expression? We’ll need to use \\ all over the place in the regular expression. We might see something like this: ’\\w+[\u2680\u2681\u2682\u2683\u2684\u2685]\\d+’.

We couldn’t use the r’ prefix on the string because we needed to have the Unicode escapes processed. This forced us to use \\ for elements of the regular expression. We used \uxxxx for the Unicode characters that are part of the pattern. Python’s internal compiler will replace \uxxxx with Unicode characters and \\w will become the required \w internally.

When we look at a string at the >>> prompt, Python will display the string in its canonical form. Python prefers to display strings with as a delimiter, using " when the string contains a . We can use either or " for a string delimiter when writing code. Python doesn’t generally display raw strings; instead, it puts all of the necessary escape sequences back into the string:

>>> r"\w+" 
 
’\\w+’

We provided a string in raw form. Python displayed it in canonical form.

1.6.4 See also

  • In the Encoding strings – creating ASCII and UTF-8 bytes and the Decoding bytes – how to get proper characters from some bytes recipes, we’ll look at how Unicode characters are converted into sequences of bytes so we can write them to a file. We’ll look at how bytes from a file (or downloaded from a website) are turned into Unicode characters so they can be processed.

  • If you’re interested in history, you can read up on ASCII and EBCDIC and other old-fashioned character codes here: http://www.unicode.org/charts/.

You have been reading a chapter from
Modern Python Cookbook - Third Edition
Published in: Jul 2024
Publisher: Packt
ISBN-13: 9781835466384
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image