What do you think is the output of the following Python code?
>>> flag = "🇺🇸"
>>> reversed_flag = flag[::-1]
>>> print(reversed_flag)
Questions like this make me want to immediately open a Python REPL and try the code out because I think I know what the answer is, but I’m not very confident in that answer.
Here’s my line of thinking when I first saw this question:
- The
flag
string contains a single character. - The
[::-1]
slice reverses theflag
string. - The reversal of a string with a single character is the same as the original string.
- Therefore,
reversed_flag
must be"🇺🇸"
.
That’s a perfectly valid argument. But is the conclusion true? Take a look:
>>> flag = "🇺🇸"
>>> reversed_flag = flag[::-1]
>>> print(reversed_flag)
🇸🇺
What in the world is going on here?
Does "🇺🇸"
Really Contain a Single Character?
When the conclusion of a valid argument is false, one of its premises must be false, too. Let’s start from the top:
The flag
string contains a single character.
Is that so? How can you tell how many characters a string has?
In Python, you can use the built-in len()
function to get the total number of characters in a string:
>>> len("🇺🇸")
2
Oh.
That’s weird. You can only see a single thing in the string "🇺🇸"
— namely the US flag — but a length of 2
jives with the result of flag[::-1]
. Since the reverse of "🇺🇸"
is "🇸🇺"
, this seems to imply that somehow "🇺🇸" == "🇺 🇸"
.
How Can You Tell What Characters Are In a String?
There are a few different ways that you can see all of the characters in a string using Python:
>>> # Convert a string to a list
>>> list("🇺🇸")
['🇺', '🇸']
>>> # Loop over each character and print
>>> for character in "🇺🇸":
... print(character)
...
🇺
🇸
The US flag emoji isn’t the only flag emoji with two characters:
>>> list("🇿🇼") # Zimbabwe
['🇿', '🇼']
>>> list("🇳🇴") # Norway
['🇳', '🇴']
>>> list("🇨🇺") # Cuba
['🇨', '🇺']
>>> # What do you notice?
And then there’s the Scottish flag:
>>> list("🏴")
['🏴', 'U000e0067', 'U000e0062', 'U000e0073', 'U000e0063',
'U000e0074', 'U000e007f']
OK, what is that all about?
💪🏻
Challenge: Can you find any non-emoji strings that look like a single character but actually contain two or more characters?
The unnerving thing about these examples is that they imply that you can’t tell what characters are in a string just by looking at your screen.
Or, perhaps more deeply, it makes you question your understanding of the term character.
What Is a Character, Anyway?
The term character in computer science can be confusing. It tends to get conflated with the word symbol, which, to be fair, is a synonym for the word character as it’s used in English vernacular.
In fact, when I googled character computer science
, the very first result I got was a link to a Technopedia article that defines a character as:
“[A] display unit of information equivalent to one alphabetic letter or symbol.”
— Technopedia, “Character (Char)”
That definition seems off, especially in light of the US flag example that indicates that a single symbol may be comprised of at least two characters.
The second Google result I get is Wikipedia. In that article, the definition of a character is a bit more liberal:
“[A] character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.”
— Wikipedia, “Character (computing)”
Hmm… using the word “roughly” in a definition makes the definition feel, shall I say, non-definitive.
But the Wikipedia article goes on to explain that the term character has been used historically to “denote a specific number of contiguous bits.”
Then, a significant clue to the question about how a string with one symbol can contain two or more characters:
“A character is most commonly assumed to refer to 8 bits (one byte) today… All [symbols] can be represented with one or more 8-bit code units with UTF-8.”
— Wikipedia, “Character (computing)”
OK! Maybe things are starting to make a little bit more sense. A character represents a unit of text and is often stored as one byte of information . The symbols that we see in a string can be made up of multiple 8-bit (1 byte) UTF-8 code units.
Characters are not the same as symbols. It seems reasonable now that one symbol could be made up of multiple characters, just like flag emojis.
But what is a UTF-8 code unit?
A little further down the Wikipedia article on characters, there’s a section called Encoding that explains:
“Computers and communication equipment represent characters using a character encoding that assigns each character to something – an integer quantity represented by a sequence of digits, typically – that can be stored or transmitted through a network. Two examples of usual encodings are ASCII and the UTF-8 encoding for Unicode.”
— Wikipedia, “Character (computing)”
There’s another mention of UTF-8! But now I need to know what a character encoding is.
What Exactly Is a Character Encoding?
According to Wikipedia, a character encoding assigns each character to a number. What does that mean?
Doesn’t it mean that you can pair each character with a number? So, you could do something like pair each uppercase letter in the English alphabet with an integer 0 through 25.

You can represent this pairing using tuples in Python:
>>> pairs = [(0, "A"), (1, "B"), (2, "C"), ..., (25, "Z")]
>>> # I'm omitting several pairs here -----^^^
Stop for a moment and ask yourself: “Can I create a list of tuples like the one above without explicitly writing out each pair?”
One way is to use Python’s enumerate()
function. enumerate()
takes an argument called iterable and returns a tuple containing a count that defaults to 0 and the values obtained from iterating over iterable.
Here’s a look at enumerate()
in action:
>>> letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>>> enumerated_letters = list(enumerate(letters))
>>> enumerated_letters
[(0, 'A'), (1, 'B'), (2, 'C'), (3, 'D'), (4, 'E'), (5, 'F'), (6, 'G'),
(7, 'H'), (8, 'I'), (9, 'J'), (10, 'K'), (11, 'L'), (12, 'M'), (13, 'N'),
(14, 'O'), (15, 'P'), (16, 'Q'), (17, 'R'), (18, 'S'), (19, 'T'), (20, 'U'),
(21, 'V'), (22, 'W'), (23, 'X'), (24, 'Y'), (25, 'Z')]
There’s an easier way to make all of the letters, too.
Python’s string
module has a variable called ascii_uppercase
that points to a string containing all of the uppercase letters in the English alphabet:
>>> import string
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> enumerated_letters = list(enumerate(string.ascii_uppercase))
>>> enumerated_letters
[(0, 'A'), (1, 'B'), (2, 'C'), (3, 'D'), (4, 'E'), (5, 'F'), (6, 'G'),
(7, 'H'), (8, 'I'), (9, 'J'), (10, 'K'), (11, 'L'), (12, 'M'), (13, 'N'),
(14, 'O'), (15, 'P'), (16, 'Q'), (17, 'R'), (18, 'S'), (19, 'T'),
(20, 'U'), (21, 'V'), (22, 'W'), (23, 'X'), (24, 'Y'), (25, 'Z')]
OK, so we’ve associated characters to integers. That means we’ve got a character encoding!
But, how do you use it?
To encode the string ”PYTHON”
as a sequence of integers, you need a way to look up the integer associated with each character. But, looking things up in a list of tuples is hard. It’s also really inefficient. (Why?)
Dictionaries are good for looking things up. If we convert enumerated_letters
to a dictionary, we can quickly look up the letter associated with an integer:
>>> int_to_char = dict(enumerated_letters)
>>> # Get the character paired with 1
>>> int_to_char[1]
'B'
>>> # Get the character paired with 15
>>> int_to_char[15]
'P'
However, to encode the string ”PYTHON”
you need to be able to look up the integer associated with a character. You need the reverse of int_to_char.
How do you swap keys and values in a Python dictionary?
One way is use the reversed()
function to reverse key-value pairs from the int_to_char
dictionary:
>>> # int_to_char.items() is a "list" of key-value pairs
>>> int_to_char.items()
dict_items([(0, 'A'), (1, 'B'), (2, 'C'), (3, 'D'), (4, 'E'), (5, 'F'),
(6, 'G'), (7, 'H'), (8, 'I'), (9, 'J'), (10, 'K'), (11, 'L'), (12, 'M'),
(13, 'N'), (14, 'O'), (15, 'P'), (16, 'Q'), (17, 'R'), (18, 'S'),
(19, 'T'), (20, 'U'), (21, 'V'), (22, 'W'), (23, 'X'), (24, 'Y'),
(25, 'Z')])
>>> # The reversed() function can reverse a tuple
>>> pair = (0, "A")
>>> tuple(reversed(pair))
('A',