{ Variables and Primitives. }

Objectives:

By the end of this chapter, you should be able to:

  • Define variables in Python
  • Explain the difference between ASCII and Unicode
  • Use basic methods to manipulate strings in Python

Variables

Declaring a variable in Python is quite easy; just use = to assign a value to a variable. If you want to indicate an absence of value, you can assign a variable to the value None.

a = 1
b = "YOLO"
nothing = None

In Python you can also do multiple assignment with variables separated by a comma. This is quite common with Python so make sure to try this out. Here's a basic example:

a,b = 1,2
a # 1
b # 2

By convention, variable names should be written in snake_case. This means that all words should be lowercase, and words should be separated by an underscore. This is a very strong convention in Python, so you should get into the habit as quickly as possible!

myVariable = 3 # Get outta here, this isn't JavaScript!
my_variable = 3 # much better

Types in Python

As we mentioned in the last chapter, Python has quite a few built in data types, including booleans, numeric types (int, float), strings, tuples, lists, dictionaries, and many more. We can see the type of an object using the built in type function.

type(False) # bool
type("nice") # str
type({}) # dict
type([]) # list
type(()) # tuple
type(None) # NoneType

Strings in Python

In Python 2 strings, are stored internally as 8 bit ASCII. But in Python 3, all strings are represented in Unicode.

Uh, what?

Before we talk about methods on strings in Python, let's learn a little bit about the history of character encodings. If you would like a longer description, feel free to read this excellent article.

When we as humans see text on a computer screen, we are viewing something quite different than what a computer processes. Remember that computers deal with bits and bytes, so we need a way to encode (or map) characters to something a computer can work with. In 1968, the American Standard Code for Information Interchange (or ASCII) was standardized as a character encoding. ASCII defined codes for characters ranging from 0 to 127.

Why this range? Remember that computers work in base 2 or binary, so each bit represents a power of two. This means that 7 bits can get us 2^7 = 128 different binary numbers; since each bit can equal 0 or 1, with 7 bits we can represent all numbers from 0000000 up to 1111111. With ASCII, we can then map each of these numbers to a distinct character. Since there are only 26 letters in the alphabet (52 if you care about the distinction between upper and lower case), plus a handful of digits and punctuation characters, ASCII should more than cover our needs, right?

ASCII was a great start, but issues arose when non English characters like é or ö could not be processed and would just be converted to e and o. In the 1980s, computers were 8-bit machines which meant that bytes now held 8 bits. The highest binary number we could obtain on these machines was 11111111 or 2^0 + 2^1 + 2^2 + 2^3 + 2^4 + 2^5 + 2^6 + 2^7, or 255. Different machines now used the values of 128 to 255 for accented characters, but there was not a standard that emerged until the International Standards Organization (or ISO) emerged.

Even with an additional 128 characters, we started running into lots of issues once the web grew. Languages with completely different character sets like Russian, Chinese, Arabic, and many more had to be encoded in completely different character sets, causing a nightmare when trying to deliver a single text file with multiple character sets.

In the 1980s, a new encoding called Unicode was introduced. Unicode's mission was to encode all possible characters and still be backward compatible with ASCII. The most popular character encoding that is dominant on the web now is UTF-8, which uses 8-bit code units, but with a variable length to ensure that all sorts of characters can be encoded.

TL;DR: in Python3, strings are Unicode by default.

String Methods

Python contains quite a few helpful string methods; here are a few. Try running these in a REPL to see what they do!

Let's start with a simple variable:

string = "this Is nIce"

Upper

To convert every character to upper-case we can use the upper function.

string.upper() # 'THIS IS NICE'

Lower

To convert every character to lower-case we can use the lower function.

string.lower() # 'this is nice'

Capitalize

To convert the first character in a string to upper-case and everything else to lower-case we can use the capitalize function.

string.capitalize() # 'This is nice'

Title

To convert every first character in a string to upper-case and everything else to lower-case we can use the title function.

string.title() # 'This Is Nice'

Find

To find a subset of characters in a string we can use the find method. This will return the index at which the first match occurs. If the character/characters is/are not found, find will return -1

instructor = 'elie'
instructor.find('e') # 0
instructor.find('E') # -1 it IS case sensitive!

string.find("i") # 2, since the character "i" is at index 2
string.find('Tim') # -1 

isalpha

To see if all characters are alphabetic we can use the isalpha function.

string.isalpha() # False
string[0].isalpha() # True

isspace

To see if a character or all characters are empty spaces, we can use the isspace function

string.isspace() # False
string[0].isspace() # False
string[4].isspace() # True

islower

To see if a character or all characters are lower-cased , we can use the islower function (there is also a function, which does the inverse called isupper)

string.islower() # False
string[0].islower() # True
string[5].islower() # False
string.lower().islower() # True

istitle

To see if a string is a "title" (first character of each word is capitalized), we can use the istitle function.

string.istitle() # False
string.title().istitle() # True

"not Awesome Sauce".istitle() # False
"Awesome Sauce".istitle() # True

endswith

To see if a string ends with a certain set of characters we can use the endswith function.

"string".endswith('g') # True
"awesome".endswith('foo') # False

partition

To partition a string based on a certain character, we can use the partition function.

string.partition('i') # what's the type of what you get back?
"awesome".partition('e') # ('aw', 'e', 'some')

Formatting strings with format

One of the most common string methods you'll use is the format method. This is a powerful method that can do all kinds of string manipulation (see here for an entire book on it), but it's most commonly just used to pass varaibles into strings. In general this is preferred over string concatenation, which can quickly get cumbersome if you're mixing a lot of variables with strings. For example:

first_name = "Matt"
last_name = "Lane"
city = "San Francisco"
mood = "great"

greeting = "Hi, my name is " + first_name + " " + last_name + ", I live in " + city + " and I feel " + mood + "."
greeting # 'Hi, my name is Matt Lane, I live in San Francisco and I feel great.'

Here, the greeting variable looks fine, but all that string concatenation isn't easy on the eyes. It's very easy to forget about a + sign, or to forget to separate words with extra whitespace at the beginning and end of our strings.

This is one reason why format is nice. Here's a refactor:

greeting = "Hi, my name is {} {}, I live in {} and I feel {}.".format(first_name, last_name, city, mood)

When we call format on a string, we can pass variables into the string! The variables will be passed in order, wherever format finds a set of curly braces.

If the empty curly braces seem a little weird, we can also create variable names and assign them to the values we're passing in:

greeting = "Hi, my name is {first} {last}, I live in {my_city}, and I feel {my_mood}.".format(
    first=first_name, 
    last=last_name,
    my_city=city, 
    my_mood=mood
)

This approach can be helpful if you want to pass in the same variable multiple times in your string:

new_greeting = "Hi, my name is {name}. My friends call me {name}. You can call me {name}.".format(name=first_name)

When you're ready, move on to Boolean Logic

Continue