Operations and Data
Variables
A program is a sequence of operations. An operation is one word for what is also called a command, a step, an action, or by many other names. The reason for the ambiguity in name is that an operation can be something very simple or basic (like adding 1 + 1) to something very complex like a complete program of thousands of simpler operations.Operations either check data in the program or they manipulate the data in the program. Data in the program is stored in variables. A variable is a data storing entity, a block of memory, in which data can be stored. The variable name is an alias to the address of the storage space. Through the name (the memory address) we can access the storage space and see what is stored there. The variable value is the data that is stored. Many operations are operations that check the data in a variable or operations that set the data in a variable to a new value.
Variable Declarations
Before you can use a variable in your program you must declare it. Every language will have a way to declare variables and only a few allow implicit definition (use without declaration). A variable declaration is a statement that states the variable’s name and type unambiguously as well as possibly setting up an initial value. This means a type must be included (sometimes this is implicit) and a name selected that is unique relative to other names used in the program.Variable Names
Naming your variables is an art. There have been suggested many conventions. The challenge with naming is picking a concise name that carries the purpose of the variable. Often we want to name the variable quickly and common bad choices of names are “x”, “i”, “num”, “var”, and so on. They pass the concise test but fail badly at capturing the purpose of the variable. If a variable is intended to count something then “count” is a better name than “x”. An even better name might tell us what we are counting like “nameCount” or “accessCount”. Notice that the names I’ve selected are a bit strange. Programming languages often make restrictions on what names you may use and a very common one is that you cannot use spaces in your names. Squishing multiple words together is hard to read, so we either use an underscore “_” in place of a space or we capitalize the first letter of every word. You may see other conventions used as well. I have not capitalized the first letter because capitalized variable names often carry a meaning in the programmers convention (for instance it might signify a constant variable). Sometimes a variable might be preceded by an underscore “_” to mean something else. As you program you will adopt the conventions of the programmers around you and also develop your own style.Variable Operations
When a value is assigned to a variable we call the operation an assignment. Another common name for an assignment is a “set” operation. Assignments set the variable’s value to the new value stated. Often when a variable is created it is assigned an initial value and this is called initialization. It is good programming style to always initialize your variables when you create them. Failure to initialize a variable before use can lead to errors in your program.Some variables will have their initial value for the duration of the program. We call these constant variables. Other variables will change their values through assignment operations. The assignments may be to store the intermediate results of our calculation, the input data from the user, or the output to be presented to the user. The ability for a variable to change its value is called mutability and assignment operations are called mutation operations.
Other than assignments variables can be used in specific operations that usually depend on the type of variable. Variables with number values can be added, subtracted, and so on. Each data type has its own collection of operations defined on it. Modern programming languages come with a wide variety of operations on the types of data listed below. In a rare case where an operation does not already exist in the programming language new operations can be defined by the programmer.
Data Types
Variables don’t just have names and values. In most programming languages variables have data types as well. A data types describes the kinds of data that can be stored in the variable’s storage. Common simple data types are numbers, characters, text, and true/false (known as Boolean). Each type permits a kind of data that can be stored in variables of that type while prohibiting other data types. In most languages if you confuse data types (try to assign text to a number variable) you will get an error.
Numbers
A variable of the type number will only store numbers. Depending on the language numbers might all be treated the same, or there may be many different kinds of numbers. Specifically a number might be a whole number (also called a counting number) – we use number variables a lot to count things in our program. However, numbers are also used often to capture data that measures something.
Measurement data often differs from counting data in that the range of measurement data can span the entire range of numbers. Counting data commonly ranges from one (or zero) to some positive number possibly in the millions or even billions, but rarely larger (these days). On the other hand measurement data might be very small (less than one in a trillion) or very large (millions of billions of billions). Measurement data usually contains decimals, and may express irrational numbers. In rare cases special numbers like complex numbers are needed for scientific programs.
A common number type you’ll encounter is integer (sometimes known as int). Integer variables can only store integer values and usually in a range. Common ranges are from [-N … N-1] where N is a power of 2. The size of N depends on language, the machine and the variable type selected. These days a common value of N is 2^31 (2,147,483,648 or about 2 billion) or 2^63 (9,223,372,036,854,775,808 or about 9 billion billion or 9 quintillion) based on the bit size of the computer. Certain languages will allow you to select different values of N. In rare cases we will need a variable that is unbounded the range of value it might have. When those cases arise we can access more sophisticated numbers types than the basic integer.
In contrast to the integer is the floating point number that is used if we need to capture non-integer data. These variables can have values in a vast range, but the trade off is not every number can be represented.
The format for storing a floating point represented number is similar to scientific notation for numbers where all numbers are expressed as a pair of numbers. First a base number in the range 0 to 1 (not inclusive of 1 usually) and second an exponent representing the power of 10 used to convert the base into the number stored. So the number stored is then base * 10^exponent. To use a floating point number we do not need to know the details of the storage, except to understand that this method means we only ever store a fixed number of decimal places.
The numbers that are represented are commonly rounded off and can suffer from loss of precision after many operations. As a result a number that should be 0 might instead be represented as 0.0000000102 instead. It is important to always remember when using variables of this type that these kind of errors can occur. So while floating point number give us access to a much broader range of numbers, because of precision errors we tend to use them only when necessary, opting for the clean operations of integers when possible.
Characters
A variable of type character (sometimes char) will store any of a number of special characters. Usually these are characters from a specific preselected set of characters. The most commonly used character sets are the ASCII set of 256 standard characters. Most English language documents are still stored as files of ASCII characters. Other documents, especially those that use special symbols or characters from different alphabets, might use the more modern Unicode set of characters.
The way that character encoding works is by assigning each character in the table a number. The number is what is actually stored in the computer, but when the character variable is accessed the program knows to treat that number not as a number, but as the character with that number.
For instance, in the ASCII character encoding scheme the character ‘a’ is assigned the number 97. You will notice this is a different number than the character ‘A’ which is assigned the number 65. This is because though these are the same letters, they are different characters. The uppercase and lowercase letters are distinct characters and thus they get different numbers on the character table.
Another good example is that the character ‘1’ has the number 49. Each of the numerals is a character and a number, but as characters they need a place on the character table. So each of the ten standard numerals appear on the table from numbers 48 to 57. However, only these numbers appear on the character table. If you want to make the number ten out of characters you will need two characters, a ‘1’ and a ‘0’.
The alphabet (upper and lower case) and the numerals are not the only characters on the table. Punctuation, like periods, commas, quotes, and the like, also make an appearance. However many of the symbols on the standard tables are special symbols that get infrequent usage, but are there for special tasks in case we need them.
One special character of interest is the null character. The null character is not really a character at all but instead a symbol that stands in for no character at all. In many ways the null character serves the same role as the zero numeral in math. When you use or encounter the null character it is really symbolizing that there is no character there.
In most languages if you want to stress that you mean the character over some other data type you will use single quotes. This is common in language independent presentation of algorithms too. For instance, if we mean to say the character (the numeral) 1 then we must use single quotes and say ‘1’, otherwise, it is common to interpret it as the number 1. Programmers and programming languages alike prefer this unambiguous convention.
As we will see this remains the convention for all characters even when speaking of characters that aren’t always ambiguous like the letter A. If we wish to stress we mean it as a character we should say ‘a’ or ‘A’ as opposed to just a or A. If we fail to do so in a program then we are likely to get an error, but also in discussion it helps us stay clear about what we are discussing.
Text
Characters are really only an interesting data type if we can string them together to form words, sentences, paragraphs and other collections of characters. A variable of the type text will be a collection of zero or more characters from some character set. We sometimes call such a collection a string of characters. There is a first character in the string and a last character (sometimes a special last character marking the end). Each position in the string is important and the order of the characters matter. That is what makes the string “dog” different from the string “god”.
Strings are actually our first example of a data structure, that is data that has more than just a value, but there is also structure in the data. The string “dog” and the string “god” use the same characters in a different structure (in this case order) to achieve a different value for the string variable. A string is an example of a list structure. Specifically a string is a list of characters.
Strings are used to store many kinds of data but are used most often for text as we know it, that is text representing language of some kind. A substring of a string value is any consecutive piece of the string. For example the string “dog” is a substring of the string “the dog jumped over the fox” but not a substring of “don’t go there” (‘d’, ‘o’, and ‘g’ all appear in the string but not consecutively). A substring that begins with the first character of a string is called a prefix. A substring that ends with the last character of a string is called a suffix.
As with characters we use special notation to be clear that we are speaking about text. The double quote signifies a text or string value. “the dog jumped over the fox” is unambiguously a string when we contain it in double quotes. “dog” is a string but if you used dog (without quotes) in a program it would be interpreted as a variable name not as a text value. Notice this means there is a difference between ‘I’ and “I”, the first being a character and the second being a string.
Boolean
A boolean variable is a variable that can only take two values. It can also be called a binary variable because of this. In this case we sometimes say the two values it can take are zero and one making it a single bit. The most common values associated with the boolean variable are true and false.
When we use a boolean variable in our programs it usually is used as either a flag or a check. A flag is a variable that like a light switch is either on (true) or off (false). They often represent different options that may be set before running the program by the programmer or by a user. The flags then are used to control the program’s performance from run to run. A check variable is used to check if a condition holds or not. The condition is usually a condition on other variables and their values. Checks commonly arise in regular programming and decision making as a part of branch operations.