DBest Reads: Assembly Language: Simple, Short, and Straightforward Way of Learning Assembly Programming

Assembly Language: Simple, Short, and Straightforward Way of Learning Assembly Programming

Chapter 1 - Introduction to Assembly Language

This chapter presents a brief introduction in assembly and the basic needed concepts in starting assembly programming to solve computer science related problems.

What is Assembly Language?

Assembly language is a low-level and the most basic programming language available for any processor. In assembly, programmers works directly on operations implemented on the computer's processor. Due to the nature of assembly language, it lacks high-level programming convenience since its far from human language like other high-level programming languages does. With assembly language, instructions like JMP @X, MOV AH, 09D, JNZ, DEC and many others are the codes that programmers deal with. At first glance, this codes does not represent any English like words, so learning assembly language can be quite a challenge for beginners. These are mnemonic codes that represents instructions and unlike other programming languages, in assembly, a certain code can represent several instructions which makes it more difficult.

Any computer has exactly two things on its foundation, a CPU and some Memory. These two makes computer programs run. The CPU reads the numbers one at a time, decodes them, and does what the numbers say, while the memory acts as the temporary storage location for the CPU. In assembly programming, programmers will deal with instructions implemented directly on the CPU and memory.

CPU REGISTERS

In the CPU Registers (for 32 bit), the 32 bit is the leftmost part of the register and the 16 bit is the rightmost part. Also the 16 bit are divided into two parts, the higher 8 bit and the lower 8 bit. Figure 1.1 presents the leftmost and rightmost illustration. All sample assembly codes and programs in this book uses only the 16 bit registers.

Figure 1.1 - Leftmost and rightmost composition of a register

1. General Purpose Register

Figure 1.2 presents the general purpose registers which are available to the programmer.

Figure 1.2 - General purpose registers

As presented in figure 1.2, AX/EAX and DX/EDX both are used for input/output operations and AX/EAX and BX/EBX are also used for arithmetic computation. Note that AX, BX, CX, and DX are the 16 bit versions and EAX,EBX, ECX, and EDX are the 32 bit versions. Figure 1.3 presents the 32 bit and 16 bit areas of the general purpose registers.

Figure 1.3 - 32 bit and 16 bit composition

As presented in figure 1.3, the 16 bit registers are compose of two parts, the higher 8 bit and the lower 8 bit. The register AX comprises of AH (higher 8 bit) and AL (lower 8 bit), and so with the remaining 16 bit registers.

2. Segment Registers

Segments are specific areas defined in a assembly program for containing data, code and stack. There are three main segments which are:

2.1 Code Segment or CS Register contains all the instructions to be executed. A 16-bit Code Segment register stores the starting address of the code segment;

2.2 Data Segment or DS Register contains data, constants and work areas. A 16-bit Data Segment register stores the starting address of the data segment; and

2.3 Stack Segment or SS Register contains data and return addresses of procedures or subroutines. It is implemented as a 'stack' data structure. The Stack Segment register stores the starting address of the stack.

3. Pointer Registers

The 32 bit counterpart of pointer registers are EIP, ESP, and EBP registers and corresponding 16 bit counterpart are IP, SP, and BP. There are three categories of pointer registers which are (the following focuses on 16 bit only):

3.1 Instruction Pointer (IP) is the 16 bit IP register which stores the offset address of the next instruction to be executed. IP in association with the CS register (as CS:IP) gives the complete address of the current instruction in the code segment.

3.2 Stack Pointer (SP) is the 16 bit SP register which provides the offset value within the program stack. SP in association with the SS register (SS:SP) refers to be current position of data or address within the program stack.

3.3 Base Pointer (BP) is the 16 bit BP register which mainly helps in referencing the parameter variables passed to a subroutine. The address in SS register is combined with the offset in BP to get the location of the parameter. BP can also be combined with DI and SI as base register for special addressing. Figure 1.4 presents the Pointer Registers.

4. Index Registers

The 32 bit counterpart of index registers are ESI and EDI, and their 16 bit counterpart are SI and DI. Index Registers are used for indexed addressing and sometimes used in addition and subtraction. The two sets of index pointers are:

4.1 Source Index (SI) which is used as source index for string operations.

4.2 Destination Index (DI) which is used as destination index for string operations.

Figure 1.5 presents the Index Registers.

5. Control Registers

Combining the 32 bit instruction pointer register and the 32 bit flags register are considered as the control registers. Many instructions involve comparisons and mathematical calculations may change the status of the flags and some other conditional instructions test the value of these status flags to take the control flow to other location. The common flag bits are:

5.1 Overflow Flag (OF) indicates the overflow of a high-order bit (leftmost bit) of data after a signed arithmetic operation.

5.2 Direction Flag (DF) determines left or right direction for moving or comparing string data. When the DF value is 0, the string operation takes left-to-right direction and when the value is set to 1, the string operation takes right-to-left direction.

5.3 Interrupt Flag (IF) determines whether the external interrupts like keyboard entry and others are to be ignored or processed. It disables the external interrupt when the value is 0 and enables interrupts when set to 1.

5.4 Trap Flag (TF) allows setting the operation of the processor in single-step mode. The DEBUG program we used sets the trap flag, so we could step through the execution one instruction at a time.

5.5 Sign Flag (SF) shows the sign of the result of an arithmetic operation. This flag is set according to the sign of a data item following the arithmetic operation. The sign is indicated by the high-order of leftmost bit. A positive result clears the value of SF to 0 and negative result sets it to 1.

5.6 Zero Flag (ZF) indicates the result of an arithmetic or comparison operation. A nonzero result clears the zero flag to 0, and a zero result sets it to 1.

5.7 Auxiliary Carry Flag (AF) contains the carry from bit 3 to bit 4 following an arithmetic operation; used for specialized arithmetic. The AF is set when a 1-byte arithmetic operation causes a carry from bit 3 into bit 4.

5.8 Parity Flag (PF) indicates the total number of 1-bits in the result obtained from an arithmetic operation. An even number of 1-bits clears the parity flag to 0 and an odd number of 1-bits sets the parity flag to 1.

5.9 Carry Flag (CF) contains the carry of 0 or 1 from a high-order bit (leftmost) after an arithmetic operation. It also stores the contents of last bit of a shift or rotate operation.

Example programs used in this book will not cover the above mentioned flags.

Chapter 2 - Our First Assembly Program

This chapter gives a quick glimpse on coding instructions in assembly. This chapter deals with assembler installation, writing codes, and running our very first assembly program.

Assembly Program Structure

Codes in assembly language are not case sensitive. The instruction MOV and mov means the same. In this book all reserved words and labels are coded in uppercase and all other user-defined codes are in lower case for readability purpose.

Fundamentals of Assembly Instructions

An instruction is a statement that is executed by the processor at runtime after the program has been loaded into memory and started. An instruction format contains three basic parts:

1. Mnemonic Code (required)

2. Operand(s) (usually required)

3. Comment (optional)

Figure 2.1: Assembly instructions usual format

Mnemonic Code

The mnemonic code is a short word that identifies the operation carried out by an instruction.

Operands

The operands are instructions which could have between zero to three operands, each of which can be a register, memory operand, constant expression, or I/O port.

Comments

Comments are a short description of the codes purpose. A way for the programmer to communicate information about how the code works to a person reading the source code.

Example 1: MOV AH,9D ;Displays string

Explanation: MOV is an example of a mnemonic code instruction in assembly. The instruction MOV requires two operands separated by comma. We can say that AH is the first operand and 9D is the second operand. ;Display string is the comment. Comments are ignored by the assembler and will not affect the output of the program. Comments are used as internal documentation for the programmers reference of what the code does. It is a good practice in assembly to add comments as assembly codes does not usually represent English words.

Example 2: INT 21H ;Calls DOS service

Explanation: INT is another example of a mnemonic code instruction in assembly. The instruction INT requires only one operand and in this example 21H is the operand. ;Call Dos service is the comment.

The MOV instruction in Assembly

The MOV instruction appears to be shorthand for the word "Move". This makes a lot of sense as this instruction move from source to destination. However, the MOV instructions does not really moves the value from the source to the destination, instead it copies the value of the source to the destination. The syntax is presented in figure 2.2

Figure 2.2: Format of the MOV instruction

In figure 2.2, the destination can either be a register or memory address, and the source can be a register, memory address, or immediate value.

Example: MOV AH, 9D

Explanation: In the example, it copies the source with the value 9D (D means decimal), to the destination Register AH. This means that AH now holds the value 9D.

The INT instruction in Assembly

INT stands for interrupt. INT is an assembly language instruction for x86 processors that generates a software interrupt. Figure 2.3 presents the syntax of the INT instruction.

Figure 2.3: Format of the INT instruction

Example: INT 21H

Explanation: The example calls the software interrupt 21H (H means Hexadecimal) which is the DOS service. This usually returns the program to DOS. Other software interrupts and mnemonic codes are discussed in the succeeding chapters of this book.

The assembly program structure can be best explained with example. Figure 2.4 is a sample program that prints the word "Hello World!" into the screen.

Figure 2.4 - A program that prints "Hello World!"

Note that the numbers in the leftmost part are just line numbers, hence not a part of the program code.

;<Code listings in Figure 2.4 - START>

.MODEL SMALL

.STACK

.DATA

Message DB "Hello World!$"

.CODE

MAIN: ; Below are comments

MOV DX,OFFSET Message ; Offset of Message is in DX

MOV AX,SEG Message ; Segment of Message is in AX

MOV DS,AX ; DS:DX points to string

MOV AH,9D ; Function 9D displays string

INT 21H ; Calls dos service

MOV AH, 4CH ; Code to terminate the program

INT 21H ; Calls dos service

END MAIN

;<Code listings in Figure 2.4 - END>

Explanation of code listings in figure 2.4

The ";" <semi-colon> is for comments for each line wherein anything that follows the ";" are ignored by the assembler, hence it will not affect the output of the program.

Line 1: .MODEL

It is an Assembler directive that defines the memory model to use in the program. Basically, memory models defines how big the program is. The bigger the program, the bigger model should be defined. The different memory models in assembly are:

a. TINY. This means that there is only one segment for both code and data. This type of program can be a .com file;

b. SMALL. This means that by default all code is place in one segment and all data declared in the data segment is also placed in one segment which means that all procedures and variables are addressed as NEAR by pointing at offsets only. SMALL is the MODEL that were used for all sample programs in this book;

c. COMPACT. This means that by default all elements of code are placed in one segment but each element of data can be placed in its own physical segment which means that data elements are addressed by pointing at both at the segment and offset addresses. Code elements (procedures) are NEAR and variables are FAR;

d. Medium. This is the opposite to compact. Data elements are NEAR and procedures are FAR;

e. Large. This means that both procedures and variables are FAR. It is needed to point at both the segment and offset addresses; and

f. FLAT. This isn't used much as it is for 32 bit unsegmented memory space. For this, DOS extender is needed. This is what is needed to be used in writing a program to interface with a C/C++ program that used a DOS extender such as DOS4GW or PharLap.

Line 2: .STACK

It is an Assembler directive that reserves a memory space for program instructions in the stack. This directive is used for stack based programs which is discussed in the succeeding chapters of this book.

Line 3: .DATA

It indicates that the data segment starts here and that the stack segment ends there. This directive is where we declare and/or assign value to a storage similar to variable declaration and definition (assigning value to a variable) in high-level language.

Line 4: Message DB "Hello World!$"

Message is the variable name declared (user-defined) with a memory directive (define directive) of DB which stands for DefineByte (used for all programs in this book), and the value assigned to the variable Message is "Hello World!". The $ <dollar sign> is a string terminator, which means that it will not be printed in the screen. If $ is miss out, it will result to random characters printed on the screen, since the end of the string is not defined. The list of memory-directives (define directives) is presented in table 2.1