Assembly Language: Simple, Short, and Straightforward Way of Learning Assembly Programming
Chapter 1 - Introduction to Assembly Language
This chapter presents a brief
introduction in assembly and the basic needed concepts in starting assembly
programming to solve computer science related problems.
What is Assembly
Language?
Assembly language is a low-level and the most basic
programming language available for any processor. In assembly, programmers works
directly on operations implemented on the computer's processor. Due to the
nature of assembly language, it lacks high-level programming convenience since its
far from human language like other high-level programming languages does. With
assembly language, instructions like JMP @X,
MOV AH, 09D, JNZ, DEC and many others are the codes that programmers deal
with. At first glance, this codes does not represent any English like words, so
learning assembly language can be quite a challenge for beginners. These are mnemonic
codes that represents instructions and unlike other programming languages, in
assembly, a certain code can represent several instructions which makes it more
difficult.
Any computer has exactly two things on its foundation, a CPU and some Memory. These two makes computer programs run. The CPU reads the
numbers one at a time, decodes them, and does what the numbers say, while the
memory acts as the temporary storage location for the CPU. In assembly
programming, programmers will deal with instructions implemented directly on
the CPU and memory.
In the CPU Registers (for 32 bit),
the 32 bit is the leftmost part of the register and the 16 bit is the rightmost
part. Also the 16 bit are divided into two parts, the higher 8 bit and the lower
8 bit. Figure 1.1 presents the leftmost and rightmost illustration. All sample assembly codes and programs in
this book uses only the 16 bit registers.
Figure 1.1 - Leftmost and rightmost composition of a register
1. General Purpose Register
Figure 1.2 presents the general
purpose registers which are available to the programmer.
Figure
1.2 - General purpose registers
As presented in figure 1.2, AX/EAX and DX/EDX
both are used for input/output operations and AX/EAX and BX/EBX are also used
for arithmetic computation. Note that AX, BX, CX, and DX are the 16 bit versions
and EAX,EBX, ECX, and EDX are the 32 bit versions. Figure 1.3 presents the 32
bit and 16 bit areas of the general purpose registers.
Figure 1.3 - 32 bit and 16 bit composition
As presented in figure 1.3, the
16 bit registers are compose of two parts, the higher 8 bit and the lower 8
bit. The register AX comprises of AH (higher 8 bit) and AL (lower 8 bit), and
so with the remaining 16 bit registers.
2.
Segment Registers
Segments are specific areas
defined in a assembly program for containing data, code and stack. There are
three main segments which are:
2.1 Code Segment or CS Register
contains all the instructions to be executed. A 16-bit Code Segment register
stores the starting address of the code segment;
2.2 Data Segment or DS Register
contains data, constants and work areas. A 16-bit Data Segment register stores
the starting address of the data segment; and
2.3 Stack Segment or SS Register
contains data and return addresses of procedures or subroutines. It is
implemented as a 'stack' data structure. The Stack Segment register stores the
starting address of the stack.
3.
Pointer Registers
The 32 bit counterpart of pointer
registers are EIP, ESP, and EBP registers and corresponding 16 bit counterpart
are IP,
SP, and BP. There are three categories of pointer registers which are (the
following focuses on 16 bit only):
3.1 Instruction Pointer (IP) is the 16 bit IP register which stores the
offset address of the next instruction to be executed. IP in association with
the CS register (as CS:IP) gives the complete address of the current
instruction in the code segment.
3.2 Stack Pointer (SP) is the 16 bit SP register which provides the
offset value within the program stack. SP in association with the SS register
(SS:SP) refers to be current position of data or address within the program
stack.
3.3 Base Pointer (BP) is the
16 bit BP register which mainly helps in referencing the parameter variables
passed to a subroutine. The address in SS register is combined with the offset
in BP to get the location of the parameter. BP can also be combined with DI and
SI as base register for special addressing. Figure 1.4 presents the Pointer
Registers.
4.
Index Registers
The 32 bit counterpart of index registers are ESI and EDI, and their
16 bit counterpart are SI and DI. Index Registers are used for
indexed addressing and sometimes used in addition and subtraction. The two sets
of index pointers are:
4.1 Source Index (SI) which is used as source index for string
operations.
4.2 Destination Index (DI) which
is used as destination index for string operations.
Figure 1.5 presents the Index
Registers.
5.
Control Registers
Combining the 32 bit instruction
pointer register and the 32 bit flags register are considered as the control
registers. Many instructions involve comparisons and mathematical calculations
may change the status of the flags and some other conditional instructions test
the value of these status flags to take the control flow to other location. The
common flag bits are:
5.1 Overflow Flag (OF) indicates the overflow of a high-order bit
(leftmost bit) of data after a signed arithmetic operation.
5.2 Direction Flag (DF) determines left or right direction for moving
or comparing string data. When the DF value is 0, the string operation takes
left-to-right direction and when the value is set to 1, the string operation
takes right-to-left direction.
5.3 Interrupt Flag (IF) determines whether the external interrupts like
keyboard entry and others are to be ignored or processed. It disables the
external interrupt when the value is 0 and enables interrupts when set to 1.
5.4 Trap Flag (TF) allows setting the operation of the processor in
single-step mode. The DEBUG program we used sets the trap flag, so we could
step through the execution one instruction at a time.
5.5 Sign Flag (SF) shows the sign of the result of an arithmetic
operation. This flag is set according to the sign of a data item following the
arithmetic operation. The sign is indicated by the high-order of leftmost bit.
A positive result clears the value of SF to 0 and negative result sets it to 1.
5.6 Zero Flag (ZF) indicates the result of an arithmetic or comparison
operation. A nonzero result clears the zero flag to 0, and a zero result sets
it to 1.
5.7 Auxiliary Carry Flag (AF) contains the carry from bit 3 to bit 4
following an arithmetic operation; used for specialized arithmetic. The AF is
set when a 1-byte arithmetic operation causes a carry from bit 3 into bit 4.
5.8 Parity Flag (PF) indicates the total number of 1-bits in the result
obtained from an arithmetic operation. An even number of 1-bits clears the
parity flag to 0 and an odd number of 1-bits sets the parity flag to 1.
5.9 Carry Flag (CF) contains the carry of 0 or 1 from a high-order bit
(leftmost) after an arithmetic operation. It also stores the contents of last
bit of a shift or rotate operation.
Example programs used in this
book will not cover the above mentioned flags.
Chapter 2 - Our First Assembly Program
This chapter gives a quick glimpse
on coding instructions in assembly. This chapter deals with assembler
installation, writing codes, and running our very first assembly program.
Assembly Program
Structure
Codes in assembly language are not case sensitive. The
instruction MOV and mov means the same. In this book all reserved words and labels are coded in uppercase and
all other user-defined codes are in lower case for readability purpose.
Fundamentals of Assembly Instructions
An instruction is a statement
that is executed by the processor at runtime after the program has been loaded
into memory and started. An instruction format contains three basic parts:
1. Mnemonic Code (required)
2. Operand(s) (usually required)
3. Comment (optional)
Figure 2.1: Assembly instructions usual format
Mnemonic Code
The mnemonic code is a short word that identifies the
operation carried out by an instruction.
Operands
The operands are instructions which could have between zero
to three operands, each of which can be a register, memory operand, constant
expression, or I/O port.
Comments
Comments are a short description of the codes purpose. A way
for the programmer to communicate information about how the code works to a
person reading the source code.
Example 1:
MOV AH,9D ;Displays string
Explanation: MOV
is an example of a mnemonic code instruction in assembly. The instruction MOV
requires two operands separated by comma. We can say that AH is the first operand and 9D
is the second operand. ;Display string
is the comment. Comments are ignored by the assembler and will not affect the
output of the program. Comments are used as internal documentation for the
programmers reference of what the code does. It is a good practice in assembly
to add comments as assembly codes does not usually represent English words.
Example 2:
INT 21H ;Calls DOS service
Explanation: INT
is another example of a mnemonic code instruction in assembly. The instruction
INT requires only one operand and in this example 21H is the operand. ;Call
Dos service is the comment.
The MOV
instruction appears to be shorthand for the word "Move". This makes a
lot of sense as this instruction move
from source to destination. However, the MOV instructions does not really moves
the value from the source to the destination, instead it copies the value of
the source to the destination. The syntax is presented in figure 2.2
Figure 2.2: Format of the MOV instruction
In figure 2.2, the destination can either be a register or
memory address, and the source can be a register, memory address, or immediate
value.
Example: MOV AH, 9D
Explanation: In the example, it copies the source with the value 9D (D means decimal), to the destination Register AH. This means that AH now holds the
value 9D.
INT stands for interrupt. INT is
an assembly language instruction for x86 processors that generates a software
interrupt. Figure 2.3 presents the syntax of the INT instruction.
Figure 2.3: Format of the
INT instruction
Example: INT 21H
Explanation: The example calls the software interrupt 21H (H means Hexadecimal) which is the
DOS service. This usually returns the program to DOS. Other software interrupts
and mnemonic codes are discussed in the succeeding chapters of this book.
The assembly program structure can be best explained with
example. Figure 2.4 is a sample program
that prints the word "Hello World!" into the screen.
Note that the numbers in the leftmost part are just line
numbers, hence not a part of the program code.
;<Code listings in Figure 2.4 - START>
.MODEL
SMALL
.STACK
.DATA
Message
DB "Hello World!$"
.CODE
MAIN: ; Below are comments
MOV
DX,OFFSET Message ; Offset of Message is in DX
MOV
AX,SEG Message ; Segment of Message is
in AX
MOV
DS,AX ; DS:DX points to string
MOV
AH,9D ; Function 9D displays
string
INT
21H ; Calls dos service
MOV
AH, 4CH ; Code to terminate the
program
INT
21H ; Calls dos service
END
MAIN
;<Code listings in Figure 2.4 - END>
Explanation
of code listings in figure 2.4
The ";"
<semi-colon> is for comments for each line wherein anything that follows
the ";" are ignored by the assembler, hence it will not affect the
output of the program.
Line 1: .MODEL
It is an Assembler directive
that defines the memory model to use in the program. Basically, memory models
defines how big the program is. The bigger the program, the bigger model should
be defined. The different memory models in assembly are:
a. TINY. This means that there is only one segment for both code and
data. This type of program can be a .com
file;
b. SMALL. This means that by default all code is place in one segment
and all data declared in the data segment is also placed in one segment which
means that all procedures and variables are addressed as NEAR by pointing at
offsets only. SMALL is the MODEL that were used for all sample
programs in this book;
c. COMPACT. This means that by default all elements of code are placed
in one segment but each element of data can be placed in its own physical
segment which means that data elements are addressed by pointing at both at the
segment and offset addresses. Code elements (procedures) are NEAR and variables
are FAR;
d. Medium. This is the opposite to compact. Data elements are NEAR and
procedures are FAR;
e. Large. This means that both procedures and variables are FAR. It is
needed to point at both the segment and offset addresses; and
f. FLAT. This isn't used much as it is for 32 bit unsegmented memory
space. For this, DOS extender is needed. This is what is needed to be used in writing a program to interface with a C/C++
program that used a DOS extender such as DOS4GW or PharLap.
Line 2: .STACK
It is an Assembler directive that reserves a memory space for program
instructions in the stack. This directive is used for stack based programs
which is discussed in the succeeding chapters of this book.
Line 3: .DATA
It indicates that the data
segment starts here and that the stack segment ends there. This directive is
where we declare and/or assign value to a storage similar to variable
declaration and definition (assigning value to a variable) in high-level
language.
Line 4: Message DB "Hello World!$"
Message is the variable name
declared (user-defined) with a memory directive (define directive) of DB which stands for DefineByte (used for all programs in
this book), and the value assigned to the variable Message is "Hello World!". The $ <dollar sign> is a string terminator, which means
that it will not be printed in the screen. If $ is miss out, it will result to
random characters printed on the screen, since the end of the string is not defined.
The list of memory-directives (define directives) is presented in table 2.1
Enjoyed the reading?
The following are the entire contents:
1.1 What is Assembly Language?
1.2 CPU REGISTERS
1.2.1 General Purpose Register
1.2.2 Segment Registers
1.2.3 Pointer Registers
1.2.4 Index Registers
1.2.5 Control Registers
2.1 Assembly Program Structure
2.1.1 Fundamentals of Assembly
Instructions
2.1.2 The MOV instruction in
Assembly
2.1.3 The INT instruction in
Assembly
2.1.4 Reserved words in assembly
2.2 Running our first Assembly
program
2.2.1 How to install TASM?
2.2.2 Writing the Assembly Program
Codes
2.2.3 Compiling (Assembling),
Linking and Running the Program
3.1 Simplified Segment Directives
3.2 Output Routines
3.3 Input Routines
4.1 Introduction to Arithmetic
Instruction
4.2 The ADD Instruction
(Addition)
4.3 The SUB Instruction
(Subtraction)
4.4 The INC Instruction
(Increment)
4.5 The DEC Instruction
(Decrement)
4.6 The IMUL and MUL Instructions
(Multiplication)
4.7 The IDIV and DIV Instructions
(Division)
4.8 Handling numeric data
4.8.1 Algorithm in printing 2 digit
number
4.8.2 Algorithm in printing 3 digit
number
4.8.3 Algorithm in accepting 2
digit number
4.8.4 Algorithm in accepting 3
digit number
5.1 Conditional Control
5.1.1 Conditional Jumps
5.1.2 Unconditional Jump
5.2 Loop Control
5.2.1 Conditional Loop
5.2.2 Counter Controlled Loop
6.1 What is Stack?
6.1.1 PUSH operation
6.1.2 POP operation
6.2 Stack Simulation
6.3 Stack Oriented Program
7.1 Defining a Procedure
7.2 Calling a Procedure
7.3 Procedure Oriented Program
8.1 OddEven Program
8.2 Legal Age Program
8.3 Alphabet Program
List of sample programs:
No comments:
Post a Comment