Lecture 1: Computer Operation & Abstraction Layers

Computer Operation

  • A computer executes machine instructions → simple as that
    • Can be thought of as a language plus a machine
  • These instructions can be very simple, low-level like
    • ADD R1, R5 → add registers 1 and 5, put the result in register 1
  • This simplifies hardware design
    • Easier and cheaper to implement simple machines
    • Makes it general-purpose → capable of running any program
  • Hard to write complex programs this way
    • Too much detail, very long programs, hard to debug
    • Instead, use higher-level languages

Bridging the Mismatch Between Humans and Computers

  • How to execute a high-level language (L1) on standard hardware?
    • Translate it to machine language (L0)
      • Replace the entire program with its machine language equivalent
    • Interpret it
      • Take one instruction in the high-level language and execute it on the spot
  • This process is done by compilers and interpreters
    • The end result is the same
  • Can think of this as a “virtual” machine M1 for L1
    • It executes L1 as far as the user is concerned
    • Does not matter if compiled or interpreted
    • The “computer” is the interpreter + hardware together
  • M1 hides the details of M0 → provides a new abstraction

Aside: Interpreters vs Compilers

  • Interpreters take code and execute the equivalent low-level instructions on the fly
    • Must re-interpret every time the program runs
    • Checks what type of statement it is every time → overhead adds up
  • Compilers translate the entire program once into lower-level code (e.g., assembly)
    • More efficient at runtime
    • Translate once, then every execution afterward runs native assembly/machine code

The M/L Hierarchy (Levels of Abstraction)

┌─────────────────────────────────────────────────────┐
│  Level 5 (M5 / L5)  │  High-Level Language (C, Java)│
├─────────────────────────────────────────────────────┤
│  Level 4 (M4 / L4)  │  Assembly Language            │
├─────────────────────────────────────────────────────┤
│  Level 3 (M3 / L3)  │  Operating System Machine     │
├─────────────────────────────────────────────────────┤
│  Level 2 (M2 / L2)  │  Instruction Set Architecture │
│                     │  (ISA) — machine code (0s/1s) │
├─────────────────────────────────────────────────────┤
│  Level 1 (M1 / L1)  │  Microarchitecture level      │
│                     │  (microprograms, ALU control) │
├─────────────────────────────────────────────────────┤
│  Level 0 (M0 / L0)  │  Digital Logic (transistors,  │
│                     │  gates, flip-flops)           │
└─────────────────────────────────────────────────────┘
  • Each level hides the complexity below it → same abstraction principle discussed in INF 43 (Wasserman 1996)
  • Assembly (L4) looks much simpler than raw microcode (L1) because each assembly instruction gets translated down through the layers
  • When people say “low-level language,” they usually mean L2 (ISA/machine code) or L4 (assembly) → anything below that is typically invisible to programmers and baked into the hardware

Hierarchy of Virtual Machines

  • L1 and L0 must not be too different
    • Otherwise translation or interpretation becomes impractical
    • Example: try going from English directly to machine language
  • But you can have a hierarchy of virtual machines
    • Each machine/language pair defines a level
  • Each new virtual machine
    • Hides details of the level below
    • Adds new, higher-level capabilities
  • This is a structured way of looking at computers

Layered / Structured View

  • Top-down from the highest, most abstract layer
  • A layer
    • Groups together similar elements
    • Abstracts away details of what’s below → they are not visible through this layer’s abstraction
  • This way of looking at a computer system
    • Simplifies understanding of the system
    • Hides complexity
  • At the bottom (for us) is hardware → transistors, etc.
    • In reality there are layers below that
    • (Fortunately for us) → EEs, physicists, chemists, and material scientists deal with them
  • Hardware drives the definition of many basic issues

Example of Hardware Influence

  • Digital hardware has 2 states (on/off)
  • Memory speed depends on its type and size → computers use a hierarchy of memories
    • A few registers (typically 32) → short-term storage
    • Main memory (512KB - 128MB) → intermediate storage
    • Disk (typically 100+ GB) → permanent storage
  • Where is a variable declared in your C program located?
    • Different places at different times → depends on the compiler
    • These details are hidden from the programmer by the intervening layers

Layer Breakdown: Top to Bottom

Application Program Layer (L5)

  • Written in a high-level language to solve a problem
    • A very abstract view of the computer
    • Named variables, if/case statements, loops, and functions
    • Not directly understandable by standard hardware
  • Requires translation to run on a computer system
    • Large mismatch between hardware and these high-level abstractions
    • Compilation into lower-level language (typically assembly or machine language)
      • Which is still not directly executed by hardware
      • Exception: Java is translated to bytecodes and interpreted
    • Uses function libraries supplied by compilers and the OS (printf(), log(), etc.)

Assembly Language Layer (L4)

  • Lower level of abstraction → more of the hardware directly exposed
    • Registers and memory are used explicitly
    • Operators (instructions) are much simpler
      • Cannot say A[I] := A[I] * B[I-3] anymore → need tedious details
      • Ends up using machine instructions like ADD Rx, Ry
    • Still higher than machine language (binary)
  • Views a computer as an Instruction Set Architecture
  • Has access to OS-level abstractions

Operating System Layer (L3)

  • Implements multiprogramming
    • Ability to run multiple programs “simultaneously”
    • Time-sharing the computer’s resources
  • Allows a privileged mode of execution
    • Direct access to certain hardware resources
  • Protection
    • System from user
    • User from others and themselves
  • Provides several major abstractions to a programmer
    • File systems
    • Virtual memory
    • Process creation and scheduling (including loading a program and initiating its execution)
    • I/O devices
    • Networking
  • Access to OS functions is through the system call interface

Instruction Set Architecture Layer (L2)

  • ISA is the assembly programmer’s view of a computer
  • Provides the following abstractions
    • Data types → 8, 16, 32-bit integers, ASCII characters, etc.
    • Operations (instructions) → ADD Rx, Ry, Rz which implements Rx ← Ry + Rz
    • Addressing modes
    • Memory model and addressing → word size, byte order, number of words
    • Registers

Microarchitecture Layer (L1)

  • Also called computer organization → describes
    • Major units of a system → ALU, register file, control unit, memory, caches, I/O
    • How they are interconnected → e.g., ALU is only connected to registers
    • Protocols for their communication
      • Synchronous (clock-based)
      • Asynchronous (handshake-based)
  • Abstracts away the digital logic and hardware layers
    • They contain the same information but at a much more detailed level, plus additional info like voltage levels and timing

Digital Logic Layer (L0)

  • Describes a computer at a gate level
    • Gates are the basic building blocks → AND, OR, NOT, 1-bit storage element
  • Shows gate-level implementation → an adder is described as a number of logic gates and their connections
  • Some hardware details are still abstracted away
    • Timing information is much simpler
    • Analog nature of transistors can be ignored → treat them as digital switches (ON or OFF)

Hardware Layer (Below L0)

  • Describes a computer at the transistor level
    • Every transistor and its connection to others
    • There are >10 BILLION transistors in a modern state-of-the-art processor

Why Use Abstraction?

  • 10 billion transistors in a processor alone, plus much more in memory, I/O devices, and controllers
  • Impossible to design without abstractions → so the work is divided → SE faces the same complexity problem
    • One group designs the semiconductor process
    • Another group designs several different transistors → needs only some key parameters from the process
    • Another group designs dozens of gates and memory cells → doesn’t need process info, just transistor parameters
    • Computer engineers use gates to design a processor → only deal with gates, memory cells, and their parameters

Lecture 2: System Organization

System Overview

┌─────────────────┐
│       CPU        │
│ ┌─────────────┐  │
│ │ Control Unit│  │
│ └─────────────┘  │
│ ┌─────────────┐  │
│ │     ALU     │  │
│ └─────────────┘  │
│ ┌─────────────┐  │
│ │  Registers  │  │
│ │ ┌──┬──┬──┐  │  │
│ │ │  │  │  │  │  │
│ │ ├──┼──┼──┤  │  │
│ │ │  │  │  │  │  │
│ │ └──┴──┴──┘  │  │
│ └─────────────┘  │
└────────┬─────────┘
         │
═════════╪══════════╪══════════╪══════════╪═══  BUS
         │          │          │          │
    ┌────┴────┐ ┌───┴───┐ ┌───┴───┐ ┌───┴────┐
    │ Memory  │ │ Disk  │ │Printer│ │  ...   │
    └─────────┘ └───────┘ └───────┘ └────────┘
  • CPU → Central Processing Unit (same as processor)
    • Consists of data path and control unit
    • Data path = ALU + register file
    • Control Unit → directs operations
    • ALU → performs arithmetic and logic
    • Registers → small, fast storage (grid of cells)
  • Memory → DRAM + controller, always byte-addressable
  • Bus → a collection of wires for communication between all components
    • Master sends commands/data, slaves respond with data or write
    • Manager/Worker (Master/Slave) protocol → dictates who has access to the bus
    • Wires are the most important part → expensive, and the bottleneck of the computer

I/O Devices

  • Devices are viewed as memory (disk, printer, screen)
    • Why? → They are simple receivers and senders of binary data
    • Read/write data from/to them
    • Each usually has a controller that understands its operation → special-purpose processor
    • CPU is freed to do other things while the controller handles the device

A Simple Data Path

                       Reg_addr A ----┐
                       Reg_addr B ----┐
  Result addr ---┐                    │
                 ▼                    ▼
          ┌──────────────────────┐
          │    Register File     │
          └───────┬────────┬─────┘
                  │ A      │ B
                  ▼        ▼
   Cin ------→┌──────────────┐
  ALU_op ----→│     ALU      │──→ [Z]
              └──────┬───────┘──→ [N]
                     │         ──→ [C]
                     ▼
                   Result
                     │
                     └──────────→ (back to Register File)
  • Register File
    • Inputs (control, dashed): Result addr, Reg_addr A, Reg_addr B
    • Input (data, solid): Result writeback from ALU output
    • Outputs (data): A and B → fed into ALU
  • ALU
    • Data inputs: A (left operand), B (right operand)
    • Control inputs (dashed): Cin (carry in), ALU_op (operation select)
    • Data output: Result → feeds back to register file
    • Flag outputs: Z (zero), N (negative), C (carry)
  • This is a high-level description of the data path → there are circuits underneath that dictate the compute logic
  • ALU takes operands from the register file and writes the result back to a register
  • Thick lines → buses, dotted lines → control wires (controlling what we do, not directly part of the computation)

Data Path Operation

  • Read operands from the register file (A and B)
    • At current rising edge of the clock signal
      • Most machines run off a clock that gives precise timing for when each cycle starts
      • In principle, asynchronous (non-clocked) designs are not actually faster compared to clocked machines
    • Place on buses A and B
  • Specify ALU operation → add, multiply, etc.
  • “Wait for ALU”
    • Designed to fit into 1 clock cycle → no real wait, damn near instant (nanoseconds) → this is the “+1 operation” cost in T(N) analysis
    • Machine is designed around the longest operation
      • Sometimes an operation finishes early but we wait for the clock anyway
  • Write result from bus C to selected register → on the next rising edge
  • Set flags: N, C, Z
    • These are single-bit registers
    • Whatever operation came through the ALU (multiply, add, etc.) sets specific flags based on the output
    • These are the fundamental building blocks that allow conditional operations
  • This is what x86 does when executing ADD R4, R6 or similar RR (register-to-register) operations

Memory View

  • Linear array of words → each word 2 or 4 bytes
  • Memory is byte-addressable, even if typically organized as “words”
┌─────────┬─────────┐
│ Byte 0  │ Byte 1  │  Word 0
├─────────┼─────────┤
│ Byte 2  │ Byte 3  │  Word 1
├─────────┼─────────┤
│         :         │
│         :         │
├─────────┴─────────┤
│                   │  Word N-1
└───────────────────┘

Each row is one word (2 bytes wide)
Word addresses: 0, 1, ..., N-1
Total: N words, 2N bytes

Other Data Path Organizations (Operand Specifiers)

  • x86 (32-bit) and our example use 2-address instructions (2 operand specifiers per instruction)
  • Can be changed to:
    • 3-address → source and destination registers can be distinct
      • R[4] ← R[8] + R[6]
      • More bits needed in each instruction
    • 1-address → one source operand and the destination are predefined
      • Uses a special register called the accumulator
      • Operations are of the form ACC ← ACC + R[6]
      • Price: irregularity → requires an extra step to load/store the accumulator
    • 0-address → both source operands and the destination are predefined
      • Stack machine → special register points to the top of the stack
      • All operands must be pushed/popped on the stack
      • Operations are of the form stack[top] ← stack[top] + stack[top+1]
      • Example: old HP calculators (Reverse Polish Notation)
      • Useful for evaluating deeply nested expressions

Register Transfer Language (RTL)

  • A shorthand for describing data path operations
    • R[4] ← R[4] + R[6]
    • Z ← 1 if R[4] = 0
  • Can describe simultaneous (parallel) operations
    • MDR ← Mem[MAR] ; PC ← PC + 2
  • Useful shorthand for describing register-to-register operations
  • Same notation can be used for control logic

CPU Instruction Execution Cycle

  • CPU executes an instruction in several steps
  • Do forever:
    1. Fetch instruction from memory into Instruction Register (IR)
    2. Increment Program Counter (PC)
    3. Decode the instruction in IR
    4. Fetch operands specified by a field in IR
    5. Execute operation specified by a field in IR
    6. Store result (if necessary) as specified by a field in IR
  • This cycle repeats forever until a halt instruction stops the processor
  • This is the algorithm for the control unit

Execution Cycle in RTL

  1. Instruction FetchIR ← Mem[PC]
  2. Increment Program CounterPC ← PC + 2
  3. Decode → generate all necessary control signals
  4. Fetch OperandsA ← RegFile[IR(src1)], B ← RegFile[IR(src2)]
  5. Execute OperationResult ← ALU(IR(opcode), A, B)
  6. Store ResultRegFile[IR(dest)] ← Result

Instruction Set Architecture (ISA)

  • Consists of
    • Memory model
    • Data types
    • Addressing modes → how instructions specify RAM locations
    • Instructions
    • Instruction format

ISA Details (Assuming Intel x86)

  • Memory model
    • 4GB of memory
    • Byte-addressable
    • Linear array of storage locations
    • 8-byte physical word
  • Registers
    • Eight 32-bit integer registers, some “reserved” for specific tasks (e.g., Stack Pointer)
    • Can be used as 16-bit or 32-bit using different mnemonics
    • Plus special registers → program counter, flags, etc.
    • Some not directly accessible to the programmer (cannot add to program counter directly)
  • Data types
    • Integer, floating point, decimal, ASCII
    • Operations defined on each type

Addressing Modes

  • How instructions specify memory locations
    • Using registers, immediate operands, memory addresses
    • How many operands → 0-, 1-, 2-, and 3-address instructions
  • Instruction format
    • Specifies size and “fields” in the instruction word
    • Typical fields: opcode, source operand specifiers, result specifier

Instruction Design

  • Instructions can be grouped by type
    • Arithmetic → +, -, *, /
    • Logical → AND, OR, NOT, XOR, shifts
    • Control → conditional branches, jumps (change PC)
    • Move → memory to register, register to register
  • x86 instructions have many formats and addressing modes → too many to cover
  • Also has real, virtual, and protected modes → for backwards compatibility, security, etc.

Example x86 Instructions

MOV R1, 4

  • Type: move
  • Destination: R1 (register 1)
  • Addressing mode: immediate
  • The constant 4 requires at least a 4-bit data type

MOV AX, [02D]

  • Type: move
  • Destination: AX (16-bit register; EAX is 32 bits)
  • Addressing mode: direct addressing
  • Address is really DS:[02D]

JMP label1

  • Type: jump
  • Destination: label1
  • Transfers control to the instruction located at label1

Lecture 3: Data Representation & Digital Logic

Data Representation

  • Computers represent everything in binary
    • This includes programs, input/output information, and intermediate variables
  • ASCII table provides mapping of 8 bits (1 byte) to characters
    • 0100 0001 in binary → character “A” in ASCII
  • Binary, octal, or hexadecimal for integers
  • Floating point for real numbers

Binary Numbers

  • Base 2 number system compared to our regular base 10
    • Digits: 0 and 1
    • Positional system → 101 decimal = 100 + 1
    • A binary number 111 means:
      • 1 × 2² + 1 × 2¹ + 1 × 2⁰ = 7 (decimal)

Octal

  • Base 8 system
    • Digits: 0–7
    • 555o means:
      • 5 × 8² + 5 × 8¹ + 5 × 8⁰ = 365 (decimal)
  • Bytes (8 bits) are typically written in octal for convenience

Hexadecimal

  • Base 16 number system
    • Digits: 0–9, A (10), B (11), C (12), D (13), E (14), F (15)
    • ABCh = 2748d
      • 10 × 16² + 11 × 16¹ + 12 × 16⁰ = 2748
  • Bytes (8 bits) are sometimes also written in hex for convenience

Digital Hardware Level

  • Computers are built using gates and memory
    • Simplest building blocks at this level of abstraction → digital logic / computer organization level
  • Gates and memory are made with transistors
    • Can be viewed as electronic switches in digital logic
      • They connect or disconnect based on the input value → gives rise to two-value (binary) system
      • Another abstraction — reality is much more complex
    • Transistors are the building blocks at the hardware level
  • Information values are voltage values
    • +3.3V → “One”, 0V (ground) → “Zero”
      • aka “high” and “low” (T/F) values or states
  • Digital circuits can only be in one of these two states

CMOS Transistors

  • Don’t have to know much about this — all it does is open and close

  • Gates implement basic boolean functions such as AND, OR, NOT
  • Implemented using two voltage “rails” → High and Low
  • The output connects to one rail at a time, giving it a desired value
    • Input values A, B determine which rail
  • Consider an AND gate:

  • AND function → SERIES connection to High value
  • OR function → PARALLEL connection to High
  • Must always be connected to one or the other (high/low), otherwise electronics fails

CMOS AND Gate

571

  • Out connects to High only when BOTH A and B are High (N-MOS transistors)
  • Out connects to Low when either A or B (or both) are Low (P-MOS)

Truth Table Walkthrough

  • We just made a digital gate — what is its function?
    • A=Low, B=Low → Out=Low
    • A=High, B=Low → Out=Low
    • A=Low, B=High → Out=Low
    • A=High, B=High → Out=High

Logic Gates

  • A gate has one or more inputs and one output
  • Standard symbols represent common gates
  • Logically, a gate represents a Boolean function on input variables
  • All variables are Boolean (binary) values
  • A truth table lists all possible combinations of values of the input variables

Boolean Algebra

  • Behavior of gates can be described with Boolean Algebra
    • Abstracts away hardware characteristics
  • An algebra → a set of values and operations with certain properties
    • Similar to a data type in a programming language
  • Boolean Algebra:
    • {0, 1}, OR, NOT
      • Values are 0, 1; operations are OR and NOT
        • A computer can be built using just these two
    • Typically use additional operators: AND, etc.

Boolean Operators

  • NOT (negation, complement, inverse)
    • Y = NOT(X) = X’
  • OR (union, disjunction, logical sum)
    • Y = X + Z = X OR Z
  • AND (conjunction, logical multiplication)
    • Y = X · Z = X AND Z

Logic Gate (AND) with Truth Table

Logic Gate (XOR) — Exclusive OR with Truth Table

One-Bit Adder

Boolean Functions

  • Used to describe most of the data path elements
    • Out = F(IN₀, IN₁, …, INₙ₋₁)
      • n-input functions with 1 output
      • Both inputs and output are 1-bit binary variables
  • Composable from NOT, AND, OR, etc.
    • Each of those is itself a Boolean function
    • Even just OR + NOT (NOR) is sufficient for everything
      • Or AND + NOT (NAND)
  • Boolean Algebra provides:
    • Mathematical foundation
    • Theorems for transforming functions → obtaining a different but equivalent function
      • Equivalent → same inputs produce same output value

Symbols and Terminology

  • AND → often shown as · (multiply), A · B
    • Or even simply AB
    • These are called product terms (Π)
  • OR → often shown as + (sum), A + B
    • Or even Σ
      • ΣΠ → a sum of products
  • NOT → shown with a bar or accent over a symbol
    • NOT(A) = Ā = A’

Boolean Operator Precedence

  • NOT → highest

  • AND

  • OR → lowest

  • Thus:

    • NOT A AND B = (NOT A) AND B
    • A OR B AND C = A OR (B AND C)

Function Description

  • A function can be described by a truth table
    • Unlike functions in “normal” algebra
    • Because Boolean algebra has a small set of inputs
      • 4-input function → 2⁴ possible input combinations → can just enumerate them
  • From truth table → obtain a canonical form
    • A sum of product terms which make output = 1
      • Each term shown with either true or complement of inputs
        • A’BC’ → means Out = 1 when A = 0, B = 1, C = 0
      • Sum is a big OR gate

Example: XOR Gate

  • Product terms that make output = 1:
    • A’ · B → AND of (NOT A) and B
    • A · B’ → AND of A and (NOT B)
  • Sum of products:
    • Out = A’ · B + A · B’

Design Algorithm for Boolean Functions

  • Write a specification for a Boolean function → what needs to be implemented
  • Fill in the truth table
  • Write the sum of product form
    • Only for each term that has a 1 in the output column
  • Draw the logic gates needed:
    • Draw inverters for inputs that need a complement
    • Draw an AND gate for each product term used
    • Connect (wire) inputs to the AND gates
    • Draw an OR gate for the sum
    • Wire outputs of AND gates to the OR gate

Discussion 1: Introduction to MIPS Assembly

  • High-Level Language → (compiler) → Assembly → (assembler) → Machine Code → CPU

A Typical Computer Architecture

  • CPU
    • Registers → 1, …, $31
    • ALU → +, -, ·, /, <<, >>
  • Between the two, addresses come into memory; data and instructions come out of memory
  • Memory → object code, program data

Registers

  • CPU needs internal fast storage to store intermediate computation results and frequently used data
    • Registers are faster than memory
  • MIPS has 32 general-purpose registers, each 32 bits wide
    • Called a 32-bit architecture
  • MIPS registers can be identified by their index → 0, 31)

MIPS Register Set

NameRegister #Usage
$00The constant value 0
$at1Assembler temporary (reserved)
v12-3Function return values
a34-7Function arguments
t78-15Temporaries (not preserved across function calls)
s716-23Saved variables (preserved across calls)
t924-25More temporaries
k126-27OS temporaries (reserved for OS)
$gp28Global pointer
$sp29Stack pointer
$fp30Frame pointer
$ra31Function return address

Operands: Registers

  • Registers used for specific purposes (Register Convention):
    • a3 → used to pass arguments into subprogram
    • v1 → used for return values for subprogram OR system services (syscall)
    • t9 → used to hold intermediate values; values can change when a subprogram is called
    • s7 → used to hold variables; values are maintained across subprogram calls
  • Special registers other than 31:
    • pc, lo, hi

Memory

  • From the programmer’s point of view, memory is a contiguous array of bytes (8-bit data values)
  • The index of each element in the array is called an “address”
  • For a 32-bit processor, the address range is from 0 to 2³² − 1
    • 2³² bytes = 4 GB
  • In MIPS, it is only possible to read/write from/to memory 1/2/4 bytes in a single operation
    • 1 word = 4 bytes
    • 1 half word = 2 bytes

Instructions with Registers (ADD)

  • C code: a = b + c
  • MIPS Assembly:
# $t0 = a, $t1 = b, $t2 = c
add $t0, $t1, $t2

MIPS Instructions

  • MIPS instructions have the following format:
    • [label:] operation [operands] [#comment]
iamalabel:
    add $t1, $t2, $t3    # this is a comment
  • Labels are optional → instead of counting lines, you give a location a name
  • Operations → add, sub, addi, sll, beq, etc.
  • Operands specify the data required by the operation
    • Can be register names, constant numbers, or memory locations/addresses (where you put labels)
    • The instruction format depends on the operation

MIPS Program Template (.asm vs .c)

###### Data segment ######
.data
# define program data such as arrays here
array_a: .space 40
 
###### Code segment ######
.text
# define other functions if any
foo:
    # foo function entry … logic …
main:
    # main program entry
    # … program logic, calling other functions…
    li $v0, 10       # Exit program
    syscall          # using a system call
/****** Data segment ******/
int array_a[10];
 
/****** Code segment ******/
int foo(){…}
int main(){
    // main program entry
    // … program logic, calling other functions…
    return 0;        // Exit program
}
  • Data Directive
    • Defines the data segment of a program containing data
    • The program’s variables should be defined under this directive
  • Text Directive
    • Defines the code segment of a program containing functions and assembly instructions
      • To define main as the entry point of the program:
        • In MARS, toggle Settings → Initialize Program Counter to global main if defined
        • Place .globl main under .text

Instruction Types

  • Arithmetic instructions
  • Logical instructions
  • Data movement instructions
    • Register ↔ register
    • Memory ↔ register (load/store)
  • System calls
  • Comparison instructions
  • Control transfer instructions
    • Unconditional (jump)
      • Plus stack pointer (i.e. for function calls)
    • Conditional (branch)

Arithmetic Instructions

add $dst, $src1, $src2     # $dst = $src1 + $src2
sub $dst, $src1, $src2     # $dst = $src1 - $src2
 
mult $src1, $src2          # {hi, lo} = $src1 * $src2
div $src1, $src2           # lo = $src1 / $src2
                           # hi = $src1 % $src2
  • Examples:
add $t3, $t1, $t2          # $t3 = $t1 + $t2
sub $t3, $t1, $t2          # $t3 = $t1 - $t2
mult $t1, $t2              # {hi, lo} = $t1 * $t2

Arithmetic Instructions (Immediate)

addi $dst, $src1, $imm     # $dst = $src1 + $imm
 
addi $t3, $t1, 5           # $t3 = $t1 + 5
addi $t3, $t1, -20         # $t3 = $t1 + (-20)
 
addi $t3, $t1, $t2         # WRONG! (can't use register as immediate)
add  $t3, $t1, $t2         # CORRECT!

Bit-Wise Logical Instructions

Data Movement Instructions

  • Move (copy) value between registers:
    • add $dst, $src, $zero → e.g. add $t1, $t0, $zero
    • move $dst, $src → e.g. move $t1, $t0
  • Load value → from a memory address into a register
  • Store value → from a register into a memory destination

MIPS Pseudo-Instructions

  • Pseudo-instructions are provided by the assembler but do NOT have the operation implemented in hardware
  • They get translated into multiple simple instructions
  • Example: move $t0, $t1
    • Translates into → addu $t0, $t1, 0
    • Other possible translations:
      • add $t0, $t1, $0
      • or $t0, $t1, $0

Example of MIPS Pseudo-Instruction

  • mult (basic instruction)
    • mult $t0, $t1 → hi, lo = t1
  • mul (pseudo-instruction)
    • mul $dst, $src1, $src2
    • Translates into:
      • mult $src1, $src2
      • mflo $dst

System Calls

  • MIPS provides a special syscall instruction to obtain services from the operating system
    • Many services are provided in the MIPS simulators → reading/writing files, console
  • To use syscall system services:
    1. Load the service number in register $v0
    2. Based on the spec, load argument values into register $aX
    3. Launch the syscall instruction
    4. Retrieve return values, if any, from result registers
  • Example → end program:
li $v0, 10
syscall

MIPS Instruction Set References

MIPS Emulator

Discussion 2: Bit Masking, Multiply by Shift-and-Add, Conditional Statements

Bit Masking

  • Use bitwise logical instructions to:
    • Set bit values to 1
    • Clear bit values to 0
    • Query bit values
  • Mask → a special binary number used together with logical instructions to set, clear, or query specific bits

Setting Bits (OR)

  • OR truth table
  • OR with 0 → no change
  • OR with 1 → result is 1

Clearing Bits (AND)

  • AND truth table
  • AND with 0 → result is 0
  • AND with 1 → no change

Querying Bits (AND)

  • Use bitwise AND to query specific bits
    • Clear all bits except the ones you want to query
  • Example: query the value of bit 3
  • In assembly (assume $t0 has the input number)

Example: Clearing/Querying Multiple Bits

  • Example → query odd bits of an integer (= clear even bits of an integer)
    • Use AND operation
      • AND with 0 → 0
      • AND with 1 → no change
    • Mask will be 32 bits wide
    • Use 1010_1010_1010_1010_1010_1010_1010_1010 as mask
      • 0 for bits we want to clear
      • 1 for bits that need to keep their value
      • 0xAAAAAAAA in hex OR 2863311530 in decimal
        • You can’t write binary literals in MIPS assembly
  • In assembly:
li $t1, 0xaaaaaaaa
and $t0, $t0, $t1

Multiplication & Division (MIPS)

  • Special registers → lo, hi
  • 32 × 32 multiplication, 64-bit result
    • mult $t0, $t1 → result in {hi, lo}
  • 32-bit division → 32-bit quotient, 32-bit remainder
    • div $t0, $t1 → quotient in lo, remainder in hi
  • Moves from lo/hi special registers
    • mflo $t2
    • mfhi $t3
  • Or use mul $t2, $t0, $t1 (pseudo-instruction)

Sequential Binary Multiplication

         1 0 1 1   (multiplicand = 11)
       × 1 1 1 0   (multiplier   = 14)
       ==========
         0 0 0 0
       1 0 1 1
     1 0 1 1
 + 1 0 1 1
 ==================
   1 0 0 1 1 0 1 0  (product = 154)

Multiply by Shift-and-Add

  • Binary number 0000_0001_0010 → position values are 2ⁱ (128, 64, 32, 16, 8, 4, 2, 1)
    • Digit × position value → 0 + 0 + 0 + 16 + 0 + 0 + 2 + 0 = 18
  • If we shift the number 1 bit to the left (<< 1) → 0000_0010_0100
    • Now: 0 + 0 + 32 + 0 + 0 + 4 + 0 + 0 = 36
    • Shifting one bit left doubles the number (18 → 36)
  • Key insight:
    • Shifting an integer by n bits to the left = multiplying by 2ⁿ
    • Shifting an integer by n bits to the right = dividing by 2ⁿ
  • How to multiply $t2 by 2 using shift in assembly → sll $t2, $t2, 1
  • Multiply by 16 → 16 = 2⁴ → sll $t2, $t2, 4
  • Multiply by 18 → 18 = 16 + 2 → shift-and-add both

General Case: Multiply N by M (non-power of two)

  • Decompose M into sum of powers of two (use binary representation)
  • Multiply N by each component using shift
  • Add all partial results together
  • Example: 13 · N
    • 13 = 8 + 4 + 1 (binary: 0000_1101)
    • 13·N = (8 + 4 + 1)·N = (N<<3) + (N<<2) + N

Assembly Control Flow

  • Assembly instructions execute sequentially
  • Program Counter ($pc) → special register pointing to the next instruction to execute
  • Default behavior: PC ← PC + 4
  • Memory layout example:
Address        Instruction
0x0040_0000    addi $t0, $t1, 1
0x0040_0004    and  $t2, $t1, $t0
0x0040_0008    move $t3, $t2
...

Changing the Execution Flow

  • Use jump instructions to skip a block of code
  • Equivalent to changing $pc to something other than PC + 4
  • Example: jump from addi $t0, $t1, 1 directly to move $t3, $t2, skipping the and in between
  • Need to know either:
    • The destination address, OR
    • The relative offset
  • Two flavors: unconditional jump or conditional branch

Unconditional Jumps

  • j label → jumps to first instruction after the label
    • PC ← coded 26-bit address of label
  • jal label → jump and link
    • Link → next instruction address after jal is loaded into $ra
    • Used for function calls (future discussion)
  • jr $reg → jumps to address in $reg
    • PC ← $reg
    • e.g. jr $ra

Conditional Branches

  • Compare on:
    • Equality/inequality of two registers
      • beq $t0, $t1, target → branch to target if $t0 == $t1
    • >, <, >=, <= of a register and 0
      • bgtz $t0, target → branch if $t0 > 0
ConditionInstruction
equalitybeq
inequalitybne
> 0bgtz
< 0bltz
≥ 0bgez
≤ 0blez

slt Instruction

  • slt $dst, $s, $t → if $s < $t, set $dst = 1, else $dst = 0
  • Used to build pseudo-instructions for comparing two registers:
Pseudo-instructionSyntaxExpansion
branch if greater thanbgt $s, $t, Cslt $at, $t, $s / bne $at, $zero, C
branch if less thanblt $s, $t, Cslt $at, $s, $t / bne $at, $zero, C
branch if greater than or equalbge $s, $t, Cslt $at, $s, $t / beq $at, $zero, C
branch if less than or equalble $s, $t, Cslt $at, $t, $s / beq $at, $zero, C

If Statements

  • Python:
if condition:
    statements
  • Statements enclosed by if only execute when the condition is true
  • Example:
if x > 5:
    z = 7
x = x + 1
  • Steps to execute:
    1. Evaluate the condition x > 5
    2. If false → skip z = 7 and continue from x = x + 1
    3. If true → execute z = 7, then continue from x = x + 1

Compiling If Statements

  • Use a jump/branch to skip a block of code
  • Template:
if condition:
    <<statements>>

translates to:

    test NEGATED condition
    if true → jump to EndOfIf
    <<statements>>
EndOfIf:
  • Negated condition being true = original condition being false
  • Example: if (a < b) a++
move $t0, $a0
move $t1, $a1
bge $t0, $t1, EndOfIf   # bge is the complement of "<"
addi $t0, $t0, 1
EndOfIf:

If…Else Statement Translation

  • Template:
if (condition):
    THEN block
else:
    ELSE block

translates to:

    test NEGATED condition
    bxx ElseCode
    <<THEN block>>
    j EndOfIf
ElseCode:
    <<ELSE block>>
EndOfIf:
  • Example: max = (a > b) ? a : b; max = max + 2
# $a0 = a, $a1 = b, $t2 = max
move $t0, $a0
move $t1, $a1
ble $t0, $t1, ElseCode
    move $t2, $a0
    j EndOfIf
ElseCode:
    move $t2, $t1
    # no need for j here → falls through
EndOfIf:
    addi $t2, $t2, 2

Complex If Statements: AND (&&)

  • if (a == b && a > c) { <<stmts>> } is equivalent to nested ifs:
if (a == b) {
    if (a > c) { <<stmts>> }
}
  • Assembly:
move $t0, $a0   # a
move $t1, $a1   # b
move $t2, $a2   # c
 
bne $t0, $t1, end_if
ble $t0, $t2, end_if
<<statements>>
end_if:

Complex If Statements: OR (||)

  • if (a == b || a > c) { <<stmts>> } is equivalent to:
if (a == b) {
    <<stmts>>
} else {
    if (a > c) { <<stmts>> }
}
  • Assembly:
move $t0, $a0   # a
move $t1, $a1   # b
move $t2, $a2   # c
 
bne $t0, $t1, else_code
<<statements>>
j end_if
else_code:
    ble $t0, $t2, end_if
    <<statements>>
end_if:
  • Alternative method (branch into the if body):
beq $t0, $t1, if_code
bgt $t0, $t2, if_code
<<rest of program>>
# NOTE: add j to skip around if_code
if_code:
<<statements>>
# NOTE: add j instructions as needed to control flow

Sample Problem: Implement max

  • int max(int a, int b, int c) → returns the maximum of three ints
  • C implementation:
int max(int a, int b, int c) {
    int max = a;
    if (b > max) max = b;
    if (c > max) max = c;
    return max;
}
  • Assembly (params in $a0, $a1, $a2, return in $v0):
# int max = a
move $t0, $a0
# if (b > max) max = b
ble $a1, $t0, check_c
move $t0, $a1
check_c:
# if (c > max) max = c
ble $a2, $t0, ret_max
move $t0, $a2
ret_max:
move $v0, $t0

Lab1 / Register Convention Notes

  • Do NOT use $sX (saved) registers in Lab 1 unless you follow the register convention
    • Register convention → $sX values must be preserved across function calls
    • If you modify $sX, you must restore it before the callee exits
  • Use temporary registers $t0$t9 instead
    • More than enough for Lab 1
    • Tip: reuse a $t register once its value is no longer needed
  • All data in hardware is binary; hex/decimal are just different views
    • 14 → 0x0E00001110
  • You can’t write binary literals in MIPS assembly
    • YES → li $t0, 10 or li $t0, 0xA
    • NO → li $t0, 00000000000000000000000000001010

Lecture 4: Boolean Algebra & Combinational Logic

Continuing from Lecture 3

Laws of Boolean Algebra

  • True for all x, y in B = ({0, 1}, +, ·)
  • Shown mostly for the AND form → similar laws exist for OR
  • Identity Law → 1 · A = A
  • Null Law → 0 · A = 0
  • Commutativity → x · y = y · x
  • Associativity → (x · y) · z = x · (y · z)
  • Distributivity → x · (y + z) = x · y + x · z
  • Unique Complement → x + x’ = 1; x · x’ = 0
  • Idempotent Law → x · x = x
  • Absorption → x + x · y = x; x · (x + y) = x
  • De Morgan’s → (x + y)’ = x’ · y’
  • Simplification → x + x’y = x + y; x · (x’ + y) = x · y

Using Boolean Laws for Transformation

  • These laws are used to transform functions to a different but equivalent form
  • Example: transforming XOR to NAND-gate equivalent
    • A ⊕ B = A’B + AB’
    • = NOT(NOT(A’B + AB’))
    • = NOT(NOT(A’ · B) · NOT(A · B’))
      • → NAND-gate equivalent

More Complex Boolean Functions

  • Multi-output functions → build a separate boolean function for each output
  • So far only talked about functions with 1-bit inputs → how to build functions of N-bit variables?
  • Consider Out[3:0] = In_A[3:0] AND In_B[3:0]
    • Defined as Out[i] = In_A[i] AND In_B[i], for 0 ≤ i ≤ 3
    • This is true for all N-bit logical functions
  • Construct an N-bit function by:
    • Designing each bit position individually
    • Considering any communication between bits
      • e.g. adder has carry_in and carry_out in each position

More Complex Components

  • Very common logic elements → built from basic AND, OR, NOT gates
  • Multiple gate-level implementations possible for the same “composite” logic element
  • What counts as “composite” depends on the underlying hardware

Composite Gates

  • From the standpoint of Boolean Algebra:
    • NAND and NOR
      • In some technologies NAND is a basic gate → depends on HW implementation
    • XOR
  • For chip design, engineers get a library with a variety of different gate types and sizes
    • But it is a hardware compiler that actually uses them
    • Designs are written in VHDL (Very High-level Definition Language)
    • “Compiled” by hardware compiler or synthesizer

Multiplexor (MUX)

  • A boolean function → multiple inputs, control inputs, one output
  • Selects one of the many inputs as its output depending on the value of the control inputs
  • n-to-1 multiplexor → n inputs and 1 output
    • n is typically 2, 4, 8, …
  • Number of control inputs = log₂(n)
  • Standard symbol and truth table:

MUX Details

  • The diagram above is a truth table for a 2-to-1 MUX
  • Can be simplified using boolean algebra
  • Implementation: combine AND, OR, NOT gates to match the truth table expression

Encoders

  • 2ⁿ inputs and n outputs
  • Function table for an 8-input encoder → 8 inputs and 3 outputs

Discussion 3: Loops

Translation of If Statements

  • C code:
if (condition) {
    <<statements>>
}
  • Translated to:
    test for negated condition
    bXX EndOfIf       # if negated condition true → skip body
    <<statements>>
EndOfIf:
  • Negated condition being true ≡ original condition being false → skip the body

Translation of While Loops

  • C code:
while (condition) {
    <<loop body>>
}
  • Template:
BeginningOfWhile:
    test for negated condition
    bXX EndOfWhile
    <<loop body>>
    j BeginningOfWhile
EndOfWhile:

Example: While Loop Translation

  • C code:
i = 0;
sum = 0;
while (i < n) {
    sum += i;
    i++;
}
  • MIPS assembly ($t0 = i, $t1 = sum, $t2 = n):
BeginningOfWhile:
    bge  $t0, $t2, EndOfWhile   # if i >= n, exit loop
    add  $t1, $t1, $t0          # sum += i
    addi $t0, $t0, 1            # i++
    j    BeginningOfWhile
EndOfWhile:

For Loops in C

  • General form:
for (<<init>>; <<condition>>; <<update>>) {
    <<loop body>>
}
  • Equivalent while-loop form:
<<init>>
while (<<condition>>) {
    <<loop body>>
    <<update>>
}

Translation of For Loops

  • Rewrite as while loop first, then apply while-loop template
  • Pattern:
<<init>>
BeginningOfWhile:
    test for negated condition
    bXX EndOfWhile
    <<loop body>>
    <<update>>
    j BeginningOfWhile
EndOfWhile:

Example: For Loop Translation

  • C code:
sum = 0;
for (i = 0; i < n; i++) {
    sum += i;
}
  • MIPS assembly ($t0 = sum, $t1 = n, $t2 = i):
li   $t0, 0            # sum = 0
# assume $t1 = n
li   $t2, 0            # i = 0
BeginningOfWhile:
    bge  $t2, $t1, EndOfWhile
    add  $t0, $t0, $t2     # sum += i
    addi $t2, $t2, 1       # i++
    j    BeginningOfWhile
EndOfWhile:

Lecture 5: Composite Logic Elements & ALU

Announcements

  • Midterm Thursday April 30th, IN CLASS

Midterm Topics

  • ISA
  • Assembly
  • Binary numbers and arithmetic
  • Boolean functions
  • Gate-level design (mux, decoder, half-adder, full-adder, etc.)
  • Transistor implementation of a gate
  • One’s and two’s complement and arithmetic
  • Anything covered in class, lab, or discussion by April 23rd inclusive
  • (Read the book), look at lecture slides, lab/discussion notes
    • Difference with the book is the transistor design used
  • Any material on the above topics covered in assignments OR class slides OR lab OR discussions can be on the test
  • One page, double-sided of notes IS allowed
  • NO calculators, NO books

Outline

  • More composite logic elements → decoder, comparator, shifter
  • 2’s complement arithmetic (covered in discussion)
  • Adders → half adder, full adder
  • ALU
  • Reading: Sec. 3.2.2–3.2.3 and Fig. 4-2 for ALU functional description (except Programmed Logic Arrays)

Where We Are

  • So far → gate-level design for boolean functions, plus some composite gates
  • Now → more composite gates + how to assemble an ALU

Decoders

  • n inputs and 2ⁿ outputs
  • Function table for an 8-bit decoder → 3 inputs and 8 outputs
  • Exactly one output line is high, determined by the binary value of the inputs

Comparator

  • Boolean function, 2 inputs, 1 output
  • Out = 1 when two inputs are equal, 0 otherwise
  • XOR gate behavior:
    • Out = 0 if inputs are both 0 or both 1
    • Out = 1 if one input is 1 and the other is 0
  • 1-bit comparator → NOT(XOR(A, B))
  • 2-bit comparator → NOT(OR(XOR(A1, B1), XOR(A0, B0)))
    • Only 1 when every bit pair matches

Shifters

  • Shift-by-one circuits → n-bit input shifted by one bit to the left or right depending on a 1-bit control input
  • Example: figure 3-17, page 96
  • Implemented with other gates:
    • Use AND to “disable” an input → requires a control input
    • Use OR to combine the different possibilities (shift left vs shift right vs no shift)

Signed Arithmetic (details in discussion)

  • So far only dealt with unsigned numbers → positive by default
  • Need a representation for negative numbers:
    • Sign-magnitude → devote one bit to sign
    • One’s complement
    • Two’s complement
  • Only 2’s complement is used for integer arithmetic
    • Others used in floating-point representation
  • Subtraction = addition of 2’s complement of subtrahend
  • Bit n−1 is the sign bit → number is positive if this bit is 0, negative otherwise

Half Adder

  • Circuit with 2 binary inputs and 2 outputs → generates sum and carry out
  • Called “half” because there is no carry input
  • Truth table:
ABSumCarry
0000
0110
1010
1101
  • Boolean equations:
    • Sum = A XOR B
    • Carry = A AND B

Full Adder

  • Full adder has an additional carry input (Cin) → 3 inputs (A, B, Cin), 2 outputs (Sum, Carry)
  • Sum minterms → Sum = A'B'Cin + A'BCin' + AB'Cin' + ABCin = Σ(M1, M2, M4, M7)
  • Carry minterms → Carry = A'BCin + AB'Cin + ABCin' + ABCin = Σ(M3, M5, M6, M7)
    A  B  Cin | Sum Carry   Min
    ----------|-----------  ---
    0  0  0   |  0   0       0
    0  0  1   |  1   0       1
    0  1  0   |  1   0       2
    0  1  1   |  0   1       3
    1  0  0   |  1   0       4
    1  0  1   |  0   1       5
    1  1  0   |  0   1       6
    1  1  1   |  1   1       7

Arithmetic Circuit Summary

  • Half adder → 2-bit input, 2-bit output
  • Full adder → 3-bit input, 2-bit output

Ripple Carry Adder

  • How to add multi-bit numbers → chain full adders together
  • Carry out of bit i feeds into carry in of bit i+1 → carry “ripples” through
     An-1    Bn-1         A1   B1       A0    B0
      |       |            |    |        |    |
      v       v            v    v        v    v
   +-------+            +-------+      +-------+
   |  FA   |  <- ... <- |  FA   | <--  |  FA   | <- Cin (usually 0)
   +-------+            +-------+      +-------+
    |   |                |    |         |    |
    v   v                v    v         v    v
   Cout Sn-1             C1   S1        C0   S0

Subtraction

  • How to implement subtraction → add the 2’s complement of the subtrahend
  • Steps:
    • Invert bits of subtrahend (1’s complement)
    • Add 1 using the least-significant carry-in → free, since LSB has an unused Cin input
  • How to detect overflow → watch carry in and out of the MSB (Most Significant Bit)
    • Also an LSB (Least Significant Bit)
  • Details covered in discussion

Hardware Peculiarities — Why We Need a Mux

  • Consider:
if (A == B)
    C = D & E;
else
    C = D | F;
  • Software executes either AND or OR, not both
  • Hardware is always active → both AND and OR generate outputs whether we need them or not
  • Solution → select the desired output
    • Outputs not selected are forced to 0 via control signals
    • Selected output passes through (1 · OUT)
    • All outputs are OR’d together → but only one is actually “passed through”
  • Control input is typically generated by the control unit
  • This 3-gate (4 counting the NOT) select structure is a multiplexor!

ALU (Arithmetic Logic Unit)

  • A functional unit that performs:
    • Arithmetic operations → ADD, SUB, MPY
    • Logical operations → AND, OR, XOR, NOT
  • Operates on data types → 8-, 16-, 32-, or 64-bit values
  • Inputs: A (n bits), B (n bits), Cin, M (mode select), S1 S0 (operation select)
  • Outputs: F (n bits), Cout
           A[n-1..0]    B[n-1..0]
              |            |
              v            v
          +----------------------+
   Cin -->|         ALU          |--> Cout
          +----------------------+
              |        ^   ^  ^
              v        |   |  |
           F[n-1..0]   M   S1 S0
                       (mode) (op select)

ALU Bit Slice

  • Build the ALU from n identical bit slices, each handling one bit position
  • Slice i inputs → Ai, Bi, Ci (carry in), M, S1, S0
  • Slice i outputs → Fi, Ci+1 (carry out to next slice)
       Ai  Bi   M S1 S0
        |   |    | | |
        v   v    v v v
     +----------------+
Ci ->|    Slice i     |-> Ci+1
     +----------------+
            |
            v
            Fi

ALU Design → Logic Unit + Arithmetic Unit + Mux

  • Each bit slice contains:
    • Logic unit → computes logical operations on Ai, Bi
    • Arithmetic unit → computes arithmetic operations on Ai, Bi, Ci
    • Mux (controlled by M) → picks logic or arithmetic result for Fi

Logic Unit

  • Controlled by S1, S0 → selects one of 4 logical operations:
S1S0Fi
00Ai
01NOT(Ai)
10Ai XOR Bi
11Ai XNOR Bi

Arithmetic Unit

  • Built around a full adder with preprocessed inputs X and Y based on S1, S0:
S1S0XY
00A0
01A’0
10AB
11A’B
  • Implementation tricks:
    • X = S0 XOR A → flipping A when S0 = 1 gives A’ (one’s complement)
    • Y = S1 · B → masks B with S1
  • Combined with a Cin of 1 when inverting → gives 2’s complement, enabling subtraction

Packaging Gates

  • Many ways to package gates into integrated circuits (ICs)
  • Different scales of integration:
    • SSI (Small-Scale Integrated) → e.g., TI SN74 series → 4 gates per IC
    • MSI (Medium-Scale Integrated)
    • LSI (Large-Scale Integrated)
    • VLSI (Very Large-Scale Integrated) → e.g., Pentium 4 with several million “gates”
  • Different package types → DIP, SSOP, quad flatpack, etc.

SSI: Small-Scale Integration (DIPs)

  • DIP = Dual Inline Package
  • 14 to 64 pins per package
  • Typical contents → 6 INV, or 4 2-input gates, or 3 3-input gates, or 2 complex gates
  • Example: 7400 → 3/4 in × 1/3 in, 14 pins
          +---U---+
       1 -|       |- 14  VCC
       2 -|       |- 13
       3 -|  7400 |- 12
       4 -|       |- 11
       5 -|       |- 10
       6 -|       |-  9
  GND  7 -|       |-  8
          +-------+

Discussion 4: ASCII & Memory Access

Representation

  • Data in a computer is just binary (0s and 1s)
  • Same 32-bit pattern 1111…11111110 can mean different things:
    • Decimal (signed): -2
    • Hex: 0xFFFFFFFE
    • As four bytes: four separate characters
  • Characters/strings → ASCII → 1 byte per character

ASCII (American Standard Code for Information Interchange)

  • Each text character has a unique 8-bit value
  • Examples: S = 0x53, a = 0x61, A = 0x41
  • Lower-case and upper-case differ by 0x20 (32)
    • A → a: +32
    • a → A: −32
  • Other encodings exist (UTF-8, Unicode) but ICS 51 sticks with ASCII

Strings

  • Strings = arrays of characters in consecutive bytes of memory
  • Variable length → need a way to mark the end
  • Convention: NULL-terminated with '\0' (0x00)
  • "Hello!"0x48 65 6C 6C 6F 21 007 bytes, not 6 (the \0 counts)

CPU ↔ Memory Interaction

  • Initially everything lives in memory: object code (text) + program data
  • Registers are much faster storage but there are only 32 of them
  • CPU sends address → memory returns data/instruction
   CPU                            Memory
 ┌──────┐    addresses ───→     ┌─────────┐
 │ ALU  │    ←─── data ───→     │ Object  │
 │ Regs │    ←── instr  ───     │  code + │
 └──────┘                        │  data   │
                                 └─────────┘

What’s in Memory

  • Text → instructions
  • Data
    • Global/static → allocated before program begins
    • Dynamic → allocated at runtime (heap, stack)
  • Memory size: at most 2³² = 4 GB → addresses 0x00000000 to 0xFFFFFFFF

Memory is Byte-Addressable

  • From the programmer’s POV memory is a giant array of bytes (8-bit elements)
  • Each byte has its own address: 0, 1, 2, 3, …
  • This is true for both MIPS and x86

MIPS Memory Map

0xFFFFFFFC ┌──────────────┐
           │   Reserved   │
0x80000000 ├──────────────┤
0x7FFFFFFC │   Stack      │ ← $sp, $fp  (grows down)
           │      ↓       │
           │ Dynamic Data │
           │      ↑       │
0x10010000 │   Heap       │ ← $gp
0x1000FFFC ├──────────────┤
           │  Static Data │
0x10000000 ├──────────────┤
0x0FFFFFFC │     Text     │ ← $pc, $ra
0x00400000 ├──────────────┤
0x003FFFFC │   Reserved   │
0x00000000 └──────────────┘
  • C example mapping:
    • int f, g, y; int k[10]; (globals) → Static Data
    • int x[10]; (local), return addr → Stack
    • malloc(...)Heap (Dynamic)
    • li $t0, 2, add $v0, $t2, $t3 (code) → Text

Arrays (HLL)

  • Same-type objects stored contiguously under one name
  • Syntax in C: Type Name[Size]
    • int A[10] → 4 bytes/int × 10 = 40 bytes
    • char B[20] → 1 byte/char × 20 = 20 bytes
  • Can even be an array of arrays (e.g. char* sports[5] → array of strings)

Declaring Globals/Arrays in MIPS Assembly

  • Goes in the .data segment only
  • Format: label: data_type comma-separated-list
  • Common data types:
    • .asciiz → null-terminated ASCII string
    • .word → 32-bit integers
    • .byte → 8-bit chars or short ints
    • .space N → reserve N bytes (uninitialized)
  • Memory is word-aligned → 32-bit words must start at addresses divisible by 4
    • Faster/simpler for the processor
    • Mostly transparent to HLL/compiler

1D Array Layout Examples

  • char foo[5] = {'1','2','3','4','5'}; → MIPS: foo: .byte '1','2','3','4','5' → 5 bytes
  • char foo[5] = {1,2,3,4,5}; → MIPS: foo: .byte 1,2,3,4,5 → still 5 bytes, but stores raw values 1..5 (NOT ASCII codes for ‘1’..‘5’)
  • int foo[5] = {16, 2, 77, 40, 12071}; → MIPS: foo: .word 16,2,77,40,12071 → 4 bytes × 5 = 20 bytes

Lab 1 Data Section (preview)

  • Mixes all three flavors:
    • .asciiz → strings (new_line, space, triple_range_lbl, …)
    • .word → arrays of ints (triple_range_test_data, swap_bits_test_data, …)
    • .byte → arrays of chars (hex_digits: .byte '0','1',…,'F')
  • In MARS, the data segment view can toggle between ASCII and hex display

Pointers in C → MIPS Equivalents

  • C & (get address) → MIPS la (load address)
    • ptr = &x;la $reg, var_name
    • e.g. la $t0, hex_digits
  • C * (dereference, get value) → MIPS lw/lb/lbu (read) and sw/sb (write)
    • int x = *ptr;lw $dst, imm($reg)
    • *ptr = 5;sw $src, imm($reg)
    • $reg holds the address; imm($reg) = address + offset

Loading From Memory

  • lw $dst, imm($reg) → load a 32-bit word
  • lb $dst, imm($reg) → load a byte, sign-extend into the upper bits (use for 2’s complement)
  • lbu $dst, imm($reg) → load a byte, zero-extend into the upper bits (use for unsigned, e.g. pixels 0–255)

Writing to Memory

  • sw $src, imm($reg) → store a word from the full register
  • sb $src, imm($reg) → store the least significant byte of $src

Example: lb vs lw

hex_digits: .byte 'a','b','c','d'    # 0x61, 0x62, 0x63, 0x64

la $t0, hex_digits
lb $t1, 1($t0)   # → $t1 = 0x00000062     ('b')
lw $t2, 0($t0)   # → $t2 = 0x64636261     (little-endian: 'd''c''b''a')

Size Directives + Endian-ness

  • Setup: my_int: .word 0xDDCCBBAA, la $t0, my_int (assume $t0 = 500)
  • Little-endian layout (MARS default → x86-style):
addr:   500   501   502   503
byte:   0xAA  0xBB  0xCC  0xDD     ← LSB at lowest address
InstructionResultWhy
lb $t1, 0($t0)0xFFFFFFAAbyte 0xAA, sign-extended (negative)
lbu $t1, 0($t0)0x000000AAbyte 0xAA, zero-extended
lw $t1, 0($t0)0xDDCCBBAAfull word reassembled
  • For lw, the address must be word-aligned (multiple of 4) — else error

lb vs lbu — Why It Matters

  • Same byte 0xAA interpreted differently:
    • lb0xFFFFFFAA−86 (signed)
    • lbu0x000000AA170 (unsigned, e.g. pixel value 0–255)
  • Matching instruction families:
    • Signed: add, addi, bge (expansion uses slt)
    • Unsigned: addu, addiu, sltu

Endian-ness

  • Specifies how multi-byte data is laid out in byte-addressable memory
  • Little-endian → LSB at the lowest address (x86, MARS)
  • Big-endian → LSB at the highest address
  • Same 0xDDCCBBAA stored at address 500:
              Big-Endian      Little-Endian
   500          0xDD              0xAA
   501          0xCC              0xBB
   502          0xBB              0xCC
   503          0xAA              0xDD
  • lw $t1, 0($t0)0xDDCCBBAA either way (the load reassembles correctly)
  • But lb $t1, 0($t0) → reads the byte AT address 500 → 0xFFFFFFDD (BE) vs 0xFFFFFFAA (LE)
  • MIPS can be configured either way; MARS uses little-endian, QtSpim follows the host

Loading Array Elements

  • int array[8] = {1, 2, 3, 4, 5, 6, 7, 8}; → each int = 4 bytes
  • To load array[2]:
la $t0, array
lw $t1, 8($t0)        # offset = 2 · 4 = 8 bytes ✓
 
# WRONG: lw $t1, 2($t0)   → not word-aligned, error
# WRONG: lb $t1, 2($t0)   → loads ONE byte from the middle of array[0]
  • Alternative form (computed address):
la   $t0, array
li   $t2, 8
add  $t3, $t0, $t2
lw   $t1, 0($t3)

Loading Many Elements — Hard-coded vs Loop

  • int array[32] = {1, 2, …, 32};
  • Hard-coded → don’t do this:
la $t0, array
lw $t1, 0($t0)
lw $t1, 4($t0)
lw $t1, 8($t0)
...
lw $t1, 124($t0)
  • Loop → scalable:
la   $t0, array
li   $t2, 32          # counter
my_loop:
    lw   $t1, 0($t0)
    addi $t0, $t0, 4
    addi $t2, $t2, -1
    bne  $t2, $zero, my_loop

Demo 1: ASCII — Same Byte, Different Syscalls (4_ascii_demo.asm)

  • A char in MIPS is just a byte → how it prints depends on the syscall code
  • 'a' = 0x61 = 97 → same bits, four different views
.data
 
.text
.globl main
 
# utility
print_nl:
    li $a0, '\n'
    li $v0, 11
    syscall
    jr $ra
 
# main
main:
    li $s0, 'a'
 
    li $v0, 11        # print_char → 'a'
    move $a0, $s0
    syscall
    jal print_nl
 
    li $v0, 1         # print_int → 97
    move $a0, $s0
    syscall
    jal print_nl
 
    li $v0, 34        # print_hex → 0x00000061
    move $a0, $s0
    syscall
    jal print_nl
 
    li $v0, 35        # print_bin → 0...01100001
    move $a0, $s0
    syscall
 
    li $v0, 10
    syscall
  • Syscall codes seen here:
    • 11 → print character
    • 1 → print integer
    • 34 → print hex
    • 35 → print binary
    • 4 → print string
    • 10 → exit

Demo 2: Data Section & Memory Layout (4_data_demo.asm)

  • .data declares labeled storage; .text is where code lives
  • la loads the address of a label, then lw/lb read from it with byte offsets
  • Offsets are in bytes, not elements → lw $a0, 8($s3) reads the 3rd word (index 2)
.data
 
my_int:    .word 0xDDCCBBAA
 
hello_str: .asciiz "Hello!"
bye_str:   .asciiz "Bye..."
 
int_var:   .word 67305985
 
int_arr:   .word 1, 2, 3, 4
int8_arr:  .byte 1, 2, 3, 4
char_arr:  .byte '1', '2', '3', '4'
 
res_10byte: .space 10
 
dummy_arr: .byte 1, 2, 3, 4
 
.text
.globl main
 
# utility
print_nl:
    li $a0, '\n'
    li $v0, 11
    syscall
    jr $ra
 
# main
main:
 
    la $t0, my_int
    lw $t1, 1($t0)       # unaligned load — reads bytes at offset 1..4
 
    la $s0, hello_str
    lb $a0, 5($s0)       # byte at offset 5 → '!'
    li $v0, 11
    syscall
    jal print_nl
 
    move $a0, $s0        # print whole string → "Hello!"
    li $v0, 4
    syscall
    jal print_nl
 
    la $s1, char_arr
    lb $a0, 1($s1)       # byte at offset 1 → '2'
    li $v0, 11
    syscall
    jal print_nl
 
    la $s3, int_arr
    lw $a0, 8($s3)       # word at offset 8 → int_arr[2] = 3
    li $v0, 1
    syscall
 
    li $a0, '\n'
    li $v0, 11
    syscall
 
    la $s3, int_arr
    lw $a0, 4($s3)       # word at offset 4 → int_arr[1] = 2
    li $v0, 1
    syscall
 
    # end program
    li $v0, 10
    syscall

Summary

  • ASCII chars are 8-bit (1 byte) values → e.g. 'a' = 0x61
  • MIPS Memory Map → text, static data, dynamic data, reserved
  • Declare static/global data in .data (.word, .byte, .asciiz, .space N)
  • Get an address into a register → la $reg, var_name
  • Read/write memory using $reg as a base + byte offset:
    • Load: lw, lb (signed), lbu (unsigned)
    • Store: sw, sb
  • Indexing rule: offset = index · element_size (4 for .word, 1 for .byte)
  • Word loads/stores require word-aligned addresses (multiples of 4)
  • MARS = little-endian → LSB at lowest address
  • Same bits, different meanings — interpretation is decided by the instruction width and signed/unsigned variant, not the data itself

Discussion 5: Function Calls, Register Convention & the Stack

Terminology

  • Caller → the calling function (e.g. main)
  • Callee → the called function (e.g. sum)
void main() {
  int y;
  y = sum(42, 7);
  ...
}
int sum(int a, int b) {
  return (a + b);
}

Function Conventions (the contract)

  • Caller:
    • passes arguments to the callee
    • “jumps” to the callee
  • Callee:
    • performs the function
    • returns the result to the caller
    • returns to the point of call
    • must not overwrite registers/memory the caller still needs

Calling with Jumps

  • Unconditional jump → j, or jal + jr
  • j label → jump to the first instruction after labelPC ← 26-bit address of label
  • jal labeljump and link
    • jumps (same as j)
    • stores the address of the instruction after the jal into $ra$ra ← PC + 4
  • jr $reg → jumps to the address held in $regPC ← $reg (e.g. jr $ra)

MIPS Function Conventions

  • Call a function → jal (jump and link)
  • Return from a function → jr $ra (jump register)
  • Arguments → $a0–$a3
  • Return value → $v0

MIPS Register Set

NameReg #Usage
$00the constant value 0
$at1assembler temporary (reserved)
$v0–$v12–3function return values
$a0–$a34–7function arguments
$t0–$t78–15temporaries (not preserved across calls)
$s0–$s716–23saved variables (preserved across calls)
$t8–$t924–25more temporaries
$k0–$k126–27OS temporaries (reserved for OS)
$gp28global pointer
$sp29stack pointer
$fp30frame pointer
$ra31function return address

How jal / jr Work Together

0x00400200 main:   jal simple        # jumps to simple; $ra = PC + 4 = 0x00400204
0x00400204         add $s0, $s1, $s2
...
0x00401020 simple: jr $ra            # jumps to address in $ra (0x00400204)
  • voidsimple doesn’t return a value
  • jal → jumps to simple, sets $ra = 0x00400204
  • jr $ra → jumps back to 0x00400204 (the instruction right after the call)

Passing Arguments & Return Valuediffofsums(2, 3, 4, 5)

int main() {
  int y;
  y = diffofsums(2, 3, 4, 5);   // 4 arguments
}
int diffofsums(int f, int g, int h, int i) {
  int result;
  result = (f + g) - (h + i);
  return result;                // return value
}
# $s0 = y
main:
  addi $a0, $0, 2      # argument 0 = 2
  addi $a1, $0, 3      # argument 1 = 3
  addi $a2, $0, 4      # argument 2 = 4
  addi $a3, $0, 5      # argument 3 = 5
  jal  diffofsums      # call function
  add  $s0, $v0, $0    # y = returned value
 
# $s0 = result
diffofsums:
  add $t0, $a0, $a1    # $t0 = f + g
  add $t1, $a2, $a3    # $t1 = h + i
  sub $s0, $t0, $t1    # result = (f + g) - (h + i)
  add $v0, $s0, $0     # put return value in $v0
  jr  $ra              # return to caller

The Problem: Clobbered Registers

  • diffofsums overwrites 3 registers → $t0, $t1, $s0
  • $s0 is supposed to be preserved across calls → if main uses $s0 to control program flow, the program executes incorrectly
  • Why not worry about the $tX registers? → by convention they’re not preserved, so the caller already knows not to rely on them
  • Fix → diffofsums can use the stack to temporarily store and restore registers

Register Convention (partial)

  • $t0–$t9 → temporaries → values can change when a subprogram is called (not preserved)
  • $s0–$s8 → saved values → preserved across subprogram calls

Register Convention (full)

Preserved (Callee-Saved)Nonpreserved (Caller-Saved)
$s0–$s7$t0–$t9
$ra (caller-saved)$a0–$a3
$sp$v0–$v1
stack above $spstack below $sp

Prologue / Epilogue Pattern

  • If the callee needs a preserved ($sX) register, it must wrap the body:
    • Prologue → push (store) the old value onto the stack
    • Epilogue → pop (restore) the old value before returning
  • This keeps the caller’s $s0 “good” across the call (the callee failed to follow register convention otherwise)

The Stack

  • Memory used to temporarily save variables
  • Like a stack of dishes → last-in-first-out (LIFO)
  • Expands → uses more memory when more space is needed
  • Contracts → frees memory when the space is no longer needed
  • Grows down → from higher to lower memory addresses
  • $sp (stack pointer) → points to the top of the stack

MIPS Memory Map

Address        Segment
0xFFFFFFFC  ┌──────────────┐
            │   Reserved   │
0x80000000  ├──────────────┤
0x7FFFFFFC  │  Stack   ↓   │ ← $sp   (stack grows down)
            │ Dynamic Data │
            │  Heap    ↑   │
0x10010000  ├──────────────┤
0x1000FFFC  │ Static Data  │
0x10000000  ├──────────────┤
0x0FFFFFFC  │     Text     │  (instructions)
0x00400000  ├──────────────┤
0x003FFFFC  │   Reserved   │
0x00000000  └──────────────┘

push → $sp moves down (allocate)    pop → $sp moves up (free)

Storing Saved Registers on the Stack (callee-saved)

# $s0 = result
diffofsums:
  addi $sp, $sp, -4    # Prologue: make space for 1 register
  sw   $s0, 0($sp)     #   store $s0 on stack
                       #   (no need to store $t0 or $t1)
  add  $t0, $a0, $a1   # $t0 = f + g            ┐
  add  $t1, $a2, $a3   # $t1 = h + i            │ Body
  sub  $s0, $t0, $t1   # result = (f+g) - (h+i) │
  add  $v0, $s0, $0    # return value in $v0    ┘
  lw   $s0, 0($sp)     # Epilogue: restore $s0 from stack
  addi $sp, $sp, 4     #   deallocate stack space
  jr   $ra             # return to caller

The Stack During the diffofsums Call

Address Data         Address Data         Address Data
  FC     ?  ← $sp       FC     ?             FC     ?   ← $sp
  F8                    F8    $s0 ← $sp      F8
  F4                    F4                   F4
  F0                    F0                   F0
  (a) Initial State     (b) push($s0)        (c) pop()
  • push → store $s0 at F8, $sp moves down to F8
  • pop → restore $s0, $sp moves back up to FC

Nested Function Calls — the $ra Problem

  • mainsimpleeasy
0x00400020 main:   jal simple        # $ra = PC + 4 = 0x00400024
0x00400024         add $s0, $s1, $s2
...
0x00400120 simple: jal easy          # $ra = PC + 4 = 0x00400124  ← OVERWRITES old $ra!
0x00400124         jr  $ra
...
0x00400220 easy:   jr  $ra
  • Problem → the inner jal easy overwrites $ra (0x004000240x00400124)
  • When simple runs jr $ra it jumps to 0x00400124 (back inside simple) instead of returning to mainOops… where is 0x00400024?
  • $ra only holds one return address → nesting clobbers it

Fix → Save $ra on the Stack

proc1:
  addi $sp, $sp, -4    # make space on stack
  sw   $ra, 0($sp)     # save $ra on stack
  jal  proc2
  ...
  lw   $ra, 0($sp)     # restore $ra from stack
  addi $sp, $sp, 4     # deallocate stack space
  jr   $ra             # return to caller

Recursive Function Calls

int factorial(int n) {          // 3! = 3·2·1 = 6
  if (n <= 1) return 1;
  else return (n * factorial(n-1));
}
  • factorial(3) → 3 · factorial(2) → 2 · factorial(1) → return 1
  • Each call needs its own n (and $ra) preserved across the recursive call → use $sX registers + the stack ($tX would be clobbered by the recursive call)
0x90 factorial: addi $sp, $sp, -8    # Prologue: make space (2 words)
0x94            sw   $ra, 0($sp)     #   store $ra
0x98            sw   $s0, 4($sp)     #   store $s0
0x9C            addi $t0, $0, 2
0xA0            slt  $t0, $a0, $t0   # n < 2 ?  (i.e. n <= 1)
0xA4            beq  $t0, $0, else   # no  → go to else
0xA8            addi $v0, $0, 1      # yes → return 1
0xAC            j    end             #       jump to epilogue
0xB0 else:      move $s0, $a0        # save n into $s0
0xB4            addi $a0, $s0, -1    # n = n - 1
0xB8            jal  factorial       # recursive call
0xBC            mul  $v0, $s0, $v0   # n · factorial(n-1)
0xC0 end:       lw   $ra, 0($sp)     # Epilogue: restore $ra
0xC4            lw   $s0, 4($sp)     #   restore $s0
0xC8            addi $sp, $sp, 8     #   release space
0xCC            jr   $ra             # return

Stack During the Recursive Call (3! = 6)

Each frame = 2 words: [ $ra @ low addr, $s0 @ high addr ]

      Deepest recursion              Unwinding (returns)
   Address Data                   Address Data
     FC                  ← $sp       FC            $v0 = 6  (final)
     F8    $s0                       F8    $s0
     F4    $ra                       F4    $ra     ← $sp   $s0=3, $v0 = 3·2 = 6
     F0    $s0 (0x3)                 F0    $s0 (0x3)
     EC    $ra (0xBC)                EC    $ra     ← $sp   $s0=2, $v0 = 2·1 = 2
     E8    $s0 (0x2)                 E8    $s0 (0x2)
     E4    $ra (0xBC)    ← $sp       E4    $ra     ← $sp   $s0=1, $v0 = 1
  • Each recursive call pushes its own $ra + $s0 (the current n)
  • On the way back, each frame is popped and the multiply runs → 1 → 2·1 → 3·2 = 6

Function Call Summary

  • Caller:
    • put arguments in $a0–$a3
    • save any needed registers ($ra, maybe $t0–$t9)
    • jal callee
    • look for the result in $v0
    • restore registers
  • Callee:
    • save registers that might be disturbed ($s0–$s7)
    • perform the function
    • put the result in $v0
    • restore saved registers
    • jr $ra

Discussion 6: System Calls (syscall)

What syscall Is

  • syscall → requests a system service from the OS/simulator
  • Used for:
    • Input/output → read from console / file / external device; print to console, write to file / device
    • random numbers, time, sleep, …
  • $v0 → holds the service number (which service you want)
    • plus other argument values, if any
  • Full list → MARS help menu (F1), or the MIPS syscall function reference in MARS

Steps to Use a syscall

  1. Load the service number into $v0
  2. Load argument values (if any) into $a0, $a1, $a2, or $f12 as specified
  3. Issue the syscall instruction
  4. Retrieve return values (if any) from the specified result registers

Lab0: Two Services (no global variables)

  • Print an integer to the console → service 1
li   $v0, 1
move $a0, $s0
syscall
  • Exit the program → service 10
li $v0, 10
syscall

Printing Characters & Strings (Lab2 main)

  • Print a string → service 4
li $v0, 4
la $a0, change_case_output
syscall
  • How do you know the end of a string? → the NULL terminator '\0'
  • Print the 1st char → service 11 (print char)
li $v0, 11
la $t0, change_case_output    # address of the string
lb $a0, 0($t0)                # load the 1st byte (the char value)
syscall
  • For the 2nd char → use offset 1($t0)
  • Address vs. value distinction:
    • la $t0, label → loads the address
    • lb $a0, 0($t0) → loads the value of the char at that address

Common Print Services

Service$v0Arguments
print integer1$a0 = integer
print string4$a0 = address of string
print char11$a0 = char value
exit10

Demo: Print globals & arrays (4_data_demo.asm)

  • Print a string
  • Print a single character from a string
  • Print elements from a char array
  • Print elements from an int array

File I/O — Must Open First

  • Must open a file before reading/writing it
  • Concept → like C FILE *fptr = fopen("myFile.dat", "r"); or Python f = open("myFile.dat", "r") → returns a file descriptor
  • In MIPS assembly → low-level → use syscall to obtain the file descriptor directly
  • Read → reads bytes into a buffer (like C fgets(str, 4, fptr) / Python f.read(4))

File I/O syscall Table

ServiceCodeArgumentsResult
open file13$a0 = addr of null-terminated filename; $a1 = flags; $a2 = mode$v0 = file descriptor (negative if error)
read from file14$a0 = file descriptor; $a1 = addr of input buffer; $a2 = max chars to read$v0 = # chars read (0 if EOF, negative if error)
write to file15$a0 = file descriptor; $a1 = addr of output buffer; $a2 = # chars to write$v0 = # chars written (negative if error)
close file16$a0 = file descriptor
  • Open flags → 0 = read, 1 = write

Example: Open / Write / Close (from F1 Help Menu)

.data
fout:   .asciiz "testout.txt"                         # filename for output
buffer: .asciiz "The quick brown fox jumps over the lazy dog."
 
.text
# Open (for writing) a file that does not exist
li   $v0, 13          # system call for open file
la   $a0, fout        # output file name
li   $a1, 1           # flags → open for writing (0 = read, 1 = write)
li   $a2, 0           # mode is ignored
syscall
move $s6, $v0         # save the file descriptor
 
# Write to the file just opened
li   $v0, 15          # system call for write to file
move $a0, $s6         # file descriptor
la   $a1, buffer      # address of buffer to write from
li   $a2, 44          # hardcoded buffer length (# chars)
syscall
 
# Close the file
li   $v0, 16          # system call for close file
move $a0, $s6         # file descriptor to close
syscall
  • Be careful with $sX here → register convention applies
    • may replace with $tX, or use $sp to save/restore
  • To read a file instead → change the syscall types (open with flag 0, then use service 14)

Summary

  • syscall → ask the OS/MARS for a service; the service number goes in $v0
  • Pattern → set $v0 (+ $a0/$a1/$a2/$f12 args) → syscall → read result registers
  • Common codes → 1 print int, 4 print string, 11 print char, 10 exit
  • Files → must open (13) before read (14) / write (15), then close (16); $v0 returns the file descriptor (negative = error)

Lecture 6: Memory — Latches

Today’s Objectives

  • Ideal vs real gates
  • Sequential circuits
  • Latch
  • Clock
  • Brief look at processor organization
  • Reading: Sec. 3.3.1–3.3.4

Storage Elements

  • So far → only memoryless digital functions (combinational logic)
  • Now we need to store program data and intermediate values → sequential logic
  • The ISA tells us there are registers and memory → e.g. ADD R4, R6:
    • where does the data reside? What are R4, R6? How do they work?
  • Built from gates, similar to combinational logic, BUT:
    • require feedback → new state = F(current state, inputs)
    • require timing to be taken into account → assume non-zero gate delay → introduce the clock
  • A combinational circuit has latency DT → output changes DT seconds after an input change, and keeps changing as long as the inputs change
  • New state → a boolean function depending on the inputs from DT earlier + the state from DT earlier

SR Latch (Set-Reset) — a basic storage element

  • Two cross-coupled NOR gates → Q and Q’ are fed back as inputs → this is what makes it sequential
  • A bi-stable circuit that stores a binary value:
    • Q = 1 → “1” state
    • Q = 0 → “0” state
  • Can only be in one of these states → output change requires S or R to be applied
   R ──▶[ NOR ]──┬──────────▶ Q
           ▲     │
           └──── │ ────┐   (cross-coupled: Q feeds the bottom NOR,
           ┌──── │ ────┘    Q' feeds the top NOR)
           ▼     │
   S ──▶[ NOR ]──┴──────────▶ Q'
  • Behavior:
    • S = 1, R = 0 → Set → Q = 1
    • S = 0, R = 1 → Reset → Q = 0
    • S = R = 0 → hold (Q unchanged)
    • S = R = 1 → forbidden → unstable; Q may oscillate before settling to 1 or 0

SR Latch Truth Table

QtSRQt+1
0000
0010
0101
011?
1001
1010
1101
111?
  • ? → undefined output (the S = R = 1 problem)

Problem with the SR Latch

  • Combinational functions take time to settle → the output may vary for a while before reaching the correct value
  • If S or R change, Q changes immediately → but we want changes to happen only when desired and once per clock cycle
  • Fix → modify the latch to incorporate the clock as an input

The Clock

  • We’ll build a computer with many latches → need to make sure things happen when required
  • Example arithmetic operation:
    • take data from 2 registers → run through the ALU → store to another register
    • these three steps must occur at the right time
  • Solution → generate and distribute a special signal called the clock
    • keeps time → like a conductor for an orchestra or a drummer in a band
  • Any state change in the system is referenced to the clock
  • The clock is the basic time unit for executing instructions (an instruction may take several clocks)
  • Must account for signal propagation through gates → typical gate delay today < 100 ps (10⁻¹⁰ s)

Clocked SR Latch

  • We want gate outputs to change only at specific times → add circuitry so changes occur only when the clock is active (Clock = 1)
  • Add two gates in front of the NOR latch, gated by the clock:
    • if Clock = 0 → neither S nor R reaches the NOR gates → Q holds
    • if Clock = 1 → S/R pass through → Q can change
   S ───┬──[ AND ]──┐
        │     │     │
 Clock ─┼─────┤  [ SR latch ]──▶ Q, Q'
        │     │     │
   R ───┴──[ AND ]──┘
  • Q only changes when Clock = 1
  • Timing → S, R come from elsewhere in the system; they can only take effect after the clock goes high (DT1 later); Q then changes DT2 later, due to gate propagation delay

Still Have Problems

  • (A) Undefined state when S = R = 1 → solved by the D latch (next)
  • (B) Inputs can change as long as the clock is high → half a clock cycle → that’s too long
    • could play with the duty cycle (% of the cycle that is “1”) → generate a very narrow “1” pulse → but hard, especially at 4 GHz
    • “sample” the input only while the clock is “1”
    • better solution → flip-flops (later)

D Latch — avoiding S = R = 1

  • Only one data input: D (plus the clock)
  • D drives S directly; D’ (inverted D) drives R → S and R can never both be 1
  • When the clock becomes 1:
    • D = 1 → the circuit remembers 1 (Q = 1)
    • D = 0 → the circuit remembers 0 (Q = 0)
   D ──┬────────────[ AND ]──┐
       │              │       │
       │      Clock ──┤   [ SR latch ]──▶ Q, Q'
       │              │       │
       └──[ NOT ]─────[ AND ]──┘

8-Bit Memory

  • Use 8 D latches → an 8-bit memory
  • 8 inputs (D0–D7), all written at the same time → all 8 latches share the same clock
        D0 ──[ D Latch ]── Q0
        D1 ──[ D Latch ]── Q1
        D2 ──[ D Latch ]── Q2
         ⋮                  ⋮
        D7 ──[ D Latch ]── Q7
                 ▲
              clock (shared)

Registers

  • A register → a group of latches/flip-flops treated as one unit, instead of addressing each storage element individually
   Data_in[7:0] ──▶│  Register A  │──▶ Data_out[7:0]
                        ▲
                      Clock

Lecture 7: Flip-Flops & RAM

Flip-Flops

  • D latches still have a problem → input changes while the clock is “1” still propagate
  • Another (better) solution to the SR-latch input-change problem
  • Two approaches:
    • Master-slave flip-flops
    • Edge-triggered flip-flops
  • Change can occur only on a clock edge (transition 0→1 or 1→0) → exactly one change per clock cycle, output stable for the whole clock period

D Flip-Flop Timing

  • D (data) can wander, but Q only updates on the active clock edge → Q is stable between edges

Master-Slave D Flip-Flop

  • Two D latches in series:
    • Master latch → clocked by Clk
    • Slave latch → clocked by Clk’ (the inverted clock)
   D ──▶│ D    Q │── Qm ──▶│ D    Q │──▶ Q
        │ Master │         │ Slave  │
 Clk ─┬▶│ Clk    │     ┌──▶│ Clk Q' │──▶ Q'
      │  └────────┘     │   └────────┘
      └────[ NOT ]──────┘  Clk'
  • When Clk is high → the master is transparent → Qm follows D (slave holds, since Clk’ = 0)
  • When Clk is low → the slave is transparent → Q follows Qm (master holds)
  • The output changes at the clock edge that falls from 1 → 0 → this D FF is negative-edge triggered

Random Access Memory (RAM)

  • Access time → from address presented to data out
  • Constant access time → independent of the address (unlike tape or disk, which are sequential)
  • Semiconductor memory → RAM and ROM
  • Many bits on one IC → each bit is a “latch”
  • Two types (speed / cost / size tradeoff):
    • SRAM (Static) → faster, fewer bits → each bit is a latch
    • DRAM (Dynamic) → denser/cheaper → each bit stores charge on a tiny capacitor
  • Tradeoff → higher speed costs more per bit and limits capacity

Memory Cell Organization

  • SRAM cell → a clocked D-latch-style cell:
   Data_in ──▶│ D       │
   Clock ─────▶│ C    Q │──▶ Data_out
   Write ─────▶│ En      │
  • DRAM cell → one transistor controls access to a capacitor that stores charge:
    • Write → put charge in
    • Read → let charge flow out (destructive → needs refresh)

Register File

  • An array of registers (R0…R7) addressed as a unit
  • Write path → an address decoder drives the Enable of the selected register → only that register loads Data_in on the clock
  • Read path → an address-controlled mux selects the addressed register’s Q → Data_out
  • For 8 registers → a 3-bit address
   Data_in ─┬──▶│ D  R0  Q │─┐
            ├──▶│ D  R1  Q │─┤
            │        ⋮       ├─[ MUX ]──▶ Data_out
            └──▶│ D  R7  Q │─┘     ▲
                   ▲               │
   Address ─▶[ decoder ]───────────┘
   (3 bits)   (drives Enable)   (selects output)
                   ▲
                 Clock

Lecture 8: More Memory — Organization & Decoding

Today’s Objectives

  • Continue the study of memory organization
  • Reading: Sec. 3.3.1–3.3.4, Sec. 2.1 pp. 39–42, Sec. 4.1.1–2 (next time)

Review

  • SR latch → basic storage device, but the output changes as the inputs change
  • Clock & clocked latches → better control of when the output changes
  • Flip-flops → best way to store data, based on the clock’s rising edge → one change per clock cycle, output stable for the whole period
  • Registers → aggregate a number of latches/flip-flops, controlled as a unit
  • Register file → array of registers (see Lecture 7)

Memory Chips

  • A memory IC is typically organized as a cell matrix → NxN (good for layout)
  • At each position → a single cell (1 bit)
  • To get a wider output → replicate the matrix
    • e.g. a 128 Mb IC may have 8 blocks (individual NxN matrices)
  • Memory is built out of multiple chips
    • e.g. a 256 MB DIMM may have 16 × 128 Mb ICs → eight are read/written at once, delivering 8 bits (a byte)

Memory Organization & Decoding

  • A decoder circuit selects the appropriate word in memory based on the address provided
  • Two steps:
    • first → a row is read out of the cell matrix, N bits at once
    • second → a bit (column) is selected from that row and output
  • The output goes through a tri-state buffer → 3 states: 0, 1, floating (electrically disconnected)
  • The larger the memory → the longer it takes to read
  • SRAM is faster than DRAM but has fewer bits:
    • SRAM “bit” → a latch
    • DRAM “bit” → a special transistor (capacitor + access transistor)

Memory IC Organization (NxN, 1 bit wide)

   Data_in
      │
      ▼
   ┌───────────────────────┐
   │  ▢   ▢   •••   ▢       │ ◀── row selected by Row Address
   │  ▢   ▢          ▢      │
   │         •••            │     (▢ = single cell)
   │  ▢   ▢          ▢      │
   └───┬───┬───────┬────────┘
  Row  │   │       │
  Addr ▼   ▼       ▼
       [  column MUX  ] ◀── Column Address
              │
              ▼
         [ tri-state ] ◀── Output Enable (OE)
              │
              ▼
           Data_out
  • Row Address → row decoder selects one row (N bits read at once)
  • Column Address → mux selects 1 bit from that row
  • OE → enables the tri-state buffer onto the output

RAM Control

  • Multiple chips are connected together to → get a wider word, or get a larger memory
  • Need to be able to → select a chip and enable its output
  • Control signals:
    • CS (chip select) → nothing happens inside a RAM unless it is selected
    • OE (output enable) → allows direct connection of IC outputs (tri-state onto a shared bus)
    • R/W’ → read or write

A 4-Chip Memory System

   Data_Bus ═══════════╦════════════╦════════════╦═══════════
                       │            │            │
               ┌───────┴───┐ ┌──────┴────┐      ┌┴──────────┐
   Addr ──────▶│ Din  Dout │ │ Din  Dout │ •••  │ Din  Dout │
   R/W' ──────▶│ Addr RAM0 │ │ Addr RAM1 │      │ Addr RAM3 │
               │ R/W'    OE│ │ R/W'    OE│      │ R/W'    OE│
               │ CS        │ │ CS        │      │ CS        │
               └────▲──────┘ └────▲──────┘      └────▲──────┘
                    │             │                  │
   Address ─▶[ decoder ]─CS0──────┴─CS1──────────────┴─CS3
   Output_enable ────────(OE, shared)
   R/W' ─────────────────(shared)
  • Low address bits → Addr (shared by all chips)
  • High address bits → decoder → one CS per chip → selects the active chip
  • Only the selected chip (with OE asserted) drives the shared data bus

Lecture 9: Disk Memory, Error Detection/Correction & Compression

Today’s Objectives

  • Disk memory
  • Error correction & detection
  • Full data path organization (next lecture)
  • Reading: Sec. 2.3.2 (disk), Sec. 4.1.1 (ignore JVM stuff), class slides

Disk Memory

  • Disk → magnetic memory
  • “Sequential” access (mechanical positioning)
  • Reads/writes are performed on blocks of data → more efficient this way
  • Much slower than memory → milliseconds to access, even at 15K RPM
  • Much larger than main memory (DRAM) → multiple TB common (and growing)

Disk Structure

  • A disk = a stack of platters, each with two recording surfaces
  • A mechanical arm moves the read/write head across the surfaces
  • Each surface → data on concentric circles called tracks, further divided into sectors (marked by a sector mark)
   ════════ Surface 1     ◀── direction of arm
   ════════ Surface 2
   ════════ Surface 3
   ════════ Surface 4
        ⋮
   (head moves in/out to reach the desired track)

   One platter (top view):  Track 0 (outer) … Track N (inner),
   each divided into sectors

Disk Organization — Terms

  • Track → sequence of bits at the same radius on one surface
  • Cylinder → set of tracks at the same radius across all platters
  • Sector → fixed-length segment within a track
  • Seek time → time to position the head over a track
  • Rotational latency → time for the disk to rotate the desired sector under the head
  • Head transfer rate → how fast data transfers once the head is in position

Parameters

  • Rotational speed → 3600 to 15K RPM → determines how soon the data appears under the head
  • Seek speed → track-to-track → milliseconds
  • Average time to find data → ½ rotational delay + average seek time
    • ~100 ns for DRAM vs ~15 ms for disk → orders of magnitude slower
  • Transfer rate → ~100 MB/sec

Operation

  • Address supplied as (platter, track, sector)
    1. Seek → move the arm + R/W head assembly to the desired track
    1. Locate the sector → wait for the platter to rotate to the right place
    1. Read/write the sector
  • Can read more at once → a whole cylinder → read the track from all platters at once

Optimization

  • Read an entire sector / track / cylinder → helps amortize the startup cost
  • Cache the read data in the controller’s memory → satisfies future requests from RAM (works due to locality)
  • The disk controller + OS may reorder disk I/O requests to minimize seek time

Error Detection & Correction

  • Data stored in bulk, or transmitted, can be corrupted
  • Causes of corruption are diverse; occurrence is relatively infrequent

Error Detection — Parity (simplest method)

  • When writing/sending a block → add a parity bit indicating whether the number of 1s is even (or odd)
  • When reading/receiving → recompute and check the count
  • If the written parity ≠ the read parity → an error was detected
                 P │←──── Data bits ────→
   Transmitted:  0 │ 1 0 0 1 0 1 1     (four 1s → even → P = 0)
                 P │       ↓ error
   Received:     0 │ 0 0 0 1 0 1 1     (three 1s → odd → mismatch ⟹ error!)
  • Detects a single-bit error, but cannot say which bit, and cannot fix it

Error Correction — Hamming Codes

  • Use multiple parity bits, each checking an overlapping subset of the data bits
  • The combined parity bits name the position of the wrong bit → so it can be corrected

Compression — Huffman Coding

  • A lossless compression algorithm
  • Each symbol is encoded in a bit string whose length is “proportional” to its frequency of occurrence
    • frequent symbols (e.g. “e”) → shorter codes; rare symbols (e.g. “x”) → longer codes → saves a lot of bits on real text
  • Built by constructing a prefix tree → guarantees no encoding is a prefix of another (so it decodes unambiguously)
  • Build bottom-up → repeatedly combine the two lowest-probability nodes

Huffman Example

  • Symbols and probabilities:
SymbolProbability
A0.3
B0.3
C0.2
D0.1
E0.1
  • Build (combine the two lowest each step; label branches 1 / 0):
    • D(0.1) + E(0.1) → DE(0.2)
    • C(0.2) + DE(0.2) → CDE(0.4)
    • A(0.3) + B(0.3) → AB(0.6)
    • AB(0.6) + CDE(0.4) → root(1.0)
                root
            1 /      \ 0
            AB         CDE
          1/  \0     1/    \0
          A    B     C      DE
                           1/  \0
                           D    E
  • Resulting encoding:
SymbolEncoding
A11
B10
C01
D001
E000
  • Example → AABCBDAE11 11 10 01 10 001 11 000 = 11111001100011100018 bits
  • A fixed 3-bit code → 8 chars × 3 bits = 24 bits → Huffman saves 6 bits here

Lecture 10: Data Path & Control Unit

Back to the CPU — the Final Stretch

  • Goal → design the control unit and put it all together
  • Reading: Sec. 4.1.1 (ignore JVM stuff), class slides

Data Path

  • The core of the CPU → a register file feeding an ALU
  • Read two operands from the register file (Reg_addr A, Reg_addr B) onto buses A and B
  • The ALU computes a result; flags N, C, Z are produced; the Result bus writes back to the register file
   Result addr ─▶┌───────────────┐◀─ Reg_addr A
                 │ Register File │◀─ Reg_addr B
                 └──┬─────────┬──┘
                    │ A       │ B
                    ▼         ▼
   ALU op ─▶ ┌───────────────────┐ ─▶ Z
   Cin   ─▶  │        ALU        │ ─▶ N
             └─────────┬─────────┘ ─▶ C
                       │ Result
                       └──────────────▶ (back to Register File)

Data Path Operation

  • Read operands from the register file → at the current rising clock edge → place on buses A and B
  • Specify the ALU operation
  • “Wait” for the ALU → designed to fit into 1 clock cycle (no real wait)
  • Set flags → N, C, Z
  • Write the RESULT bus to the selected register → really happens on the next rising edge

Adding Memory Access

  • Memory for our machine → a linear array of words (each word 2/4/8 bytes), byte-addressable even if organized as words
  • Two steps → generate a memory address, then access memory at that address
    • Read → wait for data ready → read into the Memory Data Register → move to the register file or ALU
    • Write → put data out to memory
  • Need two new registers → a clocked interface to the asynchronous memory:
    • MAR (Memory Address Register)
    • MDR (Memory Data Register)
  • Control signals → R/W’ and D_ready (data ready on read / “done” on write)

Register Transfer Language (RTL)

  • A shorthand for describing register-to-register (and all other) data-path operations, and control logic
  • Everything comes from and goes to a register → so it can be described as register transfers through combinational logic
  • Notation:
    • variables = (contents of) registers
    • operations = assignment, arithmetic, logical, etc.
    • memory = an array → M[4] (or R[4]) = location 4 of memory (or the register file)
    • assignment → R[4] ← R[4] + R[6]
    • conditional → Z ← 1 if R[4] = 0
    • simultaneous ops (in time) → MDR ← Mem[MAR] ; PC ← PC + 2
  • Example → RTL for Ld Rj, Ri:
    • MAR ← Ri (or Ri + displacement)
    • MDR ← Mem[MAR] if D_ready = 1
    • Rj ← MDR

Instruction Execution & Fetch

  • Instruction execution includes steps beyond the specified operation → the same data path is reused for both data processing and these extra steps → keep augmenting it until it can do everything
  • Need logic to execute the control algorithm → a control unit
  • Instruction fetch → get an instruction from memory to start → needs two more registers:
    • PC (program counter) → PC ← PC + 2
    • IR (instruction register) → IR ← Mem[PC]
  • Plus the control signals to drive all this

Full (Augmented) Data Path

  • Adds the memory-interface registers and the selection muxes around the core register-file / ALU:
            ┌────────┐       EnblRes│      Addr_A│  Addr_B│
            │        │  Addr_Res ▼  ▼            ▼        ▼
            │ Memory │◀────────▶┌───────────────┐
            │        │          │ Register File │
            └───┬────┘          └───┬───────┬───┘    Const
       Sel_mem  │   ┌──[ PC ]       │       │         │
         [MUX]──┴───┤  [ MAR ]    [Sel_A] [Sel_B]◀────┘
                    │  [ MDR ]──┐    │ A     │ B
                    │  [ IR ]   │    ▼       ▼
                    │           │ ┌───────────────┐
                    └───────────┘ │      ALU      │◀── Op
              D_ready             └───────┬───────┘
                                          │ Result
              ──▶ To Control Unit (flags Z / N / C, IR opcode)
  • New registers → PC, MAR, MDR, IR
  • New muxes → Sel_mem (address source: PC vs Result), Sel_A (ALU A input: reg file vs PC), Sel_B (ALU B input: reg file vs Const)
  • Write-enable controls → Wr_PC, Wr_MAR, Wr_MDR, Wr_IR, EnblRes

Control Unit

  • Generates the control signals for the data path
  • Encoded as microinstructions (control words, CW) → a format with various fields
  • Driven by a microcode engine → supplies one CW per cycle

Control Word (CW) Format

  • Groups of control signals → memory, ALU, PC, etc.
| R/W' | Sel_M | Wr_MAR | Wr_MDR | Wr_IR | EnblRes | Sel_A | Sel_B | Op | Wr_PC | ... |
|◀──── Memory Control Group ────▶|◀────────── ALU / Reg Control ─────────▶| PC | Etc.|

Microcode Engine

  • Cycle (do forever):
    • fetch the m-instruction → CW ← m-Mem[m-PC]
    • update m-PC → m-PC ← CW.Next (a field in the CW; more general than m-PC ← m-PC + 1)
    • apply the CW → run a wire from each CW bit out as a control signal
    • wait for the end of the cycle
  • Very simple → a tiny CPU whose only job is “fetch m-instruction, update m-PC”
  • Location N in the m-store → the first CW of the m-code → the first CW of CPU instruction-fetch → on hardware reset, m-PC ← N
  • m-PC uses an extra next-address field in the CW itself → simplifies branching and allows easy relocation of m-instructions
  • Two “next” addresses:
    • Next1 → next sequential address (same as m-PC + 1, but eliminates an adder)
    • Next2 → conditional m-branch target

Microcode Engine Organization

        ┌─────────────────┐   Addr     m-PC
        │ m-program store │◀──────────── │
        └────────┬────────┘              ▲
            CW   │ Data              [ MUX ]◀── B_sel
   ┌─────────────▼───────────────────────┐  ▲ ▲ ▲ ▲
   │ Mem | ALU | Reg | Branch | PC | Next1 | Next2 |  Reset_addr   IR.opcode
   └──┬──────────────────┬─────────────────┘    ▲
      │ control          │ Branch field          │
      ▼ signals          ▼                 [ B_sel Generator ]◀── Z, N, D_ready
   (to CPU data path)                       ◀── IR (Opcode, R1, R2)
  • The CW’s Next1 (sequential) and Next2 (branch target) feed a mux
  • The B_sel Generator picks the next m-PC source using the Branch field + the selected flag

m-Branches

  • Reasons to have a microcode branch:
    • waiting for memory (D_ready = 1)
    • implementing CPU branch instructions (BN, BZ, …)
    • instruction decode → branch to the right CW for the fetched opcode
  • Conditional m-branch → if cond then m-PC ← Next2 else m-PC ← Next1
  • Unconditional m-branch → m-PC ← Next1

Branch Field in the CW

  • 4 bits → specify which input signal to branch on (one-hot — only one bit may be 1):
    • 1000 → branch on Z
    • 0100 → branch on N
    • 0010 → branch on Opcode (instruction decode)
    • 0001 → branch on D_ready
    • 0000 → do not branch
  • RTL for the m-branch:
    • Z type → if Z = 1 then m-PC ← Next2 else Next1
    • N type → if N = 1 then m-PC ← Next2 else Next1
    • Opcode → m-PC ← IR.opcode (no arithmetic — just jam the value in)
    • D_ready → if D_ready = 1 then m-PC ← Next2 else Next1
    • 0000m-PC ← Next1
  • B_sel Generator → selects the next m-PC via a mux, based on the branch type + the value of the selected control signal

Instruction Decode

  • Implemented as a microcode branch → m-PC ← IR[opcode] → no arithmetic, just jam the value in
  • At each opcode address → a group of CWs implementing all the steps of that instruction (I-fetch, PC update, I-decode, the RR ALU op, …)
  • For a whole instruction group → use further decoding on some opcode bits to pick the actual instruction

Microprograms

  • Represented with symbolic notation (equivalent to the m-store contents, minus labels)
  • Tools (compilers/assemblers) generate the CW bit strings → nobody would want to do it by hand
  • The full m-program implements all CPU instructions
  • Convention for writing them → only show the signals that must be “1” for the next clock (the rest are don’t-care)
    • use Iden = pass-through ALU function → Result ← B
    • register specifiers x, y come from IR.r1, IR.r2