Introduction

My study notes of Computer Architecture & Organization. Took the class during 22-2. Note written during 22-Winter.


What is Computer?

The Von Neumann Architectue

Just like coding, the basic architecture of a computer has an input -> process -> output.

Typical computer organization


Instruction Set Architecture

ISA

  • Processor’s Instruction set
    • The set of assembly language instructions
  • Programmer accessible register within processor
    • Size of each programmer-accessible registers
    • Instruction that can use each register
  • Information necessary to interact with memory
    • Memory alignment
  • How processor reacts to interrupt from the programming view point

    MIPS

  • Goals of instruction set design for MIPS
    • Maximize performance and minimize cost, and reduce design time (of compiler and hardware)
    • By the simplicity of Hardware!
  • MIPS design principles
    • Simplicity favors regularity
    • Smaller is faster
    • Make the common case fast
    • Good design demands goom compromises

Basic Procedures

How procedures are conducted. Control is transferred.

Uses stack to implement the design principles. Makes code simple, therefore making the common cases fast.

Translation and Startup

Like I learned in DS, the whole process is depicted as the following (In DS, the Linker & Loader was the focus; In CA, the Assembler was focused; Personally, I want to learn more about the Compiler..):

Loading a Program

Load from image file on disk into memory

  1. Read header to determine segment sizes
  2. Create virtual address space
    • Done by the OS
    • Virual address & memory explained later..
  3. Copy text and initialized data into memory
  4. Set up arguments on stack
  5. Initialize registers (including $sp; stack pointer, $fp; frame pointer, $gp; global pointer)
  6. Jump to startup routine
    • Copies arguments to $a0, … and calls main
    • When main returns, do exit syscall

Pipelining

Pipelined Datapath

There exists 5 stages

  1. IF : Instruction Fetch
  2. ID : Instruction Decode
  3. EX : Execute Operation Calculate Address
  4. MEM : Access Memory
  5. WB : Write Result Back to Register

Control signals are also passed along the pipline because the next decoded instruction may overwrite the control signals. Saving the control preserves the action to be done.

Hazards

As instructions are executed continuously right after another, conflicts may happen.

  • There exists 3 classes
    • Structure Hazards
      • A required resource is busy
    • Data Hazards
      • Need to wait for previous instruction to compelete its data read/write
      • Solutions : forwarding, stalling, compiler scheduling
    • Control Hazards :
      • Deciding on control action depends on previous instruction
      • Solutions : stall, reducing branch delay, branch prediction

Exceptions and Interrupts

  • Exceptions
    • “Unexpected” events within the CPU
      • overflow, …
  • Interrupt
    • From an external I/O controller
  • Performance is sacrificed to deal with them
    1. Read the problem
    2. Transfer to related handler
    3. Determine the required action
    4. If restartable
      • Take corrective action
      • EPC (Exception Program Counter) to return to program
    5. Else terminate program

Performance

As one of the goals of DS, Algorithm, OS, etc. is efficiency, calculating performance is crucial.

CPI : Clocks Per Instruction

Memory

Memory Hierarchy

Nothing to explain.. as intuitive as it can be.

Principle of Locality

  • Temporal Locality
    • Items accessed recently are likely to be accessed again soon
    • e.g. loops
  • Spatial Locality
    • Items near those accessed reccently are likely to be accessed soon
    • e.g. array data

Memory Heirarchy Levels

  • Hit : access satisfied by upper level
  • Miss : accessed data is absent so block copied from lower level.

Cache

  • Direct Mapped
    • One tag per block.
    • position of block = (num_block) % (num_cache_block)
  • Set Associative
    • Multiple tag per block.
    • position of block = (num_block) % (num_cache_set)

Multilevel Caches

  • Primary Cache
    • Attached to CPU
    • Small, but fast
  • Level-2 Cache
    • Services misses from primary cache
    • Larger, slower, but still faster than main memory
  • Main Memory
    • Services misses from L-2 cache
  • Some high-end systems include L-3 cache

Cache Coherence

  • Suppose 2 CPU cores share a physical address space
    • If CPU A writes 1 to X
      • CPU A and CPU B’s content would be different
    • [Snoop protocol] (https://www.techopedia.com/definition/332/snooping-protocol)

Writes

  • On data-write hit
    • Write Through
      • Update cache & memory
    • Write Back
      • Update cache only

Virtual Memory

This topic is very important

Virtual page number is used to access virtual address in TLB or/then Page table. Then the actual physical page address in the table is used to access Physical memory or Disk storage.

This second diagram shows a better picture of the flow.

SIMD

  • Operate elementwise on vectors of data
    • All processors execute the same instruction at the same time
    • Simplifies synchronization
    • Reduced instruction control hardware
    • Works best for highly data-parallel applications
      • e.g. GPU

BUS

  • Shortened form of the Latin omnibus.
  • Shared communication channel
    • Parallel set of wires for data and synchronization of data transfer
    • Can become a [bottleneck] (https://www.techopedia.com/definition/14630/von-neumann-bottleneck)
  • Performance limited by physical factors
    • Wire length, number of connections
  • More recent alternative : high-speed serial connections with switches
    • Like networks

I/O

Memory-Mapped I/O

  • Certain adresses are not regular memory addresses
  • Instead, they correspond to registers in I/O devices
  • They are not accessed directly, only through through address + registers.

Polling and Interrrupt

  • OS needs to know when
    • I/O has completed an operation
    • I/O has encountered an error
  • Polling
    • OS checks the status register to check if it is time for next I/O operation
  • Interrupt
    • When the I/O device completes an operation or needs attention, it “interrupts” the processor.

Multithreading

  • Performing multiple [threads] (https://www.geeksforgeeks.org/thread-in-operating-system/) of execution in parallel
  • Fine-grain Multithreading
    • Switch threads after each cycle
  • Coarse-grain Multithreading
    • Only switch on long stall
    • e.g. L-2 cache miss
  • SMT
    • Schedule instructions from multiple threads