[Study] Computer Architecture

Introduction

My study notes of Computer Architecture & Organization. Took the class during 22-2. Note written during 22-Winter.

What is Computer?

The Von Neumann Architectue

Just like coding, the basic architecture of a computer has an input -> process -> output.

Typical computer organization

Instruction Set Architecture

ISA

Processor’s Instruction set
- The set of assembly language instructions
Programmer accessible register within processor
- Size of each programmer-accessible registers
- Instruction that can use each register
Information necessary to interact with memory
- Memory alignment
How processor reacts to interrupt from the programming view point
MIPS
Goals of instruction set design for MIPS
- Maximize performance and minimize cost, and reduce design time (of compiler and hardware)
- By the simplicity of Hardware!
MIPS design principles
- Simplicity favors regularity
- Smaller is faster
- Make the common case fast
- Good design demands goom compromises

Basic Procedures

How procedures are conducted. Control is transferred.

Uses stack to implement the design principles. Makes code simple, therefore making the common cases fast.

Translation and Startup

Like I learned in DS, the whole process is depicted as the following (In DS, the Linker & Loader was the focus; In CA, the Assembler was focused; Personally, I want to learn more about the Compiler..):

Loading a Program

Load from image file on disk into memory

Read header to determine segment sizes
Create virtual address space
- Done by the OS
- Virual address & memory explained later..
Copy text and initialized data into memory
Set up arguments on stack
Initialize registers (including $sp; stack pointer, $fp; frame pointer, $gp; global pointer)
Jump to startup routine
- Copies arguments to $a0, … and calls main
- When main returns, do exit syscall

Pipelining

Pipelined Datapath

There exists 5 stages

IF : Instruction Fetch

ID : Instruction Decode

EX : Execute Operation Calculate Address

MEM : Access Memory

WB : Write Result Back to Register

Control signals are also passed along the pipline because the next decoded instruction may overwrite the control signals. Saving the control preserves the action to be done.

Hazards

As instructions are executed continuously right after another, conflicts may happen.

There exists 3 classes

Structure Hazards

A required resource is busy

Data Hazards

Need to wait for previous instruction to compelete its data read/write

Solutions : forwarding, stalling, compiler scheduling

Control Hazards :

Deciding on control action depends on previous instruction

Solutions : stall, reducing branch delay, branch prediction

Exceptions and Interrupts

Exceptions
- “Unexpected” events within the CPU
  - overflow, …
Interrupt
- From an external I/O controller
Performance is sacrificed to deal with them
1. Read the problem
2. Transfer to related handler
3. Determine the required action
4. If restartable
  - Take corrective action
  - EPC (Exception Program Counter) to return to program
5. Else terminate program

Performance

As one of the goals of DS, Algorithm, OS, etc. is efficiency, calculating performance is crucial.

CPI : Clocks Per Instruction

Memory

Memory Hierarchy

Nothing to explain.. as intuitive as it can be.

Principle of Locality

Temporal Locality
- Items accessed recently are likely to be accessed again soon
- e.g. loops
Spatial Locality
- Items near those accessed reccently are likely to be accessed soon
- e.g. array data

Memory Heirarchy Levels

Hit : access satisfied by upper level
Miss : accessed data is absent so block copied from lower level.

Cache

Direct Mapped
- One tag per block.
- position of block = (num_block) % (num_cache_block)
Set Associative
- Multiple tag per block.
- position of block = (num_block) % (num_cache_set)

Multilevel Caches

Primary Cache
- Attached to CPU
- Small, but fast
Level-2 Cache
- Services misses from primary cache
- Larger, slower, but still faster than main memory
Main Memory
- Services misses from L-2 cache
Some high-end systems include L-3 cache

Cache Coherence

Suppose 2 CPU cores share a physical address space
- If CPU A writes 1 to X
  - CPU A and CPU B’s content would be different
- [Snoop protocol] (https://www.techopedia.com/definition/332/snooping-protocol)

Writes

On data-write hit
- Write Through
  - Update cache & memory
- Write Back
  - Update cache only

Virtual Memory

This topic is very important

Virtual page number is used to access virtual address in TLB or/then Page table. Then the actual physical page address in the table is used to access Physical memory or Disk storage.

This second diagram shows a better picture of the flow.

SIMD

Operate elementwise on vectors of data
- All processors execute the same instruction at the same time
- Simplifies synchronization
- Reduced instruction control hardware
- Works best for highly data-parallel applications
  - e.g. GPU

BUS

Shortened form of the Latin omnibus.
Shared communication channel
- Parallel set of wires for data and synchronization of data transfer
- Can become a [bottleneck] (https://www.techopedia.com/definition/14630/von-neumann-bottleneck)
Performance limited by physical factors
- Wire length, number of connections
More recent alternative : high-speed serial connections with switches
- Like networks

I/O

Memory-Mapped I/O

Certain adresses are not regular memory addresses
Instead, they correspond to registers in I/O devices
They are not accessed directly, only through through address + registers.

Polling and Interrrupt

OS needs to know when
- I/O has completed an operation
- I/O has encountered an error
Polling
- OS checks the status register to check if it is time for next I/O operation
Interrupt
- When the I/O device completes an operation or needs attention, it “interrupts” the processor.

Multithreading

Performing multiple [threads] (https://www.geeksforgeeks.org/thread-in-operating-system/) of execution in parallel
Fine-grain Multithreading
- Switch threads after each cycle
Coarse-grain Multithreading
- Only switch on long stall
- e.g. L-2 cache miss
SMT
- Schedule instructions from multiple threads