Processes
- Operating system is layer of abstraction/intermediary between hardware and software applications
- abstraction
- safety
- Each process has its own address space, open files, data, etc.
- Functions to interact with threads is:
int fork (void)
: creates new process, at sample line. Returns0
inchild
and returns childpid
inparent
int waitpid(int pid, int *status, int options)
: wait for the child with pid ofpid
,-1
for any child. Status is returned instatus
void exit (int status)
: have current process exit βstatus
is0
means sucessint kill(int pid, it sig)
: sendssig
to processpid
βsig
is interruptint execve (char *prog, char **argv, char **envp)
: replaces current running program withprog
with argumentsargv
POSIX File Interface
- Functions in the POSIX file interface:
open
: returns a fd (file descriptor)close
: closes the file for fdread
: copies data from file into address space- so threads share the file data
write
: copies data from address space into filelseek
: enables non-sequential reading/writing β given byte offset, read/write to that position in the filedup2(oldfd, newfd)
: makesnewfd
a exact copy ofoldfd
β same fd and byte offset β closesnewfd
is already open β-1
if error- whatever
oldfd
referenced before,newfd
also references - often used for redirecting
stdin
andstdout
- whatever
stdin
,stdout
andstderr
have their own default fd:0
,1
,2
respectivley
PCB (Process Control Block)
- OS maintains data about each process - in PCB
- It tracks state pf each process
- registers, open files, other essential info etc.
- Context Switching is act of switching which process is in the running state
- We have P0 and P1
- P0 is interrupted or makes
syscall
β save state to PCB0 (location in PCB) - Load P1 data from PCB1 (idle between saving PCB0 and loading PCB1)
- Run for while
- P1 is interrupted or makes
syscall
β save state to PCB1 - Load P0 data from PCB1 (idle between saving PCB1 and loading PCB0)
- etc.
POSIX Thread API
- The functions for this are:
int pthread_create (pthread_t *thr, pthread_attr_t *attr, void *(*fn)(void *), void *arg)
: creates new threadthr
and runs functionfn
with argsarg
void pthread_exit(void *return_value)
: destroy current thread β return value inreturn_value
int pthread_join(pthread_t thread, void **return_value)
: wait forthread
to exit and receivereturn_value
void pthread_yield()
: yields current thread β scheduler runs another ready thread
- There are 3 threading models, each with pros and cons
- 1:1
- expensive context-switching β switching processes every time we switch thread
- everything requires
syscall
- 1:n
- if one thread blocks β blocks everything
- not taking advantage of multiple cores
- n:m
- requires locking and other things to prevent issues (data races, deadlocks, etc)
- dfficult to effeciently manage user
m
threads overn
kernel threads β kernel does not know importance or order of user threads
- 1:1
Procedure Calls
- The steps are:
- if more than 6 args β save into registers
- push
caller-save
registers to stack - push stack frame / call sub procedure
- push
callee-save
registers - do stuff
- restore
callee-save
registers - pop from stack frame β go back
- restore
caller-save
registers
Thread Switching
- Here is the order of execution:
- scheduler decides which thread to run β
Sched_Scheduler()
- switch to that thread with β
Sched_switch()
- perform context switch β
Thread_switchArch()
- push callee-saved registers into
struct switchframe
βswitchstack()
- scheduler decides which thread to run β
- If we weβre in hardware interrupt, instead of the first 2, we would have:
struct trapframe
β create the trapframetrap_entry
β identify which device caused thisinterrupt handler
β handle interrupt (which will need us to swicth to thread running the handler)
Sequential Consistency
- Mantaining program order on individual processors
- Ensuring ==write atomicity==
- cache coherence β writes are sychronized across all memory
Caching/Memory Models
- WB (Write-Back): this is the default caching policy. It allows writes to be initially performed in the cache, marking the portion as dirty. The actual write-back to main memory occurs when the item is about to be removed from the cache. The tradeoff is that it is more complex to implement.
- WT (Write-Through): In this policy, all writes are immediately written back to main memory. The tradeoff is that writes take longer since they have to be immediately propagated to main memory.
- UC (Uncacheable): This is used for device memory, where caching is not desirable. For example, when a key is pressed on the keyboard, you wouldnβt want the system to read a cached value.
- WC (Write-Combining): In this policy, writes are temporarily stored in a buffer and occur in a burst. This is useful for optimizing write operations to memory in certain scenarios.
x86 Atomicity
lock
β makes the memory instruction atomic- applies for other
lock
instructions. Otherwise, can be re-ordered
- applies for other
xchg
β always alock
instruction. Exchanges the value stored in a register with value in main memorycmpxchg
β compars 2 values and exchanges if different. Also by defaultlock
lfence
β load fence β all priorload
instructions from the fence (above it) must complete before continuing to next instructionsfence
β store fence β all prorstore
instructions from the fence are completemfence
β bothlfence
andsfence
β everything aboveβs load and store must complete- atomics and fences are a means of achieving sequential consistency
- while sequential consistency can help us in multi-threading code, donβt want to depend on this memory model β not same for all machines
- hence why we look towards locks, semaphores, cv, etc.
Mutex (Lock)
Condition Variables
Semaphores
SpinLock
C11 Atomics
-
The
C
library has its own atomic library, where we can specify a variable with_Atomic(int) x
. Standard operations on this with other atomic variables are guarnateed to be atomic. -
Has its own atomic flag implementation:
atomic_flag
- cannot directly store and load from this β
atomic_flag_test_and_set(&my_flag)
andatomic_flag_clear(&my_flag)
- must be used in lock-free implementation β do not capure lock β critical code β release lock, we can simply check:
**atomic_flag_test_and_set(&my_flag)**
β enters criticial iffalse
is returned
- cannot directly store and load from this β
-
The functions we need to know:
atomic_flag_test_and_set(atomic_flag &my_flag)
****: tests and sets the flag atomic variable to true (returns true if same, false otherwise)- this uses default memory ordering
atomic_flag_test_and_set_explicit
will allow us to pass it a memory ordering
atomic_flag_clear(atomic_flag &my_flag)
****: sets flag atomic variable to falseatomic_fetch_add_explicit(atomic_type *object, type operand, memory_order order)
****: adds operand onto what object points toatomic_store_explicit(atomic_type *object, type value, memory_order order)
: sets value of atomic object to valueatomic_load_explicit(volatile atomic_type *object, memory_order order)
: reads value of object
-
For the above, the possible memory orderings are (remember, these only apply to operations on atomic variables)
-
**memory_order_relaxed**
: No specific memory ordering is required β can be reordred freely
**memory_order_consume**
: A slightly relaxed version of**memory_order_acquire**
.**memory_order_acquire**
: Reads and writes to memory that occur after the load cannot be reordered before it.- For example, if a thread acquires a lock, all the reads and writes that occur after the lock is acquired cannot be moved before the lock is acquired. This provides guarantees regarding the ordering of operations with respect to acquiring a lock.
**memory_order_release**
: Reads and writes to memory that occur before the store cannot move after it.- For example, if a thread releases a lock (stores a value of 0), all the writes to a variable must be completed before the lock is released. This ensures proper ordering of operations with respect to releasing a lock.
**memory_order_acq_rel**
: A combination of**memory_order_acquire**
and**memory_order_release**
.**memory_order_seq_cst**
: Full sequential consistency.- The most restrictive memory ordering, ensuring that all operations appear to be executed in a single, global order.
-
Cache Coherence
-
Need cache coherecne β cache needs to be consistent with each other after write
-
Cache can differ from each other β how to solve?
-
Bus (older): system bus to share memory
-
can βsnoopβ on bus to see what is being written, and if overwrites data inside of the cache
-
poor scalability
-
cache line is amount of data transferred between different caches or main memory. Each one is either:
**Modified**
(exclusive) β only this cache has this version**Shared**
β one or more caches have this copy of data**Invalid**
β no cache has this valid data
MCS Lock
-
We are looking to improve the spinlocks
-
During each iteration of a spin, the thread interacts with memory location associated with the spinlock β via memory bus
-
Each core has a
qnode
structure, here is what it looks like: -
a
lock
is a pointer to aqnode
-
lock
is used to point to a lis of cores holding or waiting to hold the MSC lock -
lock
will always point to the most recently pushed item of the queue -
When the
lock
is pointing toNULL
, we know the lock is free