Welcome to Fang's Blog
A Panicking Note For 2020
Books/Papers To Be Finished Numerical Analysis: Theory and Experiments The Concepts and Practice of Mathematical Finance What Every…
Notes From The Concepts and Practice of Mathematical Finance -- 2. Pricing Methodologies
These notes about how to price options and derivative products. Derivative product is a product whose value is determined by the behavior of…
Notes From The Concepts and Practice of Mathematical Finance -- 1. Risk
1. What is risk Every transaction can be viewed as the buying and selling of risk. Risk can be regarded as a synonym for uncertainty…
Notes From What Every Programmer Should Know About Memory
This series contains my note taken from What Every Programmer Should Know About Memory. 1. Introduction Background: mass storage and memory…
Using OCaml for Scientific Computing -- 2. Ndarray
(This is my study note on OCaml Scientific Computing 1st Edition) Ndarray Types The Ndarray module is built on top of OCaml’s module. C…
Using OCaml for Scientific Computing -- 1. Setup & Conventions
Intro Use Owl in Toplevel load Owl in utop with the following commands. owl-top is Owl’s toplevel library which will automatically load…
CUDA Unified Virtual Address Space & Unified Memory
Unified Virtual Address Space (UVA) From CUDA 4.0 and on, UVA has been an important feature. It puts all CUDA execution, host and GPUs, in…
Several Things About CUDA Resource Assignment
Motivated by a CUDA puzzle I tried to solve today, I’d like to talk more about resource assignment. A Puzzle problem Adding two big arrays…
Memory Alignment For CUDA
In the fifth post of the CUDA series (The CUDA Parallel Programming Model - 5. Memory Coalescing), I put up a note on the effect of memory…
Recap: GPU Latency Tolerance and Zero-Overhead Thread-Scheduling
I briefly talked about how CUDA processors hide long-latency operations such as global memory accesses through their warp-scheduling…
CUDA Dynamic Parallelism
Have you wondered if it’s possible to launch nested kernels (i.e. a kernel calls another kernel) in CUDA? Well, this is where dynamic…
Some CUDA Related Questions
In this post, I talk about what happens when blocks are assigned to SMs, as well as CUDA code optimization across GPU architecture. Before…
CUDA Programming - 2. CUDA Variable Type Qualifiers
CUDA Variable Type Qualifiers lifetime == kernel? If the lifetime of a variable is within a kernel execution, it must be declared within…
Packing Files With Reprozip On MacOS Via Vagrant
I recently had to pack a project with Reprozip where all the dependencies can be nicely preserved. Reprozip uses ptrace and thus only works…
The CUDA Parallel Programming Model - 9. Interleave Operations by Stream
In the last post, we saw how full concurrency can be achieved amongst streams. Here I’d like to talk about how CUDA operations from…
The CUDA Parallel Programming Model - 8. Concurrency by Stream
In the previous posts, we have sometimes assumed that only one kernel is launched at a time. But this is not all that kernels can do. They…
The CUDA Parallel Programming Model - 7.Tiling
There’s an intrinsic tradeoff in the use of device memories in CUDA: the global memory is large but slow, whereas the shared memory is small…
The CUDA Parallel Programming Model - 6. More About Memory
The compute-to-global-memory-access ratio has major implications on the performance of a CUDA kernel. Programs whose execution speed is…
The CUDA Parallel Programming Model - 5. Memory Coalescing
This post talks about a key factor to CUDA kernel performace: accessing data in the globle memory. CUDA applications tend to process a…
The CUDA Parallel Programming Model - 4. Syncthreads Examples
This is the fourth post in a series about what I learnt in my GPU class at NYU this past fall. Here I talk about barrier synchronization…
The CUDA Parallel Programming Model - 3. More On Thread Divergence
This is the third post in a series about what I learnt in my GPU class at NYU this past fall. Here I dive a bit deeper than the previous…
The CUDA Parallel Programming Model - 2. Warps
This is the second post in a series about what I learnt in my GPU class at NYU this past fall. This will be mostly about warps, why using…
The CUDA Parallel Programming Model - 1. Concepts
I took the Graphics Processing Units course at NYU this past fall. This is the first post in a series about what I learnt. Buckle up for…
CUDA Programming - 1. Matrix Multiplication
Notation Matrix Matrix Output Matrix row counter: column counter: is the element at position in the vertical direction and position…