# Welcome to Fang's Blog

# A Panicking Note For 2020

Books/Papers To Be Finished Numerical Analysis: Theory and Experiments The Concepts and Practice of Mathematical Finance What Every…

# Notes From The Concepts and Practice of Mathematical Finance -- 2. Pricing Methodologies

These notes about how to price options and derivative products. Derivative product is a product whose value is determined by the behavior of…

# Notes From The Concepts and Practice of Mathematical Finance -- 1. Risk

1. What is risk Every transaction can be viewed as the buying and selling of risk. Risk can be regarded as a synonym for uncertainty…

# Notes From What Every Programmer Should Know About Memory

This series contains my note taken from What Every Programmer Should Know About Memory. 1. Introduction Background: mass storage and memory…

# Using OCaml for Scientific Computing -- 2. Ndarray

(This is my study note on OCaml Scientific Computing 1st Edition) Ndarray Types The Ndarray module is built on top of OCaml’s module. C…

# Using OCaml for Scientific Computing -- 1. Setup & Conventions

Intro Use Owl in Toplevel load Owl in utop with the following commands. owl-top is Owl’s toplevel library which will automatically load…

# CUDA Unified Virtual Address Space & Unified Memory

Unified Virtual Address Space (UVA) From CUDA 4.0 and on, UVA has been an important feature. It puts all CUDA execution, host and GPUs, in…

# Several Things About CUDA Resource Assignment

Motivated by a CUDA puzzle I tried to solve today, I’d like to talk more about resource assignment. A Puzzle problem Adding two big arrays…

# Memory Alignment For CUDA

In the fifth post of the CUDA series (The CUDA Parallel Programming Model - 5. Memory Coalescing), I put up a note on the effect of memory…

# Recap: GPU Latency Tolerance and Zero-Overhead Thread-Scheduling

I briefly talked about how CUDA processors hide long-latency operations such as global memory accesses through their warp-scheduling…

# CUDA Dynamic Parallelism

Have you wondered if it’s possible to launch nested kernels (i.e. a kernel calls another kernel) in CUDA? Well, this is where dynamic…

# Some CUDA Related Questions

In this post, I talk about what happens when blocks are assigned to SMs, as well as CUDA code optimization across GPU architecture. Before…

# CUDA Programming - 2. CUDA Variable Type Qualifiers

CUDA Variable Type Qualifiers lifetime == kernel? If the lifetime of a variable is within a kernel execution, it must be declared within…

# Packing Files With Reprozip On MacOS Via Vagrant

I recently had to pack a project with Reprozip where all the dependencies can be nicely preserved. Reprozip uses ptrace and thus only works…

# The CUDA Parallel Programming Model - 9. Interleave Operations by Stream

In the last post, we saw how full concurrency can be achieved amongst streams. Here I’d like to talk about how CUDA operations from…

# The CUDA Parallel Programming Model - 8. Concurrency by Stream

In the previous posts, we have sometimes assumed that only one kernel is launched at a time. But this is not all that kernels can do. They…

# The CUDA Parallel Programming Model - 7.Tiling

There’s an intrinsic tradeoff in the use of device memories in CUDA: the global memory is large but slow, whereas the shared memory is small…

# The CUDA Parallel Programming Model - 6. More About Memory

The compute-to-global-memory-access ratio has major implications on the performance of a CUDA kernel. Programs whose execution speed is…

# The CUDA Parallel Programming Model - 5. Memory Coalescing

This post talks about a key factor to CUDA kernel performace: accessing data in the globle memory. CUDA applications tend to process a…

# The CUDA Parallel Programming Model - 4. Syncthreads Examples

This is the fourth post in a series about what I learnt in my GPU class at NYU this past fall. Here I talk about barrier synchronization…

# The CUDA Parallel Programming Model - 3. More On Thread Divergence

This is the third post in a series about what I learnt in my GPU class at NYU this past fall. Here I dive a bit deeper than the previous…

# The CUDA Parallel Programming Model - 2. Warps

This is the second post in a series about what I learnt in my GPU class at NYU this past fall. This will be mostly about warps, why using…

# The CUDA Parallel Programming Model - 1. Concepts

I took the Graphics Processing Units course at NYU this past fall. This is the first post in a series about what I learnt. Buckle up for…

# CUDA Programming - 1. Matrix Multiplication

Notation Matrix Matrix Output Matrix row counter: column counter: is the element at position in the vertical direction and position…