Understanding Flash Attention: Writing the Algorithm from Scratch in Triton
Why is Flash Attention so fast? Find out how Flash Attention works. Afterward, we'll polish our understanding by writing a GPU kernel of the algorithm in Triton.
Coding something complex is not simple (wow). I'll help you. In this section, I publish general articles relating to code. It can be Swift, C++ or even PHP. I'm bounded by languages I know ;)
8 postsWhy is Flash Attention so fast? Find out how Flash Attention works. Afterward, we'll polish our understanding by writing a GPU kernel of the algorithm in Triton.
It's all about making your models run faster, from flicking a magic “compile” switch to writing your own custom GPU code. In each step, we’ll implement an innocent softmax function, but things are about to get dark by the end.
If all machine learning engineers want one thing, it's faster model training — maybe after good test metrics.
When you see something that does not work in an omnipresent framework, you believe it can't be completely broken, right?
Binary search trees are mostly hard. Writing red-black tree is a nightmare. Here, I'm going to explain one of the easiest, yet efficient and powerful balanced binary tree — treap or cartesian tree
Everything you need to know about new Swift asynchronous features. Async await, main actor, task, async get, and possible use cases — all covered.
I’m going to tell you about the internals of the Mach-O file and give an introduction to the simple relocatable object file structure
Skip List is a nice structure that lets you to perform insertions, searches, and finding n-th maximum. In this post I focus on skip list indexation