newsreiner-flashcards.vercel.appIrregularChat: AI & Autonomy3w ago
The discussion breaks down the main systems and economics behind large-model training and inference. First, it explains why token latency is the maximum of compute time and memory time. Compute scales