OUYANG, Mark; ZHANG, Fengrui. CUDA-Optimized Inference Engine for Large-Scale Language Models: Design, Kernels, and Latency Improvements. Journal of Theory and Practice in Engineering and Technology, [S. l.], v. 2, n. 5, p. 1–9, 2025. Disponível em: https://woodyinterpub.com/index.php/jtpet/article/view/291. Acesso em: 28 mar. 2026.