Nvidia quartznet. This repository is a PyTorch implementation of QuartzNet and provides scripts to train the QuartzNet 10x5 model from scratch on the LibriSpeech dataset to achieve the greedy decoding results improved upon the original paper. GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC May 19, 2025 · NVIDIA NeMo Framework is a scalable and cloud-native generative AI framework built for researchers and PyTorch developers working on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV) domains. . End-to-end neural acoustic model for automatic speech recognition providing high accuracy at a low memory footprint. QuartzNet is an end-to-end neural acoustic model that is based on efficient, time-channel separable convolutions (Figure 1). It is designed to help you efficiently create, customize, and deploy new generative AI models by leveraging existing Sep 3, 2025 · QuartzNet [ASR-MODELS6] is a version of Jasper [ASR-MODELS7] model with separable convolutions and larger filters. QuartzNet comes from the Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions paper and was trained with CTC loss on the LibriSpeech dataset to achieve state-of-the-art (SOTA) accuracy with a Word Error Rate (WER) in the range of 4. 98%. It can achieve performance similar to Jasper but with an order of magnitude fewer parameters. This repository is a PyTorch implementation of QuartzNet and provides scripts to train the QuartzNet 10x5 model from scratch on the LibriSpeech dataset to achieve the greedy decoding results improved upon the original paper. 19 to 10. zeymirjfblzogzwifzlrrcvrymgculqkpokvlffjaoghgsfakwsjcir