The corpus includes over ten languages such as English, Hindi, Tamil, Telugu, Bengali, Oriya, Assamese, and more, featuring various recording methods including reading aloud, conversations, and sentence construction; covering a range of domains such as digital time, shopping travel, medical education, personal and place names, politics, economy, sports, entertainment, and more.
This dataset was recorded in a quiet office/home environment, with the participation of 200 speakers, including 123 males and 77 females. All speakers who took part in the recording were professionally screened to ensure standardized pronunciation and clear articulation. The recorded text materials cover information such as news.