Speech Tokenizers

1 minute read


Speech tokenizer, as the name suggests, is to convert continous speech waveform into discrete tokens (usually called units). It bridges the gap between speech and text representations, also simplifying the manipulation of speech signals.

Speech Language Models

5 minute read


With the popularity of language modeling, there have been many advances in speech language models leveraing their in-context learning capability in speech synthesis.

Voice Synthesis

6 minute read


Speech synthesis with controllable voice is a challenging task.




Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus


Text style transfer rephrases a text from a source style (e.g., informal) to a target style (e.g., formal) while keeping its original meaning. Despite the success existing works have achieved using a parallel corpus for the two styles, transferring text style has proven significantly more challenging when there is no parallel training corpus. In this paper, we address this challenge by using a reinforcement-learning-based generator-evaluator architecture. Our generator employs an attention-based encoder-decoder to transfer a sentence from the source style to the target style. Our evaluator is an adversarially trained style discriminator with semantic and syntactic constraints that score the generated sentence for style, meaning preservation, and fluency. Experimental results on two different style transfer tasks (sentiment transfer and formality transfer) show that our model outperforms state-of-the-art approaches. Furthermore, we perform a manual evaluation that demonstrates the effectiveness of the proposed method using subjective metrics of generated text quality.


