.. CSTR NN-TTS documentation master file, created by
   sphinx-quickstart on Mon Feb 15 11:11:14 2016.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to CSTR's NN-TTS documentation!
=======================================

Contents:

.. toctree::
   :maxdepth: 2


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

Get Started
=========================

Required softwares/tools
------------------------

The system has been tested in linux environment, and the following packages are required.
 * Python 2.6/2.7
 * Theano 0.6/0.7 (http://deeplearning.net/software/theano/). 0.8 version is not tested yet.
 * Bandmat 0.5: https://pypi.python.org/pypi/bandmat/0.5
 * SPTK: http://sp-tk.sourceforge.net/


Data Preparation for a neural network (NN) based speech synthesis system
------------------------------------------------------------------------

To build a NN system, you need to prepare linguistic features as system input and acoustic features as system output. Please follow the instructions in this section to prepare your data.

Input Linguistic Features
+++++++++++++++

Neural networks take vectors as input, so the alphabet representation of linguistic features needs to be vectorized.

1. **HTS style**: Please check the HTS demo for the HTS style labels (http://hts.sp.nitech.ac.jp/).
    * Provide HTS full-context labels with state-level alignments.
    * Provide a question file that matches the HTS labels. 
    * The questions in the question file will be used to convert the full-context labels into binary and/or numerical features for vectorization. It is suggested to do a manual selection of the questions, as the number of questions will affect the dimensionality of the vectorized input features.
    * Different from the HTS format question, the NN system also supports to extract numerical values using '**CQS**', e.g., ** CQS "Pos_C-Word_in_C-Phrase(Fw)" {:(\d+)+}**, where '**:**' and '**+**' are separators, and '**(\d+)**' is a regular expression to match a numerical feature.

2. **"Composed" style**:

3. **Direct *vectorized* input**: If you prefer to do vectorization yourself, you can feed the system binary files directly. Please prepare your binary files with the following instructions:
    * Align the input feature vectors with the acoustic features. Input and output features should have the same number of frames.
    * Store the data in binary format with '**float32**' precision.
    * In the config file, use an empty question file, and set *appended_input_dim* to be the dimensionality of the input vector.
    * Note: voice conversion can use this kind of direct vectorized input. 


Output Acoustic Features
+++++++++++++++

The default setting is assuming you use the STRAIGHT vocoder (c version). This vocoder is free for academic users. The output includes 
    * mel-cepstral coeffcients (MCC),
    * band aperiodicities (BAP),
    * Fundamental frequency (F0) in logarithmic scale. 
Please provide the three features in binary format with 'float32' precision, in the config file, provide the dimensionality of each feature, for example
    * [Outputs] mgc  :  60
    * [Outputs] dmgc :  180
**dmgc** means the dimensionality of MCC with delta and delta delta features. If **dmgc** is set to 60, only the static features are used.
Please also tell the file extension for each feature, for example
    * [Extensions] mgc_ext : .mgc
    * [Extensions] bap_ext : .bap
    * [Extensions] lf0_ext : .lf0
   
The open-source WORLD vocoder is also supported. The modified version for SPSS can be found in the repository.
    
If you have your preferred vocoder, please try to give a nick name to each feature to match the supported ones.

Recipes
---------------

In the system, several recipes for standard neural network architectures are provided. They are described below:

Architecture
+++++++++++++++

The system supports a flexible way to change neural network architectures by changing the config file in the [Architecture] section:
    * hidden_layer_size  : [512, 512, 512, 512]
    * hidden_layer_type  : ['TANH', 'TANH', 'TANH', 'TANH']
By default, feedforward neural network is used. But the system supports various types of hidden layers:
    * **'TANH'** : The Hyperbolic Tangent activation function
    * **'RNN'**  : The simple but standard recurrent neural network unit
    * **'LSTM'** : The standard Long Short-Term Memory unit
    * **'GRU'**  : The gated recurrent unit
    * **'SLSTM'**: The simplified LSTM unit
    * **'BLSTM'**: The bidirectional LSTM unit
You can define your own architecture by choosing a hidden unit at each hidden layer. For each type of hidden layer, please check the **Models** section.

Deep Feedforward Neural Network
+++++++++++++++

An example config file can be found in the './recipes/dnn' directory. Please use 'submit.sh ./run_lstm.py ./recipes/dnn/feed_foward_dnn.conf' to build the feedforward neural network. Please modify the config file to adapt to your own working environment (e.g., data path).

Mixture Density Neural Network
+++++++++++++++

(Deep) Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN)
+++++++++++++++

An example config file is provided in './recipes/dnn/hybrid_lstm.conf'. Follow the same recipe as that in the deep feedforward neural network section.

(Deep/Hybrid) Bidirectional LSTM-based RNN
+++++++++++++++

Example config files are provided in './recipes/blstm' directory. 'blstm.conf' is for multiple bidrectional LSTM layers, while 'hybrid_blstm.conf' is for a hybrid architecture, that uses several feedforward layers at the bottom, and one BLSTM layer at the top.

Variants of LSTM
----------------

This recipe is to support the paper by Wu & King (ICASSP 2016). Several variants of LSTMs are provided. Please use the corresponding config files to do the experiments.


Stacked Bottlenecks
+++++++++++++++


Trajectory modelling
+++++++++++++++

Models
=========================

Deep Feedforward/Recurrent Networks
----------------

This is something I want to say that is not in the docstring.


.. autoclass:: models.deep_rnn.DeepRecurrentNetwork
   :members:  __init__, build_finetune_functions, parameter_prediction
   
   
Layers
=========================

Recurrent Neural Network units
----------------

This is something I want to say that is not in the docstring.

.. autoclass:: layers.gating.VanillaRNN
   :members:  __init__, recurrent_as_activation_function
   
.. autoclass:: layers.gating.LstmBase
   :members:  __init__, recurrent_fn, lstm_as_activation_function
   
.. autoclass:: layers.gating.VanillaLstm
   :members:  __init__, lstm_as_activation_function   

.. autoclass:: layers.gating.LstmNFG
   :members:  __init__, lstm_as_activation_function   
   
.. autoclass:: layers.gating.LstmNIG
   :members:  __init__, lstm_as_activation_function   
   
.. autoclass:: layers.gating.LstmNOG
   :members:  __init__, lstm_as_activation_function   
   
.. autoclass:: layers.gating.LstmNoPeepholes
   :members:  __init__, lstm_as_activation_function   
   
.. autoclass:: layers.gating.SimplifiedLstm
   :members:  __init__, lstm_as_activation_function   
   
.. autoclass:: layers.gating.GatedRecurrentUnit
   :members:  __init__, lstm_as_activation_function   
   
                     
I/O functions
=========================

Binary I/O collections
----------------

.. autoclass:: io_funcs.binary_io.BinaryIOCollection
   :members:  load_binary_file, array_to_binary_file, load_binary_file_frame
   
   
Utils
=========================

Data Provider
----------------

.. autoclass:: utils.providers.ListDataProvider
   :members:  __init__, reset, make_shared, load_next_utterance, load_next_partition

Front-end
========================

Label normalisation
---------------

.. autoclass:: frontend.label_normalisation.HTSLabelNormalisation
    :members: __init__, load_labels_with_state_alignment, pattern_matching, pattern_matching_binary, pattern_matching_continous_position, load_question_set, load_question_set_continous, wildcards2regex