This can be done manipulating the body text of the API call to include XML TTS tags which are detailed here (http://msdn.microsoft.com/en-us/library/ms717077(v=vs.85).aspx).
We have tested rate, spell and silence.
When sending this via either API then the whole message body needs to be wrapped in an xml tag (doesn't matter what, just so long as it has one). So, for instance a message body would look like this:
<test>This is a test. I'm now going to spell Esendex <spell>Esendex</spell>And now I'll sleep for a second<silence msec="1000"/> And now I'm back.</test>
However to send this via the Esendex API (REST) you need to then wrap the XML in a CDATA tag e.g.:
<![CDATA[<test>This is a test. I'm now going to spell Esendex <spell>Esendex</spell>And now I'll sleep for a second<silence msec="1000"/> And now I'm back.</test>]]>
