Alert icon
We're changing our privacy policy. This stuff matters.  Learn more  Dismiss

HOGG AND CRAIG - holy grail - 2008/04/09 9:40 p.m.

Loading...

Sign in or sign up now!
224 views
Loading...
Alert icon
Sign in or sign up now!
Alert icon

Uploaded by on Apr 9, 2008

Start with a speech waveform. I will call this a "time track" - it is a "function of time".
We can derive other "time tracks". For example, (1) we can filter the data in various ways, e.g. to obtain comb filtered data (2) from speech wav eform data, we could obtain tracks of formant data, e.g. by using Huckvale's SFS system.

Given two time tracks, where the time steps are the same (e.g. steps at 100hz or 10,000 Hz), we can compute a "dot product". A "correlation coefficient" is a normalised dot product.

If one time track is longer than the other, we can calculate successive "dot products" by stepping the shorter track along the longer track, calculating a dot product for each step. This is called a "wavelet transform", I think, in some of the literature.

Let me suppose we have tracks which can extend from the past into the future. Let us take an arbitrary piece, let this be say .01s or 100 s. We can compute a new "time track" which consists of a sequence of dot products, obtaining by shifting one piece of track, in steps, against another track. This gives us as a result, a new "time track".

In this way, starting with N tracks, we can create many new tracks, e.g. by doing dot products with a sliding window, for each pair of N tracks, we obtain N*(N-1)/2 new tracks.

All these tracks (the original time tracks, and the new computed tracks) are assumed to be tied to a given time point. So for each point in a sequence of time points, we have a set of N original tracks + N*(N-1)/2 derived tracks. The derived tracks can then be processed against the original tracks and the derived tracks to obtain yet more tracks.

What will we do with all this data?

Given all this data that can be generated, an alternative proposal, of building a physical model and letting it run, to generate speech, may be starting to look attractive.

Another project that might be simpler, is to develop a kit to automatically build a 2-language audio dictionary from dubbed speech. Since we have dubbed speech, we can snip corresponding audio frames, say of length 1.0 s or 2.0 s. We then convert the audio in each language to codebook data, and hopefully can find codebook sequences which correspond to morphemes, words, or phrases, in each language. it is desirable to wait till such a project can be run, off the shelf, in no more than 5 minutes, by using existing tools.

Perhaps I am wasting my time. Maybe I need to concentrate on experiencing the alchemy of speech, in conversation with people. This can at least be my focus for the next 4 days.
(17.10.2010. Above, "focus for the next 4 days". Perhaps I could better say, focus for the next 4 years").

Category:

Education

Tags:

License:

Standard YouTube License

  • likes, 0 dislikes

Link to this comment:

Share to:

All Comments (0)

Sign In or Sign Up now to post a comment!
Loading...

0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more