Inspired by Numentas HTM algorithm.
This encodes a stream of data into a hierarchy.
It makes sequences of sequences and never stores the same one twice.
All sequences are made of smaller sequences. No piece of a sequence is ever stored more than once. This leads to compression opportunities.
While building the tree we temporally link nodes using context.
This system works very well for many many things.
Able to extract content back out of the HTM very fast.
Able to sort content by occurrences.
Able to predict next letter, next word, paragraph, next web page based on search.
Able to order predictions by probability.
Able to recurse self feed predictions to get larger ones.
Able to Handle and or operators in search.
May offer more compression than Huffman encoding by encoding data sharing structure first (holistic hierarchy compression) vs. Huffman (streaming typically every 30k char)
This and the unshown things between nodes is at the heart of billion dollar technologies. The VCs do not want to put in for research even when most is done. It is taking an extra year to get a demo ready which will excite VCs.
htmtutor 4 months ago
Compressing testing successful!
The size of the hierarchy grows slower than the size of text files put into it.
For video compression it automatically re-uses same backgrounds.
Generalizing nodes will add lossy if desired.
htmtutor 4 months ago