 This paper presents a novel, fully customized, scalable, and high-performance fast ICA processor architecture. It is designed in an algorithm-aware manner to mitigate the inherent algorithmic failures of the fast ICA algorithm. The proposed architecture achieves a computational time of 0.32 milliseconds per 8-channel ICA operation at a frequency of 555 MHz, which is 10 times faster than the closest competitor. This shows that the proposed architecture can be used as a powerful tool for real-time applications such as EEG signal processing. This article was authored by Said Muhammad Reza Shahshahani and Hamid Reza Mariani.