 Welcome to the session on terminologies used in big data analytics. In this session, we will be talking about few terminologies which are used in the big data. At the end of this session, you will be able to compare various terminologies used in big data environments. And another learning outcome is you will be able to use the terminologies for big data environment. These are the terminologies which we will be studying in this particular lesson. In-memory analytics, in-database processing, symmetric multiprocessor system, massively parallel processing, parallel and distributed system, shed nothing architecture. These are the big data terminologies. Let us see one by one, these one. Now, starting with in-memory analytics, before that one, usually what happens for a non-volatized storage, whenever we want to access the data, then the hard disk is read. But what will happen, it is having few disadvantages. What are these one? The processing may be slow. And whatever the fetched data is there, from the hard disk to the storage, whenever it is moving towards, so it is requiring the transfer of the data. It may take again more time for it. One way to challenge this one is what to pre-processor the thing, to pre-process few things, on a stored data, like we can take the help of data cubes, we can do the aggregations in the tables, few query, we can execute and we can get the intermediate output in between. These are the few changes which we can do. But what in-memory analytics is doing? This problem has been addressed using in-memory analytics, where all the relevant data, whatever the relevant data is there, which is stored in RAM or a primary storage is there, that thing is taken for calculation or in-memory, it is doing the analytics of that data. What are the advantages of this? So what will happen as it is handling the data in the RAM, the access is very fast, the deployment is rapid, whatever the insights we are getting, the analytics we are doing, those are the better insights and minimal IT involvement is required here, infrastructure requirement is very less here, so in-memory analytics is supporting that one. Next comes in-database processing. What is this in-database processing? It is called as in-database analytics also, usually what happens that the enterprise data, whatever the enterprise data is there, OLTP data is there, online transaction processing data is there, it is used after cleaning, the enterprise, it is stored in the data warehouse, enterprise data warehouse and that data warehouse is later exported for the programs, for analytics, for processing and everything. That means the data from the data warehouse is exported for the processing to the programs. It is like a combining data warehouse and the analytics systems, but the data is moving towards data warehouse to the programs. What this in-database processing is doing? It is itself doing the computation, whatever the export time is there, that is moving the data from warehouse to the programs and all that. So that is totally minimized. In the database itself, it is doing all the computation part, analytics part, so hence it is saving the time. What are the advantages of this one? As it is saving the time, it is providing parallel processing also, partitioning also we can support because it is in the database it is working, scalability is there, optimization features can be added into this one. Data retrieval and analysis is much faster because the database we are reading from directly from the database and information is more secure as it is stored in the database. Next comes the symmetric multiprocessor system which is called as SMP. So actually this SMP is nothing but it is the sharing one where the identical processors are sharing, but they have the full access on the input output devices and these are totally controlled by a single operating system instance. All the processors which are involved in that those are totally controlled by a single operating system and they are accessing all the IO devices also. This SMP is a tightly coupled processor multiprocessor system. By default it is having high speed of memory, it is supporting a cache memory which is connected to the system bus and which is having a faster access, so that is supported by SMP. So this is the scenario of SMP, so main memory it is shared by all the processors you can see here, every processor is having its own cache which is having a faster access and everybody can is accessing IOs which are shared and fully accessing all those one and this is handled by the bus arbiter. Next comes massively parallel processing, what is this MPP? It is actually the coordination of the processing of the programs where number of processors are working in parallel, but there is a difference in SMP and this one, this MPP is having whatever the processors are there which are involved in this, everyone is having their own operating system and a dedicated memory that is the difference in SMP and MPP and they are working in the different parts from the same program also that is MPP. Let us see the parallel system and the distributed system, this I have explained using the comparison only with the properties like memory, control, processor, interconnection and what is the main focus of both of them, usually parallel systems, parallel systems main focus is performance of scientific computing you can say, whereas for the distributed system not only the performance in the view of cost and scalability, it is in the thought of reliability, availability, information, resource sharing, everything comes under this one, so main focus of distributed system is this one, but basically for parallel system it is the scientific computing and the performance. Talking about memory, parallel system are tightly coupled whereas distributed systems are weakly coupled in the distributed memory whereas parallel system the memory is shared, global clock control is used by the parallel system whereas there is no such clock control in the distributed system, processor interconnection it is in TBPS for parallel system whereas it is GBPS in the distributed system. Now let us pause the video here and think of the two architectures, which one is parallel system and which one is the distributed system, see. So this one is a parallel system where a memory is shared and the processors are different, in the distributed system memory is also not shared and the processors are also not shared independently everyone is working. Now share, share, share nothing architecture, basically in this one there are most common type of architectures are there, like shared memory, shared disk and shared nothing. Let us see one by one those one, shared memory means what a common central memory is shared by all the multiple processors. In the shared disk the disk is shared, collection of disk is shared among all the processors and every processor is having its own private memory also. Whereas in shared nothing neither memory nor disk is shared in this one but the processors are working in parallel with their own disk and their own memory. Let us see the architecture for this one, this is the shared nothing architecture you can see in this one that this architecture is having its own CPUs, every processor is having its own CPU, its own memory and its own disk and all of them are connected with the network and within the network they are sharing, you can say that in a distributed environment they are working with each other but nothing is shared among them only through the network they are interconnected, everyone is having its own CPU and memory and disk. This is all about the terminologies used in the big data environment, we have seen all the terminologies. In this shared nothing architecture, this fault isolation is one part, why? Because in this one what happens that whatever the fault is there in a particular processor or this one that is automatically isolated because nothing is shared so nobody is interconnected or nobody is affected by this one. So that is a major part that fault isolation, so single node whichever is contained that will be confined later on, so that is a major advantage we can see. Talking about the scalability, what will happen that the disk is if it is shared and if it is controlled by bandwidth and everything, so then totally it will become a tightly coupled one whereas in a different nodes if every node is having accessing of the critical data and everything then it will perform in a better way. So for a distributed shared disk system what happens is it will compromise the scalability. This is all about shared nothing architecture, these are my references. Thank you.