 Cassandra. This is Dr. Anita Poojar, professor of computer science and engineering department from Volchin Institute of Technology, SolarPore. At the end of this session, learners will get familiar to what is Cassandra, what are the different features of Cassandra and how to install Cassandra on Windows operating system. Cassandra is an open source, high performance distributed database management system designed for storing huge volume of data on large number of commodity servers. Commodity servers are basically low end desktop machines with common specifications. Cassandra was initiated at Facebook as a project for inbox search and was accepted into Apache incubator in March 2009. It was made an Apache top level project since February 2010. Since then, Cassandra is known as Apache Cassandra. These are some of the features of Cassandra. Cassandra is highly scalable, that is it is massively scalable. It is also known to have elastic scalability. Now scalability is the feature of expanding or reducing the system as per the requirement. Cassandra is linearly scalable. Now as Cassandra uses horizontally scaling, so it increases the throughput of the system as the number of nodes increase in the cluster. Therefore, it maintains a very quick response time. Cassandra also provides built in and customizable replication strategy. A replication strategy determines which nodes to place replicas on. There are two types of strategies, simple strategy and network topology strategy. Simple strategy is used when Cassandra cluster contains only one data center, that is replicas are spread across one data center. Network topology strategy is used when Cassandra cluster contains more than one data center, that is replicas are spread across more than one data center. A replication factor can also be specified here by the admin or the owner of the database. A replication factor decides how many replicas have to be stored in a cluster. Cassandra has a very good feature that is it provides high availability. Now since Cassandra uses replication strategy and also all the nodes in the Cassandra are independent of each other, that is they are independently able to handle the read request and write request of the user. So even if one node fails in the Cassandra, the work does not stop and this is a desirable feature for critical applications, which cannot effort to have failures. So there is no single point of failure in Cassandra, so it provides high availability. Cassandra has peer to peer architecture, that means it does not have master slave architecture. All the nodes in the Cassandra are connected in a ring or a cluster and they are interconnected with each other. All the nodes in the Cassandra they are independent of each other, that is they have defined roles. All nodes play the same role. Every node can handle read request or write request irrespective of whether the data is stored on that node or not. Cassandra is called as no SQL type of database, that is not only SQL type of database. Now these databases are non- relational databases, that is they are schema free. Such type of databases can store huge amount of data, that is in terms of pentabytes. Cassandra is called as column oriented database. Now there are two types of orientation in databases. One is row orientation and second one is column orientation. In row orientation the content of the database tables when they are stored into internal memory of the computer they are stored row wise, that is contents of first row is followed by the contents of second row followed by the contents of third row linearly that is adjacent to each other. In column oriented databases the content of the table are stored into the internal memory column wise, that is content of first column is followed by the content of second column followed by the content of third column and so on. Now if there are duplicate entries in the columns then they can be compressed and the number of memory locations or the storage required for storing the table can be reduced. This is the advantage of column oriented database. As well when data is partitioned for parallel processing then it is partitioned column wise. First column is given to first processor, second column is given to second processor and so on so that they can be processed parallely. In this way the data is processed faster in column oriented database but column oriented database are more suitable for data analytics whereas relational databases are more suitable for data processing. Cassandra provides automatic data distribution across all nodes that participate in a ring or database cluster. Cassandra is designed to run on cheap commodity hardware machines. If number of machines goes on increasing then it performs fast writes and can store hundreds of terabytes of data without sacrificing the read efficiency. Even today Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, eBay etc. Now let's take a pause, think and try to answer these questions. In Cassandra every row has same set of columns. Is it true or false? The correct answer is it is false. Cassandra is no SQL type of database which doesn't have any predefined schema. That means every row in these tables may have different set of columns and even the order of columns may differ. For example if there are two students student A and student B and we are storing the student records in the table as student enrollment number, student name, mail ID and contact number. Then student A may have two mail IDs and two contact numbers. So in this way that particular row has six columns whereas student B may have only one mail ID and one contact number. So total number of columns in student B's record will be only four. So Cassandra may not have same set of columns in every row. Second question is Cassandra ensures fault tolerance. Is it true or false? Yes it's true because Cassandra has peer-to-peer network architecture and it uses replication strategy. So every node is independent of every other node. They play same role. Every node has capability to handle read and write requests of the client irrespective of whether data is stored on it or not. So even if one node fails the work does not stop. Data can be fetched from its replicas or read or write request can be sent to other nodes for processing. Hence Cassandra ensures fault tolerance. Now let's see what is the difference between no SQL databases and relational databases. No SQL databases do not have fixed schema. Relational databases have fixed schema. These are highly scalable. Relational databases are less scalable as compared to no SQL databases. These do not support transactions. Relational databases support asset properties of transactions. No SQL use very simple query language but it is similar to SQL whereas relational databases use SQL that is structural query language. No SQL databases support eventually consistent property whereas relational databases support strong consistency property. These no SQL databases do not support joints between two tables whereas relational databases support joints of two tables using joint queries. Now let's see how to install Cassandra on Windows operating system preferably 64-bit. Now some of the prerequisites are we need data stacks community edition setup and JDK to be installed. Now let's see the installation steps. Run the data stacks community edition setup. A window appears for that setup. Click on the next button. End user license agreement window pops up. Read the agreement and click on the checkbox below saying that you agree to the agreement. Then click on the next button. It asks for the destination where data stacks community edition should be set up or installed. So specify the destination and click on the next. Service configuration window pops up. Check on both the checkboxes. Click on install. Installation process begins. So status of installation is shown by the status bar. When 100% installation is done, click on the next button. So the installation is completed here. It asks to click the finish button. After clicking on finish button, Cassandra shell command prompt is shown. So for this you have to go to start menu, search for Cassandra SQL shell and run the Cassandra shell. After running that Cassandra shell, we see the following command prompt that is SQL SH. Now you can run any commands of Cassandra or any queries of Cassandra on this command prompt. These are some of the references.