Skip to main content

Cassandra Setup on Linux ( Ubuntu 12.04)

I thought, setting up Distributed, NOSQL, Big Data must be a complex process.
However, when I actually did it for cassandra, I found it like installing some simple OS package !!!.. Cassandra is too simple to setup and play around it.
I found it more like plug and play engine.

The Cassandra database very well explained in Book : O’Reilly Cassandra: The Definitive Guide, Nov 2010  

So, Here is how to do it….

A) Pre requisites : jdk1.6 or grater version.
      If you don’t have, you can download and configure it. How ?
B) Cassandra on Linux         
1. Create user for cassandra
root@PTS0012:/usr/lib# id cassandra
uid=506(cassandra) gid=0(root) groups=0(root)
root@PTS0012:/usr/lib#

2. Create Dir for cassandra database
root@PTS0012:/usr/lib# mkdir -p /var/log/cassandra
root@PTS0012:/usr/lib# chown -R cassandra /var/log/cassandra
root@PTS0012:/usr/lib# mkdir -p /var/lib/cassandra
root@PTS0012:/usr/lib# chown -R cassandra /var/lib/cassandra
root@PTS0012:/usr/lib#

3. setup apache-cassandra-1.1.4
   Download the apache-cassandra from site http://cassandra.apache.org/download/
   I have downloaded the apache-cassandra-1.1.4-bin.tar.gz under /home/cassandra/
   Log in as cassandra user...
            i)  Unpack it with command
                        gzip -d apache-cassandra-1.1.4-bin.tar.gz
            ii) Extract files from tar archive with command
                        tar -xvf apache-cassandra-1.1.4-bin.tar
            iii) Edit file ./ apache-cassandra-1.1.4/conf/cassandra.yaml
                        To set :
a)Cluster name : Set cluster name right at the first time, else it will be difficult to change it latter on…
cluster_name: 'PEGCluster’
                                    b)  Open Listener for Network wide identification by setting listener address to host name or IP.
listen_address: pts0012           
                                    c) Enable Network wide access to your sever
rpc_address: 0.0.0.0
            iii) now move to ./apache-cassandra-1.1.4/bin and start the cassandra
cassandra@PTS0012:~/apache-cassandra-1.1.4/bin$ ./cassandra
xss =  -ea -javaagent:./../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1996M -Xmx1996M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss160k
INFO 13:25:50,972 Logging initialized
INFO 13:25:50,978 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0
INFO 13:25:50,979 Heap size: 2051014656/2051014656
INFO 13:25:50,979 Classpath:./../conf:./../build/classes/main:./../build/classes/thrift:./../lib/antlr-3.2.jar:./.. /lib/apache-cassandra-1.1.4.jar:./../lib/apache-cassandra-clientutil-1.1.4.jar:./../lib/apache-cassandra-thrift-1.1.4.jar:./../lib/avro-1.4.0-fixes.jar:./../lib/avro-1.4.0-sources-fixes.jar:./.. /lib/commons-cli-1.1.jar:./../lib/commons-codec-1.2.jar:./../lib/commons-lang-2.4.jar:./../ lib/compress-lzf-0.8.4.jar:./../lib/concurrentlinkedhashmap-lru-1.3.jar:./../lib/guava-r08.jar:./.. /lib/high-scale-lib-1.1.2.jar:./../lib/jackson-core-asl-1.9.2.jar:./../lib/jackson-mapper-asl-1.9.2.jar:./../lib/jamm-0.2.5.jar:./../lib/jline-0.9.94.jar:./../lib/json-simple-1.1.jar:./.. /lib/libthrift-0.7.0.jar:./../lib/log4j-1.2.16.jar:./../lib/metrics-core-2.0.3.jar:./../lib/servlet-api-2.5-20081211.jar:./../lib/slf4j-api-1.6.1.jar:./../lib/slf4j-log4j12-1.6.1.jar:./../lib/snakeyaml-1.6.jar:./../lib/snappy-java-1.0.4.1.jar:./../lib/snaptree-0.1.jar:./../lib/jamm-0.2.5.jar
INFO 13:25:50,981 JNA not found. Native methods will be disabled.
INFO 13:25:50,992 Loading settings from file:/home/cassandra/apache-cassandra-1.1.4/conf/cassandra.yaml
INFO 13:25:51,144 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO 13:25:51,371 Global memtable threshold is enabled at 652MB
INFO 13:25:51,826 Initializing key cache with capacity of 97 MBs.
INFO 13:25:51,840 Scheduling key cache save to each 14400 seconds (going to save all keys).
INFO 13:25:51,842 Initializing row cache with capacity of 0 MBs and provider org.apache.cassandra.cache.SerializingCacheProvider
INFO 13:25:51,847 Scheduling row cache save to each 0 seconds (going to save all keys).
INFO 13:25:51,945 Opening /var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-he-2 (163 bytes)
INFO 13:25:51,945 Opening /var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-he-1 (234 bytes)
INFO 13:25:52,001 Couldn't detect any schema definitions in local storage.
INFO 13:25:52,002 Found table data in data directories. Consider using the CLI to define your schema.
INFO 13:25:52,024 completed pre-loading (2 keys) key cache.
INFO 13:25:52,144 Replaying /var/lib/cassandra/commitlog/CommitLog-2329355385680836.log, /var/lib/cassandra/commitlog/CommitLog-2329355488502539.log
INFO 13:25:52,148 Replaying /var/lib/cassandra/commitlog/CommitLog-2329355385680836.log
INFO 13:25:52,173 Finished reading /var/lib/cassandra/commitlog/CommitLog-2329355385680836.log
INFO 13:25:52,173 Replaying /var/lib/cassandra/commitlog/CommitLog-2329355488502539.log
INFO 13:25:52,174 Finished reading /var/lib/cassandra/commitlog/CommitLog-2329355488502539.log
INFO 13:25:52,193 Enqueuing flush of Memtable-Versions@922432532(83/103 serialized/live bytes, 3 ops)
INFO 13:25:52,194 Writing Memtable-Versions@922432532(83/103 serialized/live bytes, 3 ops)
INFO 13:25:52,321 Completed flushing /var/lib/cassandra/data/system/Versions/system-Versions-he-1-Data.db (247 bytes) for commitlog position ReplayPosition(segmentId=2331377289149064, position=0)
INFO 13:25:52,332 Log replay complete, 3 replayed mutations
INFO 13:25:52,345 Cassandra version: 1.1.4
INFO 13:25:52,345 Thrift API version: 19.32.0
INFO 13:25:52,348 CQL supported versions: 2.0.0,3.0.0-beta1 (default: 2.0.0)
INFO 13:25:52,392 Loading persisted ring state
INFO 13:25:52,397 Starting up server gossip
INFO 13:25:52,407 Enqueuing flush of Memtable-LocationInfo@2041478995(29/36 serialized/live bytes, 1 ops)
INFO 13:25:52,408 Writing Memtable-LocationInfo@2041478995(29/36 serialized/live bytes, 1 ops)
INFO 13:25:52,504 Completed flushing /var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-he-3-Data.db (80 bytes) for commitlog position ReplayPosition(segmentId=2331377289149064, position=363)
INFO 13:25:52,515 Starting Messaging Service on port 7000
INFO 13:25:52,525 Using saved token 54110604618015102145492158359085502976
INFO 13:25:52,527 Enqueuing flush of Memtable-LocationInfo@15670808(53/66 serialized/live bytes, 2 ops)
INFO 13:25:52,528 Writing Memtable-LocationInfo@15670808(53/66 serialized/live bytes, 2 ops)
INFO 13:25:52,638 Completed flushing /var/lib/cassandra/data/system/LocationInfo/system-LocationInfo-he-4-Data.db (163 bytes) for commitlog position ReplayPosition(segmentId=2331377289149064, position=544)
INFO 13:25:52,640 Node localhost/127.0.0.1 state jump to normal
INFO 13:25:52,642 Bootstrap/Replace/Move completed! Now serving reads.
cassandra@PTS0012:~/apache-cassandra-1.1.4/bin$


4) Connecting to cassandra :
            Login as cassandra user
                i) move to /home/cassandra/apache-cassandra-1.1.4/bin
               ii) cassandra@PTS0012:~/apache-cassandra-1.1.4/bin$ ./cassandra-cli
                                    Connected to: "PEGCluster" on 127.0.0.1/9160
                                    Welcome to Cassandra CLI version 1.1.4

                                    Type 'help;' or '?' for help.
                                    Type 'quit;' or 'exit;' to quit.

                                    [default@unknown]


Congratulations! Now your Cassandra server is up and running with a new single node cluster called "PEGCluster" listening on port 9160.

Comments

Popular posts from this blog

Some facts and Figures of WCF

SOAP Message in WCF: 1.        The max size of SOAP message in WCF is 9,223,372,036,854,775,807 bytes. Including metadata. 2.        For actual user data we can use 2,147,483,647 bytes out of it. 3.        With default setting WCF uses only 65536 bytes. 4.        We can change it by setting maxReceivedMessageSize in clients app.config file.    5.        So selection of data types in Data Contract and Data table will matter a lot! 6.       Reference :   http://blogs.msdn.com/drnick/archive/2006/03/16/552628.aspx          http://blogs.msdn.com/drnick/archive/2006/03/10/547568.aspx       “Amazing blog for WCF!” Data Contract: 1.        By Default WCF can serialize 65536 DataMember. 2.        We can change it to max  2147483646. 3.       How?  Please go to: http://social.msdn.microsoft.com/Forums/en-US/wcf/thread/4a81a226-175b-41d3-864a-181792c71ffe Tracing WCF service message: 1.       Enable message log of WCF: http://www.avingtonsolutions.com/blog/post/2008/07/25/Tracing-SOAP-Message

Drop all Objects from Schema In Postgres

To Drop all objects from Postgres Schema there could be following two approaches: Drop Schema with cascade all and re-create it again.  In some cases where you dont want to/not allowed to drop and recreate schema, its easy to look for objects on current schema and drop them. following script would help to do so, Create function which would do the task and then drop that function too. --- CREATE OR REPLACE FUNCTION drop_DB_objects() RETURNS VOID AS $$ DECLARE  rd_object RECORD; v_idx_statement VARCHAR(500);   BEGIN ---1. Dropping all stored functions RAISE NOTICE '%', 'Dropping all stored functions...'; FOR rd_object IN ( SELECT format('%I.%I(%s)', ns.nspname, p.proname, oidvectortypes(p.proargtypes)) as functionDef     FROM pg_proc p     INNER JOIN pg_namespace ns ON (p.pronamespace = ns.oid)    WHERE ns.nspname = current_schema      AND p.proname <> 'drop_db_objects' )

Vacuum Analyze Full Schema in PostgreSQL Database

To analyze full database schema, you can use following shell script : -------------------------------------------------------------------------------------------------------------------------- #!/bin/sh rm -rf /tmp/vacuum_analyze_MyData.sql dbname="mydata" username="mydatadatamart" namespace="mydatadatamart" # Vacuum only those tables which are fragmented over 30% # Analyze tables only if they are not already # Process CMF and Groups Master table every time as they are master Tables=`psql $dbname $username << EOF SELECT 'VACUUM ANALYZE VERBOSE ' || tablename || ' ;' AS TableList   FROM ( SELECT *,       n_dead_tup > av_threshold AS "av_needed",       CASE WHEN reltuples > 0      THEN round(100.0 * n_dead_tup / (reltuples))       ELSE 0       END AS pct_dead FROM (SELECT C.relname AS TableName, pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins, pg_stat_get_tuples_updated(