Apache Cassandra Learning Step by Step (3): Samples ABC

====16 Feb 2012, by Bright Zheng (IT进行时)====

4. Samples ABC

We try to learn it step by step to understand the concepts and Java API usages by means of:

1. Concept Introduction

2. CLI

3. Java Sample Code

4.1. Get a Single Column by a Key

4.1.1. Sample Code

public QueryResult<HColumn<String,String>> execute() {       

        ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace);

        columnQuery.setColumnFamily("Npanxx");

        columnQuery.setKey("512204");

        columnQuery.setName("city");

        QueryResult<HColumn<String, String>> result = columnQuery.execute();

       

        return result;

    }


4.1.2.  Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---

HColumn(city=Austin)

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.1.3.  CLI

[default@Tutorial] get Npanxx['512204']['city'];

=> (column=city, value=Austin, timestamp=1329234388328000)

Elapsed time: 16 msec(s).

4.2. Get multiple columns by a Key

4.2.1. Sample Code

public QueryResult<ColumnSlice<Long,String>> execute() {

        SliceQuery<String, Long, String> sliceQuery =

            HFactory.createSliceQuery(keyspace, stringSerializerlongSerializerstringSerializer);

        sliceQuery.setColumnFamily("StateCity");

        sliceQuery.setKey("TX Austin");

       

        //way 1: set multiple columnNames

        sliceQuery.setColumnNames(202L, 203L, 204L);

       

        //way 2: use setRange

        // change 'reversed' to true to get the columns in reverse order

        //sliceQuery.setRange(202L, 204L, false, 5);

        

        QueryResult<ColumnSlice<Long, String>> result = sliceQuery.execute();

        return result;

    }

4.2.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_sc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---

ColumnSlice([HColumn(202=30.27x097.74), HColumn(203=30.27x097.74), HColumn(204=30.32x097.73)]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.2.3. CLI(TODO)

TODO: Refering to CLI Syntax, Cassandra can’t get multiple columns at one ‘get’ command?

4.3. Get multiple rows by a set of Key

4.3.1. Sample Code

public QueryResult<Rows<String,String,String>> execute() {

        MultigetSliceQuery<String, String, String> multigetSlicesQuery =

            HFactory.createMultigetSliceQuery(keyspace, stringSerializerstringSerializer,stringSerializer);

        multigetSlicesQuery.setColumnFamily("Npanxx");

        multigetSlicesQuery.setColumnNames("city","state","lat","lng");       

        multigetSlicesQuery.setKeys("512202","512203","512205","512206");

        QueryResult<Rows<String, String, String>> results = multigetSlicesQuery.execute();

        return results;

    }

4.3.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="multiget_slice" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512203=Row(512203,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),

512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)]))})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.3.3. CLI(TODO)

TODO: N/A?

4.4. Get Slices from a Range of Rows by Key

4.4.1. Sample Code

GetRangeSlicesForStateCity.java

public QueryResult<OrderedRows<String,String,String>> execute() {

        RangeSlicesQuery<String, String, String> rangeSlicesQuery =

            HFactory.createRangeSlicesQuery(keyspace, stringSerializerstringSerializer,stringSerializer);

        rangeSlicesQuery.setColumnFamily("Npanxx");

        rangeSlicesQuery.setColumnNames("city","state","lat","lng");       

        rangeSlicesQuery.setKeys("512202", "512205");

        rangeSlicesQuery.setRowCount(5);

        QueryResult<OrderedRows<String, String, String>> results = rangeSlicesQuery.execute();

        return results;

    }

Important Note: The result actually is NOT meaningful (expected return might be 512202-512205, 4 rows, but actually not) since the Key is sorted by RandomPartitioner (which can be configured in /conf/cassandra.yaml, but not recommend to do so).  The result can be referred at “Sample Code run by Maven”.

4.4.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_range_slices" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({

512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)]))

})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.4.3. CLI(TODO)

TODO: N/A

4.5. Get Slices from a Range of Rows by Columns

4.5.1. Sample Code

GetSliceForAreaCodeCity.java

public QueryResult<ColumnSlice<String,String>> execute() {

        SliceQuery<String, String, String> sliceQuery =

            HFactory.createSliceQuery(keyspace, stringSerializer,stringSerializerstringSerializer);

        sliceQuery.setColumnFamily("AreaCode");

        sliceQuery.setKey("512");

        // change the order argument to 'true' to get the last 2 columns in descending order

        // gets the first 4 columns "between" Austin and Austin__204 according to comparator

        sliceQuery.setRange("Austin", "Austin__204", false, 5);

 

        QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();

 

        return result;

    }

4.5.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_acc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

ColumnSlice([

HColumn(Austin__202=30.27x097.74),

HColumn(Austin__203=30.27x097.74),

HColumn(Austin__204=30.32x097.73)

])

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.5.3. CLI

N/A

4.6. Get Slices from Indexed Columns

4.6.1. Sample Code

GetIndexedSlicesForCityState.java

public QueryResult<OrderedRows<String, String, String>> execute() {

        IndexedSlicesQuery<String, String, String> indexedSlicesQuery =

            HFactory.createIndexedSlicesQuery(keyspace, stringSerializer,stringSerializerstringSerializer);

        indexedSlicesQuery.setColumnFamily("Npanxx");

        indexedSlicesQuery.setColumnNames("city","lat","lng");

        indexedSlicesQuery.addEqualsExpression("state", "TX");

        indexedSlicesQuery.addEqualsExpression("city", "Austin");

        indexedSlicesQuery.addGteExpression("lat", "30.30");

        QueryResult<OrderedRows<String, String, String>> result = indexedSlicesQuery.execute();

        

        return result;

    }

4.6.2. Sample Code run by Maven

 

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({512204=Row(

512204,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)]))})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.6.3. CLI

[default@Tutorial] get npanxx where state='TX' and city='Austin' and lat>'30.30';

-------------------

RowKey: 512204

=> (column=city, value=Austin, timestamp=1329299521508000)

=> (column=lat, value=30.32, timestamp=1329299521540000)

=> (column=lng, value=097.73, timestamp=1329299521555000)

=> (column=state, value=TX, timestamp=1329299521524000)

-------------------

RowKey: 512206

=> (column=city, value=Austin, timestamp=1329299521618000)

=> (column=lat, value=30.32, timestamp=1329299521633000)

=> (column=lng, value=097.73, timestamp=1329299522491000)

=> (column=state, value=TX, timestamp=1329299521618000)

-------------------

RowKey: 512205

=> (column=city, value=Austin, timestamp=1329299521555000)

=> (column=lat, value=30.32, timestamp=1329299521586000)

=> (column=lng, value=097.73, timestamp=1329299521602000)

=> (column=state, value=TX, timestamp=1329299521571000)

 

3 Rows Returned.

Elapsed time: 16 msec(s).

 

4.7. Insertion

4.7.1. Sample Code

InsertRowsForColumnFamilies.java

public QueryResult<?> execute() {

        Mutator<String> mutator = HFactory.createMutator(keyspace,stringSerializer);

       

        mutator.addInsertion("CA Burlingame", "StateCity", HFactory.createColumn(650L, "37.57x122.34",longSerializer,stringSerializer));

        mutator.addInsertion("650", "AreaCode", HFactory.createStringColumn("Burlingame__650", "37.57x122.34"));

        mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lat", "37.57"));

        mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lng", "122.34"));

        mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("city", "Burlingame"));

        mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("state", "CA"));                

       

        MutationResult mr = mutator.execute();

        return null;

    }

4.7.2. Sample Code run by Maven

Omitted

4.7.3. CLI

[default@Tutorial] set StateCity['CA Burlingame']['650']='37.57x122.34';

[default@Tutorial] set AreaCode[‘650'][‘Burlingame__650’]=’37.57x122.34';

[default@Tutorial] set Npanxx['650222']['lat']='37.57';

4.8. Deletion

4.8.1. Sample Code

InsertRowsForColumnFamilies.java

public QueryResult<?> execute() {

        Mutator<String> mutator = HFactory.createMutator(keyspace,stringSerializer);

       

        //Mutator.addDeletion(String key, String cf, String columnName, Serializer<String> nameSerializer)

        //columnName as null means to delete the whole row.

        mutator.addDeletion("CA Burlingame", "StateCity", nullstringSerializer);

        mutator.addDeletion("650", "AreaCode", nullstringSerializer);

        mutator.addDeletion("650222", "Npanxx", nullstringSerializer);

        // adding a non-existent key like the following will cause the insertion of a tombstone

        // mutator.addDeletion("652", "AreaCode", null, stringSerializer);

        MutationResult mr = mutator.execute();

        return null;

 

    }

4.8.2. Sample Code run by Maven

Omitted…

4.8.3. CLI

[default@Tutorial] del StateCity['CA Burlingame'];

[default@Tutorial] del AreaCode['650'];

[default@Tutorial] del Npanxx['650222'];

Important Note: Whatever you use, either java code or CLI, the deletion event will still leave the DeletedColumn row key there marked as Tombstone (hehe, 墓碑, a really good naming) which can be retrieved back by command of ‘list’ like this.

[default@Tutorial] list StateCity;

Using default limit of 100

-------------------

RowKey: CA Burlingame

-------------------

RowKey: TX Austin

=> (column=202, value=30.27x097.74, timestamp=1329297768323000)

=> (column=203, value=30.27x097.74, timestamp=1329297768338000)

=> (column=204, value=30.32x097.73, timestamp=1329297768354000)

=> (column=205, value=30.32x097.73, timestamp=1329297768370000)

=> (column=206, value=30.32x097.73, timestamp=1329297768385000)


2 Rows Returned.

Elapsed time: 16 msec(s).

As you see, two rows returned! Even the row of ‘CA Burlingame’ has been deleted.

Even worse, if the deletion of non-existing key will cause an issue called ‘insertion of a tombstone’ which means it will add one more row in the Column Family!!!

Fortrunately, the command of ‘get’ won’t retrieve it back any more.

[default@Tutorial] get StateCity['CA Burlingame'];

Returned 0 results.

Elapsed time: 0 msec(s).

 

Go deeper? Please read on.

When will Cassandra remove these tombstones? As I know, two ways:

1. Wait until gc_grace_seconds is timeout (Not verified yet)

The gc_grace_seconds is set per CF and can be updated without a restart.

How to get gc_grace_seconds? Simply use CLI:

[default@Tutorial] show schema;

create column family StateCity

  with column_type = 'Standard'

  and comparator = 'LongType'

  and default_validation_class = 'UTF8Type'

  and key_validation_class = 'UTF8Type'

  and rows_cached = 0.0

  and row_cache_save_period = 0

  and row_cache_keys_to_save = 2147483647

  and keys_cached = 200000.0

  and key_cache_save_period = 14400

  and read_repair_chance = 1.0

  and gc_grace = 864000   // 10 days, OMG

  and min_compaction_threshold = 4

  and max_compaction_threshold = 32

  and replicate_on_write = true

  and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'

and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';

 

2. The Compaction event (under investigation but no luck yet)

The Compaction will be triggered automatically.

But how to trigger compaction manually? Use nodetool as well.

C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost flush Tutorial

Starting NodeTool

C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost compact Tutorial

Starting NodeTool

Then we can see some logging messages in the Cassandra console.

But as I found, the tombstones are still here. (WHY???)

C:\java\apache-cassandra-1.0.7\bin>sstable2json ..\runtime\data\Tutorial\StateCity-hc-9-Data.db

{

"4341204275726c696e67616d65": [["650","37.57x122.34",1329316454906000]],

"54582041757374696e": [["202","30.27x097.74",1329297768323000], ["203","30.27x097.74",1329297768338000], ["204","30.32x097.73",1329297768354000], ["205","30.32x097.73",1329297768370000], ["206","30.32x097.73",1329297768385000]],

"616263": []

}

And still appears in the list command. (KAO, 阴魂不散? Big why???)

[default@Tutorial] list statecity;

Using default limit of 100

-------------------

RowKey: CA Burlingame

-------------------

RowKey: TX Austin

=> (column=202, value=30.27x097.74, timestamp=1329297768323000)

=> (column=203, value=30.27x097.74, timestamp=1329297768338000)

=> (column=204, value=30.32x097.73, timestamp=1329297768354000)

=> (column=205, value=30.32x097.73, timestamp=1329297768370000)

=> (column=206, value=30.32x097.73, timestamp=1329297768385000)

-------------------

RowKey: abc

 

3 Rows Returned.

Elapsed time: 31 msec(s).

 

 

在这儿咱发几句牢骚:

1. 可能是学习深度还不足的原因,感觉CLI比较弱,适合初始化建模DDL和简单的数据分析;

2. Tombstone的清理问题还没有最终得到验证,暂时挂起,权当悬案先,以后有答案了再补充、更正

Learning Apache Cassandra - Second Edition by Sandeep Yarabarla English | 25 Apr. 2017 | ASIN: B01N52R0B5 | 360 Pages | AZW3 | 10.68 MB Key Features Install Cassandra and set up multi-node clusters Design rich schemas that capture the relationships between different data types Master the advanced features available in Cassandra 3.x through a step-by-step tutorial and build a scalable, high performance database layer Book Description Cassandra is a distributed database that stands out thanks to its robust feature set and intuitive interface, while providing high availability and scalability of a distributed data store. This book will introduce you to the rich feature set offered by Cassandra, and empower you to create and manage a highly scalable, performant and fault-tolerant database layer. The book starts by explaining the new features implemented in Cassandra 3.x and get you set up with Cassandra. Then you&#39;ll walk through data modeling in Cassandra and the rich feature set available to design a flexible schema. Next you&#39;ll learn to create tables with composite partition keys, collections and user-defined types and get to know different methods to avoid denormalization of data. You will then proceed to create user-defined functions and aggregates in Cassandra. Then, you will set up a multi node cluster and see how the dynamics of Cassandra change with it. Finally, you will implement some application-level optimizations using a Java client. By the end of this book, you&#39;ll be fully equipped to build powerful, scalable Cassandra database layers for your applications. What you will learn Install Cassandra Create keyspaces and tables with multiple clustering columns to organize related data Use secondary indexes and materialized views to avoid denormalization of data Effortlessly handle concurrent updates with collection columns Ensure data integrity with lightweight transactions and logged batches Understand eventual consistency and use the right consistency l
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值