Archiving Cassandra Data

WSO2 BAM uses MapReduce jobs to archive Cassandra data. As a result, you can archive a large amount of data using a cluster of Hadoop nodes. You run the archive process manually or schedule it using a cron expression as explained below:

  1. Log in to BAM management console and select Archive Data menu under the Configure menu.


    Archiving data manually

  2. The Cassandra data archive configuration opens. Select the Date Range option to manually archive data between a specific date range.


    The configuration parameters are explained below:

    Parameter Description
    Stream name In the BAM data model, stream name maps to a Cassandra column family.,You provide the stream name to archive the data stored under that stream,name.
    Version Version of the stream. Used to specify which version to archive when,there are multiple versions under the same stream name (as recommended).
    Date range Specifies the start and end dates. E.g., From - 25/01/2013 00:00:00 AM to 03/02/2013 00:00:00 AM
    Username/Password Cassandra username and passowrd (same as BAM credentials)
    External Cassandra cluster Connection URL - connection details of Cassandra cluster. E.g.,10.100.60.150:9160,10.100.60.151:9160

    Scheduling the archive

  3. Select the Below this number of days option to schedule an archive process. For example:


    The configuration parameters are explained below:

    Parameter Description
    No of days Keeps only last 'n' no of days data in the Column Family. For example, according to above configuration, the system only runs data from the last 90 days and archives the older data.
    Cron expression Cron expression is used to schedule the archive process. For example, according to above configuration, the archive job runs everyday at midnight.
    Icon
    • Name of the archive column family is <original column family name> + _arch
    • Cassandra streams are generated with underscores (_). Replace the underscores in the stream name with dot (.) when archiving. For example, if stream name is org_wso2_bam_phone_retail_store_kpi, mention it as org.wso2.bam.phone.retail_store.kpi when archiving.
  4. Click Submit once you are done.

    Once you submit a scheduled archive, the system creates a Hive script and executes it.

  5. Go to Analytics -> List menu of the management console, select your script and click the Schedule Script link associated with it to change the schedule time of your script.

    Icon

    Note that step 6 does not apply to the manual archiving process, which only executes the Hive query, but doesn't save it.