cassandra

Project status

GitHub:

Nightly Workflows:

Puppet Forge:

Table of Contents

Description

This module enables Puppet to install, configure and run Apache Cassandra nodes. As a spin-off of several years of experience we collected in running Cassandra in a production environment, using Puppet to maintain the configuration of the nodes. During its evolution the module has proven to be useful for Cassandra versions ranging from early 1.1, over many 2.x, 3.0, 3.11 to latest 5.0 releases and multiple distributions, e.g. DSE, Apache and other. Efforts are taken to keep the generally available Apache open-source releases supported.

Leveraging the declarative nature of Puppet DSL, Cassandra configuration is considered a declarative description of the desired state and will be ensured in the node configuration. Configuration values not declared in the manifest will be kept untouched by this module.

Conceptualized to fit into a roles/profiles design pattern, this module keeps a strong focus on the topic of Cassandra node configuration disregarding many aspects bound to the use-case and the infrastructure environment.

Setup

What cassandra affects

This module affects the following component:

  • Install the Packages cassandra and cassandra-tools.

  • It takes control over the file .cassandra.in.sh located in Cassandra user’s home directory.

  • It takes control over cassandra-rackdc.properties and cassandra-topology.properties

  • Optionally it can take control over the jvm.options file and its successors for multiple Java versions and variants, used to configure Java parameters.

  • It manages the content of the cassandra.yaml by merging a configuration hash to the file as it is found on the node, i.e. installed from the package.

A bit more important to know is that it does not control the contents of cassandra-env.sh and many other files, which might lead to conflicts during package updates.

Setup Requirements

Installation and running of Cassandra will require some other settings not covered by this module, e.g.:

  • access to a package repository providing the necessery packages for your operating system distribution

  • installation of a suitable Java runtime environment

  • setup of a proper clock synchronisation, e.g. NTP

  • configuration of additional settings, e.g. kernel parameter, firewall, etc.

Beginning with cassandra

Include the module to your node manifests (or your role or profile module):

contain cassandra

Usage

Once included in the node manifest the module can be configured via Hiera.

Installation versions

By default the latest available version of cassandra and cassandra-tools packages will be installed. The default settings will prevent autoatic upgrades.

If you want Puppet to install a specific version, e.g. 4.0.0, just add the following parameter to your Hiera DB:

cassandra::cassandra_ensure: 4.0.0

The version ensurement of the tools package defaults to cassandra::cassandra_ensure but can be set differently, e.g.:

cassandra::tools_ensure: latest

If you don’t want to install the tools package, you can set:

cassandra::tools_ensure: absent

In the case, your package name differs from default Apache releases you may change cassandra_package and/or tools_package settings, e.g.

cassandra::cassandra_package: dsc22

Main node configuration

The module provides you access to the main configuration file, the cassandra.yaml, through the configuration parameter config. This may contain a hash resembling the structure of the cassandra.yaml, which will be merged to the current content of the cassandra.yaml file on the node. This merge will only happen on the node itself.

Thus the config parameter should contain only those settings you want to have non-default, i.e. want to change on the node. Keep in mind, that the structure of this hash must fit to the structure of cassandra.yaml, e.g.

cassandra::config:
  cluster_name: Example Cassandra cluster
  endpoint_snitch: PropertyFileSnitch
  seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
        - seeds: 10.0.0.1,10.0.1.1
  listen_address: "%{facts.networking.ip}"

As seen for listen_address in the example, you can use Hiera interpolation to access Facts to setup the Cassandra node.

For deeper understanding of this merge procedure refer to the cataphract/yaml_settings module, which is used to merge the config hash with the cassandra.yaml on the node.

Setting up initial_token

If your Cassandra setup relies on the setting of initial_token within cassandra.yaml, the module is providing a sophisticated feature. You may set the initial_tokens parameter containing a hash mapping the initial token for each node by a node key. The module will lookup the initial_token for each node by the configured node_key and set the initial_token within the config hash. If there is no entry for a node found, an error is raised which stops the Puppet agent on that node. Example:

cassandra::initial_tokens:
  node01.server.lan: '0'
  node02.server.lan: '56713727820156410577229101238628035242'
  node03.server.lan: '113427455640312821154458202477256070484'

The node_key parameter, which defaults to $facts['networking']['fqdn'] can be changed if you want to map the initial_token by some other ID.

Rack and DC settings

When using GossipingPropertyFileSnitch class for endpoint snitch your cluster, you can manage the cassandra-rackdc.properties file through the rackdc parameter of this module.

cassandra::rackdc:
  dc: dc1
  rack: rackA

Optionally parameters prefer_local and dc_suffix are also accepted in the rackdc hash.

Topology settings

Using enpoint snitch classes PropertyFileSnitch or GossipingPropertyFileSnitch, you might want to control the contents of cassandra-topology.properties file. This is enabled through the topology parameter of this module. Containing a multi-leveld hash mapping arrays of your nodes to the racks and the racks to the datacenters. For example:

cassandra::topology:
  dc1:
    rackA:
    - 10.0.0.1
    - 10.0.0.2
    rackB:
    - 10.0.0.3
    - 10.0.0.4
  dc2:
    rackA:
    - 10.0.1.1
    - 10.0.1.2
    rackB:
    - 10.0.1.3
    - 10.0.1.4

This will end up in a cassandra-topology.properties file containing:

10.0.0.1=dc1:rackA
10.0.0.2=dc1:rackA
10.0.0.3=dc1:rackB
10.0.0.4=dc1:rackB
10.0.1.1=dc2:rackA
10.0.1.2=dc2:rackA
10.0.1.3=dc2:rackB
10.0.1.4=dc2:rackB

Note #1: leaving the topology parameter undef (which is default), the module will remove the cassandra-topology.properties file from your nodes. This is intended behaviour to support the migration from PropertyFileSnitch to GossipingPropertyFileSnitch.

Note #2: while many configuration changes will notify the service to restart, this is suppressed in the module on updates to the cassandra-topology.properties file. Changes made to the topology won’t bump the Cassandra node, because both snitch classes using this file are reloading it in runtime to read its updated content.

Setting the runtime environment

The module provides a variety of settings to runtime environment and the Java VM.

Environment variables

The defined type cassandra::environment::variable can be used to created variables in the Cassandra process’s environment. These typically contain MAX_HEAP_SIZE, HEAP_NEWSIZE, JAVA_HOME, LOCAL_JMX and other Cassandra specific variables.

Using the environment parameter will create cassandra::environment::variable instances, e.g.:

cassandra::environment:
  MAX_HEAP_SIZE: 8G
  HEAP_NEWSIZE: 2G

JVM options

The defined type cassandra::environment::jvm_option add options to the JVM Cassandra is running on.

Using the jvm_options parameter will create instances of cassandra::environment::jvm_option, e.g.:

cassandra::jvm_options:
  - verbose:gc
  - server

Java runtime settings

Within the cassandra::java namespace there are components allowing to setup:

  • Java agents through defined type cassandra::java::agent

  • Properties through defined type cassandra::java::property

  • runtime options through defined type cassandra::java::runtimeoption

  • advanced runtime options through defined type cassandra::java::advancedruntimeoption

  • garbage collector settings through class cassandra::java::gc

Using the java property will create instances of the above. E.g.:

cassandra::java:
  properties:
    cassandra.consistent.rangemovement: false
    cassandra.replace_address: 10.0.0.2
  agents:
    jmx_prometheus_javaagent.jar: 8080:config.yaml
  runtime_options:
    check: jni
  adv_runtime_options:
    LargePageSizeInBytes: 2m
    UseLargePages: true
    AlwaysPreTouch: true

Java garbage collection settings

Deprication notice: cassandra::java_gc and the class cassandra::java::gc are now deprecated. Consider using JVM option sets instead.

Settings to Java garbage collector can be made by instanciating the cassandra::java::gc class. Using the java_gc parameter will instantiate the class, e.g.:

cassandra::java_gc:
  collector: g1
  params:
    maxGCPauseMillis: 300

JVM option sets

Since version 2.2 the module is supporting a novel approach to setup options of Cassandra Java runtime. The JVM option set feature is controlling the jvm.options file of Cassandra 3.x and many combinations used for different Java versions and use case scopes by Cassandra 4.0 and later.

Note, that much of these settings will override settings done with cassandra::java and all of this will collide with cassandra::java_gc if set. Thus remove cassandra::java_gc at all when starting with cassandra::jvm_option_sets and consider migrating cassandra::java settings to cassandra::jvm_options.

JVM options for Cassandra 3.x

Default behaviour will control the jvm.options file. The defined type cassandra::jvm_option_set resource take parameters options, sizeoptions, properties and advancedoptions and add the settings to the jvm.options file.

Use the cassandra::jvm_option_sets parameter to build instances of cassandra::jvm_option_set type. For example:

cassandra::jvm_option_sets:
  example:
    options:
      - ea
      - server
    sizeoptions:
      Xms: 4G
      Xmx: 4G
      Xmn: 800M
    advancedoptions:
      LargePageSizeInBytes: 2m
      UseLargePages: true
      AlwaysPreTouch: true
    properties:
      cassandra.start_rpc: false

For all options, advanced runtime options and properties in the above example, the Puppet module will take control over the according lines in the jvm.options file only and set the desired options. Many other settings within jvm.options will not be touched by Puppet.

The top level ID (example) is the name of the option set allowing the grouping of options. Multiple option sets are allowed, however, it is not allowed to set the same option within different option sets.

In order to enable the removal of specific settings from jvm.options, use a tilde ~ to prefix options, and undef value (denoted with tilde ~ in Hiera) of properties, sizeoptions and advancedoptions. The example below show how to remove specific settings.

cassandra::jvm_option_sets:
  remove:
    options:
      - ~ea
    sizeoptions:
      Xmn: ~
    advancedoptions:
      FlightRecorder: ~
    properties:
      cassandra.initial_token: ~

Cassandra 4.0 and later JVM setup

Cassandra 4.0 and later are using distinct option files for server operation and client tools, for Java version independent options and for different Java versions. Set the parameters optsfile to jvm, jvm8, jvm11 or jvm17 and variant to server or clients accordingly.

cassandra::jvm_option_sets:
  java8example:
    optsfile: jvm8
    variant: server
    advancedoptions:
      ThreadPriorityPolicy: 42
  java11example:
    optsfile: jvm11
    variant: server
    advancedoptions:
      UseConcMarkSweepGC: false
      CMSParallelRemarkEnabled: ~
      UnlockExperimentalVMOptions: true
      UseZGC: true

Reference

Automatically generated reference documentation is available in the REFERENCE.md file or at rtib.github.io/puppet-cassandra/.

Limitations

For supported operating systems and dependencies, see metadata.json.

Extensive itegration tests are run nightly to assure quality and compatibility with next releases.

The current integration test matrix:

Cassandra branch OS distro JDK(#java)
3.0 Debian:9
Ubuntu:16.04
Ubuntu:18.04
OpenJDK-8
3.11 Debian:9
Ubuntu:16.04
Ubuntu:18.04
OpenJDK-8
4.0 Debian:9
Debian:10
Ubuntu:16.04
Ubuntu:18.04
OpenJDK-8
OpenJDK-11
OpenJDK-8
OpenJDK-8

<a class=“anchor” id=“java”>1</a>: Note, that this module will not manage any JDK installation. The JDK versions listed here are automatically installed via dependencies while the module is installing the latest available Cassandra version from the release branch.

Development

The module is developed using recent Puppet Development Kit, validated and extensively tested using Puppet Litmus. Automated workflows, implemented with GitHub Actions are run on demand and nightly, doing validation, spec tests and continuous integration tests.

As an open project, you are welcome to contribute to this module. Currently, there is no contribution guide specific to this module, general information about the workflow may apply. In case of questions feel free to open a issue or join community Slack channels #forge-modules, #puppet, #puppet-dev, #testing on slack.puppet.com.

Issues and pull requests will be addressed in a timely manner, according to community best practice. Releases are going to be published on demand, after having merged an set of sufficiently important changes and all tests succeeded. There may be automated rule enforcement in place to provide a healthy issue lifecycle.