Introduction

1. Introducing Spring Cloud Data Flow for OpenShift

This project provides support for orchestrating long-running (streaming) and short-lived (task/batch) data microservices to OpenShift 3.

2. Spring Cloud Data Flow

Spring Cloud Data Flow is a cloud-native orchestration service for composable data microservices on modern runtimes. With Spring Cloud Data Flow, developers can create and orchestrate data pipelines for common use cases such as data ingest, real-time analytics, and data import/export.

The Spring Cloud Data Flow architecture consists of a server that deploys Streams and Tasks. Streams are defined using a DSL or visually through the browser based designer UI. Streams are based on the Spring Cloud Stream programming model while Tasks are based on the Spring Cloud Task programming model. The sections below describe more information about creating your own custom Streams and Tasks

For more details about the core architecture components and the supported features, please review Spring Cloud Data Flow’s core reference guide. There’re several samples available for reference.

3. Spring Cloud Stream

Spring Cloud Stream is a framework for building message-driven microservice applications. Spring Cloud Stream builds upon Spring Boot to create standalone, production-grade Spring applications, and uses Spring Integration to provide connectivity to message brokers. It provides opinionated configuration of middleware from several vendors, introducing the concepts of persistent publish-subscribe semantics, consumer groups, and partitions.

For more details about the core framework components and the supported features, please review Spring Cloud Stream’s reference guide.

There’s a rich ecosystem of Spring Cloud Stream Application-Starters that can be used either as standalone data microservice applications or in Spring Cloud Data Flow. For convenience, we have generated RabbitMQ and Apache Kafka variants of these application-starters that are available for use from Maven Repo and Docker Hub as maven artifacts and docker images, respectively.

Do you have a requirement to develop custom applications? No problem. Refer to this guide to create custom stream applications. There’re several samples available for reference.

4. Spring Cloud Task

Spring Cloud Task makes it easy to create short-lived microservices. We provide capabilities that allow short-lived JVM processes to be executed on demand in a production environment.

For more details about the core framework components and the supported features, please review Spring Cloud Task’s reference guide.

There’s a rich ecosystem of Spring Cloud Task Application-Starters that can be used either as standalone data microservice applications or in Spring Cloud Data Flow. For convenience, the generated application-starters are available for use from Maven Repo. There are several samples available for reference.

Features

The Data Flow Server for OpenShift includes the following features over and above those of the Kubernetes Server.

5. Support for Maven Resource

Possibly the most prominent feature of the OpenShift Server besides the ability to deploy to OpenShift is the ability to support Maven resources. The OpenShift Server supports Docker resources (docker://) just like the Kubernetes Server but can additionally handle Maven resources (maven://) enabled by the OpenShift Build mechanism.

For example, both the below app registrations (via the Data Flow Shell) are valid and supported:

dataflow:>app register --name http-mvn --type source --uri maven://org.springframework.cloud.stream.app:http-source-rabbit:1.1.0.RELEASE
dataflow:>app import --name http-docker --type source --uri app register --name http --type source --uri docker:springcloudstream/http-source-rabbit:1.1.0.RELEASE

See the Getting Started section for examples of deploying both Docker and Maven resource types.

6. Build Hashing for Maven Resource Apps

When deploying Maven resource (maven://) based apps, an OpenShift Build will be triggered to build the Docker image that will in turn be deployed. It is not efficient to trigger a new build for an app that was already deployed and to which there are no changes detected in the Maven Jar artifact. The resulting image would essentially be identical every time.

To help with this, the OpenShift Server will create a hash of the Maven artifact located in the local cache. On subsequent deploys of the same app (same Maven artifact) this hash will first be checked against existing buils and if found, a new build will not be triggered but instead the existing image will be used.

This feature can be disabled by specifying the spring.cloud.deployer.openshift.forceBuild=true as either a deployer (affects all deployed apps) or deployment (on a per app basis) property.

7. Volumes and Volume Mounts

Volumes and volume mounts provide the ability for a Spring Cloud Stream application to access persistent storage made available on the OpenShift cluster. The supported volume and volume mount types are determined by the underlying kubernetes-model library. All of the volume types that have a generated mode are supported.

Volumes and volume mounts can be specified as server deployer properties as well as app deployment properties specified at deployment time. Both ways of defining the volumes and volume mounts are identical, where they are specified as a JSON representation of the kubernetes-client model.

Volumes and volume mounts defined at deployer level will be added to all deployed apps. This is handy for common shared folders that should be available to all apps.

Below is an example of a volumes and volume mounts defined as a server deployer property in the ConfigMap:

spring.cloud.deployer.openshift:
  volumes:
    - name: testhostpath
      hostPath:
        path: /test/hostPath

    - name: testpvc
      persistentVolumeClaim:
        claimName: testClaim
        readOnly: true

    - name: testnfs
      nfs:
        server: 10.0.0.1:111
        path: /test/nfs

  volumeMounts:
    - name: testhostpath:
      mountPath: /test/hostPath

    - name: testpvc:
      mountPath: /test/pvc

    - name: testnfs:
      mountPath: /test/nfs
      readOnly: true
The default value for readOnly is false. I.e. Container requests read/write access.

Examples of the deployment property (via the Data Flow Shell) variation of defining volumes and volume mounts below:

dataflow:>stream create --name test --definition "time | file"
Created new stream 'timezoney'

dataflow:>stream deploy test --properties "app.file.spring.cloud.deployer.openshift.deployment.volumes=[{name: testhostpath, hostPath: { path: '/test/override/hostPath' }}],spring.cloud.deployer.openshift.deployment.volumeMounts=[{name: 'testhostpath', mountPath: '/test/hostPath'}]"

Getting Started

The Data Flow Server for OpenShift extends the Kubernetes Server implementation and therefore many of the configuration options and concepts are similar and can in fact be used with the OpenShift server.

Refer to the Spring Cloud Data Flow Server for Kubernetes reference guide.

8. Deploying Streams on OpenShift

The following guide assumes that you have a OpenShift 3 cluster available. This includes both OpenShift Origin and OpenShift Container Platform offerings.

If you do not have a OpenShift cluster available, see the next section which describes running a local OpenShift Origin cluster for development/testing otherwise continue to Installing the Data Flow Server using OpenShift templates.

8.1. A local OpenShift cluster with minishift

There are a few ways to stand up a local OpenShift Origin cluster on your machine for testing. These include:

For the purpose of this guide, the minishift tool will be used.

8.1.1. Installation and Getting Started

Install minishift as per the instructions here. Once you have installed minishift successfully, you can start up a OpenShift instance with minishift start.

$ minishift start --memory 4096 --cpus 4 --deploy-router
Starting local OpenShift cluster...
oc is now configured to use the cluster.
Run this command to use the cluster:
oc login --username=admin --password=admin
$
The --deploy-router option deploys the default HAProxy Router which is required to expose and access the Spring Cloud Data Flow UI and other tools.
OpenShift Console

The OpenShift Console is a valuable interface into your cluster, it is recommended that you open the console with:

$ minishift console
Opening OpenShift console in default browser...
$

a browser window will open with the console login page. Login with admin/admin credentials.

OpenShift Console
Make sure you wait for the docker-registry and router deployments to successfully deploy before continuing. These resources are deployed to the default project.
oc CLI tool

You can also manage the local cluster with the oc CLI tool. If you do not have the oc tool installed, follow the instructions here.

Login and use the local instance with:

$ oc login --username=admin --password=admin
Login successful.

You have access to the following projects and can switch between them with 'oc project <projectname>':

  * default
    kube-system
    openshift
    openshift-infra

Using project "default".
$

8.2. Creating a new Project

To group the resources created as part of this guide, create a new Project. You can do this using the Console or oc tool. Below is an example using the 'oc' tool:

$ oc new-project scdf --description="Spring Cloud Data Flow"
Now using project "scdf" on server "https://192.168.64.13:8443".
...
$
The IP address (192.168.64.13) assigned will vary each time you use minishift start, so adjust accordingly. The active project should be scdf (check with oc project) and should be the project used for the rest of this guide.

8.3. Installing the Data Flow Server using OpenShift templates

To install a Data Flow Server and supporting infrastructure components to OpenShift, we will use OpenShift templates. Templates allow you to deploy a predefined set of resources with sane default configurations which can be optionally configured via parameters for specific environments.

The templates for the Data Flow Server for OpenShift are available in the src/etc/openshift directory in this project’s GitHub repository. There are several templates available:

8.3.1. Installing the OpenShift templates

You can install the above templates using the OpenShift Console or oc tool. You would have to clone or download the Data Flow Server for OpenShift project and import the templates in the src/etc/openshift directory one by one using the Console or oc create -f …​.

However, a more convenient and the recommended way of installing all the templates is to run the following:

$ curl https://raw.githubusercontent.com/donovanmuller/spring-cloud-dataflow-server-openshift/v1.1.0.RELEASE/src/etc/openshift/install-templates.sh | bash
Installing OpenShift templates into project 'scdf'...
Archive:  /tmp/scdf-openshift-templates.zip
  inflating: /tmp/scdf-openshift-templates/scdf-ephemeral-datasources-kafka-template.yaml
  inflating: /tmp/scdf-openshift-templates/scdf-ephemeral-datasources-rabbitmq-template.yaml
  inflating: /tmp/scdf-openshift-templates/scdf-ephemeral-datasources-template.yaml
  inflating: /tmp/scdf-openshift-templates/scdf-sa.yaml
  inflating: /tmp/scdf-openshift-templates/scdf-template.yaml
Installing template '/tmp/scdf-openshift-templates/scdf-ephemeral-datasources-kafka-template.yaml'
template "spring-cloud-dataflow-server-openshift-ephemeral-kafka" replaced
Installing template '/tmp/scdf-openshift-templates/scdf-ephemeral-datasources-rabbitmq-template.yaml'
template "spring-cloud-dataflow-server-openshift-ephemeral-rabbitmq" replaced
Installing template '/tmp/scdf-openshift-templates/scdf-ephemeral-datasources-template.yaml'
template "spring-cloud-dataflow-server-openshift-ephemeral-datasources" replaced
Installing template '/tmp/scdf-openshift-templates/scdf-sa.yaml'
serviceaccount "scdf" replaced
Installing template '/tmp/scdf-openshift-templates/scdf-template.yaml'
template "spring-cloud-dataflow-server-openshift" replaced
Adding 'edit' role to 'scdf' Service Account...
Adding 'scdf' Service Account to the 'anyuid' SCC...
Templates installed.
$

This will download all the templates and install them into the scdf project by default. It will also create and configure a required Service Account mentioned below. The project can be specified by using -s scdf after the bash command above.

8.3.2. Creating and configuring Service Accounts

The Data Flow Server requires a Service Account (named scdf), which grants it access to perform actions such as reading ConfigMaps and Secrets, creating Builds, etc.

To create the scdf Service Account, use the oc tool from the src/etc/openshift directory:

$ oc create -f scdf-sa.yaml
...
If you used the install-templates.sh script above to install the templates, the scdf Service Account would have already been created for you.

The scdf Service Account must have the edit role added to it in order to have the correct permissions to function properly. Add the edit role with the following:

$ oc policy add-role-to-user edit system:serviceaccount:scdf:scdf
...
If you used the install-templates.sh script above to install the templates, the scdf Service Account would already have the edit role added to it.

The scdf Service Account also needs to be added to the anyuid Security Context Constraint to allow the MySQL Pod to run using the root user. By default OpenShift starts a Pod using a random user Id. Add the Service Account to the anyuid SCC group with:

$ oc adm policy add-scc-to-user anyuid system:serviceaccount:scdf:scdf
If you used the install-templates.sh script above to install the templates, the scdf Service Account is already added to the anyuid SCC.

8.3.3. Installing the Data Flow Server

For this guide we’ll use the Data Flow Server with ephemeral Datasources and Kafka binder template to start a Data Flow Server in the scdf project. First, using the OpenShift Console, click the Add to Project button. You should see the list of templates mentioned above. Choose the spring-cloud-dataflow-server-openshift-ephemeral-kafka template.

Data Flow Server template

Default configuration values are provided but can be updated to meet your needs if necessary.

To avoid deployments failing due to long image pull times, you can manually pull the requires images. Note that you should first change your local Docker client to use the Docker engine in the minishift VM

$ eval $(minishift docker-env)
$ curl https://raw.githubusercontent.com/donovanmuller/spring-cloud-dataflow-server-openshift/v1.1.0.RELEASE/src/etc/openshift/pull-images.sh | bash

The above step is optional as OpenShift will also pull the required images. However, depending on your network speed, deployments may fail due to timeout. If this happens, simply start another deployment of the component by click the Deploy button when viewing the deployment.

After updating the configuration values or leaving the default values, click the Create button to deploy this template.

Pulling the various Docker images may take some time, so please be patient. Once all the images have been pulled, the various pods will start and should all appear as dark blue circles.

Data Flow Server deployed
The Data Flow Server will by default deploy apps only in the project that it itself is deployed. I.e. a Data Flow Server deployed in the default project will not be able to deploy applications to the scdf project. The recommended configuration is a Data Flow Server per project.

Verify that the Data Flow Server has started successfully by clicking on the exposed Route URL.

Data Flow Server UI
The UI is mapped to /dashboard

If you’d like to reset or perhaps try another template, you can remove the Data Flow Server and other resources created by the template with:

$ oc delete all --selector=template=scdf
$ oc delete cm --selector=template=scdf
$ oc delete secret --selector=template=scdf

8.4. Download and run the Spring Cloud Data Flow Shell

Download and run the Shell, targeting the Data Flow Server exposed via a Route.

$ wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/1.1.0.RELEASE/spring-cloud-dataflow-shell-1.1.0.RELEASE.jar
$ java -jar spring-cloud-dataflow-shell-1.1.0.RELEASE.jar --dataflow.uri=http://scdf-kafka-scdf.192.168.64.15.xip.io/

  ____                              ____ _                __
 / ___| _ __  _ __(_)_ __   __ _   / ___| | ___  _   _  __| |
 \___ \| '_ \| '__| | '_ \ / _` | | |   | |/ _ \| | | |/ _` |
  ___) | |_) | |  | | | | | (_| | | |___| | (_) | |_| | (_| |
 |____/| .__/|_|  |_|_| |_|\__, |  \____|_|\___/ \__,_|\__,_|
  ____ |_|    _          __|___/                 __________
 |  _ \  __ _| |_ __ _  |  ___| | _____      __  \ \ \ \ \ \
 | | | |/ _` | __/ _` | | |_  | |/ _ \ \ /\ / /   \ \ \ \ \ \
 | |_| | (_| | || (_| | |  _| | | (_) \ V  V /    / / / / / /
 |____/ \__,_|\__\__,_| |_|   |_|\___/ \_/\_/    /_/_/_/_/_/

1.1.0.RELEASE

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

8.5. Registering Stream applications with Docker resource

Now register all out-of-the-box stream applications using the Docker resource type, built with the Kafka binder in bulk with the following command.

For more details, review how to register applications.

dataflow:>app import --uri http://bit.ly/stream-applications-kafka-docker
Successfully registered applications: [source.tcp, sink.jdbc, source.http, sink.rabbit, source.rabbit, source.ftp, sink.gpfdist, processor.transform, source.loggregator, source.sftp, processor.filter, sink.cassandra, processor.groovy-filter, sink.router, source.trigger, sink.hdfs-dataset, processor.splitter, source.load-generator, processor.tcp-client, source.time, source.gemfire, source.twitterstream, sink.tcp, source.jdbc, sink.field-value-counter, sink.redis-pubsub, sink.hdfs, processor.bridge, processor.pmml, processor.httpclient, source.s3, sink.ftp, sink.log, sink.gemfire, sink.aggregate-counter, sink.throughput, source.triggertask, sink.s3, source.gemfire-cq, source.jms, source.tcp-client, processor.scriptable-transform, sink.counter, sink.websocket, source.mongodb, source.mail, processor.groovy-transform, source.syslog]

8.6. Deploy a simple stream in the shell

Create a simple ticktock stream definition and deploy it immediately using the following command:

dataflow:>stream create --name ticktock --definition "time | log" --deploy
Created new stream 'ticktock'
Deployment request has been sent

Watch the OpenShift Console as the two application resources are created and the Pods are started. Once the Docker images are pulled and the Pods are started up, you should see the Pods with dark blue circles:

ticktock stream deployed

You can also verify the deployed apps using the oc tool

$ oc get pods
NAME                     READY     STATUS      RESTARTS   AGE
...
ticktock-log-0-2-it3ja   1/1       Running     0          7m
ticktock-time-2-sxqnp    1/1       Running     0          6m

To verify that the stream is working as expected, tail the logs of the ticktock-log app either using the OpenShift Console:

ticktock-log logs

or the oc tool:

$ oc logs -f ticktock-log
...
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 14:49:59
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 14:50:01
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 14:50:02
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 14:50:03
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 14:50:04
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 14:50:05
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 14:50:06
...

8.7. Registering Stream applications with Maven resource

The distinguishing feature of the Data Flow Server for OpenShift is that it has the capability to deploy applications registered with the Maven resource type in addition to the Docker resource type. Using the ticktock stream example above, we will create a similar stream definition but using the Maven resource versions of the apps.

For this example we will register the apps individually using the following command:

dataflow:>app register --type source --name time-mvn --uri maven://org.springframework.cloud.stream.app:time-source-kafka:1.1.0.RELEASE
Successfully registered application 'source:time-mvn'
dataflow:>app register --type sink --name log-mvn --uri maven://org.springframework.cloud.stream.app:log-sink-kafka:1.1.0.RELEASE
Successfully registered application 'sink:log-mvn'
We couldn’t bulk import the Maven version of the apps as we did for the Docker versions because the app names would conflict, as the names defined in the bulk import files are the same across resource types. Hence we register the Maven apps with a -mvn suffix.

8.8. Deploy a simple stream in the shell

Create a simple ticktock-mvn stream definition and deploy it immediately using the following command:

dataflow:>stream create --name ticktock-mvn --definition "time-mvn | log-mvn" --deploy
Created new stream 'ticktock-mvn'
Deployment request has been sent
There could be a slight delay once the above command is issued. This is due to the Maven artifacts being resolved and cached locally. Depending on the size of the artifacts, this could take some time.

Watch the OpenShift Console as the two application resources are created. Notice this time, that instead of the Pods being started, that a Build has been started instead. The Build will execute and create a Docker image, using the default Dockerfile, containing the app. The resultant Docker image will be pushed to the internal OpenShift registry, where the deployment resource will be triggered when the image has been successfully pushed. The deployment will then scale the app Pod up, starting the application.

ticktock-maven stream deployed

To verify that the stream is working as expected, tail the logs of the ticktock-log-mvn app using the oc tool:

$ oc get pods
NAME                             READY     STATUS      RESTARTS   AGE
...
ticktock-mvn-log-mvn-0-1-agpl6   1/1       Running     0          4m
ticktock-mvn-log-mvn-1-build     0/1       Completed   0          1h
ticktock-mvn-time-mvn-1-12ikj    1/1       Running     0          1m
ticktock-mvn-time-mvn-1-build    0/1       Completed   0          1h

$ oc logs -f ticktock-mvn-log-mvn-0-1-agpl6
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 18:34:23
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 18:34:25
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 18:34:26
...  INFO 1 --- [afka-listener-1] log-sink                                 : 11/29/16 18:34:27

9. Deploying Tasks on OpenShift

Deploying Task applications using the Data Flow Server for OpenShift is a similar affair to deploying Stream apps. Therefore, for brevity, only the Maven resource version of the task will be shown as an example.

9.1. Registering Task application with Maven resource

This time we will bulk import the Task application, as we do not have any Docker resource versions imported which would cause conflicts in naming. Import all Maven task applications with the following command:

dataflow:>app import --uri http://bit.ly/1-0-1-GA-task-applications-maven

9.2. Launch a simple task in the shell

Let’s create a simple task definition and launch it.

dataflow:>task create task1 --definition "timestamp"
dataflow:>task launch task1

Note that when the task is launched, an OpenShift Build is started to build the relevant Docker image containing the task app. Once the Build has completed successfully, pushing the built image to the internal registry, a bare Pod is started, executing the task.

Verify that the task executed successfully by executing these commands:

dataflow:>task list
╔═════════╤═══════════════╤═══════════╗
║Task Name│Task Definition│Task Status║
╠═════════╪═══════════════╪═══════════╣
║task1    │timestamp      │complete   ║
╚═════════╧═══════════════╧═══════════╝

dataflow:>task execution list
╔═════════╤══╤═════════════════════════════╤═════════════════════════════╤═════════╗
║Task Name│ID│         Start Time          │          End Time           │Exit Code║
╠═════════╪══╪═════════════════════════════╪═════════════════════════════╪═════════╣
║task1    │1 │Wed Nov 30 13:13:02 SAST 2016│Wed Nov 30 13:13:02 SAST 2016│0        ║
╚═════════╧══╧═════════════════════════════╧═════════════════════════════╧═════════╝

You can also view the task execution status by using the Data Flow Server UI.

9.2.1. Cleanup completed tasks

If you want to delete the Build and Pod created by this task execution, execute the following:

dataflow:>task destroy --name task1

Configuration

As the OpenShift Server is based on the Kubernetes Server, all the configuration is mostly identical. Please see the Kubernetes Server reference guide for configuration options.

10. Maven Configuration

The Maven configuration is important for resolving Maven app artifacts. The following example taken from the Data Flow Server only template, configures a remote Maven repository:

maven:
  resolvePom: true
  remote-repositories.spring:
    url: http://repo.spring.io/libs-snapshot
    auth:
      username:
      password:

Where the resolvePom is important for determining the build strategy used. See the OpenShift templates for reference.

More configuration options can be seen in the Configure Maven Properties section in the Data Flow reference documentation.

11. Dockerfile Resolution Strategies

The Data Flow Server for OpenShift uses the Docker build strategy. The default strategy for resolving a Dockerfile is to use the built in Dockerfile included in the OpenShift deployer. However, there are three other strategies available:

  • If a remote Git URI is specified when creating the stream/task definition using the spring.cloud.deployer.openshift.build.git.uri property, this repository will be used and takes highest precedence.

  • If src/main/docker/Dockerfile is detected in the Maven artifact Jar, then it is assumed that the Dockerfile will exist in that location in a remote Git repository. In that case, the Git repository source is used in conjunction with the Docker build strategy. The remote Git URI and ref are extracted from the <scm><connection></connection></scm> and <scm><tag></tag></scm> tags in the pom.xml (if available) of the Maven Jar artifact. For example, if the <scm><connection> value was scm:git:git@github.com:spring-cloud/spring-cloud-dataflow.git, then the remote Git URI would be parsed as ssh://git@github.com:spring-cloud/spring-cloud-dataflow.git. In short, the Dockerfile from the remote Git repository for the app being deployed will be used (OpenShift actually clones the Git repo) as the source for the image build. Of course, you can include and customise whatever and however you like in this Dockerfile.

  • The other strategy uses the contents of a Dockerfile located in one of three locations as the Dockerfile source:

    • The file system location of a Dockerfile indicated by the spring.cloud.deployer.openshift.deployment.dockerfile deployment property. E.g. --properties "spring.cloud.deployer.openshift.deployment.dockerfile=/tmp/deployer/Dockerfile". The contents of this file will be used as the source input for the build.

    • The inline Dockerfile content as provided in the spring.cloud.deployer.openshift.deployment.dockerfile deployment property. E.g. --properties "spring.cloud.deployer.openshift.deployment.dockerfile=FROM java:8\n RUN wget …​"

    • The default Dockerfile provided by the OpenShift deployer.

Server Implementation

12. Server Properties

The Spring Data Flow Server for OpenShift is a specialisation of the Spring Cloud Data Flow Server for Kubernetes. Therefore, all properties supported by the Kubernetes Server are supported by the OpenShift server.

The spring.cloud.deployer.kubernetes prefix should be replaced with spring.cloud.deployer.openshift.

See Data Flow Server for Kubernetes reference documentation for supported properties.

12.1. OpenShift Specific Properties

The following properties are specific to the Data Flow OpenShift Server.

Name Usage Example Description

Force Build

spring.cloud.deployer.openshift.forceBuild=true

Ignore the build hashing feature when deploying streams and always trigger a new build for Maven based apps

Default Routing Subdomain

spring.cloud.deployer.openshift.defaultRoutingSubdomain=oscp.mydomain.com

Provide the routing subdomain used when building Route URL’s.

Default Image Tag

spring.cloud.deployer.openshift.defaultImageTag=latest

The default Docker image tag to be used when creating Build and DeploymentConfig resources

Application Properties

The following application properties are supported by the Data Flow Server for OpenShift. These properties are passed as application properties when defining streams or tasks. Below is an example of defining a stream:

dataflow:>stream create --name test --definition "time | custom --spring.cloud.deployer.openshift.build.git.uri=https://github.com/donovanmuller/timely-application-group.git | log"
Created new stream 'test'

Note the application property spring.cloud.deployer.openshift.build.git.uri=https://github.com/donovanmuller/timely-application-group.git.

13. Supported Application Properties

Name Usage Example Description

Build Git URI

spring.cloud.deployer.openshift.build.git.uri=https://github.com/donovanmuller/timely-application-group.git

The Git remote repository URI that will contain a Dockerfile in src/main/docker of that repository. See here

Git Branch Reference

spring.cloud.deployer.openshift.build.git.ref=master

The Git branch/reference for the repository specified by spring.cloud.deployer.openshift.build.git.uri. See here

Dockerfile Path

spring.cloud.deployer.openshift.build.git.dockerfile=src/main/docker

The location, relative to the project root of the Git repository, where the Dockerfile is located.

Git Repository Secret

spring.cloud.deployer.openshift.build.git.secret=github-secret

If the remote Git repository requires authentication, use this secret. See here

Deployment Properties

The following deployment properties are supported by the Data Flow Server for OpenShift. These properties are passed as deployment properties when deploying streams or tasks. Below is an example of deploying a stream definition:

dataflow:>stream create --name test --definition "time | custom | log"
Created new stream 'test'

dataflow:>stream deploy test --properties "app.custom.spring.cloud.deployer.openshift.defaultDockerfile=Dockerfile.nexus"
Deployment request has been sent for stream 'test'

Note the deployment property app.custom.spring.cloud.deployer.openshift.defaultDockerfile=Dockerfile.nexus.

14. Supported Deployment Properties

Name Usage Example Description

Force Build

spring.cloud.deployer.openshift.forceBuild=true

A flag (true/false) indicating whether to ignore the build hashing feature when deploying streams and always trigger a new build for Maven based apps

Servie Account

spring.cloud.deployer.openshift.deployment.service.account=scdf

OpenShift ServiceAccount that containers should run under

Docker Image Tag

spring.cloud.deployer.openshift.image.tag=latest

The Docker image tag for the Image Stream used when creating the Deployment

Inline Dockerfile

spring.cloud.deployer.openshift.deployment.dockerfile='FROM java:8\nRUN echo "Custom Dockerfile…​"'

An inline Dockerfile that will be used to build the Docker image. Only applicable to Maven resource apps

Node Selector

spring.cloud.deployer.openshift.deployment.nodeSelector=region: primary,role: streams

A comma separated list of node selectors (in the form name: value) which will determine where the app’s Pod’s get assigned

Default Provided Dockerfile

spring.cloud.deployer.openshift.defaultDockerfile=Dockerfile.nexus

Specify which default Dockerfile to use when building Docker images. There are currently two supported default Dockerfiles

Create Route

spring.cloud.deployer.openshift.createRoute=true

A flag (true/false) indicating whether a Route should be created for the app. Analogous to spring.cloud.deployer.kubernetes.createLoadBalancer

Route Host Name

spring.cloud.deployer.openshift.deployment.route.host=myapp.mycompany.com

Provide a Route Host value that will the created Route will expose as the URL to the app

Volume Mounts

spring.cloud.deployer.openshift.deployment.volumeMounts=[{name: 'testhostpath', mountPath: '/test/hostPath'}, {name: 'testpvc', mountPath: '/test/pvc'}, {name: 'testnfs', mountPath: '/test/nfs'}]

A list of kubernetes-model supported volume mounts. Specified as a JSON representation

Volumes

spring.cloud.deployer.openshift.deployment.volumes=[{name: testhostpath, hostPath: { path: '/test/override/hostPath' }}, {name: 'testpvc', persistentVolumeClaim: { claimName: 'testClaim', readOnly: 'true' }}, {name: 'testnfs', nfs: { server: '10.0.0.1:111', path: '/test/nfs' }}]

A list of kubernetes-model supported volumes. Specified as a JSON representation. Volumes must have corresponding volume mounts, otherwise they will be ignored

Labels

spring.cloud.deployer.openshift.deployment.labels=project=test,team=a-team

A comma separated list of labels (in the form name=value) that will be added to the app

Create Node Port

spring.cloud.deployer.openshift.createNodePort=true

Create a NodePort instead of a Route. Either "true" or a number at deployment time. The value "true" will choose a random port. If a number is given it must be in the range that is configured for the cluster (service-node-port-range, default is 30000-32767)

‘How-to’ guides

This section provides answers to some common ‘how do I do that…​’ type of questions that often arise when using Spring Cloud Data Flow.

15. Deploying Custom Stream App as a Maven Resource

This section walks you through deploying a simple Spring Cloud Stream based application, packaged as a Maven artifact, to OpenShift. The source code for this app is available in the following GitHub repository.

This guide assumes that you have gone through the Getting Started section and are using a local minishift instance of OpenShift. Adjust the steps accordingly if you are using an existing OpenShift cluster.

15.1. Deploy a Nexus Repository

For OpenShift to build the Docker image that will be deployed, it must be able to resolve and download the custom app’s Jar artifact. This means that the custom app must be deployed to an accessible Maven repository.

Assuming the local minishift OpenShift environment discussed in the Getting Started section, we will deploy a Nexus container to which we can deploy our custom application. Deploying the Nexus image is trivial thanks to an OpenShift template available here.

Make sure you have configured the scdf Service Account mentioned in the Getting Started section, as this account is used by the Nexus deployment.

Using the oc tool, upload the Nexus template with:

$ oc create -f https://raw.githubusercontent.com/donovanmuller/spring-cloud-dataflow-server-openshift/v1.1.0.RELEASE/src/etc/openshift/nexus-template.yaml
...

Once uploaded, open the OpenShift Console (you can use minishift console), authenticate and navigate to the scdf project. Then click Add to Project and select the nexus template:

Data Flow Server template

The default configurations should be sane. However, you must provide the The OpenShift Route host value. This value will depend on your minishift and OpenShift Router environment but should be similar to nexus-scdf.192.168.64.15.xip.io. Where -scdf.192.168.64.15.xip.io is the project name (scdf) and the default routing subdomain (192.168.64.15.xip.io).

Data Flow Server route configuration

Click Create when you’re happy with the configuration. Wait for the Nexus image to be pulled and the deployment to be successful. Once the Pod has been scaled successfully you should be able to access the Nexus UI by clicking on the Route URL (nexus-scdf.192.168.64.15.xip.io in this example).

The default credential for Nexus is admin/admin123 or deployment/deployment123

15.2. Configuring the Data Flow Server for OpenShift

We need to configure the Data Flow Server to use this new Nexus instance as a remote Maven repository. If you have an existing deployment from the Getting Started section you will have to change it’s configuration.

There are a few ways to do that but the way described here is to remove the existing deployment and use the existing Data Flow Server with ephemeral Datasources and Kafka binder template to deploy the updated configuration.

Remove the current environment using the oc tool (assuming you used the Kafka template):

$ oc delete all --selector=template=scdf-kafka
$ oc delete cm --selector=template=scdf-kafka
$ oc delete secret --selector=template=scdf-kafka

Next click Add to Project and select the spring-cloud-dataflow-server-openshift-ephemeral-kafka template. Use the following values for the Maven configuration items:

Configuration Parameter Value

Remote Maven repository name

nexus

Remote Maven repository URL

nexus-scdf.192.168.64.15.xip.io/content/groups/public (use your Route URL for Nexus here)

Remote Maven repository username

deployment

Remote Maven repository password

deployment123

Data Flow Server Maven configuration

Click Create and wait for the deployment to complete successfully.

15.3. Cloning and Deploying the App

Next step is to deploy our custom app into the Nexus instance. First though, we need to clone the custom app source.

$ git clone https://github.com/donovanmuller/timezone-processor-kafka.git
$ cd timezone-processor-kafka

Next we deploy the application into our Nexus repository with

$ ./mvnw -s .settings.xml deploy -Dnexus.url=http://nexus-scdf.192.168.64.15.xip.io/content/repositories/snapshots
...
Uploading: http://nexus-scdf.192.168.64.15.xip.io/content/repositories/snapshots/io/switchbit/timezone-processor-kafka/maven-metadata.xml
Uploaded: http://nexus-scdf.192.168.64.15.xip.io/content/repositories/snapshots/io/switchbit/timezone-processor-kafka/maven-metadata.xml (294 B at 6.0 KB/sec)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13.156 s
[INFO] Finished at: 2016-11-30T20:42:54+02:00
[INFO] Final Memory: 35M/302M
[INFO] ------------------------------------------------------------------------
Substitute the value for -Dnexus.url with the URL matching your Nexus instance.

15.4. Deploying the Stream

Now that our custom app is ready, let’s register it with the Data Flow Server. Using the Data Flow Shell, targeted to our OpenShift deployed instance, register the timezone app with:

dataflow:>app register --name timezone --type processor --uri maven://io.switchbit:timezone-processor-kafka:1.0-SNAPSHOT
Successfully registered application 'processor:timezone'

The assumption is that the out-of-the-box apps have been imported previously as part of the Getting Started section. If the apps are not imported, import them now with:

dataflow:>app import --uri http://bit.ly/stream-applications-kafka-docker

It does not really matter whether the Docker or Maven out-of-the-box apps are registered.

Now we can define a stream using our timezone processor with:

dataflow:>stream create --name timezoney --definition "time | timezone | log"
Created new stream 'timezoney'

and deploy it with:

dataflow:>stream deploy timezoney --properties "app.timezone.timezone=Africa/Johannesburg,app.timezone.spring.cloud.deployer.openshift.defaultDockerfile=Dockerfile.nexus"
Deployment request has been sent for stream 'timezoney'
We provide two deployment properties to the timezone app. The first is the required timezone to convert the input times. The second is to inform the Data Flow Server that it should use the provided default Dockerfile (Dockerfile.nexus) that supports Nexus repositories, instead of the default Docker.artifactory.

You should see a build being triggered for the timezone app, which will download the timezone-processor-kafka Maven artifact from the Nexus repository and build the Docker image. Once the build is successful, the app will be deployed alongside the other apps.

timezoney stream deployed

View both the timezoney-timezone-0 and timezoney-log-0 apps for the expected log outputs.

timezoney stream logs
timezoney stream logs

Once you’re done, destroy the stream with:

dataflow:>stream destroy timezoney
Destroyed stream 'timezoney'