Quantcast
Channel: Pivotal Engineering Journal
Viewing all 219 articles
Browse latest View live

Greenplum for Kubernetes Operator

$
0
0

Greenplum for Kubernetes

Greenplum is an MPP (Massively Parallel Postgres) Database coordinating multiple postgres instances to support distributed transactions and data storage. Greenplum is well known as Online Analytics Platform (OLAP) database. This blog discusses provisioning, deploying, managing and tearing down Greenplum cluster at a large scale.

Design Rationale

The greatest leaps Greenplum for Kubernetes team taken since its inception are: - containerizing the Greenplum, and - leveraging Kubernetes to manage these containers in any cloud

Leveraging Kubernetes to deploy Greenplum has taken few revisions to make it more cloud native and tolerant to real time errors. Greenplum for Kubernetes initially leveraged bare pods (one pod per segment) with anti affinity rules to deploy each pod individually on each node with the help of bash scripts. However, this setup is not robust enough to handle container terminations.

When the team decided to move with statefulsets to handle some of those issues; then team created a design to use at least 3 statefulsets: master, primary segments and mirror segments. Also, Statefulsets are able to maintain compute and storage relationship. More about statefulsets: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/

From users’ point of view:

  • Why should user care about creating 3 statefulsets and managing them by themselves?
  • Also, this is not a pure declarative style deployment of Greenplum

That provoked our thoughts and use “Operator pattern” which was initially from CoreOS. This can take away significant manual work and makes it easy to manage Greenplum cluster that users create. That’s how Operator for Greenplum for Kubernetes evolved.

Greenplum for Kubernetes Operator

Workflow

Steps: 2-4 and 6-9 are behind the scenes steps

  1. Kubernetes Admin requests a greenplum-operator deployment. This is a one-time deployment for a given Kubernetes cluster with appropriate permissions to be able to create Greenplum cluster in any namespace.
  2. Greenplum operator pod is created after the request is processed
  3. Operator pod registers Custom Resource Definition for Greenplum to Kubernetes Key Value store etcd
  4. Operator starts the controller to handle requests that are requesting custom Resource kind GreenplumCluster
  5. Greenplum User creates a manifest file fulfilling requirements and submit to kubernetes
  6. Kubernetes verifies the definition with etcd. Returns error if Custom Resource definition is not found.
  7. The informers in Kubernetes will save this request to a queue in custom controller created by operator
  8. Controller processes queue one by one and requests scheduler to either create , update or delete a given Greenplum deployment.
  9. Scheduler will process the request and creates statefulsets and oversees the whole pod creation process and their placement depending on the resource requirements specified in manifest file.

Operator Advantages

  • Operators perform CRUD operations on the apps deployed in Kubernetes. The following benefits help us using Kubernetes app with an operator.
  • User only require to convert requirements to manifest file and submit to Kubernetes with kubectl command
  • Operator can validate user input and notifies with proper error message. This enables users to experience declarative style deployment.
  • Operators can update the status of the app during & after its deployment
  • Operator is one time creation for a given cluster
  • Updates are handled appropriately when user request an update for the existing deployment
  • Operators can check for duplicate deployments
  • Operators enable respective apps to be treated as Kubernetes native resources.

References

  1. https://coreos.com/operators/
  2. http://greenplum-kubernetes.docs.pivotal.io

Troubleshooting Obscure OpenSSH Failures

$
0
0

Abstract

By using tcpdump to troubleshoot an elusive error, we uncovered a man-in-the-middle (MITM) ssh proxy installed by our information security (InfoSec) team to harden/protect a set of machines which were accessible from the internet. The ssh proxy in question was Palo Alto Network’s (PAN) Layer 7 (i.e. it worked on any port, not solely ssh’s port 22) proxy, and was discovered when we observed a failure to negotiate ciphers during the ssh key exchange.

The Problem

In our team’s Concourse CIpipelines, we create new PCF Pivotal Cloud Foundry environments, subject them to a rigorous battery of tests, and then destroy them. Among our tests is theContainer Networking Acceptance test suite (NATS or CNATS—not to be confused with the NATS messaging bus), which runs many cf ssh commands to test app-to-app connectivity.

The error was elusive, but inconvenient — it would cause an entire test suite to fail. Our only clue was a cryptic ssh failure:

Error opening SSH connection: ssh: handshake failed: EOF

Let’s be clear: we’re not using OpenSSH in our tests. Sure, we’re using the SSH protocol as implemented by the Golang library, but we’re not using the command line tool which so many of us know and love. In other words, we type cf ssh instead of ssh.

The purpose of this specialized implementation of the OpenSSH protocol is to allow users of our Pivotal Application Service (PAS) software to connect to their application, typically to debug.

Once again, though, it’s not quite OpenSSH. For one thing, our server-side binds to port 2222, not sshd’s 22. Also, it’s written in Golang, not C (both the client and the server).

Defining the problem

The problem wasn’t consistent. In fact, over the course of a 20-minute test run, it would only appear once.

It didn’t appear everywhere—one of our environments, maintained in San Francisco, seemed immune to the problem. In fact, the problem reared its ugly head only in our San Jose environments.

And, strangest of all, the problem only occurred on the first connection attempt. The first time cf ssh was run, it would fail, but subsequent attempts succeeded.

We attempted connecting from workstations in Palo Alto, San Francisco, and Santa Monica. The behavior remained consistent: the first attempt would fail, and the remaining would succeed.

We tried using ssh as a client instead of cf ssh. Same behavior: first would fail, remainder succeed.

We tried bringing up sshd as a server. The results surprised us: no failures. Not one. Our ssh-proxy failed, but sshd didn’t — what was going on?

We knew it was time for tcpdump. If we were going to get any further, we needed to examine the raw packets.

Using tcpdump on our Server

We ran tcpdump on our server (the “Diego Brain”) to determine what was happening during failed cf ssh connections. We discovered that, from the Diego Brain’s perspective, the user was shutting down the connection (by sending a FIN packet).

From the standpoint of the Diego brain (192.168.2.6), the user (10.80.130.32) terminates (FIN) the session immediately after key exchange negotiation

From the standpoint of the Diego brain (192.168.2.6), the user (10.80.130.32) terminates (FIN) the session immediately after key exchange negotiation

We dug deeper — was there anything happening in the key exchange that caused the connection to shut down?

Yes, there was something happening: the client and the Diego Brain could not agree on a common set of ciphers.

These were the ciphers offered by the Diego Brain. Note that these ciphers are the ones included in Golang’s ssh package:

  • “curve25519-sha256@libssh.org”
  • “ecdh-sha2-nistp256”
  • “ecdh-sha2-nistp384”
  • “ecdh-sha2-nistp521”
  • “diffie-hellman-group14-sha1”
  • “diffie-hellman-group1-sha1”

These were the ciphers offered by the client:

  • “diffie-hellman-group-exchange-sha256”
  • “diffie-hellman-group-exchange-sha1”

We believe that the client shut down the connection because it could not agree on a common cipher for key exchange. But the client and server were both written in Golang, so their cipher suites should be identical. In fact, both Diffie Hellman group exchange ciphers are explicitly considered to be legacy protocols by the Golang maintainers. Why was the client’s cipher suite different, and why did it include legacy protocols?

At this point we also noticed that the SSH protocol was unexpected: it wasSSH-2.0-PaloAltoNetworks_0.2. We decided to trace the packets from the client.

Using tcpdump on our Client

We ran tcpdump on our client, and attempted to connect (via ssh, not our custom client, not cf ssh) to our Diego Brain. We found the unexpected SSH protocol again, SSH-2.0-PaloAltoNetworks_0.2, but this time it was our Diego Brain presenting it:

A packet trace of an two attempted connections to port 2222; the first failed, and the second succeeded.

A packet trace of an two attempted connections to port 2222; the first failed, and the second succeeded.

But the SSH protocol SSH-2.0-PaloAltoNetworks_0.2 was only presented when the connection subsequently failed. In the diagram above, we can see that the ostensible Diego Brain shut down the connection by sending a FIN packet (packet 19) to our client.

IOPS to the Rescue

We contacted IOPS, the Pivotal organization which maintains the network, who explained that the firewall is configured to intercept and proxy all ssh connections originating from or terminating at the San Jose datacenter in order to prevent ssh tunnel attacks, since the San Jose environments are accessible from the internet.

Our Conclusions

Our networking model was wrong:

Unbeknownst to us, the Palo Alto Networks firewall was intercepting our ssh traffic.

Unbeknownst to us, the Palo Alto Networks firewall was intercepting our ssh traffic.

We concluded that our cf ssh connection actually works this way:

  • Our firewall attempts to proxy all ssh connections to San Jose.
  • When it attempts to contact the backend, it realizes it doesn’t have a common cipher suite for key exchange, and can’t establish a connection
  • When it can’t establish a connection with the server, it sends a FIN to the client (EOF)
  • It proceeds to whitelist the client-IP, server-IP, server-port tuple for a period of time (we think 15-20 minutes). It does not attempt to proxy during that time
  • It will attempt to proxy new client connections from different IP addresses during that time

Our final resolution to this issue was a workaround wherein each test suite that runs cf ssh, we “prime the pump” by running a cf ssh command, which we expect to fail, before running the test suite.

Stateful Apps in Kubernetes

$
0
0

Kubernetes is available across all public clouds nowadays, including Pivotal’s own PKS, which runs in the cloud and can also be run “on prem”, on the premises of an enterprise. Kubernetes promises to run any workload, including stateless and stateful applications.

A typical distinction between two types of Kubernetes apps–between stateless and stateful apps–is based on the manner in which data within the app is saved. However, for this discussion of Kubernetes apps, let’s adopt a definition of stateless/stateful that classifies apps by the resilience of their service. Services provided by stateless apps can be easily managed with a pool of containers that, together, deliver a redundant, resilient service. Removing a portion of the containers will not interrupt the ongoing delivery of the stateless app’s service.

Defining a Stateless App by Its Resilience

Defining a Stateless App by Its Resilience

Within this discussion, stateful apps are those apps that require context such that the service provided by the stateful app cannot be continuously sustained when an arbitrary container is removed. Redundancy of certain key containers is not possible.

Defining a Stateful App by Its Susceptibility to Failure

Defining a Stateful App by Its Susceptibility to Failure

One example of a stateful app is a database that stores its data on a particular volume, where that volume cannot be shared for some reason. Another example might be a legacy app that, while capable of being refactored to be stateless in the future, is undergoing a “lift and shift” port into Kubernetes before it is refactored to be stateless.

The vast majority of Kubernetes literature is written for stateless apps. Stateless apps are in the “sweet spot” for Kubernetes operational expectations and its optimizations. Meanwhile, designing and building an app that keeps state can be a challenge, with many fewer examples in the literature.

What are some of the most important considerations for developing and running stateful apps in Kubernetes? What expectations of Kubernetes operational patterns must be modified?

Subsequent posts in this series of blogs about Stateful Apps in Kubernetes focus on topics that may require special attention for stateful apps:

Provisioning Stateful Kubernetes Containers that Work Hard and Stay Alive

$
0
0

(This blog is the second installment of a four-part series)

By default, all containers are free from limits but subject to eviction

By default, Kubernetes places very few limits on a container. A default, “best effort” container can take as many resources as it needs until the Kubernetes system decides that the container should be evicted, typically because system memory has become scarce.

Each container runs within a virtual machine which is called a “node”. By sharing all the resources of a node, best-effort containers have several advantages:

  • Containers can sustain a burst of service, expanding their resource usage as needed, particularly for short-term spikes
  • Resources on the node are not reserved for idle containers
  • Containers that have gotten into trouble with unexpected resource problems, such as infinite loops or memory leaks, will be automatically evicted
  • Containers that have been evicted may be automatically restarted (if they have a restartable specification and resources permit)

This resolution of resource contention implies that the eviction of a given container is no big deal. This assumption is generally true for a stateless app, in which many containers may share the service load. However, for stateful apps, container eviction could cause disruption of service.

How containers are monitored, and where to find evidence of evictions

There are at least two concurrent systems for monitoring the resource usage of containers in Kubernetes. One is the Kubernetes kubelet monitor which observes whether containers are exceeding their stated limits. Another is the Linux Out-of-memory (OOM) killer which runs on each node, watching the RAM available on that node. Fundamentally, these monitors are protecting the liveliness of the node and its Kubernetes system functions.

After evicting a container, the Kubelet monitor generally provides info about the eviction within the status of a cluster, such as when queried with kubectl get pods. For example, restartable containers will restart after eviction and will have a non-zero number in the column “Restarts”. Containers that cannot be restarted after eviction by kubelet are left in an error state in the column “Status”.

To see eviction in action, see a sample from Kubernetes documentation that demonstrates a simple container that gets evicted. This eviction results from the sample container exceeding its stated memory limits. Importantly, this sample overage does not threaten the system containers on the node–there is still plenty of memory on the node to run Kubernetes. Thus, in this sample, the kubelet monitor has sufficient memory and CPU to do its eviction responsibility; the node itself is not under pressure.

The Linux OOM Killer, on the other hand, gets invoked when the node itself is starved for memory. In that situation, Kubernetes system containers, which control Kubernetes system logging, may be threatened and may not function as designed. Thus, there may not be much evidence of the Linux OOM killer in Kubernetes status reports. Evictions by the OOM tool may only show up in the syslog of a given node. To access that information, use the platform tools (gcloud on GKE, bosh on PKS) to obtain a shell on the node.

Setting limits on containers helps increase their priority, but does not prevent eviction

Kubernetes literature emphasizes how a developer can set a container’s requested resources and its limits inside the pod definition. These limits are not maintained in a benevolent way. Resource limits help the optimistic scheduling of containers onto nodes, but the limits also serve as explicit thresholds for eviction:

A Container can exceed its memory request if the Node has memory available. But a Container is not allowed to use more than its memory limit. If a Container allocates more memory than its limit, the Container becomes a candidate for termination.

Kubernetes has an algorithm for deciding the order in which containers will be evicted when the node becomes threatened with CPU/RAM starvation. This algorithm favors a “guaranteed” container, which means all resources have a stated “limit” amount that equals the “request” amount, versus a “burstable” container with a request that is lower than its limit. The order of eviction is:

  • best-effort
  • burstable
  • guaranteed

In order to be the last to be evicted, a container must state limits for CPU and memory, and either state requests equal to those limits, or state no request so that the request is set to the limit automatically. These requirements then qualify the container as a “guaranteed” container, the last to be evicted for resource issues (assuming that the container itself is well-behaved, staying inside those limits).

Noisy neighbors, controlling the neighborhood

How can stateful containers be protected against eviction when neighbors are using lots of resources?

First, consider the neighborhood, consisting of all the containers that can be scheduled on a worker node. The Kubernetes kubelet bases its decisions about eviction based on the metrics at the node level. Thus, containers on a node are at the mercy of “noisy” neighbors. If a stateful container must stay alive, it is best to deploy it into a predictably quiet neighborhood. One simple way to do this is to have no or few neighbors, to run a stateful app on a dedicated cluster.

The idea of a dedicated cluster seems to go against one of the design goals of Kubernetes, which is to optimize resource usage. Kubernetes was designed to share node resources optimally. A typical Kubernetes system stretches to handle spikes in a particular app while, given sufficient container redundancy on multiple nodes, other apps are not harmed in terms of service availability.

Stateful apps need different priorities. When operating a stateful app, resources may need to be allocated more statically in order to preserve the stateful app’s service availability.

In addition to considering a heterogeneous neighborhood, consider the stateful app’s containers as neighbors themselves. How can one of the app’s containers be protected against the others within the same app, that might take up its resources? One way is to dedicate a node per container, or at least carefully determine a topology where multiple containers can run within the same node. The topology of one node per container is simple to describe and may help with provisioning for high availability.

In summary, for the easiest configuration to assure a stateful app’s performance and protection, dedicate the Kubernetes cluster to the stateful app (“single-tenant”), and dedicate a node per single stateful container. That way, each node can be tuned for the single purpose of running the stateful app. More complex topologies are possible. Start as simply as possible.

Node capacity

To determine the largest amount of resources available to any pod on a node, two key metrics are available: a node’s “capacity” and its “allocatable” attribute, which can be requested as:

kubectl get nodes -o json

which will result in following sample output (cropped to show just the attributes in question), querying a Minikube instance that was given 4Gb to start with:

..."allocatable": {"cpu": "4","ephemeral-storage": "15564179840","hugepages-2Mi": "0","memory": "3934296Ki","pods": "110"
},"capacity": {"cpu": "4","ephemeral-storage": "16888216Ki","hugepages-2Mi": "0","memory": "4036696Ki","pods": "110"
},

...

In this output, “capacity” reflects the entire allotment given to Minikube, while “allocatable” is a calculation after system pods are running. In Kubernetes 1.12, the pod scheduler protects the system’s resources (i.e., the non-allocatable portion).

The following bash commands can parse this memory capacity on a node:

mem_cap_string=$(kubectl get nodes -o jsonpath='{.items[0].status.capacity.memory}')
mem_cap_int=$(echo ${mem_cap_string} | sed 's/[^0-9]*//g')
mem_units=$(echo ${mem_cap_string} | sed 's/[^A-z]*//g')

This “mem_cap_int” value can be used as an upper bound for total container resources within the node.

Limits managed within the app itself

Even with the recommendations above, including a dedicated cluster, dedicated nodes, and LimitRanges that guarantee eviction last, a container can be evicted because of resource usage beyond stated limits. How can a stateful app assure that its containers are “well behaved” and stay within their limits?

An app must manage its own resource usage internally, particularly with regard to the measure that commonly causes eviction: memory.

cgroups to enforce limits: best choice for CPU

One tool for an app to self-limit its resource usage is to enforce limits using Linux cgroups. cgroups are part of the Linux kernel. A cgroup setting can kill processes that go over, for example, memory limits, or can throttle processes to a maximum CPU usage. If an app manages cgroups for all the child processes it creates, the app has a chance to successfully stay within the limits of the app’s declared resource usage.

However, cgroups are not gentle for memory or disk space. Any child process that goes over a memory or disk space quota will be terminated by the cgroup without graceful notification. So cgroups for memory management require the app to forgive traumatic termination of child processes.

cgroups are extremely useful for CPU throttling of child processes because this involves a simple manipulation of CPU allocation, not a traumatic termination of the process. According to dynamic business conditions, an app could use CPU cgroups to dial up and down the CPU usage of existing child processes.

Getting access to cgroups is a challenge for containers, one that includes, in Kubernetes 1.10, a remapping of the node’s /sys/fs/cgroup file system and various additional configurations.

Additional strategies for managing resources within an app

Having established the overall picture of resources, in particular how to maximize an app’s usage of the capacity available within a node, the responsibility of resource management falls upon the app itself, to be well-behaved within its guaranteed limits. Assuming the app has an architecture wherein a parent process spawns child processes for each request, that app’s responsibilities include:

  • determining the percentage of memory and disk resources that are allocated to the parent process
  • ensuring that the parent process does not exceed these limits
  • determining the number of child processes that can run in parallel, within the resources remaining after the parent process
  • ensuring that each child process stays within some limit
  • denying or blocking on requests for new child processes while the maximum number of parallel child processes are running

That list of responsibilities starts to sound like a resource management framework, of which Kubernetes itself is one. This points up the obvious: if any stateful app can be re-architected to become stateless, resource management can be outsourced to Kubernetes. Otherwise, a stateful app must devote a significant amount of logic to managing resource allocation.

Stateful Apps, a 4-part series

Storing Stateful Data that Outlives a Container or a Cluster; Optimizing for Local Volumes

$
0
0

(This blog is the third installment of a four-part series)

Kubernetes can automatically provision “remote persistent” volumes with random names

Several types of storage volumes have built-in Kubernetes storage classes that enable provisioning volumes in a dynamic fashion, creating remote persistent volumes as necessary when a container is spun up for the first time. This provisioning of storage is useful for a scenario where the cluster lifetime is definitive, such as within a development cluster. In such a development environment, containers can come and go within the cluster, and any re-created container will remount any persistent disk that was previously created for the same container, as long as the cluster lives. The dynamically-generated volumes have names generated by the underlying storage class, typically a random string.

Storage that is automatically provisioned is also deleted by Kubernetes, in general, when the corresponding PersistentVolume object is deleted, such as when the cluster is deleted. In other words, the default “Reclaim Policy” of typical stateless containers are set to instruct Kubernetes to delete the volume when finished. This can be changed in a storage specification.

Even when retained, these volumes would be hard to track and remount in a new cluster because their names are typically generated as a random string.

StatefulSet offers a predictable, automated naming pattern with a default “retain” policy

StatefulSets offer a feature with their attribute “VolumeClaimTemplate” that controls the name of any generated volume. Combined with their attribute for “persistentVolumeReclaimPolicy”, which defaults to “Retain”, StatefulSets can easily generate volumes with well-defined names that persists past the lifetime of the initial set.

Reaching outside of Kubernetes to create volumes

In production, a typical deployment strategy requires that storage for long-lived data continues to persist no matter what happens with any cluster using that data (assuming no process explicitly deletes the data). It is best to provision with names that provide easy tracking and content identification. This can be done within a given IaaS platform and provided to the Kubernetes cluster, to be mounted by name.

For example, in Google Cloud Platform, a command like

gcloud compute disks create --size=20GB my-sample-volume-for-content-xyz

will create a volume with a given name. This volume will continue to exist for as long as the GCE account specifies. One way to access this volume within Kubernetes is to refer to the volume by name, such as with a pod yaml like:

apiVersion: v1
kind: Pod
metadata:
 name: my-host
 labels:
   app: my-app
spec:
 hostname: my-host
 containers:
 - name: gpdb
   image: gcr.io/my-project/my-image
   env:
 volumes:
 - name: pgdata
   persistentVolumeClaim:
     claimName: my-claim-gce
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
 name: my-claim-gce
 labels:
   app: my-app
spec:
 accessModes:
   - ReadWriteOnce
 storageClassName: "" # the storageClassName has to be specified but can be empty
 resources:
   requests:
     storage: 10Gi
 selector:
   matchLabels:
     app: my-app
---
apiVersion: v1
kind: PersistentVolume
metadata:
 name: my-host-pv
 labels:
   app: my-app 
spec:
 capacity:
   storage: 10Gi
 accessModes:
   - ReadWriteOnce
 gcePersistentDisk:
   pdName: my-sample-volume-for-content-xyz # this is the linkage with a pre-created volume

Local Persistent Volumes may offer performance gains, at the cost of complexity

Kubernetes 1.10 added, as a beta feature, access toLocal Persistent Volumes. Particularly in “raw block” mode, local persistent volumes imply a significant performance gain, but at the cost of deployment challenges. If a stateful app’s performance depends on storage throughput, this tradeoff may be worthy of investigation. For example,Salesforce has described their preference for local persistent volumes.

Local persistent volumes are, by definition, local to the nodes on which they have been physically attached, and cannot “travel” to another node in a manner like a network-mediated “remote persistent” volume. Therefore, when stateful data is present on a local persistent volume, a stateful app must manage to recreate the app with containers landing on the same node that already has the data. This is much more complex and much less flexible than having a standard, remote persistent volume where any node can generally mount any remote volume.

Rescheduling containers onto the nodes where their data already resides

Kubernetes has some automatic affinity when replacing a container into an existing deployment. Remote Persistent Volumes that were mounted when a container was initially launched will generally be matched and remounted to a container that is recreated while the original deployment is still in effect.

However, when a wholesale change happens, such as when a Kubernetes cluster is wiped and a new one is recreated to house an app, how can that app find any existing data, particularly in light of local volumes that cannot be moved with the benefit of networking as are remote volumes?

One strategy is to use DaemonSets to investigate all nodes and attach labels that will help Kubernetes assign containers to an appropriate location.

In other words, the steps include:

  • A daemon runs on each node, perhaps as a privileged container, investigating any storage found (particularly local), initializing and validating as necessary, and finally labeling the node appropriately
  • The stateful app’s orchestration (e.g., an operator) adds selectors to container specifications to ensure each stateful container will be scheduled on a node that matches its storage expectation

This kind of deployment might fail if there is a gap in the storage, such as a local volume gone missing. At such times, manual intervention may be necessary.

Stateful Apps, a 4-part series

Managing Stateful Apps with the Operator Pattern; Orchestration Considerations

$
0
0

(This blog is the fourth installment of a four-part series)

The Operator Pattern

The Operator Pattern stipulates a process that is registered with the Kubernetes system layer, listening to Kubernetes system events with the responsibility to manage a particular set of resources. All the logic that initializes and maintains a service is thereby encapsulated into a Kubernetes deployment, which we’ll call the “Operator” hereafter.

This pattern aligns with many of the requirements for stateful apps, so the Operator Pattern is a popular implementation choice. An Operator is one way to orchestrate all the constituent pieces of a stateful app into a holistic service. An Operator can be called an orchestrator.

Recreating failed containers

When a constituent container is missing, Kubernetes may automatically attempt to recreate the container, depending on several factors. For example, theStatefulSet resource typically specifies that Kubernetes will recreate any missing member of such a set. Alternatively, an orchestrator can monitor a deployment of pods to maintain the required set of containers.

Options for automation and self-healing

Given a constituent container has failed and been recreated automatically by Kubernetes, consider its effect on a stateful app. The service may also have failed as a result of the failed constituent. To the extent that the app guarantees High Availability for the given failure case, automated recovery of the service is expected. The orchestrator must recognize the need to reintegrate the recreated container(s) into the overall service. (The very wide range of potential error cases presents a challenge to flawlessly recover without manual intervention.)

Liveness and readiness probes offer orchestration options; DNS entries contingent on readiness

Given the need for a stateful app to orchestrate its components into a holistic service, a typical design for an orchestrator involves signalling to and from constituent containers to help move them into desired states.

Kubernetes offers health probes that can be used for decentralized signalling, including “liveness” and “readiness” health probes.

First, basic Kubernetes definitions: a failed liveness probe will cause the container to be killed. A failed readiness probe will cause a container to stop handling requests for its declared service.

Passing the readiness probe means the Kubernetes adds a DNS entry for the passing container, and vice-versa: failing the probe will omit or remove the DNS entry. Importantly, a “live-but-not-ready” container is still part of the network, but not addressable via DNS.

One mechanism for decentralized discovery of constituent containers is for these containers to bring themselves up into a live-but-not-ready state, and announce themselves via some discovery mechanism. For example, upon creation, a container could add its IP address to a config map or other shared discovery location. Meanwhile, an orchestrator can monitor changes to a config map with standard file-status monitoring tools. When all the containers that will constitute a service become available in a wait state, an orchestrator can manage initialization of the service.

Operator Discovery Cycle

Operator Discovery Cycle

Starting with fewer features

Many stateful apps require complex orchestration to assemble various pieces into a single service. On “Day 1”, this orchestration must stand up and integrate all the pieces that constitute a service. On “Day 2”, this orchestration must upgrade, repair, and resize all the pieces that constitute the (already-started) service.

When developing a stateless app from scratch, consider putting off day-2 tasks in favor of starting small. Given manual intervention by administrators to help manage day-2 operations, a development team can postpone building those features to focus on making Day 1 work smoothly. For example, consider a typical day-2 task of upgrading, wherein an app release is upgraded. For a stateful app, an upgrade could easily mean a service outage. When users are given a choice between automated upgrades at unscheduled times, and manual upgrades with scheduling, users may choose to schedule. In that case, building out automated upgrading may be deprioritized.

Stateful Apps, a 4-part series

Transferring Time-based One-time Passwords to a New Smartphone

$
0
0

Abstract

Smartphone authenticator apps such as Google Authenticator and Authy implement software tokens that are “two-step verification services using the Time-based One-time Password Algorithm (TOTP) and HMAC-based One-time Password algorithm (HOTP)”

Smartphone TOTP, a form of Two-factor authentication (2FA), display a 6-digit code derived from a shared secret, updating every thirty seconds.

The shared secret is presented only once to the user, typically in with a QR (Quick Response) Code which is scanned in by the authenticator app.

By using a simple QR app (not an authenticator app) to read in the shared secret, and storing the shared secret in a secure manner, one can easily recover the state of the authenticator app on a replacement phone.

Google Authenticator Screenshot

The Google Authenticator app running on an Android phone displaying the TOTP codes for several services, including Okta (shown), GitHub, and LastPass

This procedure is designed for an Android phone and a macOS workstation, but can be adapted to an iOS phone or Linux workstation, and, with some work, to a Windows workstation.

Procedure

0. Scan in the QR URL

When scanning in a new TOTP code, rather than bringing up Google Authenticator, we use Android Camera’s builtin QR Code reader (we’re not familiar with iOS/iPhones, but we assume there is an equivalent feature):

Scanning in a TOTP QR code

The Android Camera has a QR code reader mini-app. The launch button (see arrow) displays when the camera recognizes a QR code

The gentle reader should rest assured that all secrets in this blog post are fakes and that we would not deliberately leak our TOTP secrets in such an indiscreet manner.

Once in the QR mini-app, we copy the link to the clipboard by pressing the“duplicate” icon. A typical link would be similar to the following (an example TOTP from Slack):

otpauth://totp/Slack (Cloud Foundry):bcunnie@pivotal.io?secret=CBL5RAL4MSCFFKMX&issuer=Slack

Note that the link has a scheme of“otpauth”, an authority of “totp”, and the secret (key) is presented as a key-value pair query component (“secret=CBL5RAL4MSCFFKMX”). For those interested in more detail, the Google Authenticator GitHub Wiki is an excellent resource (where you will discover, among other things, that the key is Base32-encoded).

Copying the TOTP URL

We copy the link to the clipboard by pressing the "duplicate" icon

1. Copy the URL to a password manager

We copy the URL to a password manager. In our case we use LastPass [LastPass] , but we believe any password manager will do.

We are interested in alternative secure storage mechanisms (e.g. Vault, 1Password) for the secrets. For those of you so inclined, pull requests describing alternatives are welcome.

We copy the URL to a “secure note”, one line per URL. We name the secure notetotp.txt.

This is what our secure note looks like (the keys have been changed to protect the innocent):

otpauth://totp/Okta:bcunnie@pivotal.io?secret=ILOVEMYDOG
otpauth://totp/GitHub:brian.cunnie@gmail.com?secret=mycatisgreat2
otpauth://totp/LastPass:bcunnie@pivotal.io?secret=LETSNOTFORGETMYWIFE
otpauth://totp/LastPass:brian.cunnie@gmail.com?secret=ormylovelychildren
otpauth://totp/AWS:bcunnie@pivotal.io?secret=SOMETIMESIFORGETMYCHILDRENSNAMES
otpauth://totp/AWS:brian.cunnie@gmail.com?secret=theyrealwaysgettingintotrouble
otpauth://totp/Google:brian.cunnie@gmail.com?secret=ILETMYWIFEDEALWITHIT
otpauth://totp/Pivotal%20VPN:bcunnie@pivotal.io?secret=computersaremucheasiertohandlethankids
otpauth://totp/Coinbase:brian.cunnie@gmail.com?secret=SOMETIMESIHIDEINMYOFFICE
otpauth://totp/Joker:brian.cunnie@gmail.com?secret=buttheyopenthedoortoseehwatImdoing
otpauth://totp/Discord:brian.cunnie@gmail.com?secret=THEYGETBOREDPRETTYQUICKLY
otpauth://totp/namecheap:brian.cunnie@gmail.com?secret=soIplayminecraftwiththem

2. Display the QR code to your terminal

We make sure we have a utility which displays QR codes to our terminal; we have found qrencode quite adequate, and on macOS it’s installed as easily as brew install qrencode (assuming the homebrew package manager is already installed).

We use a three-line shell script,totp.sh to display the QR codes to our terminal. Our invocation uses the LastPass CLI to display our TOTP secrets and pipe it to our shell script:

lpass show --note totp.txt | totp.sh

A parade of QR codes scrolls on our terminal, and we use our authenticator app to scan them in. We have been able to scan as many as 12 different QR codes in under a minute!

We recommend adding the QR code on your terminal, not on the site’s web page, to the authenticator to ensure that the URL (and secret) have been correctly copied.

TOTP Alternatives

SMS 2FA

SMS 2FA transparently migrates to new phones (as long as the phone number doesn’t change), but has been faulted for being vulnerable to Signaling System 7 (SS7) attacks. [0] [1] [2]

U2F 2FA

“Universal 2nd Factor (U2F) is an open authentication standard that strengthens and simplifies two-factor authentication (2FA) using specialized Universal Serial Bus (USB) or near-field communication (NFC) devices”

U2F’s advantage is that its secret is never shared (it never leaves the key), so the secret itself is difficult to compromise. The downside is that the secret is stored in a physical key, so if the key is lost or broken, the 2FA must be reset. Also, adoption of U2F is not as widespread as TOTP: Slack, for example, offers TOTP 2FA but not U2F 2FA as of this writing.

Further Reading

CNET describes a procedure which doesn’t require storing the secrets but does require visiting each TOTP-enabled site to scan in the a new QR code. It also requires the old phone (it’s not much help if you lose your phone).

Protectimus suggests saving screenshots of your secret keys, a simple solution for the non-technical user. They also describe a very interesting mechanism to extract the keys from a rooted phone usingadb and SQLite, a technique which may be useful for users who already have a rich set of TOTP sites but have not stored the URLs in a password manager.

Footnotes

[LastPass] The security-minded reader might ask, “Wait, you’re storing your TOTP secrets in LastPass, but isn’t that also where you’re storing your passwords? Isn’t that a poor choice — to store both your secrets and passwords in the same place?”

To which we reply, “Yes, it is often poor choice to store both your secrets and passwords in the same place, but never fear — we don’t store our passwords in LastPass. Yes, we are aware that the intent of LastPass is to store passwords, but that’s not what we use it for. Instead, we store our passwords in a blowfish2-encrypted flat file in a private repo. We use LastPass for storing items that are sensitive but not passwords (e.g. TOTP keys).”

Using Greenplum to access Minio distributed object storage server

$
0
0

Pivotal Greenplum Database® (GPDB) is an advanced, fully featured, open source data warehouse. GPDB provides powerful and rapid analytics on petabyte scale data volumes. Greenplum 5.17.0 brings support to access highly-scalable cloud object storage systems such as Amazon S3, Azure Data Lake, Azure Blob Storage, and Google Cloud Storage.

Minio is a high performance distributed object storage server, designed for large-scale private cloud infrastructure. Since Minio supports S3 protocol, GPDB can also access Minio server that is deployed on-premise or cloud. One of the advantages of using Minio is pluggable storage backend that supports DAS, JBODs, external storage backends such as NAS, Google Cloud Storage and as well as Azure Blob Storage.

In this post, you will learn to setup Greenplum with Minio in 10 minutes.

Use cases

Storing cold data

Enterprises are leveraging external storage systems to store cold data such as historical sales data, old transaction data, and so on. Data that can be effectively stored on external storage systems such as Minio distributed object storage. Whenever Greenplum customers want to run analytics workloads on such datasets, customers can leverage PXF to dynamically load data from Minio into their Greenplum cluster. Since Minio provides virtual storage for Kubernetes, local drive, NAS, Azure, GCP, Cloud Foundry and DC/OS, this use cases enable import / export operations to those virtual storage systems.

Sharing data with external systems

Typically, enterprises have needs to share data with multiple RDBMS and systems across the organization. One of the data sharing patterns is to store the data in an distributed object storage system such as Minio. Greenplum users export existing data into Minio so other applications can access the shared data from Minio.

How to configure Minio in Greenplum

You can configure GPDB to access external tables such as Minio, S3 and any S3 compatible object storage including Dell EMC Elastic Cloud Storage(ECS).

  1. Login as gpadmin.

    $ su - gpadmin
  2. Create a PXF Server Configuration.

    $ mkdir -p $PXF_CONF/servers/minio

    *Note: A PXF server configuration in $PXF_CONF/servers is analogous to Foreign Data Wrapper Servers where each server represents a distinct remote system you want to connect to.

  3. Copy the provided minio template into the server.

    $ cp $PXF_CONF/templates/minio-site.xml $PXF_CONF/servers/minio
    $ cat $PXF_CONF/servers/minio/minio-site.xml<?xml version="1.0" encoding="UTF-8"?><configuration><property><name>fs.s3a.endpoint</name><value>YOUR_MINIO_URL</value></property><property><name>fs.s3a.access.key</name><value>YOUR_AWS_ACCESS_KEY_ID</value></property><property><name>fs.s3a.secret.key</name><value>YOUR_AWS_SECRET_ACCESS_KEY</value></property><property><name>fs.s3a.fast.upload</name><value>true</value></property><property><name>fs.s3a.path.style.access</name><value>true</value></property></configuration>
  4. Configure YOUR_MINIO_URL, YOUR_AWS_ACCESS_KEY_ID, and YOUR_AWS_SECRET_ACCESS_KEY properties in $PXF_CONF/servers/minio/minio-site.xml.

    $ sed -i "s|YOUR_MINIO_URL|http://minio1:9000|" $PXF_CONF/servers/minio/minio-site.xml
    $ sed -i "s|YOUR_AWS_ACCESS_KEY_ID|minio|" $PXF_CONF/servers/minio/minio-site.xml
    $ sed -i "s|YOUR_AWS_SECRET_ACCESS_KEY|minio123|" $PXF_CONF/servers/minio/minio-site.xml
    *Note: sed in mac has some issues. If you have issues in mac use `sed -i '' -i ...`.
  5. Use psql to create external table that uses the minio server to access the stocks.csv text file in our minio testbucket.

    CREATE EXTERNAL TABLE stock_fact_external (
    stock text,
    stock_date text,
    price text)
    LOCATION('pxf://testbucket/stocks.csv?PROFILE=s3:text&SERVER=minio')
    FORMAT 'TEXT';
  6. Use SQL query to retrieve data from Minio. This query returns the resultset from Minio servers that are preloaded with sample files under testbucket.

    gpadmin=# select count(*) from stock_fact_external;
    count
    -------
       561
    (1 row)
    
    gpadmin=# select * from stock_fact_external limit 10;
     stock  | stock_date | price
    --------+------------+-------
     symbol | date       | price
     MSFT   | Jan 1 2000 | 39.81
     MSFT   | Feb 1 2000 | 36.35
     MSFT   | Mar 1 2000 | 43.22
     MSFT   | Apr 1 2000 | 28.37
     MSFT   | May 1 2000 | 25.45
     MSFT   | Jun 1 2000 | 32.54
     MSFT   | Jul 1 2000 | 28.4
     MSFT   | Aug 1 2000 | 28.4
     MSFT   | Sep 1 2000 | 24.53
    (10 rows)
    

Conclusion

This post describes how to configure Greenplum to access Minio. For more details, please read this example on this github repository. For more information about PXF, please read this page

In summary, you can use Minio, distributed object storage to dynamically scale your Greenplum clusters.


Eureka, Zuul, and Cloud Configuration - Pivotal Cloud Foundry

$
0
0

Overview

In a previous post I explained how you could create several components to build a Netflix stack for local development. Now, I want to explain how Pivotal Cloud Foundry makes this much easier. If you do not have a PCF instance to use, you can create a free PWS account or use the latest version of PCF Dev (make sure to use the -s scs flag) to run PCF on your laptop (which still needs a PWS account sans credit card). And if you do have a PCF instance but do not see the Spring Configuration Server or Service Registry, you should ask your PCF Operator to install Spring Cloud Services for PCF.

The code for this tutorial is located here, note that it is on branch pcf-deployment. The final outcome will be a very simplified version of a Netflix stack configured for Pivotal Cloud Foundry. Two PCF services will be created, the Service Registry, that will discover clients configured to be discovered, and a Spring Cloud Configuration server that will look for property files to serve to their respective, configured applications. PCF will also host a Zuul router and filter (zuul), and an API (netflix-protected) that uses a property file from the Cloud Configuration server. Finally, both zuul and netflix-protected will be discoverable by the Service Registry server.

Spring Cloud Configuration

We should update the local Spring Cloud Configuration server to use the same configuration files the PCF Spring Cloud Configuration server will use. Create a ~/zuulreka-config/configurations folder to keep the properties files for the API. We can move ~/zuulreka-config/components/cloud-config/src/main/resources/netflix-protected.yml to the new configurations folder.

external:
  property: hello world!

management:
  endpoints:
    web:
      exposure:
        include: refresh

~/zuulreka-config/configurations/netflix-protected.yml

I consolidated ~/zuulreka-config/components/cloud-config/src/main/resources/application.yml and ~/zuulreka-config/components/cloud-config/src/main/resources/bootstrap.yml and deleted ~/zuulreka-config/components/cloud-config/src/main/resources/application.yml, because they were both configuring the config server and I like fewer files - you can leave them as is, if you prefer. Also note that the default native profile was removed and the cloud.config.server.git.uri property was added. Because the location is not in the ~/zuulreka-config/components/cloud-config/src/main/resources anymore, the native profile is unnecessary and the location needs to be defined.

server:
  port: 9999

spring:
  application:
    name: cloud-config
  cloud:
    config:
      server:
        git:
          uri: ${user.home}/workspace/zuulreka-config/configurations
  output:
    ansi:
      enabled: always

eureka:
  client:
    serviceUrl:
      defaultZone: ${EUREKA_HOST:http://localhost:8282}/eureka

~/zuulreka-config/components/cloud-config/src/main/resources/bootstrap.yml

Because Spring Cloud Configuration needs the config.server.git.uri to be a Git repository, we can fake that by initializing the configurations directory as a Git repository and commit the latest properties we want the netflix-protected API to use. This will create a submodule that we do not want to keep, so before you commit, make sure to run rm -rf ~/zuulreka-config/configurations/.git.

Starting everything locally will have the same exact effect as before with the only difference being where the local Cloud Configuration server loads properties from.

Pivotal Cloud Foundry

To start interfacing with Pivotal Cloud Foundry, download the cf-cli. You can read through the linked cf-cli documentation or you can run cf help from your terminal to see what commands are available. The cli will be how we will manage the services we create and applications we deploy.

Login to your PCF instance by using cf login or cf login --sso for a code to use if you are using a single sign-on solution.

To see what services and plans are available, run the command cf marketplace.

The two services we will need specifically are p-service-registry and p-config-server, both with a plan of standard. The service registry will run in PCF, waiting for clients to connect to it, so it will only need to be created with a plan and given a name. Running cf create-service help will show how to create a service. To create the service registry, run cf create-service standard p-service-registry service-registry. As the output suggests, run cf service service-registry to check the status of the services creation.

To make the configurations for the Configuration Server easy to remember, create config-server-configuration.json at the root of the project with this content:

{"git": {"uri": "https://github.com/bjstks/zuulreka-config","label": "pcf-deployment","searchPaths": "configurations"
  }
}

~/zuulreka-config/config-server-configuration.json

The uri will be the location of the GitHub repository where the configuration files are located, the label is the branch name in this case, and the searchPaths are the relative path from the root of that repository. You can find more about the Git configuration here. With those configurations, create the configuration server that will house your applications properties files with cf create-service p-config-server standard config-server -c config-server-configuration.yml. Run cf service config-server to check the status of the services creation.

These services can also be created through the user interface from the Marketplace section or from a specific org and space from the Services tab. On the services page, you can select Add a Service to find and create the service registry and configuration server. Once you find and select the services, the wizard will allow you to input the service names and configurations.

Spring Boot Web application

If you want to see how to create this component from scratch, check out my previous post. For this post, however, I am only going to show how to update the netflix-protected component. The only changes that will need to be made are to update the dependencies so that the application will connect to the PCS instances of the Service Registry and Cloud Configuration servers, update how it connects to those resources, disable security (opposed to configuring it because some of the added dependencies will add Spring Security to the classpath), and create a manifest.yml file that will describe to PCF how this application should run.

To update the netflix-protected component, add to the ~/zuulreka-config/components/netflix-protected/build.gradle file by changing the Spring dependency management plugin to also import the spring-cloud-services-dependencies from io.pivotal.spring.cloud:spring-cloud-services-dependencies:2.0.1.RELEASE. Also, include the io.pivotal.spring.cloud:spring-cloud-services-starter-service-registry dependency within the dependencies section.

buildscript {
    ext {
        springBootVersion = '2.0.2.RELEASE'
        springCloudVersion = 'Finchley.SR1'
    }
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
    }
}

apply plugin: 'java'
apply plugin: 'org.springframework.boot'
apply plugin: 'io.spring.dependency-management'

group = 'io.template'
version = '0.0.1-SNAPSHOT'
sourceCompatibility = 1.8

repositories {
    mavenCentral()
}

dependencyManagement {
    imports {
        mavenBom "org.springframework.cloud:spring-cloud-dependencies:${springCloudVersion}"
        mavenBom "io.pivotal.spring.cloud:spring-cloud-services-dependencies:2.0.1.RELEASE"
    }
}

dependencies {
    runtime('org.springframework.boot:spring-boot-devtools')

    compile(
            'org.springframework.boot:spring-boot-starter-web',
            'org.springframework.boot:spring-boot-starter-actuator',
            'org.springframework.cloud:spring-cloud-starter-config',
            'org.springframework.cloud:spring-cloud-starter-netflix-eureka-client',
            'io.pivotal.spring.cloud:spring-cloud-services-starter-service-registry')

    testCompile('org.springframework.boot:spring-boot-starter-test')
}

~/zuulreka-config/components/netflix-protected/build.gradle

Because this new dependency will add spring-security to the classpath, we either need to disable security or configure a username and password. Security is not the focus of this tutorial, so we will disable security altogether by overriding the WebSecurityConfigurerAdapter. While we are there, we should allow CSRF requests, because we will need to POST, through the Zuul router, to the /refresh endpoint to refresh the Config Server properties.

package io.template.zuulrekaconfig;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.eureka.EnableEurekaClient;
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
import org.springframework.security.config.annotation.web.configuration.WebSecurityConfigurerAdapter;

@EnableWebSecurity
@EnableEurekaClient
@SpringBootApplication
public class NetflixProtectedApplication extends WebSecurityConfigurerAdapter {

    @Override
    protected void configure(HttpSecurity http) throws Exception {
        http.csrf().disable()
            .authorizeRequests().anyRequest().permitAll();
    }

    public static void main(String[] args) {
        SpringApplication.run(NetflixProtectedApplication.class, args);
    }
}

~/zuulreka-config/components/netflix-protected/src/main/java/io/template/zuulrekaconfig/NetflixProtectedApplication.java

When running locally, we will want to connect to our localhost servers, but when we deploy our API to PCF we will want our application to connect to the PCF Services we created earlier. Because all PCF services have a specific structure when the application binds, if the PCF Config Server exists, then the property vcap.services.config-server.credentials.uri should exist and our application will connect to it, similarly for the Service Registry and the vcap.services.service-registry.credentials.uri property. However, if the application is started locally, the application will try to connect to the localhost environment.

server:
  port: 8181
spring:
  application:
    name: netflix-protected
  cloud:
    config:
      uri: ${vcap.services.config-server.credentials.uri:http://localhost:9999}
  output:
    ansi:
      enabled: always

eureka:
  client:
    serviceUrl:
      defaultZone: ${vcap.services.service-registry.credentials.uri:http://localhost:8282}/eureka

~/zuulreka-config/components/netflix-protected/src/main/resources/bootstrap.yml

Next we will create an Application Manifest that describes our application to PCF to cut down on the things we would manually have to configure each time we deployed. For our manifest, we will need to set our applications name, the buildpack, the path to the jar file to be deployed, and the names of the services to bind to once it is deployed.

applications:
- name: netflix-protected
  buildpacks:
  - java_buildpack_offline
  path: build/libs/netflix-protected-0.0.1-SNAPSHOT.jar
  services:
  - config-server
  - service-registry

~/zuulreka-config/components/netflix-protected/manifest.yml

Now we can build and push the netflix-protected component from the ~/zuulreka-config/components/netflix-protected directory with ./gradlew clean assemble && cf push -f manifest.yml.

Spring Zuul Router & Filtering

If you want to see how to create a Zuul Router & Filter, check out my previous post. First, update ~/zuulreka-config/components/zuul/build.gradle file by changing the Spring dependency management plugin to also import the spring-cloud-services-dependencies from io.pivotal.spring.cloud:spring-cloud-services-dependencies:2.0.1.RELEASE. Also, include the io.pivotal.spring.cloud:spring-cloud-services-starter-service-registry dependency within the dependencies section.

buildscript {
    ext {
        springBootVersion = '2.0.2.RELEASE'
        springCloudVersion = 'Finchley.SR1'
    }
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
    }
}

apply plugin: 'java'
apply plugin: 'org.springframework.boot'
apply plugin: 'io.spring.dependency-management'

group = 'io.template'
version = '0.0.1-SNAPSHOT'
sourceCompatibility = 1.8

repositories {
    mavenCentral()
}

dependencyManagement {
    imports {
        mavenBom "org.springframework.cloud:spring-cloud-dependencies:${springCloudVersion}"
        mavenBom "io.pivotal.spring.cloud:spring-cloud-services-dependencies:2.0.1.RELEASE"
    }
}

dependencies {
    compile(
            'org.springframework.cloud:spring-cloud-starter-config',
            'org.springframework.cloud:spring-cloud-starter-netflix-zuul',
            'org.springframework.cloud:spring-cloud-starter-netflix-eureka-client',
            'io.pivotal.spring.cloud:spring-cloud-services-starter-service-registry')

    testCompile('org.springframework.boot:spring-boot-starter-test')
}

Then to update the application to connect to the service registry when deployed to PCF and or your local instance when ran locally, change the property, defaultZone at ~/zuulreka-config/components/zuul/src/main/resources/bootstrap.yml.

eureka:
  client:
    serviceUrl:
      defaultZone: ${vcap.services.service-registry.credentials.uri:http://localhost:8282}/eureka

~/zuulreka-config/components/zuul/src/main/resources/bootstrap.yml

More information on VCAP_SERVICES can be found here, the most important part to note is that the name service-registry should match the services name and the trailing /eureka should be outside of the curly braces. The syntax ${SOME_ENVIRONMENT_VARIABLE:http://localhost:8282} will try to use the environment variable SOME_ENVIRONMENT_VARIABLE and if it does not find it, will use http://localhost:8282 explicitly.

We will configure the zuul PCF deployment with ~/zuulreka-config/components/zuul/manifest.yml that will configure the name of the application, buildpack for PCF to use first, location of the assembled jar to be deployed, number of instances it needs to run, and services it should bind to. We do not need to do a lot with our manifest but to learn about your options, check out these docs.

applications:
- name: zuul
  buildpacks:
  - java_buildpack_offline
  path: build/libs/zuul-0.0.1-SNAPSHOT.jar
  services:
  - service-registry

~/zuulreka-config/components/zuul/manifest.yml

Now we can build and push the zuul component from the ~/zuulreka-config/components/zuul directory with ./gradlew clean assemble && cf push -f manifest.yml.

Finale

The service registry needs about a minute and a half to register both applications. When it does, we can send a GET request to the Zuul router to get a response from the api we deployed. Using httpie, type http get https://zuul.apps.pcfone.io/netflix-protected/hello to see the response. Note that your domain could differ from apps.pcfone.io.

Now, to check that the @RefreshScope annotation still works in PCF - change the external.property for the netflix-protected application to say hello universe!. Also add, commit, and push those changes to your repository. Next, type http post https://netflix-protected.apps.pcfone.io/actuator/refresh to have the API refetch the updated properties. Now typing http get https://zuul.apps.pcfone.io/netflix-protected/hello should yield the updated properties.

I hope these tutorials have been useful! Please reach out and leave me some feedback or just ask for some clarification at bstokes@pivotal.io.

Reference

Testing Spring filters without pain

$
0
0

The Spring framework has grown and changed at a massive pace over the last years. It has evolved from XML configured beans to annotation based beans, from synchronous to a non-blocking and reactive programming paradigm, from application server deployments to stand-alone microservice deployments.

The list goes on.

This evolution is tangible for most of its core classes and interfaces. But there are a few of these that have barely changed since they were initially implemented, and yet they are central cornerstones when developing Spring Boot microservices. One of these classes is the org.springframework.web.filter.GenericFilterBean which was initially developed back in 2003.

The org.springframework.web.filter.GenericFilterBean is the base class you should extend when creating your own filters. It simplifies the filter initialisation by being bean-friendly and reduces the number of methods to implement from the servlet Filter interface to just one, the classic doFilter(ServletRequest request, ServletResponse response, FilterChain chain). This method gives you the opportunity to observe the request or response and take action accordingly, for example, vetoing the rest of the filter chain.

Now that we know a little bit more about filters, let’s see how can we implement one, and more importantly, test it with high confidence and fast feedback loops. This article aims at showing you how to test http filters effectively, by avoiding unnecessary mocking and rather appropriately using the test toolkit that Spring offers.

Imagine that you have to create a http filter that injects the email from the user who initiated the request, into each controller of your app. The filter will get the user id from the query parameters, call a helper service to get the email for that user and finally pass this information down the filter chain.

Since http requests are immutable, the only possible way to pass additional information down the filter chain is by setting attributes to the request. Setting a request attribute from the filter is straightforward, but the contract of how to use the attribute in the controller is a bit loose, so you might want to capture this well in your filter test.

Before writing the test, let’s start by defining a simple skeleton for our base filter class:

@Component
class UserFilter(private val userService: UserService) : GenericFilterBean() {
   override fun doFilter(request: ServletRequest, response: ServletResponse, chain: FilterChain) {
       // implementation goes here
   }
}

Notice that the filter implements the GenericFilterBean mentioned above and takes as parameter a handy UserService, which will allow us to obtain the email from a given user id. This service is obviously defined by ourselves, and for this example we are going to use an interface and forget about its actual implementation.

The signature for the UserService interface is the following:

interface UserService {
   fun getUserEmail(userId: String): String
}

If we think about the implementation of the filter, these are the steps we need to follow in the code:

  1. extract the userId from the query parameters
  2. find the email associated to the user id via the userService
  3. set the email as a an attribute to the request
  4. continue down the filter chain

This implementation captures the steps outlined above:

override fun doFilter(request: ServletRequest, response: ServletResponse, chain: FilterChain) {
   val userId = request.getParameter("userId") // 1
   val userEmail = userService.getUserEmail(userId) // 2
   request.setAttribute("userEmail", userEmail) // 3
   chain.doFilter(request, response) // 4
}

Now let’s move on and see what approaches we have to test this filter.

Test approach #1: Death by 1,000 mocks

The first way to test that a filter is doing its job is by observing the interactions it plays with the objects that are given to it (in this case the request, response and chain) and then verify that it interacts as expected by calling the right methods of the objects it has references to.

We will see that this way of testing is very hairy and highly ineffective!

Let’s create the test:

@Test
fun `user filter adds the user email in a request attribute`() {
}

We then have to mock the request, response and chain:

val request = mock(ServletRequest::class.java)
val response = mock(ServletResponse::class.java)
val chain = mock(FilterChain::class.java)

We are going to stub the ServletRequest.getParameter method to return “13” each time we ask for the userId value in the query parameter string:

Mockito.`when`(request.getParameter("userId")).thenReturn("1")

Next, we are going to stub our userService and return the email of an arbitrary user each time a user with an id 13 comes along:

val userService: UserService = Mockito.mock(UserService::class.java)
Mockito.`when`(userService.getUserEmail("13")).thenReturn("han.solo@rebelalliance.com")

Now we create the filter and we pass the userService that will return Han Solo’s email whenever a user id of 13 is present:

val userFilter = UserFilter(userService)

We can now invoke the doFilter method providing the arguments mocked above:

userFilter.doFilter(request, response, chain)

The last step is to verify the interactions with the mocked objects:

verify(request).setAttribute("userEmail", "han.solo@rebelalliance.com")
verify(chain).doFilter(request, response)
verifyNoMoreInteractions(chain)
verifyZeroInteractions(response)

The test in its grandiose shape looks like this:

@Test
fun `user filter adds the user email in a request attribute`() {
   val request = mock(ServletRequest::class.java)
   val response = mock(ServletResponse::class.java)
   val chain = mock(FilterChain::class.java)
   Mockito.`when`(request.getParameter("userId")).thenReturn("1")

   val userService: UserService = Mockito.mock(UserService::class.java)
   Mockito.`when`(userService.getUserEmail("13")).thenReturn("han.solo@rebelalliance.com")

   val userFilter = UserFilter(userService)
   userFilter.doFilter(request, response, chain)

   verify(request).setAttribute("userEmail", "han.solo@rebelalliance.com")
   verify(chain).doFilter(request, response)
   verifyNoMoreInteractions(chain)
   verifyZeroInteractions(response)
}

What just happened here? Well my friend, we have just stepped into mock hell and we need to get out of there quickly, because in addition to the monumental effort it takes to mock all these objects, we have not learned a single thing about http filters. This test assumes many things about the mocked objects and it can’t possibly prove it works as expected.

Besides preventing you to actually learn how filters work, every time you will refactor your filter, chances are that you will have to refactor the tests too. In addition, the test describes the implementation, but it does not capture the expected behaviour of the filter. This test is just the bone X-ray of the implementation, a description step by step of what the filter needs to do.

So how can we escape mock hell?

There are better ways to test a http filter. Spring provides awesome tools to test the web layer. The next approach I am going to describe leverages Spring’s MockMvc test framework. This approach will give you a higher level of confidence that your filter is doing the right thing as it puts the filter to work with the components that are affected down the chain, like the controller.

Test approach #2: Using a real controller in the test to interact with the filter

Let’s go back to our test class and create a private controller that will return, in the body of the response, the request attribute value that the filter is supposed to inject into the request object:

@RestController
private class TestController {
   @GetMapping("/test")
   fun test(@RequestAttribute userEmail: String): String = userEmail
}

The first step to escape from the mock hell created before is to remove all the contents from the previous test. The only thing we are going to keep is stubbing the UserService. Here it makes sense to stub the retrieval of the user email and delegate this to a mock (although you could also initialise the DB locally with the data needed for the test and let the filter hit it in your test).

@Test
fun `user filter adds the user email in a request attribute`() {
    val userService: UserService = Mockito.mock(UserService::class.java)
    Mockito.`when`(userService.getUserEmail("13")).thenReturn("han.solo@rebelalliance.com")
   // more to come
}

The next step is to use the MockMvcBuilders class from Spring and add the controller just created, along with the filter:

val mockMvc = MockMvcBuilders
       .standaloneSetup(TestController())
       .addFilter<StandaloneMockMvcBuilder>(UserFilter(userService))
       .build()

Finally, we call this mockMvc instance with a URL that maps to the test controller and with a query parameter including the userId with a value of “13”. Then we will set the expectation that the response content should be the email of Han Solo:

mockMvc
       .perform(MockMvcRequestBuilders.get("/test?userId=13"))
       .andExpect(status().isOk)
       .andExpect(content().string("han.solo@rebelalliance.com"))

Now if we take a step back, the new filter test looks like this:

@Test
fun `user filter adds the user email in a request attribute`() {
   val userService: UserService = Mockito.mock(UserService::class.java)
   Mockito.`when`(userService.getUserEmail("1")).thenReturn("han.solo@rebelalliance.com")
   val mockMvc = MockMvcBuilders
           .standaloneSetup(TestController())
           .addFilter<StandaloneMockMvcBuilder>(UserFilter(userService))
           .build()
   mockMvc
           .perform(MockMvcRequestBuilders.get("/test?userId=1"))
           .andExpect(status().isOk)
           .andExpect(content().string("han.solo@rebelalliance.com"))
}

Notice the following:

In this test we are actually putting the filter with the test controller at work. We have learned that if you add a parameter to the controller function with the annotation @RequestAttribute, your controller will extract that parameter injected from the filter and you will be able to consume it.

By using a test controller and adding the filter to it, we are simulating how our filter will behave once it has been registered with the application controllers. The feedback loop of testing this filter is very quick and the confidence level is high, because we know that the contracts are met when the test passes - for instance, we know that the request attribute is properly injected.

You could argue that this is not a unit-test because we are testing it with different layers of the application, like the controller defined in the test. I prefer not to obsess too much about which layer I am testing the code on, but rather ask myself whether I am accurately capturing the essence of the class I am testing and making sure I don’t end up in mock hell as option #1 leads to.

How we moved a massively parallel Postgres database onto Kubernetes

$
0
0

If you’ve ever wondered what type of applications are the best candidates to run on Kubernetes, distributed applications that scale out, such as Greenplum, are certainly at the top of the list, as we’ve discovered.

Our project started in late 2017 as an investigation of how Greenplum can best benefit from containers and a container management platform such as Kubernetes. At that point, Greenplum already had containers (see PL/Container) triggered by SQL queries with UDFs (user defined functions). These containers were, and still are, useful in isolating the functions and managing resources in a more granular and precise manner. Our next step was to move all of Greenplum into containers. Would this be an unnatural act, an okay solution that could be considered “lifting and shifting” of a legacy application, or a joining of two technologies that fit like hand in glove?

In our exploration, first, we packaged Greenplum in a container to run it in that single container. It was certainly not a workable solution, but a step in the right direction to test out and learn about how isolation, resource management, and storage work in this environment. One important point to note here is that containers are not a new technology that introduces another layer between the executing code and the hardware. Rather, containers simply use basic Linux kernel constructs such as cgroups, namespaces, and chroot that have been added to the kernel over the last four decades.

Kubernetes Intro

Kubernetes Intro

At runtime, your application code that is running in a container is as close to the hardware as any piece of code that is not inside a container. This should not be confused with the abstraction created by the layers of images used to package containers to make them portable, because that’s not a runtime concern. Therefore, you still get bare-metal performance from code that executes inside a container. And, if you want to layer in VMs for the benefits that VMs provide, that also is possible and your code runs as efficiently on VMs inside a container as it does on VMs directly.

After we tinkered with running Greenplum in a single container, we realized that this was not the ideal mapping for Greenplum in containers. Since Greenplum is an MPP (Massively Parallel Processing/Postgres) system, we needed to, at the very least, spread the cluster across many containers and somehow get these containers to know about each other and work together. That’s when Kubernetes came into focus.

Kubernetes is the open source container management platform that allowed us to break Greenplum out of a single container and run it as a truly distributed scale-out database. Kubernetes orchestrates groups of containers called pods. A pod can have one or more containers that can share storage. In our case, the storage was set up as PVs (persistent volumes) and these PVs can be either local or remote depending on the storage class chosen by the user.

Again, the containers that make up a pod are just runtime constructs created by using the Linux kernel. Kubernetes orchestrates the pods via its scheduler by choosing a worker node for each pod to run on. A Kubernetes cluster also has master nodes where the scheduler and other administrative functions live. We assisted Kubernetes in making the scheduling decision by specifying (anti)affinity rules and node selectors, which are a simplified form of such rules. If in the future we need more complex logic, Kubernetes also allows us to define our own custom scheduler.

For our containers, we also defined CPU & memory bounds, called requests and limits in Kubernetes. The available resources on the worker nodes are then handled by the Linux kernel feature cgroups under the hood.

Kubernetes provided a great platform for Greenplum by allowing us to create clear boundaries between compute and storage, by letting us distribute the compute across a cluster with precise controls, and by giving us the knobs for management of resources.

Greenplum, in turn, turned out to be a great tenant for Kubernetes because it distributed user data across its cluster and processed SQL and PL queries on these data in parallel. Let’s take a quick look at Greenplum’s architecture. It has a master, a standby master, a number of primary segments and their mirror segments. Each of these is a Postgres database instance along with a number of libraries and tools for AI/Machine Learning, Graph, Text analytics, etc.

Greenplum

Greenplum

When a query is received, the master creates a plan that distributes the execution across the primary segments, allowing Greenplum to return results in a fraction of the time. The more primary segments and the more parallelized your data, the faster your analysis.

With this distributed, scale-out architecture of Greenplum, when compute and/or storage resources need to be increased, we can do so by simply adding more segments. Similarly, on Kubernetes, an application can increase capacity by increasing the number of pods. This was an obvious one-to-one mapping between the atomic units of Greenplum and Kubernetes: a Greenplum segment mapped to a Kubernetes pod.

Map pod to segment

Map pod to segment

It was primarily this relationship in the distributed nature of Greenplum and Kubernetes which made for a strong architectural fit. So, seeing this, we were able to move Greenplum out of a single container and onto a highly available, Linux-kernel-based container orchestrator and run the same exact Greenplum database to provide the same exact user experience. The overall solution, as it turned out, had a number of additional benefits.

One such benefit was the Operator Pattern, which was the way Kubernetes allowed for adding custom logic to control an application. This meant that we could build a Greenplum operator, a first-class citizen of Kubernetes, and use this operator to automate day-1 and day-2 operations such as deployment, failover, expansion, and upgrades and patching of not just Greenplum but a whole set of related tools and components altogether.

We should note though that automated upgrades and patching doesn’t mean 0-downtime rolling upgrades; rather, it means that during a scheduled maintenance window, the human operator can perform an upgrade by simply triggering the Greenplum operator to take all the necessary steps, preserving state and data of the database in the process.

Another big benefit was that we could handle application and library dependency management in our CI pipelines so that the customers didn’t have to. Greenplum and its rich ecosystem could now be tested for security, configuration, integration, networking as well as dependencies and then released in one package ready to create a new Greenplum workbench or upgrade from an existing one. Of course, the user could still customize which components to enable and how.

There was one caveat though. The performance was only as good as the underlying hardware. Just as with bare-metal deployments, our hardware needed to provide adequate CPU, memory, disk IO, network IO for the pods. Kubernetes couldn’t magically make Greenplum fly. And building the right infrastructure for Kubernetes, especially if we were going to run a database on it, was not straightforward. This area remains a challenge for us and for the larger Kubernetes community. We have seen deployments of stateless applications in production environments, but databases are relatively new to this scene.

Nevertheless, our investigation brought us to a place where we see tremendous opportunity and benefits from running an MPP database such as Greenplum on Kubernetes. We evaluated whether this would be an unnatural act, an okay solution that could be considered “lifting and shifting” of a legacy application, or bring together two technologies that fit like hand in glove. And, we strongly believe that massively parallel Postgres databases and Kubernetes fit like hand in glove.

For more on this topic, see Pivotal Engineering Journal articles by our team:

Posts

Posts

Stateful Apps in Kubernetes

$
0
0
Kubernetes is available across all public clouds nowadays, including Pivotal’s own PKS, which runs in the cloud and can also be run “on prem”, on the premises of an enterprise. Kubernetes promises to run any workload, including stateless and stateful applications. A typical distinction between two types of Kubernetes apps–between stateless and stateful apps–is based on the manner in which data within the app is saved. However, for this discussion of Kubernetes apps, let’s adopt a definition of stateless/stateful that classifies apps by the resilience of their service.

Provisioning Stateful Kubernetes Containers that Work Hard and Stay Alive

$
0
0
(This blog is the second installment of a four-part series) By default, all containers are free from limits but subject to eviction By default, Kubernetes places very few limits on a container. A default, “best effort” container can take as many resources as it needs until the Kubernetes system decides that the container should be evicted, typically because system memory has become scarce. Each container runs within a virtual machine which is called a “node”.

Storing Stateful Data that Outlives a Container or a Cluster; Optimizing for Local Volumes

$
0
0
(This blog is the third installment of a four-part series) Kubernetes can automatically provision “remote persistent” volumes with random names Several types of storage volumes have built-in Kubernetes storage classes that enable provisioning volumes in a dynamic fashion, creating remote persistent volumes as necessary when a container is spun up for the first time. This provisioning of storage is useful for a scenario where the cluster lifetime is definitive, such as within a development cluster.

Managing Stateful Apps with the Operator Pattern; Orchestration Considerations

$
0
0
(This blog is the fourth installment of a four-part series) The Operator Pattern The Operator Pattern stipulates a process that is registered with the Kubernetes system layer, listening to Kubernetes system events with the responsibility to manage a particular set of resources. All the logic that initializes and maintains a service is thereby encapsulated into a Kubernetes deployment, which we’ll call the “Operator” hereafter. This pattern aligns with many of the requirements for stateful apps, so the Operator Pattern is a popular implementation choice.

Transferring Time-based One-time Passwords to a New Smartphone

$
0
0
Abstract Smartphone authenticator apps such as Google Authenticator and Authy implement software tokens that are “two-step verification services using the Time-based One-time Password Algorithm (TOTP) and HMAC-based One-time Password algorithm (HOTP)" Smartphone TOTP, a form of Two-factor authentication (2FA), displays a 6-digit code derived from a shared secret, updating every thirty seconds. The shared secret is presented only once to the user, typically with a QR (Quick Response) Code which is scanned by the authenticator app.

Using Greenplum to access Minio distributed object storage server

$
0
0
Pivotal Greenplum Database® (GPDB) is an advanced, fully featured, open source data warehouse. GPDB provides powerful and rapid analytics on petabyte scale data volumes. Greenplum 5.17.0 brings support to access highly-scalable cloud object storage systems such as Amazon S3, Azure Data Lake, Azure Blob Storage, and Google Cloud Storage. Minio is a high performance distributed object storage server, designed for large-scale private cloud infrastructure. Since Minio supports S3 protocol, GPDB can also access Minio server that is deployed on-premise or cloud.

Eureka, Zuul, and Cloud Configuration - Local Development

$
0
0
Overview A couple of recent projects I have been on have started our engagement with the Netflix stack described here, and because I wanted to have a way to quickly prototype, I set up this demo. This will be a Spring Boot API that uses Spring Cloud Configuration, Eureka Service Discovery, and a Zuul router. Hopefully, by the end of the demo, you will see how easy it is to create this popular use case.
Viewing all 219 articles
Browse latest View live