Zalando Tech Blog – Technical Articles Lora Vardarova

How to choose a JavaScript framework?


Developers are often biased about their technology choices. At the beginning of the year, I was about to start working on a new product and my team could choose any tech stack. I did not want to be one of these biased developers who chose the framework they liked. I wanted to make an informed and educated decision. I already had experience with React and AngularJS. I had a good knowledge of Angular and experience with TypeScript. But what about Vue, the framework that most JavaScript developers wanted to learn according to State of JavaScript 2017 survey?

A friend of mine likes saying that JavaScript Frameworks are like weeds: everyday a new framework gets released. It does feel like this, doesn’t it? I was quite skeptical about Vue when it was released, and to be honest, I was quite skeptical about Vue a long time after it got released. Did we really need another JavaScript framework? I did not really think so. But I had some free time on my hands and decided to use it to learn Vue so that I could make an informed decision about which framework to choose.

History Lesson
AngularJS was started as a side project at Google around 2009. Later it was open-sourced and v1.0 was officially released in 2011.

React was developed at Facebook. It was open-sourced at JSConf US in May 2013.

Vue was created by Evan You after working for Google using AngularJS in a number of projects. He wanted to extract what he really liked about AngularJS and build something lightweight. Vue was originally released in February 2014.

Angular 2.0 was announced at the ng-Europe conference in September 2014. The drastic changes in the 2.0 version created considerable controversy among developers. The final version was released in September 2016.

Side note: AngularJS and Angular 2.0, which was later simply called Angular are two different frameworks. The naming really caused a lot of confusion. I believe that the Angular team would have been better off choosing a different name.

In December 2016 Angular 4 was announced, skipping version 3 to avoid a confusion due to the misalignment of the router package's version, which was already distributed as v3.3.0. The final version was released in March 2017.

Nowadays, developers are looking for smaller, faster, and simpler technologies. All three frameworks (Angular, React and Vue) are doing lots of work in this direction. You can expect pretty good performance from these frameworks. Performance benchmarks show similar performance.

In April 2017, Facebook announced React Fiber, a new core algorithm of React. It was released in September 2017.

Angular 5 was released in November 2017. Key improvements in Angular 5 included support for progressive web apps, and a build optimizer.

Google pledged to do twice-a-year upgrades. Angular 6 was released in April 2018. Angular 7 will be released September/October 2018.

In the beginning of 2018, a schedule was announced for phasing-out AngularJS: after releasing 1.7.0, the active development on AngularJS will continue until the end of June 2018. Afterwards, 1.7 will be supported till  June, 2021 as long-term support.

The bottom line is that these three frameworks, React, Vue and Angular, are quite mature. And it is likely they’ll be around for a while.

Key Concepts
React use the Virtual DOM pattern. React creates an in-memory data structure cache, computes the resulting differences, and then updates the browser's displayed DOM efficiently.

React is all about components. Your React codebase is basically just one large pile of big components that call smaller components. Props are how components talk to each other; they are the data, which is passed to the child component from the parent component. It’s important to note that React’s data flow is unidirectional: data can only go from parent components to their children, not the other way around.

The component approach means that both HTML and JavaScript code live in the same file. React’s way to achieve this is the JSX language. It allows us to write HTML like syntax which gets transformed to lightweight JavaScript objects.

To build an Angular application you define a set of components for every UI element, screen, and route. An application will always have a root component that contains all other components. Components have well-defined inputs and outputs, and lifecycle.

The idea behind dependency injection is that if you have a component that depends on a service, you do not create that service yourself. Instead, you request one in the constructor, and the framework will provide you one. This allows you to depend on interfaces, not concrete types. This results in more decoupled code and improves testability.

Property bindings makes Angular applications interactive.

Vue also makes use of the Virtual DOM like React.

In Vue.js the state of the DOM is just a reflection of the data state. You connect the two together by creating "ViewModel" objects. When you change the data, the DOM updates automatically.

You create small, decoupled units so that they are easier to understand and maintain. In Vue the components are ViewModels with pre-defined behaviours. The UI is a tree of components.

In Vue the HTML, JS and CSS for each component live in the same file. Some hate the single-file components, some people love them. I personally think that they are very handy and can make you more productive as it reduces context switch.

Ecosystems
This table that shows the libraries you may be familiar with in React or Angular or Vue alongside their equivalent in the other frameworks:


It is important to note here that Angular is somewhat more prescriptive. Some developers do not like this and prefer to have freedom choosing the tools they use. It is more of a personal preference.

Lessons Learned
I had the idea to build a small app with all three frameworks and compare them. And so I did. But it was completely unnecessary because of TodoMVC; a project which offers the same Todo application implemented using MV* concepts in most of the popular JavaScript frameworks of today.

TodoMVC is supposed to help you select an MV* framework. But the Todo app is way too simple.

If you are new to web development and you are learning a new framework, the TodoMVC is probably a good start.

But if you are experienced and would like to build real-world, more complex applications there are better alternatives.

Some better alternatives are RealWorld and HNPWA.

RealWorld allows you to choose any frontend (React, Angular, Vue and even more) and any backend (Node, Scala etc) and see how they power a real-world full-stack medium.com clone.

HNPWA A collection of unofficial Hacker News clients built with a number of popular JavaScript frameworks. Each implementation is a complete Progressive Web App that utilises different progressive technologies.

Lesson #1 The Todo App is too simple.
Use RealWorld or HNPWA to see what a real-world application would look like. Play with them, build on them and learn.

Lesson #2 Documentation is very important.
Good documentation helps you to get started quickly. Vue really excels at documentation. This is one of the reasons why it is so easy to get started with Vue.

React and Angular also have good documentation. Still not as good as Vue in my opinion.

The main problem with the Angular documentation is that often you will stumble upon documentation about AngularJS instead and it can be very confusing and frustrating. That is why I said earlier that the Angular team would have been better off if they had chosen a different name for Angular.

Lesson #3 Community is important.
When documentation fails, you learn that community is also very important. You want to be sure that it will be easy to get help if you get stuck and cannot find information in the documentation. You want to choose a framework whose corresponding communities are extensive and helpful; communities where you can discuss best practices and get solid advice.

Ultimately, you need to answer the following question: Would it be easy to hire more developers who are experts or willing to work with and learn this framework?

Other questions worth asking when choosing a JavaScript framework

How high is the “Bus Factor?”
The “Bus Factor” is a number equal to the number of team members who, if run over by a bus, would adversely affect a project. To put it more simply: Can other people continue working on your projects if you are hit by a bus tomorrow?  

Remember that talent is hard to hire. You need to know how easy it is to find developers for each of the frameworks. Also, what does the learning curve look like for each framework? Again, I think that Vue really excels here. It has the lowest learning curve of the three.

What does the product roadmap look like?
Is it just a prototype? Choose whatever, learn something new.

Would it have a single function that would never change? Do you have to ship it quickly? Choose whatever you are most familiar with.

Is your product business critical? Probably it is a good decision to be more conservative in your choice.

Is the product going to evolve, have new features, etc.? It should be scalable in that case.

Wrap Up
There is a point in your programming career when you realise that there isn’t a best framework. All the frameworks solve the same problems but in different ways. Is it a good thing that there are so many alternatives? Yes. In my opinion, the competition between Angular, Vue, React, and the other frameworks out there is very healthy. It brings a lot of innovation and improvements in the entire JavaScript ecosystem. We all benefit from that no matter which framework we work with.

We are developers. We like fighting about all sorts of important things like tabs versus spaces, trailing commas, etc. Joking aside, it is somehow in our blood to fight about silly things. I feel that we should appreciate the improvements all these JavaScript frameworks bring. Because there isn’t a best framework.

Don’t ask what the best framework is, ask what the most suitable framework for your product and your team is.


Work with open minds like Lora. Have a look at the Zalando jobs site.

Zalando Tech Blog – Technical Articles Pascal Pompey

Debunking the myth of the data science bubble

We’ve all read articles indicating the looming decline of data science. Some coined the term ‘data science bubble,’ some even went so far as set a date for the ‘death of data-science’ (they give it five years before the bubble implodes). This reached a point where anyone working in the field needed to start paying attention to these signals. I have investigated the arguments backing this ‘imminent death’ diagnostic, detected some biases, drafted an early answer on LinkedIn, the Zalando communication team picked on it, and following their encouragements, I prepared a revised version for the Zalando Blog. This post doesn’t aim at making any bold predictions about the future without proper evidence. I always found these to be relatively pointless. It just aims to point out that, for all the noise, there is no solid reason to believe that any of us should worry about our jobs in the years to come. In fact, the very arguments used to prognose a ‘data science bubble’ can be turned around as reasons not to worry.

The arguments used by proponents of the data science bubble are generally of three sorts:

1- Increased commoditization

2- Data scientists should not become software engineers

3- Full automation

Increased Commoditization:
It is clear that data science work is getting increasingly commoditized: almost all ML frameworks now come with libraries of off-the-shelf models that are pre-architectured, pre-trained and pre-tuned. Want to do image classification? Download a pre-trained ResNet for your favorite deep-learning framework and you are almost ready to go. The net effect is that a single well-rounded data scientist can now solve in a week what a full team couldn't solve in six months 10 years ago.

Does that mean less demand for data scientists? Certainly not, it only means that investing in data science is now viable for a lot of domains for which data science was simply too expensive or too complex before. Hence a rising demand for data science and data scientists. It is useful to take software engineering as a comparison here. Over the years, most of the complexity around programming has been abstracted and commoditized. Only a few could start anything in assembly, C made it much easier to develop complex projects, Java commoditised memory management, etc… Did it make the demand for software engineers vanish? Certainly not, on the contrary, it increased their productivity and hence their net value to any organisation.

Data-scientists should not become software engineers:
I strongly disagree with this assessment: one wouldn’t believe the number of data science projects that end up in a powerpoint presentation with pretty graphs and then just an ignominious death. Why? Because data scientists often lack the ability to make their projects deliver continuous value in a well-maintained and monitored production environment. 95% of the data science projects I see do not make it past the POC stage. Going beyond the POC requires a software engineering mindset.

It is still rare to find data-scientists actually capable of (1) putting a model in a production environment, and then (2) guaranteeing that machine-learned based value is continuously delivered, monitored and maintained in the long run. Sadly, that is precisely where the ROI for any data science investment lies. I am not sure pushing data scientists to move towards management would help there: chronic over-powerpointing and the urge for serial POCs that never make it beyond the MVP stage is very much a management-induced sickness. I am not saying data scientists should become software engineers but, if anything, data-scientists need better engineering and software architecture abilities, not less.

The risk of automation
Full automation is very unlikely, because in many regards, data science is still more an art than it is a technique. There is a huge gap between the 'hello Mnist’ tensorFlow example and applying ML to a new domain for which no golden data-set or known model archetype exists. Ever had to use crowdsourcing for gathering labels? Ever ventured into the uncharted territories of ML? Ever had to solve a problem for which you couldn’t piggyback on an existing git repo? You will know what I am talking about…

And there we enter the real discussion: Data scientists that are not able to go beyond the TensorFlow Mnist-CNN example, the ResNet boilerplate or the vanilla word2vec + lstm archetype are indeed going to become extinct. The same way no programmer can make a living out of the ‘Hello World’ code he/she wrote during the first year of college. But for those who know how to go beyond that and make ML actually work in a continuous delivery environment, there is a bright future in front of them and there are good reasons to think it will span much longer than the five years to come.

Sources:

https://blogs.oracle.com/datawarehousing/the-end-of-the-data-scientist-bubble

https://towardsdatascience.com/the-data-science-bubble-99fff9821abb

https://medium.com/@TebbaVonMathenstien/are-programmers-headed-toward-another-bursting-bubble-528e30c59a0e

Zalando Tech Blog – Technical Articles Pascal Pompey

Debunking the myth of the data science bubble

We’ve all read articles indicating the looming decline of data science. Some coined the term ‘data science bubble,’ some even went so far as set a date for the ‘death of data-science’ (they give it five years before the bubble implodes). This reached a point where anyone working in the field needed to start paying attention to these signals. I have investigated the arguments backing this ‘imminent death’ diagnostic, detected some biases, drafted an early answer on LinkedIn, the Zalando communication team picked on it, and following their encouragements, I prepared a revised version for the Zalando Blog. This post doesn’t aim at making any bold predictions about the future without proper evidence. I always found these to be relatively pointless. It just aims to point out that, for all the noise, there is no solid reason to believe that any of us should worry about our jobs in the years to come. In fact, the very arguments used to prognose a ‘data science bubble’ can be turned around as reasons not to worry.

The arguments used by proponents of the data science bubble are generally of three sorts:

1- Increased commoditization

2- Data scientists should not become software engineers

3- Full automation

Increased Commoditization:
It is clear that data science work is getting increasingly commoditized: almost all ML frameworks now come with libraries of off-the-shelf models that are pre-architectured, pre-trained and pre-tuned. Want to do image classification? Download a pre-trained ResNet for your favorite deep-learning framework and you are almost ready to go. The net effect is that a single well-rounded data scientist can now solve in a week what a full team couldn't solve in six months 10 years ago.

Does that mean less demand for data scientists? Certainly not, it only means that investing in data science is now viable for a lot of domains for which data science was simply too expensive or too complex before. Hence a rising demand for data science and data scientists. It is useful to take software engineering as a comparison here. Over the years, most of the complexity around programming has been abstracted and commoditized. Only a few could start anything in assembly, C made it much easier to develop complex projects, Java commoditised memory management, etc… Did it make the demand for software engineers vanish? Certainly not, on the contrary, it increased their productivity and hence their net value to any organisation.

Data-scientists should not become software engineers:
I strongly disagree with this assessment: one wouldn’t believe the number of data science projects that end up in a powerpoint presentation with pretty graphs and then just an ignominious death. Why? Because data scientists often lack the ability to make their projects deliver continuous value in a well-maintained and monitored production environment. 95% of the data science projects I see do not make it past the POC stage. Going beyond the POC requires a software engineering mindset.

It is still rare to find data-scientists actually capable of (1) putting a model in a production environment, and then (2) guaranteeing that machine-learned based value is continuously delivered, monitored and maintained in the long run. Sadly, that is precisely where the ROI for any data science investment lies. I am not sure pushing data scientists to move towards management would help there: chronic over-powerpointing and the urge for serial POCs that never make it beyond the MVP stage is very much a management-induced sickness. I am not saying data scientists should become software engineers but, if anything, data-scientists need better engineering and software architecture abilities, not less.

The risk of automation
Full automation is very unlikely, because in many regards, data science is still more an art than it is a technique. There is a huge gap between the 'hello Mnist’ tensorFlow example and applying ML to a new domain for which no golden data-set or known model archetype exists. Ever had to use crowdsourcing for gathering labels? Ever ventured into the uncharted territories of ML? Ever had to solve a problem for which you couldn’t piggyback on an existing git repo? You will know what I am talking about…

And there we enter the real discussion: Data scientists that are not able to go beyond the TensorFlow Mnist-CNN example, the ResNet boilerplate or the vanilla word2vec + lstm archetype are indeed going to become extinct. The same way no programmer can make a living out of the ‘Hello World’ code he/she wrote during the first year of college. But for those who know how to go beyond that and make ML actually work in a continuous delivery environment, there is a bright future in front of them and there are good reasons to think it will span much longer than the five years to come.

Sources:

https://blogs.oracle.com/datawarehousing/the-end-of-the-data-scientist-bubble

https://towardsdatascience.com/the-data-science-bubble-99fff9821abb

https://medium.com/@TebbaVonMathenstien/are-programmers-headed-toward-another-bursting-bubble-528e30c59a0e

Zalando Tech Blog – Technical Articles Eugen Kiss

An economic perspective on testing

Testing is a controversial topic. People have strong convictions about testing approaches. Test Driven Development is the most prominent example. Clear empirical evidence is missing, which invites strong claims. I advocate for an economic perspective towards testing. Secondly, I claim that focussing too much on unit tests is not the most economic approach. I coin this testing philosophy “Lean Testing.”

The main argument is as follows: different kinds of tests have different costs and benefits. You have finite resources to distribute into testing. You want to get the most out of your tests, so use the most economic testing approach. For many domains (e.g. GUIs), tests other than unit tests give you more bang for your buck.

Confidence and Tests

The article 'Write tests. Not too many. Mostly integration' and the related video by Kent C. Dodds express the ideas behind Lean Testing well. He introduces three dimensions with which to measure tests:

  • Cost (cheap vs. expensive)
  • Speed (fast vs. slow)
  • Confidence (low vs. high) (click doesn't work vs. checkout doesn't work)

The following is the 'Testing Trophy' suggesting how to distribute your testing resources.

Compared to Fowler's Testing Pyramid, confidence as a dimension is added. Another difference is that unit tests do not cover the largest area.

One of Kent C. Dodds' major insights is that you should actually consider the confidence a test gives you: "The more your tests resemble the way your software is used, the more confidence they can give you."

Return on Investment of Tests

The Return on investment (ROI) of an end-to-end test is higher than that of a unit test. This is because an end-to-end test covers a greater area of the code base. Even taking into account higher costs, it provides disproportionally more confidence.


Plus, end-to-end tests test the critical paths that your users actually take. Whereas unit tests may test corner cases that are never or very seldomly encountered in practice. The individual parts may work but the whole might not. The previous points can be found in 'Unit Test Fetish' by Martin Sústrik.

Further, Kent C. Dodds claims that integration tests provide the best balance of cost, speed and confidence. I subscribe to that claim. We don't have empirical evidence showing that this is actually true, unfortunately. Still, my argument goes like this: End-to-end tests provide the greatest confidence. If they weren't so costly to write and slow to run we would only use end-to-end tests. Although better tools like Cypress mitigate these downsides. Unit tests are less costly to write and faster to run but they test only a small part that might not even be critical. Integration tests lie somewhere between unit tests and end-to-end tests so they provide the best balance.

As an aside: The term “integration test,” and even more so “end-to-end test,” seems to generate intense fear in some people. Such tests are supposed to be brittle, hard-to-setup and slow-to-run. The main idea is to just not mock so much.

In the React context of Kent C. Dodd’s article integration testing refers to not using shallow rendering. An integration test covers several components at once. Such a test is easier to write and more stable since you do not have to mock so much and you are less likely to test implementation details.

In the backend world, an integration test would run against a real database and make real HTTP requests (to your controller endpoints). It is no problem to spin up a Docker database container beforehand and have its state reset after each test. Again, these tests run fast, are easy to write, reliable and resilient against code changes.

Code Coverage

Another point is that code coverage has diminishing returns. In practice, most agree as most projects set the lower bound for coverage to around 80%. There is actually supporting research such as 'Exploding Software-Engineering Myths.' What follows are general arguments.

Even with 100% code coverage you trust your dependencies. They can, in principle, have 0% code coverage.

For many products, it is acceptable to have the common cases work but not the exotic ones (Unit Test Fetish). If you miss a corner case bug due to low code coverage that affects 0.1% of your users you might survive. If your time to market increases due to high code coverage demands you might not survive. And "just because you ran a function or ran a line does not mean it will work for the range of inputs you are allowing" (source).

Code Quality and Unit Tests

There is the claim that making your code unit-testable will improve its quality. Many arguments and some empirical evidence in favor of that claim exist so I will put light on the other side.


The article ‘Unit Test Fetish’ states that unit tests are an anti-architecture device. Architecture is what makes software able to change. Unit tests ossify the internal structure of the code. Here is an example:

"Imagine you have three components, A, B and C. You have written extensive unit test suite to test them. Later on you decide to refactor the architecture so that functionality of B will be split among A and C. you now have two new components with different interfaces. All the unit tests are suddenly rendered useless. Some test code may be reused but all in all the entire test suite has to be rewritten."

This means that unit tests increase maintenance liabilities because they are less resilient against code changes. Coupling between modules and their tests is introduced! Tests are system modules as well. See ‘Why Most Unit Testing is Waste’ for these points.

There are also some psychological arguments. For example, if you value unit-testability, you would prefer a program design that is easier to test than a design that is harder to test but is otherwise better, because you know that you'll spend a lot more time writing tests. Some further points can be found in 'Giving Up on Test-First Development'.

The article 'Test-induced Design Damage' by David Heinemeier Hansson claims that to accommodate unit testing objectives, code is worsened through otherwise needless indirection. The question is if extra indirection and decoupled code is always better. Does it not have a cost? What if you decouple two components that are always used together. Was it worth decoupling them? You can claim that indirection is always worth it but you cannot, at least, dismiss harder navigation inside the code base and during run-time.

Conclusion

An economic point of view helps to reconsider the Return on Investment of unit tests. Consider the confidence a test provides. Integration tests provide the best balance between cost, speed and confidence. Be careful about code coverage as too high aspirations there are likely counter-productive. Be skeptical about the code-quality improving powers of making code unit-testable.

To make it clear, I do not advocate to never write unit tests. I hope that I provided a fresh perspective on testing. As a future article, I plan to present how to concretely implement a good integration test for both a frontend and backend project.

If you desire clear, albeit unnuanced, instructions, here is what you should do: Use a typed language. Focus on integration and end-to-end tests. Use unit tests only where they make sense (e.g. pure algorithmic code with complex corner cases). Be economic. Be lean.


Sources

Additional Notes

One of the problems of discussing the costs and benefits of unit tests is that the boundary between unit and integration tests is fuzzy. The terminology is not completely unambiguous so people tend to talk at cross purposes.

To make it clear, low code coverage does not imply fewer bugs. As the late Dijkstra said (1969): “Testing shows the presence, not the absence of bugs.”

There is research that didn’t find Test Driven Development (TDD) improving coupling and cohesion metrics. TDD and unit tests aren’t synonyms but in the context of this article it’s still interesting: ‘Does Test-Driven Development Really Improve Software Design Quality?’ Another article ‘Unit Testing Doesn’t Affect Codebases the Way You Would Think’ analyzes code bases and finds that code with more unit tests has more cyclomatic complexity per method, more lines of code per method and similar nesting depth.

This article focussed on which kinds of tests you should distribute your automated testing budget. Let's take a step back and consider reducing the automated testing budget altogether. Then we'd have more time to think about the problems, find better solutions and explore. This is especially important for GUIs as often there is no 'correct' behavior but there is 'good' behavior. Paradoxically, reducing your automated testing budget might lead to a better product. See also ‘Hammock Driven Development’.

There is a difference between library and app code. The former has different requirements and constraints where 100% code coverage via unit tests likely makes sense. There is a difference between frontend and backend code. There is a difference between code for nuclear reactors and games. Each project is different. The constraints and risks are different. Thus, to be lean, you should adjust your testing approach to the project you're working on.


Come work with our tech team. Open job positions here!

Zalando Tech Blog – Technical Articles Michal Raczkowski

Decoupled styling in UI components

Styling isolation

Styling isolation achieved via CSS-modules, various CSS-in-JS solutions or Shadow-DOM simulation is already a commonly used and embraced pattern. This important step in CSS evolution was really necessary for UI components to be used with more confidence. No more global scope causing name conflicts and CSS leaking in and out! The entire component across HTML/JS/CSS is encapsulated.

Styling API - exploration

I expect CSS technology to offer much more in the future. The encapsulation usually comes hand in hand with the interface, for accessing what was hidden in an organised way. There are different ways to provide styling-APIs, for customising the component CSS from the outside.

One of the simplest methods is to support modifiers; flags for the component, used to change appearance, behavior or state:

<MyComponent type="large" />

This is convenient if there are a few predefined modifiers. But what if the number of different use cases grows? The number of modifiers could easily go off the scale if we combined many factors, especially for non-enum values like "width" or "height".

Instead we could expose separate properties that provide a certain level of flexibility:

<MyComponent color="red" border="2" borderColor="black" />

In such cases different modifiers can simply be constructed by users of the component. But what if the number of CSS properties is large? This solution also doesn't scale nicely. Another con is that any modification of the component's styles will likely force us to change the API as well.

Another solution is to expose the class that will be attached to the root element (let’s assume it's not a global class and proper CSS isolation technique is in place):

<MyComponent className="my-component-position" />

Attaching a class from the outside will effectively overwrite the root element CSS. This is very convenient for positioning the component, with such CSS properties as: "position," "top," "left," "z-index," "width," and "flex.” Positioning of the component is rarely the responsibility of the component itself. In most cases it is expected to be provided from outside. This solution is very convenient and more flexible than former proposals. But it’s limited to setting the CSS only for the component's root element.

The combination of the above solutions would likely allow us to address many use cases, but is not perfect, especially for component libraries, where simple, generic and consistent API is very important.

Decoupled styling

I'd like to take a step back and rethink the whole idea of styling-API for components. The native HTML elements come with minimal CSS, enough to make the elements usable. The users are expected to style them themselves. We are not talking about "customisation", as there is basically no inherent styling in place to "customise". Users inject styling, via a “class” attribute or “className” property:

<button class="fashion-store-button" />

In latest browsers like Chrome, we can also set the styling for more complex HTML5 elements like video elements:

<video class="fashion-store-video" />

.fashion-store-video::-webkit-media-controls-panel {
 background-color: white;
}

Thanks to Shadow DOM and webkit-pseudo-elements users can set the styles not only for the root element, but also for important inner parts of the video component. However webkit pseudo-elements are poorly documented and seem to be unstable. It’s even worse for custom elements, because currently it’s not possible to customise the inner parts of elements (::shadow and /deep/ have been deprecated). However, there are other proposals that will likely fill the gap:

Let's summarise the native approach, which I call "decoupled styling":

  1. A component is responsible only for its functionality (and API) and comes with minimal or no styling
  2. A component styling is expected to be injected from outside
  3. There is styling-API in place to style the inner parts


Benefits

The nature of styling is change, the nature of functionality (and API) is stability. It makes perfect sense to decouple both. Decoupled styling actually solves many issues that UI-component library developers and users are facing:

  • styling couples components together
  • same changelog for styling and functionality/API causes upgrading issues (e.g. forced migrations)
  • limited resilience - changes in styling propagate to all parts of the frontend project
  • costs of rewriting components to implement a new design
  • costs of rewriting/abandoning projects, because of outdated components
  • limitations of styling-API to address different use cases
  • bottleneck of component library constantly adjusted for different use cases


API proposal

In the world of custom UI components, many components are constructed from other components. Contrary to native HTML/CSS implementation with injecting a single class name, here we need API for accessing the nested components. Let’s look at the following proposal for the API.

Imagine a “Dialog” component that contains two instances of a “Button” component (“OK” and “Cancel” buttons). The Dialog component wants to set the styling for OK button but leave the styling for the Cancel button unchanged (default):

<Button classes={icon: "button-icon", text: "button-text"}>OK</Button>
<Button>Cancel</Button>

We used “classes” property to inject the CSS classes for two of Button’s internal elements; the icon and the text elements. All properties are optional. It’s up to component itself to define its styling-API (set of class names referencing their child elements).

To use Dialog with its default, minimal styling:

<Dialog />

But for cases where we want to adjust the styles, we will inject it:

<Dialog classes={root: "dialog-position"} />

We injected a class that will be attached to the root element. But we can do much more:

<Dialog classes={
 root: "dialog-position",
 okButton: {
   icon: "dialog-ok-button-icon",
   text: "dialog-ok-button-text"
 }
} />

The example above shows how we can access every level of nested components structure in the Dialog. We’ve set the CSS classes for the root element and OK button. By doing that we will effectively overwrite the styling for the OK button, that is preset inside Dialog.

In the same way we will be able to set the styling for components that contain Dialogs, and farther up, to the highest level of the application. On the root level of the application, defining the styles will practically mean defining the application theme.


Implementation

I implemented two examples using React and TypeScript, first with CSS Modules and second with Emotion (CSS-in-JS library). Both are based on the same concept:

  • default, minimal styling for components is predefined as an isolated set of classes
  • styling-API (set of class names) is defined using TypeScript interface, with all properties optional
  • components allow injection of class names object (via “classes” parameter) which is “deeply-merged” with default class names object, overwriting the styles

React, TypeScript, CSS Modules: https://github.com/mrac/decoupled-styling-css-modules
React, TypeScript, Emotion: https://github.com/mrac/decoupled-styling-css-in-js

Conclusion

Decoupling styling from UI components may be a step towards making them really reusable, drawing from the original idea behind Cascade Style Sheets to separate the presentation layer from UI logic and markup. Defining boundaries between UI logic and markup on one side and styling on the other side would likely change the way UX designers collaborate with engineers.  Here designers would style components based on API provided by engineers. It would be easier now to specify what constitutes a breaking-change within that contract. Putting an ever-changing skin on top of what is stable would likely save costs, friction and contribute to software quality.

Zalando Tech Blog – Technical Articles Kaiser Anwar Shad

Comparing Redux, MobX & setState in React

Introduction

React is a declarative, efficient, and flexible JavaScript library for building user interfaces. Compared to other frontend libraries and frameworks, React’s core concept is simple but powerful: ‘React makes it painless to design simple views and renders by using virtual DOM’. However, I don’t want to go into detail about virtual DOM here. Rather, I want to show three ways how you can manage state in React. This post requires basic understanding about the following state management approaches. If not, check out the docs first.

  1. setState: React itself ships with built-in state management in the form of a component’s `setState` method, which will queue a render operation. For more infos => reactjs.org
  2. MobX:  This is a simple and scalable library applying tested functional reactive programming (TFRP), which stands for: ‘Anything that can be derived from the application state, should be derived. Automatically.’ For more infos => mobx.js.org
  3. Redux: Maybe the most popular state management solution for React. The core concepts are having a single source of truth, immutable state and that state transitions are initiated by dispatching actions and applied with pure functions (reducers). For more infos => redux.js.org


Location

  1. setState is used locally in the component itself. If multiple children need to access a parent’s local state, the data can either be passed from the state down as props or, with less piping, using React 16’s new Context API.
  2. MobX can be located in the component itself (local) or in a store (global). So depending on the use case the best approach can be used.
  3. Redux is providing the state globally. Means the state of the whole application is stored in an object tree within a single store.


Synchronicity

  1. setState is asynchronous.*
  2. MobX is synchronous.
  3. Redux is synchronous.

*Why asynchronous? Because delaying reconciliation in order to batch updates can be beneficial. However, it can also cause problems when, e.g., the new state doesn’t differ from the previous one. It makes it generally harder to debug issues. For more details, check out the pros and cons.


Subscription

  1. setState is implicit, because it directly affects the state of the component. Changing the state of child components can be done via passing props (or Context API in React 16).
  2. MobX is implicit, because it is similar to setState with direct mutation. Also component re-renders are derived via run-time usage of observables. To achieve more explicitness/observability, actions can (and generally should) be used to change state.
  3. Redux is explicit, because a state represents a snapshot of the whole application state at a point in time. It is easy to inspect as it is a plain old object. State transformations are explicitly labeled/performed with actions.

Mutability*

  1. setState is mutable because the state can be changed by it.
  2. MobX is mutable, because actions can change the state of the component.
  3. Redux is immutable, because state can’t be changed. Changes are made with pure functions which are transforming the state tree.

* With mutability the state can be changed directly, so the new state overrides the previous one. Immutability is protecting the state from changes and (in Redux) instead of directly changing the state it dispatches actions to transform the state tree into a new version.


Data structure

  1. setState -
  2. MobX Graph: multidirectional ways; loops can be used. The state stays denormalized and nested.
  3. Redux Tree: is a special kind of graph, which has only one way: from parent to child. The state is normalized like in a database. The entities only reference to each other by identifiers or keys.

Observing Changes

  1. setState -
  2. MobX: Reactions are not producing new values, instead they produce side effects and can change the state.
  3. Redux: An Object describes what happened (which action was emit).

Conclusion

Before starting to write your application you should think about which problem you want to solve. Do you really need an extra library for state management or is React’s built-in setState fulfilling your needs? Depending on the complexity you should extend it. If you love to go for the mutable way and expect the bindings automatically, then MobX can fit your needs. If you want to have a single source of truth (storing state in an object tree within a single store) and keep states immutable, then Redux can be the more suitable solution.

Hopefully this post gave you a brief overview about the different ways to manage state in React. Before you start with one of those libraries, I recommend to go through the docs of each. There are a lot more treasures to discover!


TL;TR:


This post is inspired by:

Check out our open Software Engineering positions on our jobs page.

Zalando Tech Blog – Technical Articles Holger Schmeisky

Sharing successful large scale agile experiences


Zalando has been known for radical approaches to agility since 2015. In order to keep growing and staying successful we took the next step in 2017 forming around 30 business units. Each business unit is formed around one particular business problem, topic or product with end2end responsibility. All disciplines needed are inside this business unit from commercial roles to tech teams.

Challenges in large scale product groups

Looking at this setup, we experience challenges. You’re probably familiar with this if you work in a similar setup or if your company has around the size of one of our business units (<100 people).

  • Who takes product decisions at this size with several teams and product people?
  • How to keep the focus on the actual product with so many technical components and intermediate steps?
  • How to enable 50 engineers to understand their everyday task contribution to the overall quarterly goals?
  • How to do sprint planning with 20 people?
  • How to handle cross-cutting concerns like 24/7 and platform work in a feature team setup?

By far the biggest question was however: How can this work inside Zalando?


Our Solution Approach

How to support these +30 business units to reach their business goals through agile working? Rome was not built in a day. We knew we had to work by process and collaboration.

We used the power of our network and collected successful solutions from all units. The first and most important observation was that no solution can be mechanically copied, but always has to be adapted to the specific needs of the unit (“There are no best practices, only practices that work in a specific context”). To enable this adaption and learning, in addition to the bare facts we collected:

  1. the story and motivation around the solutions
  2. the details of how they are adopted
  3. the (contact details of the) people who created them

For the first factor, we invited people from these teams for teachback sessions open for everyone to share their experiences in a try/avoid format.

Secondly, from these we created a 20 page guide on how to structure large teams with background details. Finally, we connected people we talked to who have similar challenges to the pioneers, because all advice needs to be adapted to the specific BU needs.

Concrete Examples

For example, the Fashion Store Apps group (5 teams) struggled with their narrow product definition: Each platform and the API were treated as separate products, with seperate teams, backlogs, product owners, etc. These needed to be managed, synchronized, and aligned, and code needed to be integrated. As you can imagine, somewhere along the way the focus on the customer gets hard to find. To address this, the team redefined the product as “Fashion Store Apps,” reorganized the teams to reflect this, and merged all backlogs into one.

Another example is how Personalization (6 teams) increased the understanding of the goals and unlocked possibilities. As is typical in a large organization, goals and concrete products were difficult to define for this department and usually the understanding did not transfuse to the engineering and data science teams. To tackle this, everyone (including engineers) took responsibility for creating or refining the press releases that underlie the epics for the upcoming quarter. Ideas to achieve goals are as likely to come from Product* as they are to come from delivery teams. The concrete outcome is an aligned and commonly understood overview of the next quarter’s sprints. This led to much higher involvement and identification during the quarter, and to more motivated teams.

A LeSS introduction backwards

These are only two examples from many more instances of how we scale agile at Zalando. The whole approach is somehow a LeSS introduction backwards. We make note of what trials work, and we find a high similarity to the LeSS framework without ever using the word or the whole framework. The practices emerged themselves as they made sense to the people inside the organization. As one engineering lead put it after reading a LeSS book, “It’s nice to see that we were not the only ones with these ideas.”

Our key learning directed to all fellow Agile Coaches and Agile Change Agents is to not implement frameworks, but to source from working solutions and share the successes.

Eventually we will end up in a form of LeSS organization without anybody inside Zalando connecting emotionally to the framework itself.

If you would like to learn more, feel free to reach out to agility-coaching@zalando.de or have a look at our open position.

Many thanks for the input and support of our colleagues Samir Hanna, Tobias Leonhard and Frank Ewert.

Zalando Tech Blog – Technical Articles Michal Michalski

Insights on Zalando's event-driven microservice architecture

As discussed in my previous blog post, Kafka is one of the key components of our event-driven microservice architecture in Zalando’s Smart Product Platform. We use it for sequencing events and building an aggregated view of data hierarchies. This post expands on what I previously wrote about the one-to-many data model and introduces more complex many-to-many relationships.

To recap: to ensure the ordering of all the related entities in our hierarchical data model (e.g. Media for Product and the Product itself) we always use the same partition key for all of them, so they end up sequenced in a single partition. This works well for a one-to-many relationship: Since there’s always a single “parent” for all the entities, we can always “go up” the hierarchy and eventually reach the topmost entity (“root” Product), whose ID we use to derive the correct partition key. For many-to-many relationships, however, it’s not so straightforward.

Let’s consider a simpler data model that only defines two entities: Products (e.g. Shoes, t-shirt) and Attributes (e.g. color, sole type, neck type, washing instructions, etc., with some extra information like translations). Products are the “core” entities we want to publish to external, downstream consumers and Attributes are meta-data used to describe them. Products can have multiple Attributes assigned to them by ID, and single Attributes may be shared by many Products. There’s no link to a Product in Attribute.

Given the event stream containing Product and Attribute events, the goal is to create an “aggregation” application, that consumes both event types: “resolves” the Attribute IDs in Product entities into full Attribute information required by the clients and sends these aggregated entities further down the stream. This assumes that Attributes are only available in the event stream, and calling the Attribute service API to expand IDs to full entities is not feasible for some reason (access control, performance, scalability, etc.).

Because Attributes are “meta data”, they don’t form a hierarchy with the Product entity; they don’t “belong” to them, they’re merely “associated” with them. It means that it’s impossible to define their “parent” or “root” entity and, therefore, there’s also no single partition key they could use to be “co-located” with the corresponding Products in a single partition. They must be in many (potentially: all) of them.

This is where Kafka API comes in handy! While Kafka is probably best known from its key-based partitioning capabilities (see: ProducerRecord(String topic, K key, V value) in Kafka’s Java API), it’s also possible to publish messages directly to the specific partition using the alternative, probably a less known ProducerRecord(String topic, Integer partition, K key, V value). This, on its own, allows us to broadcast an Attribute event to all the partitions in a given topic, but if we don’t want to hardcode the number of partitions in a topic, we need one more thing: producer’s ability to provide the list of partitions for a given topic using the partitionsFor method.

The complete Scala code snippet for broadcasting events could now look like this:

import scala.collection.JavaConverters._
Future.traverse(producer.partitionsFor(topic).asScala) { pInfo =>
 val record = new ProducerRecord[String, String](topic, pInfo.partition, partitionKey, event)

// send the record
}

I intentionally didn’t include the code to send the record, because the Kafka’s Java client returns Java Future, so converting this response to Scala Future would require some extra code (i.e. using Promise), which could clutter this example. If you’re curious on how this could be done without the awful, blocking  Future { javaFuture.get } or similar (please, don’t do this!), you can have a look at the code here.

This way we made the same Attribute available in all the partitions, for all the “aggregating” Kafka consumers in our application. Of course it carries consequences and there’s a bit more work required to complete our goal.

Because the relationship information is stored in Product only, we need to persist all the received Attributes somewhere, so when a new Product arrives, we can immediately expand the Attributes it uses (let’s call it “Attribute Local View”, to emphasise it’s a local copy of Attribute data, not a source of truth). Here is the tricky part: Because we’re now using multiple, parallel streams of Attribute data (partitions), we need an Attribute Local View per partition! The problem we’re trying to avoid here, which would occur in case of a single Attribute Local View, is overwriting the newer Attribute data coming from “fast” partition X, by older data coming from a “slow” partition Y. By storing Attributes per partition, each Kafka partition’s consumer will have access to its own, correct version of Attribute at any given time.

While storing Attributes per partition might be as simple as adding Kafka partition ID to the primary key in the table, it may cause two potential problems. First of all, storing multiple copies of the same data means – obviously – that the storage space requirements for the system are significantly raised. While this might not be a problem (in our case Attributes are really tiny comparing to the “core” entities), this is definitely something that has to be taken into account during capacity planning. In general, this technique is primarily useful for problems, where the broadcasted data set is small.

Secondly, by associating the specific versions of Attributes with partition IDs, the already difficult task of increasing numbers of partitions becomes even more challenging, as Kafka’s internal topic structure has now “leaked” to the database. However, we think that growing the number of partitions is already a big pain (breaking the ordering guarantees at the point where partitions were added!) that requires careful preparations and additional work (e.g. migrating to the new topic with more partitions, rather than adding partitions “in place” to the existing one), so it’s a tradeoff we accepted. Also, to reduce the risk of extra work we try to carefully estimate the number of partitions required for our topics and tend to overprovision a bit.

If what I just described sounds familiar to you, you might have been using this technique without even knowing what it is; it’s called broadcast join. It belongs to a wider category of so called map-side joins, and you can find different implementations of it in libraries like Spark or Kafka Streams. However, what makes this implementation significantly different is the fact that it reacts to the data changes in real-time. Events are broadcast as they arrive, and local views are updated accordingly. The updates to aggregations on product changes are instant as well.

Also, while this post assumes that only Product update may trigger entity aggregation, the real implementation we’re using is doing it on Attribute updates as well. While, in principle, it’s not a difficult thing to do (a mapping of Attribute-to-Product has to be maintained, as well as the local view of the last seen version of a Product), it requires significantly more storage space and carries some very interesting performance implications as single Attribute update may trigger an update for millions of Products. For that reason I decided to keep this topic out of the scope of this post.

As you just saw, you can handle many-to-many relationships in a event-driven architecture in a clean way using Kafka. You’ll benefit from not risking having outdated information and not resorting to direct service calls, which might be undesirable or even impossible in many cases. As usual, it comes at a price, but if you weigh pros and cons carefully upfront, you might be able to make a well-educated decision to your benefit.

Like Michal's work and want to be part of the action at our Fashion Insights Center in Dublin? Keep up to date with our Dublin jobs.

Zalando Tech Blog – Technical Articles Oleksandr Volynets

How we migrated the Zalando Logistics Operating Services to Java 8

“Never touch working code!” goes the old saying. How often do you disregard this message and touch a big monolithic system? This article tells you why you should ignore common wisdom and, in fact, do it even more often.


Preface

Various kinds of migration are a natural part of software development. Do you remember the case when the current database didn’t scale enough? Or maybe there is need for a new tech stack when the existing stack does not meet changing requirements? Or perhaps the migration from the monolithic application to the microservice architecture is hard. There could also be smaller-scale migrations like upgrading to a newer version of the dependency, e.g. Spring, or Java Runtime Environment (JRE). This is the story on how a relatively simple task of migration from Java 7 to Java 8 was performed on a large-scale monolithic application that has ultimate criticality to the business.

Zalos as the service for Logistics Operations

Zalos (Zalando Logistics System) is a set of Java services, backend and frontend, that contains submodules to operate most functions inside the warehouses operated by Zalando. The scale of Zalos can be summarized by the following statistics:

  • more than 80,000 git commits,
  • more than 70 active developers in 2017,
  • almost 500 maven submodules,
  • around 13,000 Java classes with 1.3m lines of code, plus numerous production and test resource files,
  • operates with around 600 PostgreSQL tables and more than 3,000 stored procedures.

Zalos 2, denoted as just Zalos below, is the second generation of the system, and has grown to this size over the past five years. Patterns that were, at the time, easy to adopt for scaling up architectural functionality, have quickly become a bottleneck with the growing number of teams maintaining it. It is deployed to all Zalando warehouses every second week, and every week there is a special procedure to create a new release branch. Each deployment takes about five hours, branching takes about the same time. When also considering urgent patches, it takes a significant portion of each team’s time to do regular deployment or maintenance operations.

Now, what happens if the system is left unmaintained for a while? The package dependencies and Java libraries become obsolete and, as a consequence, security instability grows. Then, one day one of the core infrastructure systems has to change the SSL certificate, and this causes some downtime in all relevant legacy systems operating a deprecated Java version. For the logistics services these problems might become a big disaster, and you start thinking: “What does it take to migrate Zalos from Java 7 to Java 8?”


Migration? Easy!

With some basic experience with Java 9, the option to go even further has been rejected pretty fast: a combination of Java-9 modularity and 500 sub-modules doesn’t look very positive. Well, bad luck. What else do you need to keep in mind for Java 8 support? Spring? Sure. GWT? Maybe. Guava? Oh yes. Generics? This too.

This is a good time to talk about the tech stack for Zalos. It contains backend as well as frontend parts, both running Spring 3. The backend uses PostgreSQL databases via the awesome sprocwrapper library. Both backend and frontend rely on Zalando-internal parent packages to take care of dependency management. The frontend engine is GWT 2.4 with some SmartGWT widgets. And, to mention a few more challenges, it uses Maven overlays with JavaScript but more on this later.

Our first strategy was to bump as many package dependencies as we can. Spring 4 which fully supports Java 8, GWT 2.8.2 that already has support for Java 9, Guava 23.0, etc. We use GWT 2.4; a jump of over five years development-wise. Hard dependency on our internal Zalando dependencies had ruled out the major Spring upgrade too. Guava 23 has deprecated some methods and we would need to change quite an amount of code: again, a failure.

Let’s try an another strategy then: bump as little as we can. This strategy worked much better. We only needed to have Spring 3.2.13 and Guava 20.0, plus required upgrades like javassist and org.reflections. The matrix of compatible versions is shown in the appendix. GWT dependency was left untouched, although it limits our client code to Java 7. A compromise but not a blocker: there is little active development of new GWT code anyway.

Now, overlays, or in our case Dependency Hell, is a feature of Maven to include dependencies from a WAR or a ZIP file and it “inlines” the complete package as is. And it does so with all its dependencies. As an example, this means, should an overlay have a different version of spring-core, you get two versions of spring-core in the final WAR artifact. When the application starts, it will get confused which version to use for which parts of the application, and various ClassNotFound exceptions will pop up. Bad luck, republishing all war-overlays with updated dependencies is required.


Go-live or don’t rush?

It took just two weeks of highly-motivated and self-driven work for two people to crack the problem and run the 500-module monolith on the laptop with Java 8. It took two more weeks to deploy it to the staging environment after fixing multiple issues. After that, it took two more months to finally deploy it to the production environment. Why so long? Because we deal with the utmost critical system that has several serious constraints, and here they are:

  1. Deployments. Deployment to production lasts up to five hours and it should not interfere with any other deployment, due to internal limitations of the deployment system. With absolute priority for production deployment there isn’t much time for experimenting with the migration. Solution? Tweaking the deployment service helped reduce deployment time by about one third to have some freedom for experimenting on a staging environment.
  2. Development. There are still about 25 commits per day in the main branch. Breaking it would have a significant impact on feature development, and it isn’t easy to experiment with JDK versions from the feature branch. This isn’t good, but still there is a more serious constraint.
  3. Warehouse operations. They are the backbone of an e-commerce company and should not be interrupted by the migration. The risk of any bug should be carefully minimized to maintain the service liveness.

To solve at least two constraints, we created a concrete three-step plan on how we execute the migration in a safe manner and be able to roll back at any time:

  1. Upgrades of all packages compatible with both Java 7 and 8 without changing runtime version. This ensured that there are no changes for deployment
  2. Switch to Java 8 runtime (JRE) keeping source code in Java 7 mode. This step ensured that we can safely change the deployment settings without touching the code and dependencies.
  3. Switch to Java 8 development mode to fully support Java 8 features. No major deployment changes were done with this step.

In addition, except for a staging environment, every step was carefully tested on a so-called beta environment which operates on production data.


Outlook

The migration was completed despite some failed attempts a few years ago. Several things have happened. The service has become a little more stable and secure. The code can now be written with lambdas, method references, etc. Deployment service has been improved too. But most importantly, the legacy system got attention. Even though we had one camp of people who said, “We tried that before, why do you want to try again?” there was also the second camp with, “You are crazy but yeah, do it”. No matter what was tried before and in what manner, it is never too late to try again.

Keep your legacy code under careful supervision: add code quality metrics, minimize maintenance efforts, optimize release cycles. With this you will stop having “Legacy Nightmares” but rather have a maintained piece of code.

Appendix

Here is a list Maven dependencies and related changes that finally made it working together:


In addition, the following compilation and runtime settings were required:

  • <source> and <target> properties for maven-compiler-plugin set to 1.8
  • tomcat 7, i.e. run services with “mvn tomcat7:run-war” and not “mvn tomcat:run-war” which uses tomcat 6 by default.

Come work with us! Have a look at our jobs page.

Zalando Tech Blog – Technical Articles Rohit Sharma

Using Akka cluster-sharding and Akka HTTP on Kubernetes

This article captures the implementation of an application serving data over HTTP which is stored in cluster-sharded actors and deployed on Kubernetes.

Use case: An application, serving data over HTTP and with a high request rate, and the latency of order of 10ms with limited database IOPS available.

My initial idea was to cache it in memory, which worked pretty well for some time. But this meant larger instances due to duplication of cached data in the instances behind the load balancer. As an alternative I wanted to use Kubernetes for this problem and do a proof of concept (PoC) of a distributed cache with Akka cluster-sharding and Akka-HTTP on Kubernetes.

This article is by no means a complete tutorial to Akka cluster sharding or Kubernetes. It outlines knowledge I gained while doing this PoC. The code for this PoC can be found here.

Let’s dig into the details of this implementation.

To form an Akka Cluster, there needs to a pre-defined ordered set of contact points often called seed nodes. Each Akka node will try to register itself with the first node from the list of seed nodes. Once, all the seed nodes have joined the cluster, any new node can join the cluster programmatically.

The ordered part is important here, because if the first seed node changes frequently then the chances of split-brain increases. More info about Akka Clustering can be found here.

So, the challenge here with Kubernetes was the ordered set of predefined nodes, and here comes StatefulSet(s) and Headless Services to the rescue.

StatefulSet guarantees stable and ordered pod creation, which satisfies the requirement of our seed nodes, and Headless Service is responsible for their deterministic discovery in the network. So, the first node will be “<application>-0” and the second will be “<application>-1” and so on.

  • <application> is replaced by the actual name of the application

The DNS for the seed nodes will be of the form:

<application-name>-<ordinal>.<service-name>.<namespace>.svc.cluster.local

Steps:

  1. Start with creating the Kubernetes resources. First, the Headless Service, which is responsible for deterministic discovery of seed nodes(Pods), can be created using the following manifest:
kind: Service
apiVersion: v1
metadata:
name: distributed-cache
labels:
  app: distributed-cache
spec:
clusterIP: None
selector:
  app: distributed-cache
ports:
  - port: 2551
    targetPort: 2551
    protocol: TCP

Note, that the “clusterIP” is set to “None.” Which indicates it’s a Headless Service.

2. Create a StatefulSet, which is a manifest for ordered pod creation:

apiVersion: "apps/v1beta2"
kind: StatefulSet
metadata:
name: distributed-cache
spec:
selector:
  matchLabels:
    app: distributed-cache
serviceName: distributed-cache
replicas: 3
template:
  metadata:
    labels:
      app: distributed-cache
  spec:
    containers:
     - name: distributed-cache
       image: "localhost:5000/distributed-cache-on-k8s-poc:1.0"
       env:
         - name: AKKA_ACTOR_SYSTEM_NAME
           value: "distributed-cache-system"
         - name: AKKA_REMOTING_BIND_PORT
           value: "2551"
         - name: POD_NAME
           valueFrom:
             fieldRef:
               fieldPath: metadata.name
         - name: AKKA_REMOTING_BIND_DOMAIN
           value: "distributed-cache.default.svc.cluster.local"
         - name: AKKA_SEED_NODES
           value: "distributed-cache-0.distributed-cache.default.svc.cluster.local:2551,distributed-cache-1.distributed-cache.default.svc.cluster.local:2551,distributed-cache-2.distributed-cache.default.svc.cluster.local:2551"
       ports:
        - containerPort: 2551
       readinessProbe:
        httpGet:
          port: 9000
          path: /health

3. Create a service, which will be responsible for redirecting outside internet traffic to pods:

apiVersion: v1
kind: Service
metadata:
labels:
  app: distributed-cache
name: distributed-cache-service
spec:
selector:
  app: distributed-cache
type: ClusterIP
ports:
  - port: 80
    protocol: TCP
    # this needs to match your container port
    targetPort: 9000

4. Create anIngress, which is responsible for defining a set of rules to route traffic from outside internet to services.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: distributed-cache-ingress
spec:
rules:
  # DNS name your application should be exposed on
  - host: "distributed-cache.com"
    http:
      paths:
        - backend:
            serviceName: distributed-cache-service
            servicePort: 80

And the distributed cache is ready to use:

Summary
This article covers Akka Cluster-sharding on Kubernetes with the pre-requirements of an ordered set of Seed Nodes and their deterministic discovery in the network, and how it can be solved with StatefulSet(s) and Headless Service(s).

This approach of caching data in a distributed fashion offered the following advantages:

  • Less database lookup, saving database IOPS
  • Efficient usage of resources; fewer instances as a result of no duplication of data
  • Lower latencies to serve data

This PoC opens up new doors to think about how we cache data in-memory. Give it a try (all steps to run it locally are mentioned in the Readme).

Interested in working at Zalando Tech? Our job openings are here.