1/n - How do I ... in FP : Validation
Here is a small post-it like article. Just to remember some useful patterns we hit in everyday life as software developers. This post is the first in a (potentially infinite) stream.
How do I validate things
The case
The basic. Let say we are in a web app exposing a JSON rest endpoint with lots of parameters. As a client you do not want to fall into this loop :
- create JSON payload
- make HTTP request to test it
- receive a
Bad Request
response indicating a invalid parameter - fix the parameter
- re-send the request
- getting back again a
Bad Request
response because of a different invalid parameter - and so on.
The same also applies to validating a simple web form. As a user, you want a chance to fix your mistakes once and for all. In this post we will deal with the configuration of an application which is backed by lightbend config.
Lightbend config is a simple library that just load a configuration writen in HOCON (format similar to JSON but with comments) and allows us to retrieve elements of configuration using a simple path like somepart.subpart.property
. It is written in Java and not really into real Functional Programming ; it throws exceptions, for instance. In this post we won't use things more complicated than :
import com.typesafe.config.ConfigFactory
Config conf = ConfigFactory.load();
int bar1 = conf.getInt("foo.bar");
This example should talk by itself about the simplicity of use of this library.
One more dependency will be added to the project is guava, for theses reasons :
- highlight how to deal with an external object
- we will deal with hostnames and ports
- because the aim of this post is not to discuss valid schemes in hostnames and ports.
Specification
The configuration we will validate will not be a toy situation, but a real world use case : we will validate the configuration of a Kafka Streams application, mainly consisting in two parts :
- the configuration of the kafka stream part
- some business configuration.
For the Kafka part, it is simple. A Kafka stream application only requires two parameters : a list of valid hosts as bootstrap servers and an application identifier :
kafka {
bootstrapServers: ["localhost:9092", "localhost:9093"]
applicationId: salesApp
}
Let's say the application is used in a sales context, were the input stream elements correspond to a sale made with a budget. In a defined time window, depending on the the kind of item sold, a threshold is defined to emit in the output stream a review of the period that just ended.
The application logic won't be a concern in this post and the only thing interesting will be the configuration needed for the threshold :
app {
thresholdA : 2003
thresholdB : 4878
thresholdC : 7889
}
To add more constraints, the following relation must be enforced :
100 <thresholdA < thresholdB < thresholdC < 10000
Solutions in Object Oriented / Java land
When looking at this problem, we can see two solutions:
- using the features of JSR 303 (bean validation) and JSR 380.
- doing it by hand with
if
statements
Let's see what we can come up with both of them...
Let's use bean validation
To solve this validation case, the JSR 303 was set up. In this section, we will try to use it to validate our model. As a standard in Java, there are several implementations. In this article, we will use hibernate-validator which is the reference implementation of the specification.
First thing to note is the requirements list : hibernate-validator depends on Unified EL (JSR 341) and CDI (not sure about that one though).
We can have a hint on how the hibernate implementation works internally when reading the section about SecurityManager : the framework rely heavily on reflection at runtime.
Validating things in the JSR 303 context requires an instance of Validator
. To create one, it is easy, we only need a call to a factory.
ValidatorFactory factory = Validation.buildDefaultValidatorFactory();
Validator validator = factory.getValidator();
Before actually validating anything, we will start with a model of our BusinessConfig
class which will serve to hold the three thresholds (first shot) :
public final class BusinessConfig {
@Min(0)
public int thresholdA;
public int thresholdB;
@Max(10000)
public int thresholdC;
public BusinessConfig(int thresholdA, int thresholdB, int thresholdC){
this.thresholdA = thresholdA;
this.thresholdB = thresholdB;
this.thresholdC = thresholdC;
}
// Omitted imports & getters/setters for brevity
}
Using the @Min
and @Max
annotations, we specify easily constraints on thresholdA
and thresholdC
. To deal with the constraint on thresholdB
, we will have some kind of a dependent constraint on the other values. To cope with that, we will have to write a Class level constraint, which will result in:
- writing a custom annotation
- writing a validator for it
Writing the annotation is straightforward :
@Target({ElementType.TYPE, ElementType.ANNOTATION_TYPE})
@Retention(RetentionPolicy.RUNTIME)
@Constraint(validatedBy = {ThresholdConstraintValidator.class})
public @interface ThresholdConstraint {
String message() default "{eu.enhan.validation.java.jsr303.ThresholdConstraint.message}";
Class<?>[] groups() default {};
Class<? extends Payload>[] payload() default {};
}
We can see that we added a validatedBy
parameter so that the annotation will be wired to the ThresholdConstraintValidator
class. Note that as forecast, the retention for the annotation is on Runtime scope which is the evidence that some reflection will be done at runtime by the framework.
The implementation of the ThresholdConstraintValidator
class ain't complex either:
public class ThresholdConstraintValidator implements ConstraintValidator<ThresholdConstraint, BusinessConfig> {
@Override
public boolean isValid(BusinessConfig value, ConstraintValidatorContext context) {
if (value == null)
return false;
return value.thresholdA < value.thresholdB && value.thresholdB < value.thresholdC;
}
}
One line is enough to specify how to validate thresholdB
.
One last thing to do is to add our ThresholdConstraint
to the BusinessConfig
class before a first run with a program such as :
ValidatorFactory factory = Validation.buildDefaultValidatorFactory();
Validator validator = factory.getValidator();
Config config = ConfigFactory.load();
int tA = config.getInt("app.thresholdA");
int tB = config.getInt("app.thresholdB");
int tC = config.getInt("app.thresholdC");
BusinessConfig businessConfig = new BusinessConfig(tA, tB, tC);
Set<ConstraintViolation<WholeConfig>> constraintViolations = validator.validate(businessConfig);
If we run it in a standard main()
and if we add a loop to display violations, we will quickly see that our custom validator is not called at all. That is because wee need to add it to the resources/META-INF/services/javax.validation.ConstraintValidator
file. Time to run again.
This time you will get an error at runtime :
Exception in thread "main" javax.validation.UnexpectedTypeException: HV000150: The constraint eu.enhan.validation.java.jsr303.ThresholdConstraint defines multiple validators for the type eu.enhan.validation.java.jsr303.BusinessConfig: eu.enhan.validation.java.jsr303.ThresholdConstraintValidator, eu.enhan.validation.java.jsr303.ThresholdConstraintValidator. Only one is allowed.
This is apparently due to the annotation @Constraint(validatedBy = {ThresholdConstraintValidator.class})
in ThersholdConstraint
. Putting an empty array will solve the issue.
We can customize the path on which the error will appear to make it clear that there is an error on thresholdB
, using the context provided as parameter in isValid()
and the method becomes:
@Override
public boolean isValid(BusinessConfig value, ConstraintValidatorContext context) {
if (value == null)
return false;
boolean valid = value.thresholdA < value.thresholdB && value.thresholdB < value.thresholdC;
if (!valid) {
context.disableDefaultConstraintViolation();
context.buildConstraintViolationWithTemplate("thresholdB should verify thresholdA < thresholdB < thresholdC")
.addPropertyNode("thresholdB")
.addConstraintViolation();
}
return valid;
}
This makes us possible to render the error in a constraint violation on the correct field.
Now, let's continue by validating the Kafka configuration. As seen before, it is a simple bean with 2 fields : a list of hosts and an application id. As we did with the BusinessConfig
we will have to extract values from the config first, this is the easy part :
List<String> rawBootstrapServers = config.getStringList("kafka.bootstrapServers");
String applicationId = config.getString("kafka.applicationId");
Once done, let's convert these strings into a Kafka configuration. As stated before, and to try out how each solution integrates with external code, we will use the HostAndPort
class to model Kafka bootstrap servers.
HostAndPort
already comes with validation built-in : we do not have a public constructor, but factory methods that either throw an exception or return a valid instance. How to integrate with the bean validation specification then ? Surprisingly the documentation does not mention those cases so, validating beans which would require modification (ie by adding annotations or public constructors) does not seem to be supported. We will have to do it by hand.
List<HostAndPort> bootstrapServers = new ArrayList<>();
List<String> invalidValidHosts = new ArrayList<>();
for (String rawBootstrapServer : rawBootstrapServers) {
try {
bootstrapServers.add(HostAndPort.fromString(rawBootstrapServer).withDefaultPort(9092));
} catch (IllegalArgumentException e) {
invalidValidHosts.add(rawBootstrapServer);
}
}
This results in an awkward situation where the results of the validation process cannot be merged together easily. And we did not handle the case where the bootstrapServers
list is empty or if a parameter is missing. Clearly we should do it in a different manner.
In this use case, bean validation appears to be pretty weak by not handling the case where the beans we want to validate may come from another library. To go further, the result of validation is a set of ConstraintViolation
which is designed to be displayed via the standard message mechanism. Transforming these violation into something does not seem to be an easy task, but I might be wrong, as I did not dig further.
However the mechanism of externalizing some common behavior is an excellent idea and the specification answers to a specific validation problem.
Let's do it by hand
With the Bean Validation spec, we hit some cases where integrating with external beans is tedious. What if we do validation by hand. Is the boilerplate that big ? Will we underline clearly the benefits of Hibernate Validation ?
To get started with this solution, we just need to write the code. In idiomatic Java, this solution will be based on the standard library for lists, sets, etc.
To avoid the ConstraintViolation
un-flexibility, we will write our own structure to handle errors. We see that we have these possible mistakes in the configuration file :
- completely bad syntax (
config
wasn't able to parse the file) - one parameter is absent
- list of bootstrap servers is empty
- an host string is not a valid host and port
thresholdA
is too lowthresholdC
is too highthresholdB
is not between thresholdA and thresholdC
This could be translated in the following type hierarchy:
public class ConfigErrors {
private ConfigErrors() {
}
interface ConfigError {
}
static class CouldNotParse implements ConfigError {
}
static class ParameterIsMissing implements ConfigError {
public final String parameterName;
public ParameterIsMissing(String parameterName) {
this.parameterName = parameterName;
}
}
static class NoBootstrapServers implements ConfigError {
}
static class InvalidHost implements ConfigError{
public final String incorrectValue;
public final int positionInArray;
public InvalidHost(String incorrectValue, int positionInArray) {
this.incorrectValue = incorrectValue;
this.positionInArray = positionInArray;
}
}
static class ThresholdATooLow implements ConfigError {
public final int incorrectValue;
public final int minAllowedValue;
public ThresholdATooLow(int incorrectValue, int minAllowedValue) {
this.incorrectValue = incorrectValue;
this.minAllowedValue = minAllowedValue;
}
}
static class ThresholdCTooHigh implements ConfigError {
public final int incorrectValue;
public final int maxAllowedValue;
public ThresholdCTooHigh(int incorrectValue, int maxAllowedValue) {
this.incorrectValue = incorrectValue;
this.maxAllowedValue = maxAllowedValue;
}
}
static class ThresholdBNotInBetween implements ConfigError {
public final int incorrectValue;
public final int suppliedA;
public final int suppliedC;
public ThresholdBNotInBetween(int incorrectValue, int suppliedA, int suppliedC) {
this.incorrectValue = incorrectValue;
this.suppliedA = suppliedA;
this.suppliedC = suppliedC;
}
}
}
Pretty verbose, but having total control of this type hierarchy will make it easy to add methods to the interface (such as a toJSON()
for instance) or to use a visitor pattern to traverse a list of errors.
Validating the bootstrapServer
parameter will be straightforward as it is almost the same code we used before, except that we will output the error as an InvalidHost
. Failure on bootstrapServer
may also be caused by an empty list. To take all this in consideration, we come up with this (for now, we will set aside the case where a parameter is absent):
List<String> rawBootstrapServers = config.getStringList("kafka.bootstrapServers");
List<ConfigErrors.ConfigError> errors = new ArrayList<>();
List<HostAndPort> bootstrapServers = new ArrayList<>();
if (rawBootstrapServers.isEmpty()) {
errors.add(new ConfigErrors.NoBootstrapServers());
} else {
String rawBootstrapServer;
for (int i = 0; i < rawBootstrapServers.size(); i++) {
rawBootstrapServer = rawBootstrapServers.get(i);
try {
bootstrapServers.add(HostAndPort.fromString(rawBootstrapServer).withDefaultPort(9092));
} catch (IllegalArgumentException e) {
errors.add(new ConfigErrors.InvalidHost(rawBootstrapServer, i));
}
}
}
Note that we created a list of ConfigError
to accumulate errors: at the end of the whole validation procedure, we will just have to check whether this list is empty or not to determine if the validation was successful. We also created a list of bootstrapServers to keep the correct ones.
About the business config, it is also quite easy even if verbose (still considering that config does not throw exceptions) :
int tA = config.getInt("app.thresholdA");
int tB = config.getInt("app.thresholdB");
int tC = config.getInt("app.thresholdC");
if (tA < 0) {
errors.add(new ThresholdATooLow(tA, 0));
}
if (tC > 10000) {
errors.add(new ThresholdCTooHigh(tC, 10000));
}
if (tB > tC || tB < tA) {
errors.add(new ThresholdBNotInBetween(tB, tA, tC));
}
Now, let's see how to handle the case where config.getXXX()
throws exceptions. A pre-Java 8 solution could look like this :
List<String> rawBootstrapServers;
try {
rawBootstrapServers = config.getStringList("kafka.bootstrapServers");
} catch (ConfigException.Missing | ConfigException.WrongType ex) {
errors.add(new ConfigErrors.ParameterIsMissing("kafka.bootstrapServers"));
rawBootstrapServers = null;
}
// and then :
if (rawBotstrapServers != null) {
// Verifications as we did earlier...
}
Continuing with thresholdA
we will have:
Integer tA;
try {
tA = config.getInt("app.thresholdA");
} catch (ConfigException.Missing | ConfigException.WrongType ex) {
errors.add(new ConfigErrors.ParameterIsMissing("app.thresholdA"));
tA = null;
}
// And then
if (tA != null && tA < 0) {
errors.add(new ConfigErrors.ThresholdATooLow(tA, 0));
}
The exact same code applies for thresholdC
. To validate thresholdB
we depend on both thresholdA
and thresholdC
though so we need something a bit more complicated :
if (tA!= null && tB != null && tC != null && (tB > tC || tB < tA)) {
errors.add(new ConfigErrors.ThresholdBNotInBetween(tB, tA, tC));
}
There is a lot of code which has the same shape :
TypeXXX p;
try {
p = config.getTypeXXX("path.to.param");
} catch (ConfigException.Missing | ConfigException.WrongType ex) {
errors.add(new ConfigErrors.ParameterIsMissing("path.to.param"));
p = null;
}
First thing we could do is to transform this pattern into a method :
TypeXXX getTypeXXX(Config cfg, String path, List<ConfigError> errors) {
try {
return config.getTypeXXX(path);
} catch (ConfigException.Missing | ConfigException.WrongType ex) {
errors.add(new ConfigErrors.ParameterIsMissing(path));
return null;
}
}
This way, we factor out the try/catch pattern and we keep accumulating errors in an external list. We then need to write the same method for all the types we will use in our config (Integer
, String
, List<String>
). This is still tedious and we may wonder if we could generify this solution, on the type of data to be extracted. The idea is to decorate the call to config.getXXX()
with the try/catch pattern. Taking advantage of Java 8 features, we create a generic get()
method which requires a Supplier<XXX>
.
public static <A> A get(String path, List<ConfigErrors.ConfigError> errors, Supplier<A> extractor) {
try {
return extractor.get();
} catch (ConfigException.Missing | ConfigException.WrongType ex) {
errors.add(new ConfigErrors.ParameterIsMissing(path));
return null;
}
}
If we put aside the runtime exceptions that do not appear at all in the supplier signature, we can see that functions are a nice way to inject behavior in a general algorithm. Our get()
function does not depend on Config
at all. Yet, the get function ensures that we return null
if the extractor
threw an exception (one documented in config.getXXX()
, though). One major flaw of this design is indeed to rely on such exceptions. What if the extractor is not based on config
and throw another kind of exception ? One other flaw is the list of error we mutate by adding elements to it. What will happen if we are running validation in parallel ? We might then use a thread safe implementation for List
. Also, testing it requires to test the side effect (adding to the list in case of error). Example of test :
@Test
void get_should_return_null_when_throwing_exception() {
// Given
List<ConfigErrors.ConfigError> errors = new ArrayList<>();
// When
String str = Idiomatic2Sample.get("some.path", errors, () -> {throw new ConfigException.Missing("some.path");} );
// Then
Assertions.assertThat(str).isNull();
Assertions.assertThat(errors).hasSize(1);
}
Not much work, but asserting only on the result object is easier (and less error prone: we are not keen to forget it, while we we may forget to assert things on the errors
list, if we only wrote the test).
We still can go further by avoiding to use null
and benefit from the Optional
abstraction brought by Java 8 :
<A> Optional<A> get(String path, List<ConfigErrors.ConfigError> errors, Supplier<A> extractor) {
try {
return Optional.ofNullable(extractor.get());
} catch (ConfigException.Missing | ConfigException.WrongType ex) {
errors.add(new ConfigErrors.ParameterIsMissing(path));
return Optional.empty();
}
}
This does not change much from our previous version of get()
. What changes is how we use the returned optional :
// Validating thresholdA
Optional<Integer> tA = get("app.thresholdA", errors, () -> config.getInt("app.thresholdA"));
tA.ifPresent(p -> {
if (p < 0 ){
errors.add(new ConfigErrors.ThresholdATooLow(p, 0));
}
});
There, we can perform the validation without null
check. The same applies for thresholdC
. But for thresholdB
we need both tA
and tC
to be not empty. We have some possibilities. The first one is to use a nested ifPresent()
:
tA.ifPresent(a -> {
tC.ifPresent(c -> {
tB.ifPresent(b -> {
if (b > c || b < a){
errors.add(new ConfigErrors.ThresholdBNotInBetween(b, a, c));
}
});
});
});
How horrible ! And, think about it, at the end of the process, when we will use the three threshold values do do something useful in our application, we will have again the same kind of code. There must be another way to combine those optionals. If we had two optionals, what would it mean to combine them ? An answer to this question could be : combining two optionals returns an optional which is not empty if both input are not empty and the type of the contained value is the result of the combination of both contained values. In code, this translate to this signature which says "given optionals a and b and a function comb that combines elements contained by a and b, return an optional" :
public static <A,B,C> Optional<C> combine(Optional<A> a, Optional<B> b, BiFunction<A,B,C> comb)
The implementation really looks like our nested ifPresent()
above :
public static <A,B,C> Optional<C> combine(Optional<A> a, Optional<B> b, BiFunction<A,B,C> comb) {
return a.flatMap( aValue ->
b.map(bValue -> comb.apply(aValue, bValue)
)
);
}
It looks like magic, but we just used the API of Optional
:
flatMap
takes as parameter a function which itself returns an optional and take as input the content of the optionalmap
which applies a function taking as input the content of the optional
The benefits of such a function is that it works for any typeA
,B
, andC
.
Generalizing it to combine three optionals could be as simple as :
public static <A,B,C,D> Optional<D> combine(Optional<A> a, Optional<B> b, Optional<C> c, TriFunction<A,B,C,D> comb){
return a.flatMap( aValue ->
b.flatMap(bValue ->
c.map(cValue ->
comb.apply(aValue, bValue, cValue)
)
)
);
}
But Java does not provide such TriFunction
(a function which has 3 parameters). We can note how we nest flatMap
/map
though, in what looks like a recursive structure. We could extrapolate up to a combine which would take n
optionals as parameters... But we hit the limits of the language to express such things in an easy and readable way.
Let's go back to our validation problem. What do we really expect from validation ? When we validate something, we expect either :
- a success, ie. the correct value,
- a list of errors
Then, instead of usingOptional
, can we do better ?Optional
models the presence/absence of value. In our case we need to model the correctness of value and the possible errors. Such a type would be similar toOptional
but instead of being just empty, would contain errors instead.
We won't dig more in this direction using Java as the more we will try to abstract, the more it will be awkward and the less it will be readable and easy to follow.
We encountered some drawbacks in this "by hand" version:
accumulation of errors was done using a list we passed around and mutated causing side effects on it, for instance. Composing the validated results is not clear : we are certain we are in a valid configuration only when in the if (errors.isEmpty)
block.
But we also scratched the surface of some functional programming aspect though :
- enforcing genericity instead of runtime reflection,
- we used some functions especially in the
get()
method to supply some behavior.
For the next step, we will switch to Scala and Kotlin to make things easier and we will dig more on Either
to start with.
Solutions in FP
Scala and Kotlin are better suited than Java when it comes to statically and strongly typed functional programming. Both languages offer features which makes things easier. For instance in Kotlin, the BiFunction
has type : (A, B) -> C
. In Scala, it will be : (A,B) => C
. With this notation, it is easy to see what are the types in input, and those in output.
Meet Either
As we stated at the end of the Java part, validation should be a process returning either the correct value (x)or an error. We will see how we can handle multiple errors later. This duality of the validation result (either success or failure) can be modeled as this (in Java):
public abstract class Either<L, R> {
private Either() {
}
public static final class Left<L, R> extends Either<L, R> {
public final L left;
public Left(L left) {
this.left = left;
}
}
public static final class Right<L, R> extends Either<L, R> {
public final R right;
public Right(R right) {
this.right = right;
}
}
}
Like with the previous example of ConfigError
, this kind of construct is incredibly verbose in Java. In a nutshell, here, we just intended to create an abstract type Either
which only may have two implementations : Left
and Right
. This is called a sealed hierarchy (well, in Java, it is not really sealed and checked by the compiler, but you get the idea). Either is parametric on L
(representing errors on the left side) and R
(representing successes on the right side).
If you are stuck with Java, as such type does not exist in the standard Library, you can check vavr which has implementations for a type called Either
.
In Kotlin, things like this are easier since the language supports sealed hierarchies :
sealed class Either<out L, out R>{
data class Left<out L, out R>(val left: L): Either<L, R>()
data class Right<out L, out R>(val right: R): Either<L, R>()
}
val e: Either<ConfigError, Int> = // Some code
when (e){
is Either.Left -> // Handle error case
is Either.Right -> // Handle normal case
}
The compiler checks that we handle all cases in the when
statement which is a better Java switch
. In Kotlin the Either
data type founds an implementation in Arrow, a library for functional programming which we will use from now on.
Scala enables such constructs in a similar manner :
sealed abstract class Either[+L, +R]
case class Right[+L, +R](value: R) extends Either[L,R]
case class Left[+L, +R](value: L) extends Either[L,R]
In Scala, the Either
data type is part of the standard library.
Now, let's see how Either
can be helpful in our validation context.
First, let's rewrite in Kotlin our hierarchy of errors :
sealed class ConfigError {
object CouldNotParse: ConfigError()
data class ParameterIsMissing(val parameterName: String): ConfigError()
object NoBootstrapServers: ConfigError()
data class InvalidHost(val incorrectValue: String, val positionInArray: Int) : ConfigError()
data class ThresholdATooLow(val incorrectValue: Int, val minAllowedValue: Int): ConfigError()
data class ThresholdCTooHigh(val incorrectValue: Int, val maxAllowedValue: Int): ConfigError()
data class ThresholdBNotInBetween(val incorrectValue: Int, val suppliedA: Int, val suppliedC: Int): ConfigError()
}
Then, we could change the signature of our previous get()
function to :
fun <A> get(path: String, extractor: (String) -> A) : Either<ConfigError, A> = try {
Either.right(extractor(path))
} catch (e: ConfigException.Missing){
Either.left(ConfigError.ParameterIsMissing(path))
}
// And use it :
val tAe: Either<ConfigError, Int> = get("app.thresholdA", { p -> config.getInt(p)})
Conceptually, from the Java version, we just changed 3 things:
- the extractor is a function taking a path as parameter (this avoid repeating the path for
config
) - there is no
List<ConfigError>
as parameter - the error, if any is return as the left, while a success is returned as a right.
Just to be on par, here is the same in Scala :
// Error types:
abstract sealed class ConfigError
object CouldNotParse extends ConfigError
case class ParameterIsMissing(parameterName: String) extends ConfigError
object NoBootstrapServers extends ConfigError
case class InvalidHost(incorrectValue: String, positionInArray: Int) extends ConfigError
case class ThresholdATooLow(incorrectValue: Int, minAllowedValue: Int) extends ConfigError
case class ThresholdCTooHigh(incorrectValue: Int, maxAllowedValue: Int) extends ConfigError
case class ThresholdBNotInBetween(incorrectValue: Int, suppliedA: Int, suppliedC: Int) extends ConfigError
// The get function
def get[A](path: String, extractor: String => A): Either[ConfigError, A] = try {
Right(extractor(path))
} catch {
case ConfigException.Missing => Left(ParameterIsMissing(path))
case ConfigException.WrongType => Left(CouldNotParse)
}
// Usage :
val tAe = get("app.thresholdA", p => config.getInt(p))
So far so good, let's now validate our constraints on thresholds. Like Optional
, Either
has a map
and a flatMap
method. This allows us to do (in Kotlin):
val tAe = get("app.thresholdA", { p -> config.getInt(p)}).flatMap { unvalidatedTa ->
if (unvalidatedTa < 0 )
Either.left(ConfigError.ThresholdATooLow(unvalidatedTa, 0))
else
Either.right(unvalidatedTa)
}
and in Scala:
val tAe = get("app.thresholdA", p => config.getInt(p)).flatMap{ notValidated =>
if (notValidated < 0)
Left(ThresholdATooLow(notValidated, 0))
else
Right(notValidated)
}
What did we do, we just added a flatMap whose parameter (the function) will only be called if the Either
instance is a Right
. We then can write the same kind of validation for thresholdC
, and the reading of thresholdB
. Let's see how to validate thresholdB
by combining several instances (3 in our case) of Either.
In Kotlin with Arrow it comes down to:
val tBe = Either.monad<ConfigError>().binding{
val ta = unvalidatedTAE.bind()
val tb = unvalidatedTBE.bind()
val tc = unvalidatedTCE.bind()
if (ta < tb && tb < tc)
Either.right(tb).bind()
else
Either.left(ConfigError.ThresholdBNotInBetween(tb, ta, tc)).bind()
}
In Scala:
val tBe: Either[ConfigError, Int] =for {
a <- unvalidatedTAE
b <- unvalidatedTBE
c <- unvalidatedTCE
} yield {
if (a < b && b < c)
Right(b)
else
Left(ThresholdBNotInBetween(b, a, c))
}.flatMap(identity)
Remember how we chained flatMap
to combine optionals ? Here with support of the language/library, we can reduce this nesting, but in reality, it is the exact same thing, except that it is more readable.
Let's focus now on how to build the BusinessConfig. Naively, we could be tempted to reuse the same mechanism we used to validate thresholdB
and instead of creating an Either instance, we will create a BusinessConfig. Unfortunately, because (remember, underneath it is flatMap
and map
) we never did something to accumulate the errors, it will be a fail fast feedback. This show evidence that we still miss an abstraction. How to accumulate the errors then ?
Validated and non empty lists to the rescue
If you browse the Arrow documentation, you may have seen mentions of Validated
. This data type is designed for what we need and is the whole point of this post: validation.
Very similar to Either
, it has support in its left type for NonEmptyList
. A NonEmptyList
is a list for which we know, at compile time that it is not empty. The type is often aliased to Nel
. Validated
instances can be obtained from Either
instances and vice versa.
In Kotlin:
// earlier
val tAe: Either<ConfigError, Int> = // see before
// and then
val tAV: Validated<ConfigError, Int> = Validated.fromEither(tAe)
val tBV: Validated<ConfigError, Int> = Validated.fromEither(tBe)
val tCV: Validated<ConfigError, Int> = Validated.fromEither(tCe)
And in Scala:
val tAV: Validated[ConfigError, Int] = tAe.toValidated
val tBV: Validated[ConfigError, Int] = tBe.toValidated
val tCV: Validated[ConfigError, Int] = tCe.toValidated
Because Validated
comes built-in with abilities (more on that in an other post, I guess) to combine values on the left if those type are able to accumulate, then, we lift the left type to Nel
instead. This is done in Kotlin with:
val tAV: ValidatedNel<ConfigError, Int> = Validated.fromEither(tAe).toValidatedNel()
val tBV: ValidatedNel<ConfigError, Int> = Validated.fromEither(tBe).toValidatedNel()
val tCV: ValidatedNel<ConfigError, Int> = Validated.fromEither(tCe).toValidatedNel()
And in Scala with:
val tAV: ValidatedNel[ConfigError, Int] = tAe.toValidatedNel
val tBV: ValidatedNel[ConfigError, Int] = tBe.toValidatedNel
val tCV: ValidatedNel[ConfigError, Int] = tCe.toValidatedNel
The composition itself uses several concepts underneath and the Kotlin version makes it explicit:
val businessConfValidationResult: ValidatedNel<ConfigError, BusinessConfig> =
ValidatedNel.applicative(Nel.semigroup<ConfigError>()).map(tAV, tBV, tCV, {
BusinessConfig(it.a, it.b, it.c)
}).fix()
What is this applicative
? And what the hell is semigroup
. Answering those question would take a lot of time and is not the purpose of this post (I might write about them later though), but using those concepts does not require to understand them completely. Here are the things to understand:
- Applicative and Semigroup are what we call typeclasses
- A type class is somehow a set of properties (laws). In Java, think about the
Comparable<A>
interface : it defines a behavior and you can provide instances of it for any typeA
. - To use a typeclass, you need instances of it. You can see a typeclass as the proof or evidence of the capability of a type. If we take the example of
Comparable<A>
, you can then write function that are completely ignorant of typeA
but you can use them as long as parameters are able to compare. In languages like Scala and Kotlin, the main advantage is that passing instances of typeclasses can be done implicitly : the compiler is able to inject the correct instance, if any, or raise a compilation error if no instance could be found. Mechanisms differ between Kotlin and Scala however, but this is again, a complete complex topic which we won't discuss here (and now). - Applicative is what makes the validation to not fail fast : it allows to run independent computation.
- Semigroup is simpler, it just define on a type
A
a combination function which takes as input two elements ofA
to return an element in the same typeA
. This function must be associative which is to say, in pseudo code :comb(comb(x,y),z) is equivalent to comb(x, comb(y,z))
. An example of semigroup is the+
function onInt
. With a semigroup, we gain the capability to combine.
Back to our Kotlin code, we can see that we use summon explicitly an instance of Applicative
for ValidatedNel
via a function call. This call needs a parameter because ValidatedNel will need to accumulate the errors (the left type), that is why we pass as parameter the instance of Semigroup
for NonEmptyList<ConfigError>
. All this can be seen as correctly building an Applicative
instance for the type ValidatedNel<ConfigError, ?>
where ?
will be determined when using the instance. The type of ValidatedNel.applicative(Nel.semigroup<ConfigError>())
is Applicative<ValidatedPartialOf<NonEmptyList<ConfigError>>>
. This looks complicated but it is in fact, simple. Validated<E,A>
is a generic type with two "holes": E
and A
. ValidatedPartialOf<NonEmptyList<ConfigError>>>
just mean we fixed the type E
and made a new type with only one hole. The whole point of this is that we delegated the accumulation on the E
type, in our case Nel<ConfigError>
to the semigroup instance for it, and that we keep the combination of the A
types for us, in the function as argument to the map
function.
the Scala version only highlight the Applicative
, as the language summons automatically the required instance of Semigroup
for us.
type ValidationRes[A] = ValidatedNel[ConfigError, A] // We fix the E type
val businessConfValidationResult: ValidatedNel[ConfigError, BusinessConfig] = Applicative[ValidationRes].map3(tAV, tBV, tCV)(BusinessConfig)
If you do not use IntelliJ, you can even make the explicit call to Applicative
disappear:
val businessConfValidationResult: ValidatedNel[ConfigError, BusinessConfig] = (tAV, tBV, tCV).mapN(BusinessConfig)
traverse
The last part we need to do now is to validate that our configuration is a valid Kafka configuration. Using what we just learn, we will use Validated
again. The interesting part is of course validating the list of hosts.
First, let's create a function which will validate only one host.
In Kotlin :
fun validateHost(rawString: String, index: Int): ValidatedNel<ConfigError, HostAndPort> = try {
HostAndPort.fromString(rawString).withDefaultPort(9092).validNel<ConfigError, HostAndPort>()
} catch (e: IllegalArgumentException) {
ConfigError.InvalidHost(rawString, index).invalidNel<ConfigError, HostAndPort>()
}
And in Scala:
def validateHost(rawString: String, index: Int): ValidatedNel[ConfigError, HostAndPort] = try {
HostAndPort.fromString(rawString).withDefaultPort(9092).validNel
} catch {
case e: IllegalArgumentException => InvalidHost(rawString, index).invalidNel
}
It is pretty similar and easy to understand, now that you are getting familiar with Validated
. We added the index parameter, just to keep track of where it failed, in case of error.
So, now, let's validate a list of strings to make sure that they are valid host and port. In Kotlin, it will look like this:
// We assume rawList is a List<String>
rawList.withIndex()
.map { validateHost(it.value, it.index) }.k()
.traverse(Validated.applicative<Nel<ConfigError>>(Nel.semigroup()), { it })
.fix()
Again, we summon Applivative
and Semigroup
to accumulate the results of the validation we perform on each string in the initial rawList
, and this results to a ValidatedNel<ConfigError, ListK<HostAndPort>>
. You may have noticed the fix()
and the k()
calls: they are here to overcome some limitations in Kotlin. fix()
is here to help with higher kinds, and k()
is here to wrap some standard datatypes to enable typeclass support.
The Scala version is pretty similar :
rawList.zipWithIndex.traverse[ValidationRes, HostAndPort]{ t =>
validateHost(t._1, t._2)
}(Applicative[ValidationRes])
What's left ? Combining again instances of ValidatedNel
and we know that it is easy thanks to Applicative
and Semigroup
.
We will stop here for this FP trip. The complete code used in this post is available on Github, if you want to read the complete example.
Conclusion
First, I do not know if you made it to the end, but if you did, thank you !
Through this post we went from a Java based solution built upon a popular framework to a solution in Kotlin and Scala using functional programming libraries.
Amongst the interesting things to highlight, we saw that with genericity and abstraction, we came up with solutions that do not rely on runtime reflection.
Then, to write this post, I had to run the Java samples again and again while I wrote the Kotlin and Scala version with only running the samples twice or thrice. The advantage of strongly typed functional programming is also that : the compiler is here to help you, and to prevent you from running your program if it knows it will fail at runtime.
We also saw that despite being complex, concepts like Applicative
are easy to use with good help of the type system, and moreover we did not even notice the mathematical theory (which is called Category Theory) underneath. To fully understand these concept is long but rewarding.
Where to go next ?
In this post, we used some typeclasses but there are many more. If you want to dig this subject further here is a list of resources to learn more:
- Cats documentation and the doc for Validated and Applicative
- Arrow documentation
- Bartosz Milewski's blog post on Applicative functors
- The red book which is the perfect introduction to functional programming on the JVM
Of course we also left open doors in this post, and I may write again on those we put aside here.
That's it for now. In the end, it wasn't a small post. Thanks for reading.
Updated on April 15th 2018 with some precisions thanks to Paco