Generating Avro Schemas in Kotlin

When working with Avro schemas, the most popular approach is to generate Java or Kotlin classes based on predefined Avro schemas. This method is particularly useful if your application acts as a consumer or if the schemas are provided by someone else. However, if you are the sole publisher for a topic, you might prefer to generate Avro schemas directly from your model classes.

A handy tool for this task is the avro4k library. This third-party library simplifies the process of generating Avro schemas from Kotlin data classes. It leverages the powerful kotlinx.serialisation library, making the integration straightforward and efficient.

The kotlinx.serialisation library is well-documented and provides extensive resources to help developers get started. You can explore its official documentation here. However, a significant limitation is that the provided setup guides are primarily focused on Gradle. If you are using Maven (as I usually do), you might find the lack of specific instructions frustrating.

In this tutorial, I will guide you through setting up a Maven build, writing a simple Kotlin model, and generating an Avro schema from it using avro4k and kotlinx.serialisation.

Setting Up a Maven Project for Kotlinx serialisation and Avro4k

In this section, I will walk through the steps to set up a Maven project for generating Avro schemas from Kotlin data classes using the kotlinx.serialisation library and the avro4k library. We will start by creating a new Maven project, configuring the necessary dependencies, and finally setting up the Kotlin Maven Plugin.

Step 1: Create a New Project from Maven Archetype

First, create a new Maven project using the Kotlin archetype org.jetbrains.kotlin:kotlin-archetype-jvm with version 2.0.0. This archetype provides a basic project structure for Kotlin applications.

Step 2: Replace JUnit 4 with JUnit 5

The default setup includes JUnit 4, which we need to replace with JUnit 5 to take advantage of the latest testing features. Let’s update your pom.xml to include the JUnit 5 dependency.

Step 3: Add the avro4k Dependency

Next, add the avro4k dependency to your pom.xml.

<dependency>
<groupId>com.github.avro-kotlin.avro4k</groupId>
<artifactId>avro4k-core</artifactId>
<version>${avro4k-core.version}</version>
</dependency>

The variable avro4k-core.version points to version 1.10.1.

This library contains all the logic that we need to generate Avro schemas directly from Kotlin data classes.

Step 4: Configure the Kotlin Maven Plugin

Now, configure the Kotlin Maven Plugin to include the kotlinx.serialisation compiler plugin. Add the following configuration to your pom.xml:

<build>
    <sourceDirectory>src/main/kotlin</sourceDirectory>
    <testSourceDirectory>src/test/kotlin</testSourceDirectory>
    <plugins>
        <plugin>
            <groupId>org.jetbrains.kotlin</groupId>
            <artifactId>kotlin-maven-plugin</artifactId>
            <version>${kotlin-dependencies.version}</version>
            <executions>
                <execution>
                    <id>compile</id>
                    <phase>compile</phase>
                    <goals>
                        <goal>compile</goal>
                    </goals>
                </execution>
                <execution>
                    <id>test-compile</id>
                    <phase>test-compile</phase>
                    <goals>
                        <goal>test-compile</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <languageVersion>${kotlin.version}</languageVersion>
                <jvmTarget>${java.version}</jvmTarget>
                <compilerPlugins>
                    <plugin>kotlinx-serialization</plugin>
                </compilerPlugins>
            </configuration>
            <dependencies>
                <dependency>
                    <groupId>org.jetbrains.kotlin</groupId>
                    <artifactId>kotlin-maven-serialization</artifactId>
                    <version>${kotlin-dependencies.version}</version>
                </dependency>
            </dependencies>
        </plugin>
    </plugins>
</build>

Two parts of this configuration are crucial for setting up Maven to work properly with Kotlinx serialisation: compilerPlugins and their dependencies.

First, you have to add the kotlinx-serialization plugin to the compilerPlugins section. It specifies additional compiler plugins to be used during the compilation process that will enhance or modify the compilation. However, it is just a declaration. It will not automatically download necessary dependencies.

In order to configure it as well, you have to add the following definition to the dependencies section of the plugin configuration.

<dependency>
    <groupId>org.jetbrains.kotlin</groupId>
    <artifactId>kotlin-maven-serialization</artifactId>
    <version>${kotlin-dependencies.version}</version>
</dependency>

This setup ensures that the Kotlin serialisation plugin is correctly applied during the build process.

Step 5: Re-import the Project in IntelliJ IDEA

If you are using IntelliJ IDEA as your development environment, re-import the Maven project to apply the new configurations. Occasionally, IntelliJ IDEA’s compiler might not recognize symbols from the Kotlin serialisation library. If this happens, invalidate caches and restart the IDE.

Implementing a Simple Model for Avro Serialisation

To demonstrate the use of Kotlinx Serialization and Avro4k, let’s implement a simple model for a book. For simplicity, we will keep all the related classes in one file since the model is not very big. Here is how you can define the necessary classes:

enum class CurrencyType {
    USD, EUR, NOK // Can be extended if needed.
}

data class Price(
    @Serializable(with = BigDecimalSerializer::class) var amount: BigDecimal,
    var currency: CurrencyType
)

data class Author(
    val firstName: String,
    val lastName: String
)

data class Book(
    val title: String,
    val author: Author,
    @Serializable(with = YearSerializer::class) val publicationYear: Year,
    val numberOfPages: Int,
    val price: Price,
    @AvroFixed(13) val isbn: String
)

In this model:

  • CurrencyType is an enum class representing the currency type for the book price. It includes USD, EUR, and NOK but can be extended if needed.
  • Price is a data class that holds the amount and currency type of the book’s price.
  • Author is a data class that contains the first and last name of the book’s author.
  • Book is a data class that includes details such as the title, author, publication year, number of pages, price, and ISBN of the book.

Having the model in place, we can add the @Serializable annotation to the signature of every class, like in the following example.

@Serializable
data class Book

Custom Serializers for Unsupported Types in Kotlinx Serialization

When working with Kotlinx serialisation, you might encounter types that do not have default serialisers provided by the library. In our book model, the Year and BigDecimal classes fall into this category. To handle these types, we need to implement custom serialisers. Here is how you can do it.

Every custom serialiser must implement KSerializer interface, providing custom logic for the descriptor field, as well as the deserialize and serialize functions. Let’s do it for the Year type.

class YearSerializer : KSerializer<Year> {

    override val descriptor: SerialDescriptor
        get() = PrimitiveSerialDescriptor("YearSerializer", PrimitiveKind.STRING)

    override fun deserialize(decoder: Decoder): Year {
        return Year.parse(decoder.decodeString())
    }

    override fun serialize(encoder: Encoder, value: Year) {
        encoder.encodeString(value.toString())
    }
}

We started with specifying the descriptor: SerialDescriptor value to describe the serialised form as a string. It is needed by the deserialiser to properly assign Avro type to a given value. In the deserialize method, it converts a string back into a Year object using Year.parse. Conversely, the serialize method transforms a Year object into its string representation. This custom serialiser ensures that Year values are properly converted to and from their string forms during serialisation and deserialisation processes.

Similarly, we can implement the serialiser for the BigDecimal type.

class BigDecimalSerializer : KSerializer<BigDecimal> {
    override val descriptor: SerialDescriptor
        get() = PrimitiveSerialDescriptor("BigDecimal", PrimitiveKind.STRING)

    override fun deserialize(decoder: Decoder): BigDecimal {
        return BigDecimal(decoder.decodeString())
    }

    override fun serialize(encoder: Encoder, value: BigDecimal) {
        encoder.encodeString(value.toString())
    }
}

Updating the Model Classes

With the custom serialisers implemented, the next step is to update the respective fields in our model classes to use these serialisers. In order to do so, you have to add two annotations to members of the Book and the Price classes.

Let’s start with the Book class.

@Serializable
data class Book(
    val title: String,
    val author: Author,
    @Serializable(with = YearSerializer::class) val publicationYear: Year,
    val numberOfPages: Int,
    val price: Price,
    val isbn: String
)

And then move to the Price class.

@Serializable
data class Price(
    @Serializable(with = BigDecimalSerializer::class) var amount: BigDecimal,
    var currency: CurrencyType
) 

The @Serializable annotation can be used for fields too, and allows to specify a custom serialiser for a given one.

Writing a Test to Generate and Validate the Avro Schema

With our model and custom serializers set up, we are ready to write a test to generate the Avro schema from the Book class and validate it against the expected schema. This process ensures that the schema generated by the avro4k library matches our predefined expectations.

Step 1: Define the Expected Schema

First, we define the expected Avro schema as a JSON string. This schema should reflect the structure of our Book data class, including nested fields and custom serializers.

val expectedSchema = """
            {
              "type" : "record",
              "name" : "Book",
              "namespace" : "com.lukaszpalt",
              "fields" : [ {
                "name" : "title",
                "type" : "string"
              }, {
                "name" : "author",
                "type" : {
                  "type" : "record",
                  "name" : "Author",
                  "fields" : [ {
                    "name" : "firstName",
                    "type" : "string"
                  }, {
                    "name" : "lastName",
                    "type" : "string"
                  } ]
                }
              }, {
                "name" : "publicationYear",
                "type" : "string"
              }, {
                "name" : "numberOfPages",
                "type" : "int"
              }, {
                "name" : "price",
                "type" : {
                  "type" : "record",
                  "name" : "Price",
                  "fields" : [ {
                    "name" : "amount",
                    "type" : "string"
                  }, {
                    "name" : "currency",
                    "type" : {
                      "type" : "enum",
                      "name" : "CurrencyType",
                      "symbols" : [ "USD", "EUR", "NOK" ]
                    }
                  } ]
                }
              }, {
                "name" : "isbn",
                "type" : {
                  "type" : "fixed",
                  "name" : "isbn",
                  "size" : 13
                }
              } ]
            }
        """.trimIndent()

Step 2: Generate the Schema from the Model

Next, we generate the actual Avro schema from the Book class using the avro4k library.

val actualSchema = Avro
    .default
    .schema(Book.serializer())
    .toString(true)

This code uses the Avro.default.schema method to generate the schema and converts it to a pretty-printed JSON string for easier comparison.

Step 3: Assert the Schemas Match

Finally, we assert that the generated schema matches the expected schema.

assertEquals(expectedSchema, actualSchema)

Summary

Generating Avro schemas directly from Kotlin data classes is made straightforward with tools like avro4k and kotlinx.serialization. By setting up a Maven project and configuring the Kotlin Maven Plugin, you can seamlessly serialize Kotlin classes into Avro schemas. This approach simplifies integration, especially when you’re the sole producer for a given topics or you define a model for a given domain.

The avro4k library is quite powerful and allows more than I demonstrated in this tutorial. You may be particularly interested in the following sections of its documentation.

  • Schema definition options.
  • Serialisation and deserialisation options.

You can find the complete working code for this tutorial in my GitHub repository.

Understanding Serverless Computing. Not Just an Architecture Style

Understanding IT architecture styles is crucial for anyone involved in the design and development of software systems. These styles provide a framework of principles, patterns, and guidelines that shape the structure, organisation, and behaviour of IT applications. However, there is often confusion between architectural styles and deployment models, with serverless computing frequently misinterpreted as an architecture style. It is quite common to include serverless as a new architecture style, along microservices or event-driven architecture. In this article I aim to clarify this misconception, emphasising that serverless is primarily a deployment model rather than an architectural style. By exploring the comprehensive role of architectural styles and the distinct nature of serverless computing, I will highlight why understanding this distinction is essential for effectively leveraging serverless in your IT solutions.

What is an IT Architecture Style?

An IT architecture style is a comprehensive set of principles, patterns, and guidelines that define the overall structure, organisation, and behaviour of an information technology system or application. It provides a high-level framework for designing, developing, and deploying IT solutions. Regardless of the specific architectural style you apply, they all share several common features.

  • Principles and Concepts. A software architecture style defines a set of principles, concepts, and practices that guide the design of software systems. These principles and concepts provide a common vocabulary and shared understanding among software architects.
  • Structural Elements. It defines the structural elements of a software system, such as components, modules, and layers, organised in a way that promotes modularity, flexibility, and scalability.
  • Patterns and Templates. Architecture styles typically provide patterns and templates that help architects make decisions about system structure and component usage.
  • Implementation Technologies. An architecture style may specify particular implementation technologies, such as programming languages, frameworks, or libraries, chosen based on their suitability for the style.
  • Quality Attributes. They consider the quality attributes of a software system, such as performance, reliability, maintainability, and security, providing guidelines to meet these attributes.
  • Trade-offs. Architecture styles often involve trade-offs between different quality attributes or design goals, such as prioritising performance over flexibility or security over ease of use.
  • Standards and Conventions. They include standards and conventions for naming, coding style, documentation, and other aspects of software design, ensuring consistency and maintainability across the system.

Serverless Computing: A Deployment Model, Not an Architecture Style

Having defined what an architectural style encompasses, let’s explore the nature of serverless computing and how it differs fundamentally from an architecture style.

  • Deployment Model. Serverless computing is primarily a deployment model. It is often associated with “Functions as a Service” (FaaS) platforms, where developers deploy individual functions to the cloud, and the cloud provider manages the scaling and underlying infrastructure. This model focuses on how the system is deployed and managed, rather than how it is designed and structured.
  • Scope of Address. While serverless computing offers benefits such as scalability, cost-effectiveness, and increased developer productivity, it doesn’t address all aspects of architecture. For instance, it does not provide solutions for data management, security, or integration with other systems.
  • Compatibility with Various Architectures. Serverless can be used with various architectural styles, such as microservices, event-driven architecture, or even hexagonal architecture (though it is a poor choice for going serverless). This flexibility indicates that serverless is not inherently tied to any specific architecture but can be used in different contexts depending on the system’s needs.
  • Vendor Lock-in. Serverless computing often relies on proprietary services and APIs from cloud providers, which can lead to vendor lock-in. This dependency can make it challenging or costly to switch providers or bring the system in-house.

The Distinction: Architecture Style vs. Deployment Model

An architectural style provides a comprehensive template for designing your entire IT system, irrespective of its size. It encompasses a broad set of principles and guidelines for system structure, component organisation, and quality attributes. Serverless computing, on the other hand, is a runtime environment to which you can deploy parts of your application. It serves as an implementation platform for applications designed using a particular architectural style.

In summary, while serverless computing offers a powerful and flexible deployment model, it does not fulfil the role of an architectural style. Instead, it complements various architectural styles by providing a scalable, cost-effective environment for running application components. Understanding this distinction is crucial for effectively leveraging serverless computing in your IT solutions.

Deploy Azure Functions with Maven

Azure Functions is a serverless compute service provided by Microsoft Azure, designed to help developers run event-driven code without the need to manage infrastructure. Whether you’re automating workflows, building APIs, or processing data streams, Azure Functions offers a scalable, cost-effective solution. By abstracting away the complexities of server management, it allows developers to focus on writing the logic that drives their applications.

For Java developers, Azure Functions offers first-class support, making it easier than ever to integrate Java-based solutions into the Azure ecosystem. This support includes a robust set of tools and libraries specifically designed to streamline the development and deployment process for Java applications. If you’re eager to learn more, Microsoft has prepared a decent documentation covering this topic. You’ll find there information about the programming framework, how to implement a Function, and how to deploy it. In this post I’d like to take a look at the latter and present it to you in a slightly different light than usual.

Traditionally, deploying Azure Functions involves tools like Azure CLI, Bicep, or Terraform. These tools are robust and widely used, offering powerful features for managing Azure resources. However, Java developers also have a more seamless and better integrated option to choose from – Maven.

Maven, a build automation tool primarily used for Java projects, can be an excellent choice for deploying Azure Functions. The Azure Functions Maven Plugin allows developers to define their infrastructure as code, simplifying the deployment process and ensuring consistency across environments. It’s a great choice if you’re starting your journey with Azure Functions or have no other infrastructure in the cloud. It’s easy to use and allows you to keep your infrastructure definition as close to your build definition as possible.

The Azure Functions Maven Plugin allows you to use an infrastructure-as-code approach similar to Bicep or Terraform. You can declare the desired infrastructure, and the plugin handles provisioning and state management. This means your Function App won’t be reprovisioned every time you build your application, saving time and resources. While this might seem abstract, seeing it in action will clarify its efficiency and ease of use. Let’s dive into the practical aspects of deploying your Azure Functions with Maven and see how this approach can streamline your development process.

I’m using Maven v3.9.6 in this tutorial. You can find a working example of this configuration here.

First, let’s open the pom.xml file in the azure-funcitons-sdk module. Within this file, you’ll find the declaration of the azure-functions-maven-plugin. At the core of this declaration is the <plugin> element. In Maven, a plugin is a collection of goals, which are specific tasks or actions. In our case, we’re using the azure-functions-maven-plugin to facilitate the deployment of Azure Functions. Let’s take a look at the definition. 

<plugin>
    <groupId>com.microsoft.azure</groupId>
    <artifactId>azure-functions-maven-plugin</artifactId>
    <version>${azure.functions.maven.plugin.version}</version>
    <configuration>
        <appName>azure-functions-sdk</appName>
        <resourceGroup>azure-functions-sdk-rg</resourceGroup>
        <appServicePlanName>azure-functions-sdk-sp</appServicePlanName>
        <pricingTier>Consumption</pricingTier>
        <region>norwayeast</region>
        <runtime>
            <os>linux</os>
            <javaVersion>17</javaVersion>
        </runtime>
        <appSettings>
            <property>
                <name>FUNCTIONS_EXTENSION_VERSION</name>
                <value>~3</value>
            </property>
        </appSettings>
    </configuration>
    <executions>
        <execution>
            <goals>
                <goal>package</goal>
            </goals>
        </execution>
    </executions>
</plugin>

There’s a lot of things you might be unfamiliar with in this declaration, so let’s crack them one by one.

Plugin Coordinates

(This is pretty standard, but let’s quickly go through them, especially if you’re new to Maven.)

  • <groupId>: Specifies the group identifier for the plugin. In this case, it’s com.microsoft.azure.
  • <artifactId>: Identifies the plugin itself. We’re using the azure-functions-maven-plugin.
  • <version>: Specifies the version of the plugin. The ${azure.functions.maven.plugin.version} variable points to a specific version, such as 1.24.0 in our example.

Configuration Section

  • <appName>: The name of your Azure Functions application, set to azure-functions-sdk.
  • <resourceGroup>: Defines the Azure resource group where your functions will be deployed, here named azure-functions-sdk-rg.
  • <appServicePlanName>: The Azure App Service Plan where your functions will run, set to azure-functions-sdk-sp.
  • <pricingTier>: Specifies the pricing tier for your functions. It’s set to Consumption, since it’s the cheapest and most popular option. You can also choose Premium or Dedicated tier here.
  • <region>: Determines the Azure region for deployment, set to norwayeast in our case. Note that free trial subscriptions may have limitations on available regions. Check Azure’s documentation for your specific options.
  • <runtime>: Configures the runtime settings:
    • <os>: Specifies the operating system, which is linux.
    • <javaVersion>: Sets the Java version used for the functions runtime, set to 17.
  • <appSettings>: Defines application settings specific to your functions. They’re populated as environment variables to your Function runtime. Here we only specify the Functions version itself, but you can put active profiles or other environment-dependent key-value pairs here.

Executions Section

  • <execution>: Defines when and how the plugin’s goals should be executed. The package goal is responsible for packaging your Azure Functions for deployment. Note that this phase does not deploy anything to Azure; it merely prepares the package. The actual deployment requires invoking a plugin-specific goal, which we will cover later.


Great! Now that we have our configuration in place, the next step is to run the build and deploy our Azure Functions. First, we need to prepare the artefact by executing the clean package command. You can do this from the right-hand side menu in IntelliJ IDEA. This step takes a few seconds to complete on my machine.

Once the packaging is done, the next step is to execute a plugin-specific goal: deploy. It’s essential to use the correct deployment goal to ensure our code gets deployed to Azure. By default, Maven’s deploy goal sends artefacts to the artefact repository specified in the <distributionManagement> section of your POM file, such as Nexus. However, we want to deploy our code directly to Azure. To do this, we need to specify the deployment goal from the Azure Functions Maven Plugin.

Before proceeding, ensure you have Azure CLI installed on your machine. The installation steps vary depending on your operating system, and you can find detailed instructions in the Azure CLI installation guide. I used Homebrew to install it on my machine, but you can choose any way that suits you best.

Once you have Azure CLI installed, open the terminal and type az login. This command initiates a session from your local machine to Azure’s REST API, which is necessary for the plugin to function correctly.

Now, let’s start the deployment process. Issue the following command from the directory where the POM file is located:

mvn azure-functions:deploy

As the deployment progresses, you’ll see logs indicating that all the necessary infrastructure is being provisioned, including the Resource Group, Function App, Application Insights, Storage Account, and App Service Plan. Once the deployment is complete, you can navigate to the Azure Portal to verify that everything is correctly set up and running. This is a great feature itself, because using generic tools like Bicep forces you to define related resources independently. Moreover, all necessary secrets (like connection string to the Storage Account) are added to your App Settings.

And that’s it! By following these steps, you have successfully deployed your Azure Functions using Maven. It may feel a bit odd in the beginning, especially if you’re already experienced with Bicep or Terraform, but I find this approach quite useful in scenarios when I don’t have a lot of infrastructure in Azure, or it’s provided by an external team. In this case an ability to define my Function App in the POM file and use the same tool for build, test, and deployment contributes to a great developer experience. Moreover, automatic creation of all related resources and configuration allows me to get the function up and running in a couple of minutes, making the Maven-based deployment a great option for learning and prototyping.