Generating Avro Schemas in Kotlin

When working with Avro schemas, the most popular approach is to generate Java or Kotlin classes based on predefined Avro schemas. This method is particularly useful if your application acts as a consumer or if the schemas are provided by someone else. However, if you are the sole publisher for a topic, you might prefer to generate Avro schemas directly from your model classes.

A handy tool for this task is the avro4k library. This third-party library simplifies the process of generating Avro schemas from Kotlin data classes. It leverages the powerful kotlinx.serialisation library, making the integration straightforward and efficient.

The kotlinx.serialisation library is well-documented and provides extensive resources to help developers get started. You can explore its official documentation here. However, a significant limitation is that the provided setup guides are primarily focused on Gradle. If you are using Maven (as I usually do), you might find the lack of specific instructions frustrating.

In this tutorial, I will guide you through setting up a Maven build, writing a simple Kotlin model, and generating an Avro schema from it using avro4k and kotlinx.serialisation.

Setting Up a Maven Project for Kotlinx serialisation and Avro4k

In this section, I will walk through the steps to set up a Maven project for generating Avro schemas from Kotlin data classes using the kotlinx.serialisation library and the avro4k library. We will start by creating a new Maven project, configuring the necessary dependencies, and finally setting up the Kotlin Maven Plugin.

Step 1: Create a New Project from Maven Archetype

First, create a new Maven project using the Kotlin archetype org.jetbrains.kotlin:kotlin-archetype-jvm with version 2.0.0. This archetype provides a basic project structure for Kotlin applications.

Step 2: Replace JUnit 4 with JUnit 5

The default setup includes JUnit 4, which we need to replace with JUnit 5 to take advantage of the latest testing features. Let’s update your pom.xml to include the JUnit 5 dependency.

Step 3: Add the avro4k Dependency

Next, add the avro4k dependency to your pom.xml.

<dependency>
<groupId>com.github.avro-kotlin.avro4k</groupId>
<artifactId>avro4k-core</artifactId>
<version>${avro4k-core.version}</version>
</dependency>

The variable avro4k-core.version points to version 1.10.1.

This library contains all the logic that we need to generate Avro schemas directly from Kotlin data classes.

Step 4: Configure the Kotlin Maven Plugin

Now, configure the Kotlin Maven Plugin to include the kotlinx.serialisation compiler plugin. Add the following configuration to your pom.xml:

<build>
    <sourceDirectory>src/main/kotlin</sourceDirectory>
    <testSourceDirectory>src/test/kotlin</testSourceDirectory>
    <plugins>
        <plugin>
            <groupId>org.jetbrains.kotlin</groupId>
            <artifactId>kotlin-maven-plugin</artifactId>
            <version>${kotlin-dependencies.version}</version>
            <executions>
                <execution>
                    <id>compile</id>
                    <phase>compile</phase>
                    <goals>
                        <goal>compile</goal>
                    </goals>
                </execution>
                <execution>
                    <id>test-compile</id>
                    <phase>test-compile</phase>
                    <goals>
                        <goal>test-compile</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <languageVersion>${kotlin.version}</languageVersion>
                <jvmTarget>${java.version}</jvmTarget>
                <compilerPlugins>
                    <plugin>kotlinx-serialization</plugin>
                </compilerPlugins>
            </configuration>
            <dependencies>
                <dependency>
                    <groupId>org.jetbrains.kotlin</groupId>
                    <artifactId>kotlin-maven-serialization</artifactId>
                    <version>${kotlin-dependencies.version}</version>
                </dependency>
            </dependencies>
        </plugin>
    </plugins>
</build>

Two parts of this configuration are crucial for setting up Maven to work properly with Kotlinx serialisation: compilerPlugins and their dependencies.

First, you have to add the kotlinx-serialization plugin to the compilerPlugins section. It specifies additional compiler plugins to be used during the compilation process that will enhance or modify the compilation. However, it is just a declaration. It will not automatically download necessary dependencies.

In order to configure it as well, you have to add the following definition to the dependencies section of the plugin configuration.

<dependency>
    <groupId>org.jetbrains.kotlin</groupId>
    <artifactId>kotlin-maven-serialization</artifactId>
    <version>${kotlin-dependencies.version}</version>
</dependency>

This setup ensures that the Kotlin serialisation plugin is correctly applied during the build process.

Step 5: Re-import the Project in IntelliJ IDEA

If you are using IntelliJ IDEA as your development environment, re-import the Maven project to apply the new configurations. Occasionally, IntelliJ IDEA’s compiler might not recognize symbols from the Kotlin serialisation library. If this happens, invalidate caches and restart the IDE.

Implementing a Simple Model for Avro Serialisation

To demonstrate the use of Kotlinx Serialization and Avro4k, let’s implement a simple model for a book. For simplicity, we will keep all the related classes in one file since the model is not very big. Here is how you can define the necessary classes:

enum class CurrencyType {
    USD, EUR, NOK // Can be extended if needed.
}

data class Price(
    @Serializable(with = BigDecimalSerializer::class) var amount: BigDecimal,
    var currency: CurrencyType
)

data class Author(
    val firstName: String,
    val lastName: String
)

data class Book(
    val title: String,
    val author: Author,
    @Serializable(with = YearSerializer::class) val publicationYear: Year,
    val numberOfPages: Int,
    val price: Price,
    @AvroFixed(13) val isbn: String
)

In this model:

  • CurrencyType is an enum class representing the currency type for the book price. It includes USD, EUR, and NOK but can be extended if needed.
  • Price is a data class that holds the amount and currency type of the book’s price.
  • Author is a data class that contains the first and last name of the book’s author.
  • Book is a data class that includes details such as the title, author, publication year, number of pages, price, and ISBN of the book.

Having the model in place, we can add the @Serializable annotation to the signature of every class, like in the following example.

@Serializable
data class Book

Custom Serializers for Unsupported Types in Kotlinx Serialization

When working with Kotlinx serialisation, you might encounter types that do not have default serialisers provided by the library. In our book model, the Year and BigDecimal classes fall into this category. To handle these types, we need to implement custom serialisers. Here is how you can do it.

Every custom serialiser must implement KSerializer interface, providing custom logic for the descriptor field, as well as the deserialize and serialize functions. Let’s do it for the Year type.

class YearSerializer : KSerializer<Year> {

    override val descriptor: SerialDescriptor
        get() = PrimitiveSerialDescriptor("YearSerializer", PrimitiveKind.STRING)

    override fun deserialize(decoder: Decoder): Year {
        return Year.parse(decoder.decodeString())
    }

    override fun serialize(encoder: Encoder, value: Year) {
        encoder.encodeString(value.toString())
    }
}

We started with specifying the descriptor: SerialDescriptor value to describe the serialised form as a string. It is needed by the deserialiser to properly assign Avro type to a given value. In the deserialize method, it converts a string back into a Year object using Year.parse. Conversely, the serialize method transforms a Year object into its string representation. This custom serialiser ensures that Year values are properly converted to and from their string forms during serialisation and deserialisation processes.

Similarly, we can implement the serialiser for the BigDecimal type.

class BigDecimalSerializer : KSerializer<BigDecimal> {
    override val descriptor: SerialDescriptor
        get() = PrimitiveSerialDescriptor("BigDecimal", PrimitiveKind.STRING)

    override fun deserialize(decoder: Decoder): BigDecimal {
        return BigDecimal(decoder.decodeString())
    }

    override fun serialize(encoder: Encoder, value: BigDecimal) {
        encoder.encodeString(value.toString())
    }
}

Updating the Model Classes

With the custom serialisers implemented, the next step is to update the respective fields in our model classes to use these serialisers. In order to do so, you have to add two annotations to members of the Book and the Price classes.

Let’s start with the Book class.

@Serializable
data class Book(
    val title: String,
    val author: Author,
    @Serializable(with = YearSerializer::class) val publicationYear: Year,
    val numberOfPages: Int,
    val price: Price,
    val isbn: String
)

And then move to the Price class.

@Serializable
data class Price(
    @Serializable(with = BigDecimalSerializer::class) var amount: BigDecimal,
    var currency: CurrencyType
) 

The @Serializable annotation can be used for fields too, and allows to specify a custom serialiser for a given one.

Writing a Test to Generate and Validate the Avro Schema

With our model and custom serializers set up, we are ready to write a test to generate the Avro schema from the Book class and validate it against the expected schema. This process ensures that the schema generated by the avro4k library matches our predefined expectations.

Step 1: Define the Expected Schema

First, we define the expected Avro schema as a JSON string. This schema should reflect the structure of our Book data class, including nested fields and custom serializers.

val expectedSchema = """
            {
              "type" : "record",
              "name" : "Book",
              "namespace" : "com.lukaszpalt",
              "fields" : [ {
                "name" : "title",
                "type" : "string"
              }, {
                "name" : "author",
                "type" : {
                  "type" : "record",
                  "name" : "Author",
                  "fields" : [ {
                    "name" : "firstName",
                    "type" : "string"
                  }, {
                    "name" : "lastName",
                    "type" : "string"
                  } ]
                }
              }, {
                "name" : "publicationYear",
                "type" : "string"
              }, {
                "name" : "numberOfPages",
                "type" : "int"
              }, {
                "name" : "price",
                "type" : {
                  "type" : "record",
                  "name" : "Price",
                  "fields" : [ {
                    "name" : "amount",
                    "type" : "string"
                  }, {
                    "name" : "currency",
                    "type" : {
                      "type" : "enum",
                      "name" : "CurrencyType",
                      "symbols" : [ "USD", "EUR", "NOK" ]
                    }
                  } ]
                }
              }, {
                "name" : "isbn",
                "type" : {
                  "type" : "fixed",
                  "name" : "isbn",
                  "size" : 13
                }
              } ]
            }
        """.trimIndent()

Step 2: Generate the Schema from the Model

Next, we generate the actual Avro schema from the Book class using the avro4k library.

val actualSchema = Avro
    .default
    .schema(Book.serializer())
    .toString(true)

This code uses the Avro.default.schema method to generate the schema and converts it to a pretty-printed JSON string for easier comparison.

Step 3: Assert the Schemas Match

Finally, we assert that the generated schema matches the expected schema.

assertEquals(expectedSchema, actualSchema)

Summary

Generating Avro schemas directly from Kotlin data classes is made straightforward with tools like avro4k and kotlinx.serialization. By setting up a Maven project and configuring the Kotlin Maven Plugin, you can seamlessly serialize Kotlin classes into Avro schemas. This approach simplifies integration, especially when you’re the sole producer for a given topics or you define a model for a given domain.

The avro4k library is quite powerful and allows more than I demonstrated in this tutorial. You may be particularly interested in the following sections of its documentation.

  • Schema definition options.
  • Serialisation and deserialisation options.

You can find the complete working code for this tutorial in my GitHub repository.