When working with Avro schemas, the most popular approach is to generate Java or Kotlin classes based on predefined Avro schemas. This method is particularly useful if your application acts as a consumer or if the schemas are provided by someone else. However, if you are the sole publisher for a topic, you might prefer to generate Avro schemas directly from your model classes.
A handy tool for this task is the avro4k library. This third-party library simplifies the process of generating Avro schemas from Kotlin data classes. It leverages the powerful kotlinx.serialisation
library, making the integration straightforward and efficient.
The kotlinx.serialisation
library is well-documented and provides extensive resources to help developers get started. You can explore its official documentation here. However, a significant limitation is that the provided setup guides are primarily focused on Gradle. If you are using Maven (as I usually do), you might find the lack of specific instructions frustrating.
In this tutorial, I will guide you through setting up a Maven build, writing a simple Kotlin model, and generating an Avro schema from it using avro4k
and kotlinx.serialisation
.
Setting Up a Maven Project for Kotlinx serialisation and Avro4k
In this section, I will walk through the steps to set up a Maven project for generating Avro schemas from Kotlin data classes using the kotlinx.serialisation
library and the avro4k
library. We will start by creating a new Maven project, configuring the necessary dependencies, and finally setting up the Kotlin Maven Plugin.
Step 1: Create a New Project from Maven Archetype
First, create a new Maven project using the Kotlin archetype org.jetbrains.kotlin:kotlin-archetype-jvm
with version 2.0.0
. This archetype provides a basic project structure for Kotlin applications.
Step 2: Replace JUnit 4 with JUnit 5
The default setup includes JUnit 4, which we need to replace with JUnit 5 to take advantage of the latest testing features. Let’s update your pom.xml
to include the JUnit 5 dependency.
Step 3: Add the avro4k Dependency
Next, add the avro4k dependency to your pom.xml
.
<dependency>
<groupId>com.github.avro-kotlin.avro4k</groupId>
<artifactId>avro4k-core</artifactId>
<version>${avro4k-core.version}</version>
</dependency>
The variable avro4k-core.version
points to version 1.10.1
.
This library contains all the logic that we need to generate Avro schemas directly from Kotlin data classes.
Step 4: Configure the Kotlin Maven Plugin
Now, configure the Kotlin Maven Plugin to include the kotlinx.serialisation compiler plugin. Add the following configuration to your pom.xml
:
<build>
<sourceDirectory>src/main/kotlin</sourceDirectory>
<testSourceDirectory>src/test/kotlin</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-maven-plugin</artifactId>
<version>${kotlin-dependencies.version}</version>
<executions>
<execution>
<id>compile</id>
<phase>compile</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile</id>
<phase>test-compile</phase>
<goals>
<goal>test-compile</goal>
</goals>
</execution>
</executions>
<configuration>
<languageVersion>${kotlin.version}</languageVersion>
<jvmTarget>${java.version}</jvmTarget>
<compilerPlugins>
<plugin>kotlinx-serialization</plugin>
</compilerPlugins>
</configuration>
<dependencies>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-maven-serialization</artifactId>
<version>${kotlin-dependencies.version}</version>
</dependency>
</dependencies>
</plugin>
</plugins>
</build>
Two parts of this configuration are crucial for setting up Maven to work properly with Kotlinx serialisation: compilerPlugins
and their dependencies.
First, you have to add the kotlinx-serialization
plugin to the compilerPlugins
section. It specifies additional compiler plugins to be used during the compilation process that will enhance or modify the compilation. However, it is just a declaration. It will not automatically download necessary dependencies.
In order to configure it as well, you have to add the following definition to the dependencies
section of the plugin configuration.
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-maven-serialization</artifactId>
<version>${kotlin-dependencies.version}</version>
</dependency>
This setup ensures that the Kotlin serialisation plugin is correctly applied during the build process.
Step 5: Re-import the Project in IntelliJ IDEA
If you are using IntelliJ IDEA as your development environment, re-import the Maven project to apply the new configurations. Occasionally, IntelliJ IDEA’s compiler might not recognize symbols from the Kotlin serialisation library. If this happens, invalidate caches and restart the IDE.
Implementing a Simple Model for Avro Serialisation
To demonstrate the use of Kotlinx Serialization and Avro4k, let’s implement a simple model for a book. For simplicity, we will keep all the related classes in one file since the model is not very big. Here is how you can define the necessary classes:
enum class CurrencyType {
USD, EUR, NOK // Can be extended if needed.
}
data class Price(
@Serializable(with = BigDecimalSerializer::class) var amount: BigDecimal,
var currency: CurrencyType
)
data class Author(
val firstName: String,
val lastName: String
)
data class Book(
val title: String,
val author: Author,
@Serializable(with = YearSerializer::class) val publicationYear: Year,
val numberOfPages: Int,
val price: Price,
@AvroFixed(13) val isbn: String
)
In this model:
CurrencyType
is an enum class representing the currency type for the book price. It includes USD, EUR, and NOK but can be extended if needed.Price
is a data class that holds the amount and currency type of the book’s price.Author
is a data class that contains the first and last name of the book’s author.Book
is a data class that includes details such as the title, author, publication year, number of pages, price, and ISBN of the book.
Having the model in place, we can add the @Serializable
annotation to the signature of every class, like in the following example.
@Serializable
data class Book
Custom Serializers for Unsupported Types in Kotlinx Serialization
When working with Kotlinx serialisation, you might encounter types that do not have default serialisers provided by the library. In our book model, the Year
and BigDecimal
classes fall into this category. To handle these types, we need to implement custom serialisers. Here is how you can do it.
Every custom serialiser must implement KSerializer
interface, providing custom logic for the descriptor
field, as well as the deserialize
and serialize
functions. Let’s do it for the Year
type.
class YearSerializer : KSerializer<Year> {
override val descriptor: SerialDescriptor
get() = PrimitiveSerialDescriptor("YearSerializer", PrimitiveKind.STRING)
override fun deserialize(decoder: Decoder): Year {
return Year.parse(decoder.decodeString())
}
override fun serialize(encoder: Encoder, value: Year) {
encoder.encodeString(value.toString())
}
}
We started with specifying the descriptor: SerialDescriptor
value to describe the serialised form as a string. It is needed by the deserialiser to properly assign Avro type to a given value. In the deserialize
method, it converts a string back into a Year
object using Year.parse
. Conversely, the serialize
method transforms a Year
object into its string representation. This custom serialiser ensures that Year
values are properly converted to and from their string forms during serialisation and deserialisation processes.
Similarly, we can implement the serialiser for the BigDecimal
type.
class BigDecimalSerializer : KSerializer<BigDecimal> {
override val descriptor: SerialDescriptor
get() = PrimitiveSerialDescriptor("BigDecimal", PrimitiveKind.STRING)
override fun deserialize(decoder: Decoder): BigDecimal {
return BigDecimal(decoder.decodeString())
}
override fun serialize(encoder: Encoder, value: BigDecimal) {
encoder.encodeString(value.toString())
}
}
Updating the Model Classes
With the custom serialisers implemented, the next step is to update the respective fields in our model classes to use these serialisers. In order to do so, you have to add two annotations to members of the Book
and the Price
classes.
Let’s start with the Book
class.
@Serializable
data class Book(
val title: String,
val author: Author,
@Serializable(with = YearSerializer::class) val publicationYear: Year,
val numberOfPages: Int,
val price: Price,
val isbn: String
)
And then move to the Price
class.
@Serializable
data class Price(
@Serializable(with = BigDecimalSerializer::class) var amount: BigDecimal,
var currency: CurrencyType
)
The @Serializable
annotation can be used for fields too, and allows to specify a custom serialiser for a given one.
Writing a Test to Generate and Validate the Avro Schema
With our model and custom serializers set up, we are ready to write a test to generate the Avro schema from the Book
class and validate it against the expected schema. This process ensures that the schema generated by the avro4k library matches our predefined expectations.
Step 1: Define the Expected Schema
First, we define the expected Avro schema as a JSON string. This schema should reflect the structure of our Book
data class, including nested fields and custom serializers.
val expectedSchema = """
{
"type" : "record",
"name" : "Book",
"namespace" : "com.lukaszpalt",
"fields" : [ {
"name" : "title",
"type" : "string"
}, {
"name" : "author",
"type" : {
"type" : "record",
"name" : "Author",
"fields" : [ {
"name" : "firstName",
"type" : "string"
}, {
"name" : "lastName",
"type" : "string"
} ]
}
}, {
"name" : "publicationYear",
"type" : "string"
}, {
"name" : "numberOfPages",
"type" : "int"
}, {
"name" : "price",
"type" : {
"type" : "record",
"name" : "Price",
"fields" : [ {
"name" : "amount",
"type" : "string"
}, {
"name" : "currency",
"type" : {
"type" : "enum",
"name" : "CurrencyType",
"symbols" : [ "USD", "EUR", "NOK" ]
}
} ]
}
}, {
"name" : "isbn",
"type" : {
"type" : "fixed",
"name" : "isbn",
"size" : 13
}
} ]
}
""".trimIndent()
Step 2: Generate the Schema from the Model
Next, we generate the actual Avro schema from the Book
class using the avro4k library.
val actualSchema = Avro
.default
.schema(Book.serializer())
.toString(true)
This code uses the Avro.default.schema
method to generate the schema and converts it to a pretty-printed JSON string for easier comparison.
Step 3: Assert the Schemas Match
Finally, we assert that the generated schema matches the expected schema.
assertEquals(expectedSchema, actualSchema)
Summary
Generating Avro schemas directly from Kotlin data classes is made straightforward with tools like avro4k
and kotlinx.serialization
. By setting up a Maven project and configuring the Kotlin Maven Plugin, you can seamlessly serialize Kotlin classes into Avro schemas. This approach simplifies integration, especially when you’re the sole producer for a given topics or you define a model for a given domain.
The avro4k
library is quite powerful and allows more than I demonstrated in this tutorial. You may be particularly interested in the following sections of its documentation.
You can find the complete working code for this tutorial in my GitHub repository.