Apache Arrow Rust: arrow-avro

Major contributor to arrow-avro (Arrow and Avro conversion) and author of the public launch announcement.

Announcement

Highlights

Implemented Arrow-to-Avro conversion paths that preserve schema fidelity for nested data.
Ensured developer ergonomics with clearer APIs and examples for integration workflows.
Authored launch messaging and technical documentation for the public release.

Screenshots

Schema conversion and serialization flow.

What it is

arrow-avro (imported as arrow_avro) is Apache Arrow Rust’s Official Arrow-native Avro bridge: it converts between Apache Avro and Apache Arrow by decoding/encoding column-by-column and moving data in Arrow RecordBatches (batch in/out), instead of materializing per-row Avro values and rebuilding columns afterward.

It’s designed to cover both files and streaming / schema-registry pipelines:

OCF (Object Container Files) for file-based I/O (with optional block compression)
SOE (Single‑Object Encoding) plus Confluent and Apicurio Schema Registry wire formats for message streams

The API is intentionally Arrow-first and minimal: tunable batch sizing, projection and schema resolution/evolution (reader vs. writer schemas), and optional StringViewArray support for faster string handling—so downstream compute stays vectorized end-to-end. See the official docs and the launch announcement.

What I contributed

Played a major role in taking arrow-avro to a production-ready release within the Arrow Rust ecosystem, focusing on correctness, performance, and an ergonomic RecordBatch-first API.
Helped drive feature completeness across real-world Avro pipelines: OCF ingestion/egress, streaming decoders/encoders for SOE and schema-registry framing (Confluent + Apicurio), and schema evolution via reader/writer schema resolution.
Authored the public launch announcement and wrote the bulk of the official crate documentation (runnable quickstarts, streaming examples, and “which API should I use?” guidance).

Outcome / impact

Made Avro and Arrow interchange significantly faster and more “Arrow-native” by aligning conversion with Arrow’s vectorized execution model (decode directly into Arrow builders; avoid per-row overhead).
Reduced integration friction for teams that use Avro on disk and on the wire, enabling a single, upstream crate for OCF files and Kafka-style schema-registry messages.
In the public launch benchmarks, the Arrow-first approach delivered order-of-magnitude improvements over a row-centric pipeline (up to ~33× faster reads with projection pushdown, and up to ~18× faster writes in the benchmarked cases).

Tech (high-level)

Rust · Apache Arrow (arrow-rs) · Apache Avro · Confluent & Apicurio Schema Registry · Schema resolution · Projection pushdown