OpenVINO Go Bindings

We needed fast inference on Intel CPUs. The existing options didn't work for us.

The Problem

At Accretional, we're building AI agents that run in the cloud. These agents need to execute ML models quickly and efficiently, often processing hierarchical structures like code ASTs, multi-level documents, and nested data.

The standard approach is ONNX Runtime, and there's a decent Go library for it. But we run on Intel infrastructure, and OpenVINO consistently benchmarks faster on Intel CPUs. The problem: there were no production-quality Go bindings for OpenVINO.

I could have wrapped the existing C API, but that would mean maintaining Go code that fights against the underlying C++ semantics. Instead, I wrote a C++ wrapper that exposes an idiomatic C interface, then built Go bindings on top of that.

Building the Bridge

The architecture has three layers. At the bottom, a C++ wrapper that talks directly to OpenVINO's native API. This layer handles all the C++ complexity: exception handling, memory management, type conversions. It exposes a clean C interface that CGO can understand.

The middle layer is the internal CGO package. It translates between C types and Go types, manages memory ownership, and handles the calling conventions. This is the plumbing that makes everything work.

The top layer is the public Go API. This is what developers actually use. It's designed to feel like native Go: methods return errors, resources implement Close(), and the naming follows Go conventions.

core, err := openvino.NewCore()
if err != nil {
    log.Fatal(err)
}
defer core.Close()

model, err := core.ReadModel("model.onnx")
compiled, err := core.CompileModel(model, "CPU",
    openvino.PerformanceHint(openvino.PerformanceModeThroughput),
)

request, err := compiled.CreateInferRequest()
err = request.SetInputTensor("input", data, shape, openvino.DataTypeFloat32)
err = request.Infer()

output, err := request.GetOutputTensor("output")
result, err := output.GetDataAsFloat32()

What Made It Interesting

The technical challenge wasn't just wrapping an API. It was building features that the reference implementation doesn't have.

Async inference was the first priority. OpenVINO's C++ API supports non-blocking inference: start a request, do other work, wait for completion. The existing Go options didn't have this. I implemented StartAsync(), Wait(), WaitFor(), and context-aware variants that integrate with Go's cancellation patterns.

State management came next. For models with recurrent structures, you need to pass state between inference calls. I wrapped OpenVINO's VariableState API, which lets you query, get, set, and reset model states. This is critical for processing hierarchical data where context flows between levels.

Batch operations required understanding how OpenVINO handles tensor batching. The API lets you set multiple tensors for a single input port, but only if the model was compiled with that batch dimension. I built helpers that make this pattern safe and ergonomic.

Memory management was about control. Production systems need to release memory explicitly, pre-allocate output tensors to avoid copies, and understand exactly when allocations happen. The API exposes these controls.

The Comparison

I wrote detailed benchmarks comparing this to onnxruntime_go. The results confirmed what we expected: OpenVINO is faster on Intel CPUs, and the native async support provides better throughput than simulating concurrency with goroutines.

For hierarchical data processing specifically, the combination of async inference, state management, and batch operations provides patterns that the ONNX Runtime bindings simply don't support.

Open Source

The library is open source under Accretional's GitHub organization. It's designed for production use: comprehensive tests, CI pipeline, documentation, and examples. If you're running Go workloads on Intel hardware and need ML inference, this is a viable option.

The real test was using it ourselves. We deploy this in production, processing real traffic, handling real models. That usage drives the development: every feature exists because we needed it.