Why We Chose Kalasim: Our Journey Finding the Right Simulation Engine
When we started building digital twins for AIMatrix, we knew we needed a robust discrete event simulation engine. What we didn’t know was how hard it would be to find the right one.
After evaluating six different simulation frameworks over three months, we settled on Kalasim. Here’s the story of why, including the dead ends, surprises, and lessons learned along the way.
The Digital Twin Challenge
Our digital twins needed to simulate complex real-world processes: manufacturing lines, supply chains, customer service flows. These aren’t simple systems - they have:
- Multiple interacting entities (machines, people, orders, inventory)
- Stochastic events (equipment failures, demand spikes)
- Resource constraints (limited capacity, shared equipment)
- Complex scheduling and prioritization logic
We needed a simulation engine that could handle this complexity while integrating with our Kotlin-based AMX Engine.
The Evaluation Journey
Initial Requirements:
- Discrete event simulation capabilities
- Good performance (millions of events)
- Integration with existing Kotlin/JVM ecosystem
- Active community and documentation
- Extensibility for custom behaviors
The Contenders:
- SimPy (Python) - Popular, well-documented
- SUMO (C++) - High-performance, traffic-focused
- MASON (Java) - Academic pedigree, agent-based
- Kalasim (Kotlin) - New, promising, JVM-native
- Arena/AnyLogic - Commercial solutions
- Custom-built - Roll our own
Why SimPy Didn’t Work for Us
SimPy was our first choice. It’s mature, has great documentation, and a large community. Our initial prototype looked promising:
|
|
The problems:
- Language barriers: Our team was primarily JVM-based. Context switching between Python and Kotlin created friction.
- Integration complexity: Getting Python simulation results back into our Kotlin agents required complex serialization.
- Performance concerns: For large-scale simulations, Python’s GIL became a bottleneck.
- Deployment overhead: Managing Python dependencies in our Kubernetes-based infrastructure added complexity.
The MASON Experiment
MASON seemed perfect - Java-based, agent-oriented, from George Mason University with solid academic backing.
|
|
Why it didn’t stick:
- Steep learning curve: MASON’s agent-based approach required rethinking our simulation model design.
- Limited discrete event features: Primarily designed for agent-based modeling, not the process-oriented simulations we needed.
- Documentation gaps: Academic documentation, but limited practical examples for business process simulation.
- Integration friction: Java interop with Kotlin worked, but wasn’t as seamless as pure Kotlin.
Enter Kalasim
Kalasim caught our attention because it was built specifically for discrete event simulation in Kotlin. The syntax looked clean:
|
|
What attracted us:
- Native Kotlin: Perfect language alignment with our existing codebase
- Sequence-based processes: Intuitive way to model complex workflows
- Resource management: Built-in support for constrained resources
- Statistics collection: Automatic collection of simulation metrics
- Coroutine integration: Leveraged Kotlin’s async capabilities
The Integration Reality
Getting Kalasim working in production wasn’t smooth sailing. Here are the challenges we hit:
Performance Tuning
Our first large-scale simulation (simulating a week of operations for a 50-machine factory) was painfully slow:
|
|
Optimization strategies that worked:
- Event batching: Group similar events together
|
|
- Selective monitoring: Only track metrics we actually need
|
|
Memory Management
Long-running simulations led to memory accumulation. Kalasim’s event scheduling could pile up events in memory:
|
|
Custom Distribution Support
Kalasim’s built-in distributions didn’t cover all our use cases. We needed custom failure distributions based on real equipment data:
|
|
Integration with AMX Engine
The biggest win was integrating Kalasim simulations directly with our AI agents:
|
|
This allowed our agents to run quick “what-if” simulations to inform their decisions.
Current Limitations and Workarounds
What we’re still struggling with:
-
Debugging complex models: When simulations produce unexpected results, tracing the cause through thousands of events is challenging.
-
Visualization: Kalasim has limited built-in visualization. We built custom dashboards for monitoring simulation runs.
-
Model validation: Ensuring our simulations accurately represent reality requires constant calibration against real-world data.
-
Documentation gaps: As a newer framework, some advanced use cases lack documentation. We’ve contributed back several examples.
Our workarounds:
|
|
Would We Choose Kalasim Again?
Yes, with caveats. For our specific use case - discrete event simulation integrated with Kotlin-based AI systems - Kalasim was the right choice. The language alignment and seamless integration with our existing code made it worth the rough edges.
Choose Kalasim if:
- You’re already in the JVM ecosystem
- You need discrete event simulation (not agent-based modeling)
- You can invest time in learning a newer framework
- Integration with your existing codebase is important
Look elsewhere if:
- You need battle-tested stability (SimPy is more mature)
- You want extensive built-in visualization
- You’re doing primarily agent-based modeling
- You need commercial support
What’s Next
We’re working on:
- Better debugging tools: Custom visualization and tracing capabilities
- Model validation frameworks: Automated comparison against real-world data
- Performance optimization: Exploring parallel execution for large simulations
- Contributing back: Sharing our extensions and improvements with the Kalasim community
Building digital twins with simulation engines is complex, regardless of which tool you choose. Kalasim gave us the foundation we needed, but success required significant engineering investment to make it production-ready.
The payoff has been worth it - our agents can now predict outcomes, identify bottlenecks, and recommend optimizations based on sophisticated simulation models. That capability is transforming how our customers use AIMatrix for process optimization.