Why Choose Minfx?

Besides the obvious Neptune API and feature parity, we offer much more.

We believe in the following principles:

Performance is extremely important now, and even more so for future systems. Lots of work will be done by AI agents, and they will be as fast as the systems they intereact with.

Our motto is: If it's slow, it's a bug!
It is easy to make complicated systems, but simplicity is hard. We discuss with you to make sure we build the right system for you.
We understand what is most important for your work in developing ML systems. We are ML engineers, and want to make your life as easy as possible for the most common tasks.

The reason why we build Minfx is that despite machine learning being a 10+ years old field, the tooling is still not good enough. We were frustrated by the existing providers. So we decided to build best-in-class tooling for ML scientists and engineers ourselves.

Most Important Features

Click any card to learn more

⚡ Speed & Reliability: We are extremely fast while ensuring high reliability. Reliability is so important we have a dedicated page about it.

Our architecture is designed from the ground up for performance—async logging that never blocks your training, efficient binary protocols, and smart batching.

We obsess over every microsecond because we know time is valuable. Our background in algo-trading helps us engineer efficient solutions.

Our service is very reliable. We are concurrently logging data to multiple storage backends in different locations, so you can be sure your data is never lost.

We will publish latency benchmarks soon.
🤖 Analyze Runs Where You Want To: While you can get quick overview of your runs in the UI, we get out of the way so you can analyze your data locally.

You can use your favorite tools (pandas, polars, etc.) for analysis. You can bulk download data just as you need, with high bandwidth and low latency.

We are planning on writing custom polars extension to make fetching data very straightforward, as if you had terabytes of data at your fingertips.
🧑‍🤝‍🧑 Best-in-Class Team Sharing: Anything that you see on the screen, your team-mates can see it too. Just send them a link.

We truly mean everything, and pixel-perfect.

List of Extra Features

💵 GPU Cost Analysis: Find ways to save money. We help you pick the right instance and batch size to find the pareto frontier of performance vs cost.

Compare training efficiency across different GPU configurations and make data-driven decisions about your infrastructure spending.
🤖 LLM-Enabled Dashboards: Ask for insights about your metrics, runs, etc. in natural language.

We compile appropriate programs that fetch and analyze data for you.

It's like having an AI assistant that understands your ML workflow and can help you explore your experiment data conversationally.
📊 Dataset Tracking: Detailed tracking: data versions, histograms of features, samples of data.

Peek into samples of inputs and outputs to verify your model behaves as expected.

Catch data drift early and understand exactly what data went into each training run.
👍 One-Click Deployment: We help you set up straightforward one-click deployment of your ML models, and monitor their performance.

Go from experiment to production in seconds. Select your best run, click deploy, and we communicate with your endpoint configuration that needs to be updated.

Monitor your deployed models with the same familiar interface you use for experiments. Track inference latency, throughput, and model drift in real-time.
🧹 Hyper-Parameter Sweeps: Specify what you'd like to sweep straight from the UI with no hassles. We will help you find what to sweep.

Define parameter ranges, choose your search strategy (grid/sampling/Bayesian/Optuna/Hyperband/etc.), and let us handle the orchestration.

We'll help you find what to sweep to get the best performance—our built-in analysis identifies which hyperparameters actually matter for your model.

The search can be automatically stopped early when validation loss stops improving.

Visualize the sweep progress with parallel coordinates plots and identify the optimal configuration faster.
🖥️ GPU Perf Tracking: Detailed tracking of GPUs performance via nVidia's DCGM.

Monitor GPU utilization, memory bandwidth, thermal throttling, and more.

Identify bottlenecks in your training pipeline and ensure you're getting the most out of your hardware investment.
🔢 Multi-GPU Support: Multiple loggers per device for a single run. Track distributed training seamlessly.

Works across multiple GPUs, nodes, and even clusters.

Whether you're using DataParallel, DistributedDataParallel, or custom sharding strategies, we've got you covered.
📈 Monitoring: Detailed monitoring: per-layer gradient norms, optimizer state, histograms of losses over time, and learning rate schedules.

When you encounter gradient spikes, we help you find the exact minibatches causing them so you can debug training instabilities faster.
∞ All Float Values: We track all float values (inf/nan/+0/-0/signalling nans) and display them appropriately.

These edge cases matter in numerical computing. Spot numerical issues immediately in the UI.
⏱️ High Precision: Nanosecond precision for timestamps. Never lose timing details.

Correlate events across distributed systems with confidence and measure micro-optimizations accurately.
🔍 Anomaly Detection: Time series anomaly detection in your metrics and runs.

We search for things you might not have noticed.

Get alerts when something looks off—sudden changes in loss trends, unusual metric patterns, or unexpected behavior—before it becomes a bigger problem.
📜 History Revision: When you have a regression in performance or accuracy, we help you find when it started.

Bisect through your experiment history to identify the commit, configuration change, or data update that caused the regression.
📊 Advanced Visualizations: Plot pareto-frontiers, parallel coordinates plots, histograms over time, and more.

Understand complex relationships in your experiments at a glance with multi-dimensional visualizations.
☁️ Spot Instances: Our logging works and tracks spot instance preemptions out of the box.

We track preemptions automatically, save state gracefully, and help you resume training seamlessly.

Take advantage of 60-90% cloud cost savings without worrying about losing your experiment data.
🌐 Web + Native Desktop: Use the web interface, or use the native desktop app for enhanced performance. View data even when offline.

The web interface works seamlessly across all browsers and devices, giving you full access to your experiments from anywhere.

For power users, our native desktop on Linux/Mac/Windows provides a faster, more responsive experience with larger offline storage.

Both interfaces share the same features and data, so you can switch between them based on your workflow needs.
📴 Offline Data: Continue logging experiments even without internet connectivity.

Train models in air-gapped environments, on the go, or in locations with unreliable internet without losing any data.

Our client intelligently buffers all logs locally and syncs them in the background once connectivity is available.

Access previously synced experiment data offline to review past runs, compare metrics, and prepare presentations without needing an active connection.

Is there something you'd like to see? Let us know!

View Migration Offer →

Why Choose Minfx?

Most Important Features

List of Extra Features

Contact Us