Add per-instance data transfer metering#3685
Open
peterschmidt85 wants to merge 8 commits intomasterfrom
Open
Conversation
…limits Adds a configurable per-job outbound data transfer quota (AWS only) that terminates jobs when the total external traffic exceeds the threshold. Metering uses iptables byte counters on the shim (host-level), excluding private/VPC traffic. The shim notifies the runner via a new /api/terminate endpoint so the server reads the termination reason through the existing /api/pull flow — same pattern as log quota. Configured via DSTACK_SERVER_DATA_TRANSFER_QUOTA_PER_JOB_AWS (bytes, 0=unlimited). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uota Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…orting Replace the per-job quota termination approach with per-instance passive metering. The shim starts an iptables-based netmeter at startup that continuously tracks outbound external bytes. The server reads this via the existing /api/instance/health endpoint during periodic health checks (~60s) and captures a final reading before instance termination. Changes: - Netmeter: per-instance chain (dstack-nm), no quota, exposes Bytes() - Shim: starts netmeter at boot, reports via InstanceHealthResponse - Server: stores data_transfer_bytes on InstanceModel, final read at termination - Removed: quota enforcement, /api/terminate endpoint, DATA_TRANSFER_QUOTA_EXCEEDED Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…om SSH tunnel Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7ff682d to
8ec5b15
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
r4victor
reviewed
Mar 26, 2026
| return self.health_response.dcgm is not None | ||
| return ( | ||
| self.health_response.dcgm is not None | ||
| or self.health_response.data_transfer_bytes is not None |
Collaborator
There was a problem hiding this comment.
I don't think having data_transfer_bytes should mean there are health check present – we wouldn't want to create health checks only because of data_transfer_bytes. Overall, having data_transfer_bytes as a part of health checks is confusing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds per-instance outbound data transfer metering to track billable network traffic.
dstack-nmchain) at boot that counts outbound bytes to external IPs, excluding private/VPC traffic (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)GET /api/instance/healthand stored onInstanceModel.data_transfer_bytesFiles changed
Shim:
netmeter/package (iptables chain setup, 10s polling, atomicBytes()read), started at shim boot inmain.go, reported viaInstanceHealthResponse.data_transfer_bytesServer:
InstanceModel.data_transfer_bytescolumn + Alembic migration, extraction in instance health check (both pipeline_tasks and scheduled_tasks), final read in termination path,InstanceAPI modelTest plan
Bytes()readdata_transfer_bytes= 22.1 MB (includes apt-get, Docker pull overhead)🤖 Generated with Claude Code